With research suggesting that by 2024, 60% of data used for the development of AI and analytics projects will be synthetically generated1, a Synthetic Data Community has been created by tech start-up YData to facilitate an open-source approach to improving access to tabular and time-series data, the most common formats for storing data.
The Synthetic Data Community established by YData, which created a data preparation platform to accelerate the development of AI solutions, aims to break down barriers for data science teams, researchers, and beginner learners and in so doing unlock the power of synthetic data. YData’s Synthesizer leverages state-of-the-art deep learning techniques to learn the statistical information from the real data and mimics it on a new dataset, without transforming the original data, nor copying the real records.
“We believe that having quality data is truly a game-changer and that by creating high-quality data that resembles real-world data that was initially inaccessible, endless possibilities can be unlocked,” explains YData co-founder Gonçalo Martins Ribeiro.
Synthetic data is artificially created and keeps the original data properties, ensuring its business value while being compliant. Using synthetic data reduces the risk of profile re-identification and opens up potential for innovation, collaboration and new revenue streams. Individuals’ privacy and protection against re-identification attacks are secured through mathematical methods.
Besides preserving the statistical properties of the original data, YData’s synthesized approach preserves the data quality and structure, ensuring high-quality data for purposes such as training ML models.
Moreover, by leveraging synthetic data, organizations can achieve dataset balancing, helping to sort issues such as bias and ensure more fairness within the datasets used to develop AI initiatives. YData accelerates and eases the data sharing or selling processes, speeding up the build of a trustful data economy.
“In 2020 we conducted a study that found that the biggest problem faced by data scientists was the unavailability of high-quality data even though it is widely accepted that data is the most valuable resource,” continues Ribeiro.
“Not every company, researcher, or student has access to the most valuable data like some tech giants do. As machine learning algorithms, coding frameworks evolve rapidly, it’s safe to say the scarcest resource in AI is high-quality data at scale. The Synthetic Data Community is a step towards addressing that.”
Established in 2019, Ydata’s data preparation platform was developed following a data-centric mindset, by bringing together the major data science frameworks with proprietary tool for data access and profiling, synthetic data generation and labelling, to deliver better data quality for AI.Better data means fewer errors, biases and a representative set that ensures AI is built in a responsible manner. The company’s technology has already been adopted by organizations in the financial services, utilities, and telecoms sectors.
For more information, visit www.ydata.ai or www.syntheticdata.community
1. Source: Gartner – https://blogs.gartner.com/andrew_white/2021/07/24/by-2024-60-of-the-data-used-for-the-development-of-ai-and-analytics-projects-will-be-synthetically-generated/
Media Contact
Company Name: Astute
Contact Person: Howard Robinson
Email: Send Email
City: London
Country: United Kingdom
Website: www.astuteuk.co.uk