SlideShare a Scribd company logo
Synthetic Data Generation for Machine Learning
2020 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
Sri.Krishnamurthy@qusandbox.com
www.quantuniversity.com
03/05/2020
Boston, MA
2
Speaker bio
• Quant, Data Science & ML practitioner
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers.
• Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Teaches Data Science/AI at Northeastern
University, Boston
• Reviewer: Journal of Asset Management
Sri Krishnamurthy
Founder and CEO
QuantUniversity
3
About QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science,
ML and Big Data Technologies
• Building a platform for
operationalizing AI and Machine
Learning in the Enterprise
4
1. Challenges with Real Datasets
2. Synthetic Dataset generation tools
▫ Proprietary
▫ Open Source
– Faker
– Data Synthesizer
– SDV
– Synthpop
– GANs
3. Demos
▫ Data Synthesizer
▫ Sales Data Generator
▫ VIX Data Generator
Agenda
Challenges with Real Datasets
6
7
• It may not be feasible to get samples for all
categories
• Lighting conditions
• Modifications (Glasses/No glasses,
Moustache/ No Moustache etc.)
• Positions
Coverage
Challenges with real datasets
8
All scenarios haven’t
played out
• Stress scenarios
• What-if scenarios
Challenges with real datasets
Figure ref: http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
9
Missing values
• Missing at random
• Missing sequences
• Need data to fill frames
Challenges with real datasets
10
• Access
▫ Hard to find
▫ Rare class problems
▫ Privacy concerns
making it difficult to
share
Challenges with real datasets
11
Imbalanced
• Need more samples of rare
class
• Need proxies for data points
that were not observed or
recorded
Challenges with real datasets
12
Labels
• Human labeling is hard
• Synthetic label generators
Challenges with real datasets
Tools for Synthetic Data Generation
14
Proprietary Tools
Company Core Technology
Tonic.ai
All-in-one platform for data anonymization, subsetting, and synthesis
integrated with databases (hadoop, oracle, mysql, MS sql server,
mongo db, amazon aurora/redshift, and google big query)
- Uses Condenser and Masquerade
Mostly.ai
Tablular data using generative deep neural networks (no image data)
CVEDIA
- Sensor modeling and algorithm training
- Handle image using SynCity as a custom pocket laboratory to
generate highly entropic scenes, conditions, and metadata. Enable
real-time Hardware-In-the-Loop (HWIL), Human-In-the-Loop (HITL) or
Software-In-the-Loop (SIL) simulations even with complex sensor
configurations
Deep vision data image creation
synthetic training data
Synthesis.ai The data generation platform for computer vision
15
Opensource tools
16
SDV
https://www.computer.org/csdl/proceedings-
article/dsaa/2016/07796926/12OmNwx3Q7S
17
Data Synthesizer
https://faculty.washington.edu/billhowe/publications/pdfs/pin
g17datasynthesizer.pdf
18
Synthpop
19
VAE
https://arxiv.org/pdf/1808.06444.pdf
20
GAN
https://developers.google.com/machine-
learning/gan/gan_structure
21
WGAN
1. Loan Data Synthesizer
2. Sales Data Generator
3. Vix Data Generator
23
24
Demo 1 – Loan Data Synthesizer
25
Demo 2: Synthetic Sales data generation
26
Demo 3 : Synthetic VIX generation
27
If you want to be a part of QuSandbox private Beta
Contact us:
info@qusandbox
28
1. Model Governance in the Age of Data Science and AI
▫ GFMI Course, March 9th, 10th, New York, NY
2. Synthetic VIX data generation using deep learning techniques
▫ QWAFAFEW meeting - March 17th, 2020, Boston MA
3. Using synthetic data for ML in Finance
▫ 2nd Annual Machine Learning in Quantitative Finance – April 1st, 2020, New York, NY
4. Tackling the biggest limitations of ML
▫ 2nd Annual Machine Learning in Quantitative Finance – April 1st, 2020, New York, NY
5. Foundations of Machine learning and AI for Financial Professionals
▫ 8-week Online course offered in partnership with PRMIA – May 12th – June 30th, 2020, Online
6. A Master Class on AI and Machine Learning for Financial Professionals
▫ Invited session at the 73rd CFA Annual Conference – May 17th, 2020, Atlanta, GA
Upcoming events by QuantUniversity
Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
29

More Related Content

What's hot (20)

PDF
Machine Learning and its Applications
Dr Ganesh Iyer
 
PDF
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Edureka!
 
PPTX
Explainable AI
Wagston Staehler
 
PDF
Machine learning
Dr Geetha Mohan
 
PPTX
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
PDF
Feature Engineering
HJ van Veen
 
PPTX
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
PDF
And then there were ... Large Language Models
Leon Dohmen
 
PDF
generative-ai-fundamentals and Large language models
AdventureWorld5
 
PDF
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Simplilearn
 
PDF
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
PDF
Generative AI and Security (1).pptx.pdf
Priyanka Aash
 
PDF
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON
 
PDF
Explainable AI
Dinesh V
 
PDF
Generative adversarial networks
남주 김
 
PDF
Synthetic data generation
Sandeep Joshi
 
PDF
An introduction to computer vision with Hugging Face
Julien SIMON
 
PDF
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Edureka!
 
PPTX
Big Data Analytics
Ghulam Imaduddin
 
PPTX
An Introduction to XAI! Towards Trusting Your ML Models!
Mansour Saffar
 
Machine Learning and its Applications
Dr Ganesh Iyer
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Edureka!
 
Explainable AI
Wagston Staehler
 
Machine learning
Dr Geetha Mohan
 
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Feature Engineering
HJ van Veen
 
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
And then there were ... Large Language Models
Leon Dohmen
 
generative-ai-fundamentals and Large language models
AdventureWorld5
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Simplilearn
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
Generative AI and Security (1).pptx.pdf
Priyanka Aash
 
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON
 
Explainable AI
Dinesh V
 
Generative adversarial networks
남주 김
 
Synthetic data generation
Sandeep Joshi
 
An introduction to computer vision with Hugging Face
Julien SIMON
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Edureka!
 
Big Data Analytics
Ghulam Imaduddin
 
An Introduction to XAI! Towards Trusting Your ML Models!
Mansour Saffar
 

Similar to Synthetic data generation for machine learning (20)

PDF
Synthetic data in finance
QuantUniversity
 
PDF
Synthetic data in finance
QuantUniversity
 
PDF
Qu speaker series 14: Synthetic Data Generation in Finance
QuantUniversity
 
PDF
Synthetic Data Generation with DoppelGanger
QuantUniversity
 
PDF
"Can Simulation Solve the Training Data Problem?," a Presentation from Mindtech
Edge AI and Vision Alliance
 
PDF
Practical model management in the age of Data science and ML
QuantUniversity
 
PDF
AI and Machine Learning PG program
MamathaSharma4
 
PDF
QuSandbox+NVIDIA Rapids
QuantUniversity
 
PDF
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
javascriptsali
 
PDF
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
Chris Andrews
 
PPTX
Machine Learning AND Deep Learning for OpenPOWER
Ganesan Narayanasamy
 
PPTX
AI Program Details by Enukollu Mahesh
Mahesh Enukollu
 
PDF
Quant university MRM and machine learning
QuantUniversity
 
PPTX
Image analysis - performance analysis - gans -
ramamanikanth73
 
PDF
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
QuantUniversity
 
PDF
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Ron Bodkin
 
PDF
How to use LLMs in synthesizing training data?
Benjaminlapid1
 
PPTX
Artificial intelligence: Simulation of Intelligence
Abhishek Upadhyay
 
PDF
Artificial Intelligence (ML - DL)
ShehryarSH1
 
PPTX
Artificial intelligence vs Machine learning
Swarup Saw
 
Synthetic data in finance
QuantUniversity
 
Synthetic data in finance
QuantUniversity
 
Qu speaker series 14: Synthetic Data Generation in Finance
QuantUniversity
 
Synthetic Data Generation with DoppelGanger
QuantUniversity
 
"Can Simulation Solve the Training Data Problem?," a Presentation from Mindtech
Edge AI and Vision Alliance
 
Practical model management in the age of Data science and ML
QuantUniversity
 
AI and Machine Learning PG program
MamathaSharma4
 
QuSandbox+NVIDIA Rapids
QuantUniversity
 
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
javascriptsali
 
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
Chris Andrews
 
Machine Learning AND Deep Learning for OpenPOWER
Ganesan Narayanasamy
 
AI Program Details by Enukollu Mahesh
Mahesh Enukollu
 
Quant university MRM and machine learning
QuantUniversity
 
Image analysis - performance analysis - gans -
ramamanikanth73
 
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
QuantUniversity
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Ron Bodkin
 
How to use LLMs in synthesizing training data?
Benjaminlapid1
 
Artificial intelligence: Simulation of Intelligence
Abhishek Upadhyay
 
Artificial Intelligence (ML - DL)
ShehryarSH1
 
Artificial intelligence vs Machine learning
Swarup Saw
 
Ad

More from QuantUniversity (20)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
QuantUniversity
 
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
QuantUniversity
 
PDF
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
PDF
Qu for India - QuantUniversity FundRaiser
QuantUniversity
 
PDF
Ml master class for CFA Dallas
QuantUniversity
 
PDF
Algorithmic auditing 1.0
QuantUniversity
 
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
QuantUniversity
 
PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
QuantUniversity
 
PDF
Seeing what a gan cannot generate: paper review
QuantUniversity
 
PDF
AI Explainability and Model Risk Management
QuantUniversity
 
PDF
Algorithmic auditing 1.0
QuantUniversity
 
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 
PDF
Bayesian Portfolio Allocation
QuantUniversity
 
PDF
The API Jungle
QuantUniversity
 
PDF
Explainable AI Workshop
QuantUniversity
 
PDF
Constructing Private Asset Benchmarks
QuantUniversity
 
PDF
Machine Learning Interpretability
QuantUniversity
 
PDF
Responsible AI in Action
QuantUniversity
 
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
QuantUniversity
 
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
QuantUniversity
 
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
Qu for India - QuantUniversity FundRaiser
QuantUniversity
 
Ml master class for CFA Dallas
QuantUniversity
 
Algorithmic auditing 1.0
QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
QuantUniversity
 
Seeing what a gan cannot generate: paper review
QuantUniversity
 
AI Explainability and Model Risk Management
QuantUniversity
 
Algorithmic auditing 1.0
QuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 
Bayesian Portfolio Allocation
QuantUniversity
 
The API Jungle
QuantUniversity
 
Explainable AI Workshop
QuantUniversity
 
Constructing Private Asset Benchmarks
QuantUniversity
 
Machine Learning Interpretability
QuantUniversity
 
Responsible AI in Action
QuantUniversity
 
Ad

Recently uploaded (20)

PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 

Synthetic data generation for machine learning