How to do Secure Data Labeling
for Machine Learning
Technology enthusiast with 13+ years of experience working
in the information technology and services industry. Leads
cutting-edge solutions for businesses using Machine Learning
and Artificial Intelligence.
Areas of expertise includes Architecture design, Solutioning,
Data Engineering and Deep Learning.Mohit Juneja
Solutions Architect
The Speaker
Extensive experience building future tech products using
Machine Learning and Artificial Intelligence.
Areas of expertise includes Deep Learning, Data Analysis,
full stack development and building world class products
in ecommerce, travel and healthcare sector.
Shruti Tanwar
Lead - Data Science
The Speaker
Bikash Sharma
CTO and Co-founder at
Skyl.ai
CTO & Software Architect with 15 years of experience
working at the forefront of cutting-edge technology
leading innovative projects
Areas of expertise include Architecture design, rapid
product development, Deep Learning and Data Analysis
The Panelist
Getting familiar with ‘Zoom’
All dial-in participants will be muted to enable the presenters
to speak without interruption
Questions can be submitted via Zoom Questions chat
window and will be addressed at the end during Q&A
The recording will be emailed to you after the webinar
Please familiarize yourself with the Zoom ‘Control Panel’ on your screen
Live Demo of
Secure Data
Labeling Platform
Deep Dive into the
Data Labeling
Process
1 2
...In the next 45 minutes
Machine Learning automation platform for unstructured data
A quick intro about Skyl.ai
Guided Machine Learning Workflow
Build & deploy ML models faster on
unstructured data
Collaborative Data Collection & Labeling
Easy-to-use & scalable AI SaaS platform
POLL #1
At what stage of Machine learning adoption your
organization is at?
⊚ Exploring - Curious about it
⊚ Planning - Creating AI/ML strategy
⊚ Experimenting - Building proof of concepts
⊚ Scaling up - Some departments are using it
⊚ In production - Using it in product features
⊚ Transforming - AI/Ml driven business
Deep Dive:
Data Labeling Process
01
What is Data Labeling?
Data labeling, also called data annotation/tagging, is the process of
preparing labeled datasets for machine learning.
Images Data labeling
Image Classification
ML Model
Examples of Data labeling
Computer Vision - Image Classification
Computer Vision - Object Detection
NLP - Text Extraction (NER)
By Collaborator (Human-in-loop)
● In-house employee- Assigning tasks to an in-house labeling team /
employees of the organization.
● Hire data labeling companies.
Automated data labeling
● Data labeling through machine learning algorithms
● Reduces the number of labeling tasks in the data labeling process
● Speed up the labeling process
Types of Data Labeling
3 Aspects for Building Quality Labeled Dataset
Right team to
carry out the data
labeling project
Right data
labeling process &
workflow
Right data
labeling tools in
place
⊚ Conducting Mock data labeling test
⊚ Measuring data labeling consistency
⊚ Auditing (QC) of Labeled dataset periodically as it gets labeled
Best Practices to ensure Quality Labeled Dataset
Labeling Quality: Conducting Mock Data Labeling Test
Qualify the right collaborator for your data labeling job
Labeling Quality : Measuring Data Labeling Consistency
Negative sentiment
Neutral sentiment
Positive sentiment
Measuring how consistently collaborator agree with each other
Labeling Quality : Review of Labeled Dataset
Reviewing the labeled dataset by flagging out the bad labeled data
⊚ Access Control
⊚ Audit Log
⊚ Data Encryption
⊚ Data source behind firewall
Data Security
Data Security : Access Control
Data
scientist
Project Lead /
Data Manager
Data Labeler
(Collaborator)
Data Labeling
Job Reviewer
Having right access control throughout the data labeling process
Data Security : Audit Log
Gain insights into user activities for attaining organization and compliance needs
Data Security : Encryption
Encrypted
Data at rest Data in useData in motion
TLS/SSL
Securing data assets while in rest, motion and use
Data Labeling Tool
Data Security : Firewall
Having private network restriction to data by using on-prem data labeling solution
Private network Public Network
Demo of how to perform
secure data labeling02
Skyl Labelwise: Data Labeling Process
Demo of how to perform secure
Data Labeling
Skyl.ai Labelwise
Guided
Workflow
Data labeling solution for computer
vision & NLP
Quality
Labeled dataset
Right process and metrics in place to ensure quality
data labeling
Effective
Collaboration
Collaborate and manage data labeling
projects efficiently
Early
Visibility
Get early visibility; visualize and affirm correctness
on every step of the way
Scalable
High - Performance
Access on-demand and scalable, high-performance
infrastructure
Security
& Compliance
Access control, data encryption, audit log and
on-prem solution
We can help you with...
⊚ AI Adoption Assessment
⊚ AI Systems Integration
⊚ AI Performance Evaluation
⊚ AI-Enabled Software Development
Our AI Consulting Services
www.skyl.ai contact@skyl.ai
⊚ Free 1 month Trial + POC
⊚ Complimentary 30 min consultation
⊚ AI Implementation Playbook
www.skyl.ai contact@skyl.ai
Special offer for you...
Questions?
?
We hope to hear from you soon
Thank you for joining!
85 Broad Street, New York, NY, 10004
+1 718 300 2104, +1 646 202 9343
contact@skyl.ai

More Related Content

PDF
Image annotation for machine learning
PPTX
AI in Insurance: How to Automate Insurance Claim Processing with Machine Lear...
PPTX
AI in Insurance: How to Automate Insurance Claims Processing with Machine Lea...
PPTX
Machine Learning Solutions benefits for business!
PDF
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
PPTX
Avi pic
PPT
1.Innova Zurich
PDF
Premium calculation process automation for an insurance industry leader niv...
Image annotation for machine learning
AI in Insurance: How to Automate Insurance Claim Processing with Machine Lear...
AI in Insurance: How to Automate Insurance Claims Processing with Machine Lea...
Machine Learning Solutions benefits for business!
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
Avi pic
1.Innova Zurich
Premium calculation process automation for an insurance industry leader niv...

What's hot (19)

PDF
Why Cloud ERP is Important for Business - AcTouch Technologies
PPTX
Machine learning predicts customer behavior coverts predictions into prescrip...
PPTX
Asia Start-Up InsurTech Award 2017 - Pentation Analytics
PPTX
The digital-insurer-innovation-award-application-deck-for-ntuc-income
PPTX
The Digital Insurer Award - FWD (data AI)
PPTX
The Digital Insurer Award - Sun Life
PPTX
IoT Pay | Blockchain Backed NFC PoS
PDF
TDI Startup Insurtech Award- JRNY Investment overview for digital insurer app...
PPTX
What is Digital Performance Management?
PPTX
The Digital Innovation Award - Virtual Assist by Codafication
PPTX
Insurance Broker software |Insureqlik
PDF
Digital Architecture to User Experience
PDF
Managed support cost and enhanced performance for the world's largest gaming ...
PPTX
Digital Transformation: How to Improve New Customer Onboarding
PDF
How Digital Will Impact the Federal Government this Year
PDF
AI & RPA: What's the Difference
PPTX
Perspectives for the Factory of the Future (ABII 8o encontro)
PPT
8 Minut Presentation final
PDF
Enterprise Design Thinking - Team Essentials for AI
Why Cloud ERP is Important for Business - AcTouch Technologies
Machine learning predicts customer behavior coverts predictions into prescrip...
Asia Start-Up InsurTech Award 2017 - Pentation Analytics
The digital-insurer-innovation-award-application-deck-for-ntuc-income
The Digital Insurer Award - FWD (data AI)
The Digital Insurer Award - Sun Life
IoT Pay | Blockchain Backed NFC PoS
TDI Startup Insurtech Award- JRNY Investment overview for digital insurer app...
What is Digital Performance Management?
The Digital Innovation Award - Virtual Assist by Codafication
Insurance Broker software |Insureqlik
Digital Architecture to User Experience
Managed support cost and enhanced performance for the world's largest gaming ...
Digital Transformation: How to Improve New Customer Onboarding
How Digital Will Impact the Federal Government this Year
AI & RPA: What's the Difference
Perspectives for the Factory of the Future (ABII 8o encontro)
8 Minut Presentation final
Enterprise Design Thinking - Team Essentials for AI
Ad

Similar to How to do Secure Data Labeling for Machine Learning (20)

PPTX
How to do Secure Data Labeling for Machine Learning
PPTX
How to perform Secure Data Labeling for Machine Learning
PDF
The Essential Role of Data Labeling Companies in the AI Revolution
PDF
Data Labeling Company: The Backbone of AI Development
PPTX
Add Value to Your Business with Professional AI Data Labeling Services
PDF
How AI Companies Use Data Labeling to Train Models
PDF
Mastering Data Precision: How Labeling Services Refine Machine Learning Algor...
PPTX
AI in Quality Control: How to do visual inspection with AI
PDF
Overcoming Data Labeling Challenges for AI & ML Development
PDF
Data Labeling Essential Use Cases for Powering AI Models
PPTX
How to classify documents automatically using NLP
PDF
What is Data Labeling? - Shaip
PPTX
How Data Labeling Can Help Improve Customer Experience.pptx
PPTX
Twitter Sentiment Analysis in 10 Minutes using Machine Learning
PDF
Enhancing Machine Learning Models with the Crucial Role of a Data Labeling Co...
PPTX
How to analyze text data for AI and ML with Named Entity Recognition
PDF
5 Practical Applications Showing How Data Labeling Drives AI Success
PDF
"6 Insights to Overcome the Hidden Pitfalls of Data Annotation", Karyna Naminas
PPTX
Machine-Learning-Overview a statistical approach
PPTX
Data Labeling: The Ultimate Guide to Choosing a Company | Labellerr
How to do Secure Data Labeling for Machine Learning
How to perform Secure Data Labeling for Machine Learning
The Essential Role of Data Labeling Companies in the AI Revolution
Data Labeling Company: The Backbone of AI Development
Add Value to Your Business with Professional AI Data Labeling Services
How AI Companies Use Data Labeling to Train Models
Mastering Data Precision: How Labeling Services Refine Machine Learning Algor...
AI in Quality Control: How to do visual inspection with AI
Overcoming Data Labeling Challenges for AI & ML Development
Data Labeling Essential Use Cases for Powering AI Models
How to classify documents automatically using NLP
What is Data Labeling? - Shaip
How Data Labeling Can Help Improve Customer Experience.pptx
Twitter Sentiment Analysis in 10 Minutes using Machine Learning
Enhancing Machine Learning Models with the Crucial Role of a Data Labeling Co...
How to analyze text data for AI and ML with Named Entity Recognition
5 Practical Applications Showing How Data Labeling Drives AI Success
"6 Insights to Overcome the Hidden Pitfalls of Data Annotation", Karyna Naminas
Machine-Learning-Overview a statistical approach
Data Labeling: The Ultimate Guide to Choosing a Company | Labellerr
Ad

More from Skyl.ai (20)

PPTX
AI in Quality Control: How to perform Visual Inspection with AI
PPTX
How to analyze text data with Named Entity Recognition
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
PPTX
Solving the dilemma should you build or buy ai
PPTX
How AI and Machine Learning can Transform Organizations
PPTX
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
PPTX
How an AI-backed recommendation system can help increase revenue for your onl...
PPTX
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
PPTX
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
PPTX
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?
PPTX
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
PPTX
How AI is Changing Medical Imaging in the Healthcare Industry
PPTX
Twitter Sentiment Analysis in 10 Minutes Using Machine Learning
PPTX
How to Build an AI-powered Automatic Document Classification Model
PPTX
How to Implement Biomedical Named Entity Recognition with Machine Learning
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
PDF
Ai in insurance how to automate insurance claim processing with machine lear...
PPTX
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
PPTX
AI Recruitment - How Businesses Are Winning the Race for the Talent
PPTX
AI in Health Care: How to Implement Medical Imaging using Machine Learning?
AI in Quality Control: How to perform Visual Inspection with AI
How to analyze text data with Named Entity Recognition
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Solving the dilemma should you build or buy ai
How AI and Machine Learning can Transform Organizations
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
How an AI-backed recommendation system can help increase revenue for your onl...
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
How AI is Changing Medical Imaging in the Healthcare Industry
Twitter Sentiment Analysis in 10 Minutes Using Machine Learning
How to Build an AI-powered Automatic Document Classification Model
How to Implement Biomedical Named Entity Recognition with Machine Learning
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Ai in insurance how to automate insurance claim processing with machine lear...
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
AI Recruitment - How Businesses Are Winning the Race for the Talent
AI in Health Care: How to Implement Medical Imaging using Machine Learning?

Recently uploaded (20)

PDF
TicketRoot: Event Tech Solutions Deck 2025
PDF
Slides World Game (s) Great Redesign Eco Economic Epochs.pdf
PDF
Optimizing bioinformatics applications: a novel approach with human protein d...
PDF
Altius execution marketplace concept.pdf
PDF
Human Computer Interaction Miterm Lesson
PDF
Introduction to c language from lecture slides
PDF
Secure Java Applications against Quantum Threats
PDF
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
PDF
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
PPTX
CRM(Customer Relationship Managmnet) Presentation
PDF
substrate PowerPoint Presentation basic one
PPT
Overviiew on Intellectual property right
PDF
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
PDF
Addressing the challenges of harmonizing law and artificial intelligence tech...
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PPTX
Blending method and technology for hydrogen.pptx
PPTX
maintenance powerrpoint for adaprive and preventive
PDF
Internet of Things (IoT) – Definition, Types, and Uses
PDF
Child-friendly e-learning for artificial intelligence education in Indonesia:...
TicketRoot: Event Tech Solutions Deck 2025
Slides World Game (s) Great Redesign Eco Economic Epochs.pdf
Optimizing bioinformatics applications: a novel approach with human protein d...
Altius execution marketplace concept.pdf
Human Computer Interaction Miterm Lesson
Introduction to c language from lecture slides
Secure Java Applications against Quantum Threats
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
CRM(Customer Relationship Managmnet) Presentation
substrate PowerPoint Presentation basic one
Overviiew on Intellectual property right
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
Addressing the challenges of harmonizing law and artificial intelligence tech...
Build automations faster and more reliably with UiPath ScreenPlay
EIS-Webinar-Regulated-Industries-2025-08.pdf
Blending method and technology for hydrogen.pptx
maintenance powerrpoint for adaprive and preventive
Internet of Things (IoT) – Definition, Types, and Uses
Child-friendly e-learning for artificial intelligence education in Indonesia:...

How to do Secure Data Labeling for Machine Learning

  • 1. How to do Secure Data Labeling for Machine Learning
  • 2. Technology enthusiast with 13+ years of experience working in the information technology and services industry. Leads cutting-edge solutions for businesses using Machine Learning and Artificial Intelligence. Areas of expertise includes Architecture design, Solutioning, Data Engineering and Deep Learning.Mohit Juneja Solutions Architect The Speaker
  • 3. Extensive experience building future tech products using Machine Learning and Artificial Intelligence. Areas of expertise includes Deep Learning, Data Analysis, full stack development and building world class products in ecommerce, travel and healthcare sector. Shruti Tanwar Lead - Data Science The Speaker
  • 4. Bikash Sharma CTO and Co-founder at Skyl.ai CTO & Software Architect with 15 years of experience working at the forefront of cutting-edge technology leading innovative projects Areas of expertise include Architecture design, rapid product development, Deep Learning and Data Analysis The Panelist
  • 5. Getting familiar with ‘Zoom’ All dial-in participants will be muted to enable the presenters to speak without interruption Questions can be submitted via Zoom Questions chat window and will be addressed at the end during Q&A The recording will be emailed to you after the webinar Please familiarize yourself with the Zoom ‘Control Panel’ on your screen
  • 6. Live Demo of Secure Data Labeling Platform Deep Dive into the Data Labeling Process 1 2 ...In the next 45 minutes
  • 7. Machine Learning automation platform for unstructured data A quick intro about Skyl.ai Guided Machine Learning Workflow Build & deploy ML models faster on unstructured data Collaborative Data Collection & Labeling Easy-to-use & scalable AI SaaS platform
  • 8. POLL #1 At what stage of Machine learning adoption your organization is at? ⊚ Exploring - Curious about it ⊚ Planning - Creating AI/ML strategy ⊚ Experimenting - Building proof of concepts ⊚ Scaling up - Some departments are using it ⊚ In production - Using it in product features ⊚ Transforming - AI/Ml driven business
  • 10. What is Data Labeling? Data labeling, also called data annotation/tagging, is the process of preparing labeled datasets for machine learning. Images Data labeling Image Classification ML Model
  • 11. Examples of Data labeling
  • 12. Computer Vision - Image Classification
  • 13. Computer Vision - Object Detection
  • 14. NLP - Text Extraction (NER)
  • 15. By Collaborator (Human-in-loop) ● In-house employee- Assigning tasks to an in-house labeling team / employees of the organization. ● Hire data labeling companies. Automated data labeling ● Data labeling through machine learning algorithms ● Reduces the number of labeling tasks in the data labeling process ● Speed up the labeling process Types of Data Labeling
  • 16. 3 Aspects for Building Quality Labeled Dataset Right team to carry out the data labeling project Right data labeling process & workflow Right data labeling tools in place
  • 17. ⊚ Conducting Mock data labeling test ⊚ Measuring data labeling consistency ⊚ Auditing (QC) of Labeled dataset periodically as it gets labeled Best Practices to ensure Quality Labeled Dataset
  • 18. Labeling Quality: Conducting Mock Data Labeling Test Qualify the right collaborator for your data labeling job
  • 19. Labeling Quality : Measuring Data Labeling Consistency Negative sentiment Neutral sentiment Positive sentiment Measuring how consistently collaborator agree with each other
  • 20. Labeling Quality : Review of Labeled Dataset Reviewing the labeled dataset by flagging out the bad labeled data
  • 21. ⊚ Access Control ⊚ Audit Log ⊚ Data Encryption ⊚ Data source behind firewall Data Security
  • 22. Data Security : Access Control Data scientist Project Lead / Data Manager Data Labeler (Collaborator) Data Labeling Job Reviewer Having right access control throughout the data labeling process
  • 23. Data Security : Audit Log Gain insights into user activities for attaining organization and compliance needs
  • 24. Data Security : Encryption Encrypted Data at rest Data in useData in motion TLS/SSL Securing data assets while in rest, motion and use Data Labeling Tool
  • 25. Data Security : Firewall Having private network restriction to data by using on-prem data labeling solution Private network Public Network
  • 26. Demo of how to perform secure data labeling02
  • 27. Skyl Labelwise: Data Labeling Process
  • 28. Demo of how to perform secure Data Labeling
  • 29. Skyl.ai Labelwise Guided Workflow Data labeling solution for computer vision & NLP Quality Labeled dataset Right process and metrics in place to ensure quality data labeling Effective Collaboration Collaborate and manage data labeling projects efficiently Early Visibility Get early visibility; visualize and affirm correctness on every step of the way Scalable High - Performance Access on-demand and scalable, high-performance infrastructure Security & Compliance Access control, data encryption, audit log and on-prem solution
  • 30. We can help you with... ⊚ AI Adoption Assessment ⊚ AI Systems Integration ⊚ AI Performance Evaluation ⊚ AI-Enabled Software Development Our AI Consulting Services www.skyl.ai [email protected]
  • 31. ⊚ Free 1 month Trial + POC ⊚ Complimentary 30 min consultation ⊚ AI Implementation Playbook www.skyl.ai [email protected] Special offer for you...
  • 33. We hope to hear from you soon Thank you for joining! 85 Broad Street, New York, NY, 10004 +1 718 300 2104, +1 646 202 9343 [email protected]

Editor's Notes

  • #2: Hello everyone and welcome. Thank you for joining today’s webinar on How to do Secure Data Labeling for Machine Learning. My name is Edwin Martinez and I’ll be your host today. First off, I’d like to introduce 3 expert speakers for today’s webinar..
  • #3: First we have Mohit Juneja, Mohit is a Solutions Architect and Technology supporter with over 13 years of experience in the IT and Service industry. He leads cutting-edge solutions for businesses using Machine Learning and AI. He’s an expert in Architect design, Data Engineering, and Deep Learning. Welcome Mohit!
  • #4: Next we have Shruti Tanwar - Shruti is an expert in data science who is a veteran in building SaaS products using Machine Learning and AI. Her expertise includes Deep Learning and Data Analysis, as well as full stack development and building tech products in various different fields such as ecommerce, travel, and healthcare. Welcome, Shruti!
  • #5: Finally, we have Bikash Sharma, joining us as a panelist. Bikash is CTO and Software Architect with 15 years of experience in leading innovative software projects and solutions. He’s co-founded Skyl with his expert knowledge in AI and Machine Learning. Welcome, Bikash!
  • #6: Before we begin, I’d like to briefly talk about some Zoom features that will be relevant to us. All participants in the webinar will be muted to avoid any interruptions during the session. Any questions you might have can be submitted to the Zoom Questions chat window in the control panel, located on the bottom of the screen. We’ll make sure to address your questions during the Q&A session. Also, the recording of the webinar will be emailed to you afterwards, just in case you’ve missed any talking points or wish to view it again. So that’s all for the introduction - now we’ll get started with the webinar and I’ll hand over the session to Mohit
  • #8: Let me start with a quick intro about Skyl.ai and its capabilities. Skyl.ai is a ML automation platform for unstructured data which includes text, images, audio etc. Using Skyl.ai business can build and deploy high quality NLP, Computer Vision models in hours rather than days or weeks. So how does Skyl do that? Skyl.ai provides an easy to use unified platform for the entire machine learning workflow which includes data collection, data labeling, feature engineering, training the model by choosing out of the box algorithms at scale, once model is trained, carrying out model evaluation and finally one click deployment and monitoring the model in production. So with Skyl.ai Platform you can basically. Manage your ML projects in one place. And allows you to take your AI experiments to production in no time with scale and leads to faster model release iteration cycles. The best part doing all this with no infrastructure or MLops effort required.
  • #9: Exploring - Curious about it Planning - Creating AI/ML strategy Experimenting - Building proof of concepts Scaling up - Some departments are using it In production - Using it in product features Transforming - AI/Ml driven business
  • #10: ML problems start with data—preferably, lots of data for which you already know the ground truth or the target answer is called labeled data. Supervised machine learning algorithms learn from labeled dataset, data that has been tagged with labels. Programmers do not explicitly program machine learning algorithms on how to make decisions, they program the models that learn from labeled data. NOW Often, data is NOT readily available in a labeled form. Collecting and preparing high-quality datasets is the most important step in solving an ML problem.
  • #11: Data labeling which is referred to as data annotation or tagging interchangeably is the process of preparing a labelled data set, Having both input which could be images, text and along with it one or more outputs which is the ground truth value relating the input data. Machine learning models learn to recognize repetitive patterns in labeled data. Supervised machine learning algorithms learn from labeled data, data that has been tagged with labels. After a sufficient amount of labeled data is processed, machine learning models can identify the same patterns in data that have not been labeled.
  • #12: Lets now go thru some of the examples of data labeling
  • #13: Using the Image classification ML model you can classify or categorize one or more attributes or labels and its classes from an image. For building out such computer vision based ML model we would require labeled dataset of images along with the labeled data In this example as shown we are trying to classify various attributes of apparels from the image like Type of article whether its Tops, bottom, head wear Base color if its blue, green, blue , yellow, Gender if the apparel is meant for Men or Women And so on. Basically you would provide a series of such images to a data labeler or a collaborator who shall label these attributes.
  • #14: This is another example of computer vision ML model where we not only classify the attributes in an image but also locate the location of the same under a segment area referred to as a bounding box. In this example as shown in figure in order to build a model which can detect surgical equipment in a tray like mayo scissor, forceps, etc. we would require a labeled dataset for this same. And we can perform a data labeling job for the same where the data labeler shall annotate various equipment using bounding boxes and build a labeled dataset which later shall be the input for model training.
  • #15: This is an NLP example where we can extract text (location of the word) from a given sentence and tag those under various categories. This is also referred to as NER / Named entity recognition. So in this example we are labeling sentences which are customer reviews for cameras and we are trying to extract key attributes as pros / cons and product mentions. This labeled dataset can then be used to train a NER model which can extract these key insights about cameras which are referred to as pros, cons and user product mentions. So now that we have gone through these data labeling examples, now a question arises: who does the data labeling and how is it done?
  • #16: Now there 2 types of data labeling based on who and how data labeling is done: By Collaborator This is the simplest labeling approach, where a human is employed to do the data labeling. Basically assigning tasks to employees within your organization who are the subject matter experts and know how and what exactly needs to be labeled. Or hire a data labeling company which manages all aspects of a data labeling project and usually paid on an hourly basis. How this works is you provide them with your raw unlabeled data like images or text and along with it a set of instructions on what and how the raw data needs to be labeled. Second approach is using automated data labeling In this process we automate the data labeling through machine learning algorithms Using either unsupervised learning, in which we cluster various categories of unlabeled data and then assign this to the human labeler to start validating this semi-labeled dataset. Using active learning where the algorithm learns as on when you start labeling and automatically labels the next image, where in a data labeler basically validates and modifies the annotation or labels. Examples of active learning could be video images where human labels first few frames of video and AI assisted labeling / active learning from previous frames and suggests labels in next frames. Automated data labeling can be useful, particularly in instances where there is a significant amount of unlabeled data like videos, that would be extremely expensive or time consuming to otherwise label, Basically an automated labeling system speeds up the process of labeling and human labelers basically validate or correct.
  • #17: Now that we understand what is data labeling and how it's done. Lets understand 3 aspects of data labeling to consider while taking up a data labeling project and build out a quality labeled dataset which ultimately leads to a high quality ML Model. Have the right team in place to carry out the data labeling projects, which would involve the data manager / project lead whose responsibility is make sure the data labeling projects run smoothly which includes having right data sources in place, and having visibility of project progress. Data labelers who shall be responsible for carrying out the labeling tasks based on provided instructions. QC reviewers who would be responsible for reviewing the labeled dataset for quality control and making sure that job carried by data labeler as per instructions. Finally the data scientist or ML engineer who shall validate and consume these labeled dataset and build out the AI model. Second Aspect is: Having a data labeling process or workflow - which involves defining the data labeling tasks, which shall involve the right checks in place so that you catch any errors or low quality data being labeled and label them off. Finally the aspect is: Having right data labeling tools in place that is the software or labeling platform which your team shall use to configure the right data labeling workflow that suits the needs. Where your human collaborator or data labeling partnera can effectively carry out data annotation or labeling in a secured environment. Also mechanism to carry out QC activities with right quality metrics in place and visibility around the progress with labeling tasks. Last the tools should provide easy and secured access to these labeled dataset to your data scientist who shall use it to train the model.
  • #18: Alright so how do you ensure labelled dataset is of high quality and what are some of the best practises around it? We shall go thru 3 key best practises which you must have in while carrying a data labeling project: Conducting mock labelling test to qualify right data labeler / collaborator. Measuring data labeling inconsistency to ensure labeling is more reliable and consistent. Auditing or a Quality review of labeled data periodically as it gets labeled, for scope of improvements if we find any anomalies. Lets learn in detail on each of these best practises for quality data labeling:
  • #19: The first one is conducting mock data labelling test Your model shall be as good as your labeled dataset so it's important to have the right collaborator / data labellers perform data labelling. Also key thing to learn and understand is when we are preparing the labeled dataset the model shall pick the attributes of these collaborators, that is, how they perceive the data based on their age, gender, demographic and knowledge about the subject. Most of the time this is where a bias is created in the labeled dataset, which ultimately may lead to bias model. So now the question comes, how do we qualify a data labeler? Try to have collaborators with diverse personalities, age, demography. These collaborators need to understand the subject well and should have no bias towards it. Now in order to know the capability of your data labelers and how well versed they are with the particular data labeling job which you are going to assign to them, as a best practise have a mock data labeling test where a set of labeling tasks is served to all these candidate collaborator. CLICK You can then evaluate among these collaborators and qualify for the data labeling job. Data labeling mock tests will help to build qualified collaborators whose judgement on data labeling will help to build high quality labeled dataset.
  • #20: Ok so next best practise is measuring data labeling consistency - in simple terms measure how consistently collaborators agree with each other, As human we may disagree with each other’s opinion, it's no different while performing data labeling task There could be tasks where collaborators may not answer or label the same question as asked to them, it could have various reasons as mentioned earlier, it could difference in age, personality, demographic or knowledge on the subject. Consider this example, 3 collaborators have assigned the same task of tagging a sentiment for a tweet. And all 3 of them have tagged differently, which shows 3 of them don't agree with each other. Now in a data labeling job we need to measure this and one of the metrics around it is, IRR or the Inter-Rater Reliability score. The IRR metrics is calculated from having collaborators label the same data, measure how many agree to it and come up with a IRR score. The higher the IRR score, better the data labeling job is.
  • #21: The next best practice for data labeling quality is reviewing the labeled dataset. As we understand, data labeling is the most time consuming and resource intensive part of ML development. And these labeled dataset shall be used to build out ML models by data scientists / ML engineers. So it is important to review in terms how well the data is annotated and is it good to build out a model. Consider this example: where a data labeling job is carried out for detecting pedestrians from video frames, and if you look at this particular video frame the bounding box is not covering the complete area of the pedestrian. Now this may lead to a bad model which may inappropriately detect pedestrians. So it is important for ML engineers / data scientists to review the labeled dataset and it is always recommended to do so while the data is getting labeled and flag those labeled data which are inappropriate. And provide feedback to the annotator for corrective measures.
  • #22: Up next we shall learn about the data security aspects of Secure data Labeling. There are 4 points to consider in a data labeling process. 1 Access control - regulating who has access to what aspects of data labeling. 2. Audit Log - understand who did what and when for a data labeling project. 3. Data Encryption - securing data in rest, motion and use 4. Data source behind firewall - for adhering to organization or business compliance need that data needs to be behind a firewall.
  • #23: Access control is a security technique that regulates who or what can view or use resources in a given environment. It is important to set up right access control around a data labeling process, which only allows authenticated and authorised users to have access to data. Various access can be set to different user profile in a data labeling process: Project Lead / Data Manager Access to setup data sources or data assets that requires data labeling. Access to progress of labeling job, manage job Data Labeler Have read only access to those data which is assigned for data labeling. Data Labeling Job Reviewer Have read only access to those data which is assigned for review Data scientists Have access via a secure API to the labeled dataset which they require.
  • #24: Audit Log or audit trail Audit trail also can be referred to as an audit log in data labeling process is another important security requirement. With an audit log record one can gain insight on who did what and when on activity of data labeling like accessing the data to be labeled or download / viewing the labeled dataset or access to outsource third party data labeling agencies. Audit trails are also important in terms of attaining Organization’s data security needs and industry compliance.
  • #25: TODO ⊚ Encryption helps in protecting private information and sensitive data, like corporate secrets, medical records, government classified information etc. ⊚ Enhances the security of communication between client apps and servers. Data at rest and data in motion TODO Advanced Encryption Standard AES is used worldwide. store or send sensitive data online. Using encryption via SSL which is a form of encrypting data when it is being sent to and from a labeling platform. This keeps data secure while it is in transit.
  • #26: A firewall system to prevent unauthorised access to or from a private computer network. Provides security boundaries between network devices and untrusted access from the Internet, hence securing your data from malicious attacks.
  • #27: How
  • #29: 5 minutes intro - 10 industry awareness - 15 min demo - 20 minutes QnA Define problem - Features model - How this model is built using skyl.ai
  • #30: TODO
  • #31: Thank you Mohit and Shruti, for the wonderful presentation and demo. I’d like to mention that Skyl.ai is dedicated to helping people with their Machine Learning journey by offering consulting services. Services such as: AI Adoption Assessment, Skyl will help find key areas in your organisation where AI is beneficial. AI Systems Integration, Skyl will help find the best ways to integrate AI models with your current software systems AI Performance Evaluation, Skyl will assess your AI workflow and help find ways to improve your AI system’s performance And AI-Enabled Software Development, The team at Skyl can develop highly customized, AI-enabled software solutions catered towards your organisation’s needs. If you’d like to find out more, please check out the skyl.ai website or you can send an email directly to [email protected].
  • #32: Skyl also has special offers for those of you that are curious about incorporating Machine Learning to your business. Skyl offers a free 1 month trial, plus Proof of Concept. You’ll be able to interact with real data on the screen, just like we showed in the demo. You’ll experience the process of going from collecting & labeling the data… all the way to deploying a model! Skyl also offers a complimentary 30 min consultation and an AI Implementation Playbook to go along. This is a great opportunity to see how Skyl can provide Machine Learning solutions to your challenges.
  • #33: Alright, now it’s Q&A time! As a reminder, if you have any questions, go to the question box in your control panel - located on the bottom of your Zoom screen. We’ll try to answer as many questions as possible in the time that we have left. So let’s answer some questions. Sample questions: Shruti If I do not want to use cloud services for security purposes, how can Labelwise help me out for my data labelling needs? Bikash/Mohit Would Labelwise also be providing the labelling workforce for data labelling or does that need to be taken care of by the users/ customers? Can I label unlimited data in labelwise? Ok, that’s all the time we have for questions today, but feel free to contact us with your specific questions and we’ll make sure to get them answered.
  • #34: All right, so we have reached the end of the webinar. We hope you enjoyed it. We have a lot more webinars coming up on different machine learning topics and how they can be implemented into different businesses and industries, So don’t miss out and make sure you sign up for upcoming webinars as well Thank you for joining and I hope you have a wonderful day.