How to do Secure Data Labeling
for Machine Learning
Extensive experience building future tech products using
Machine Learning and Artificial Intelligence.
Areas of expertise includes Deep Learning, Data Analysis,
full stack development and building world class products
in ecommerce, travel and healthcare sector.
Shruti Tanwar
Lead - Data Science
The Speaker
Solutions Consultant with experience working at the
forefront of cutting-edge technology and leading
innovative projects.
Areas of expertise include solutions analysis and design.
Fahid Basheer
Solutions Consultant
The Speaker
Bikash Sharma
CTO and Co-founder at
Skyl.ai
CTO & Software Architect with 15 years of experience
working at the forefront of cutting-edge technology
leading innovative projects
Areas of expertise include Architecture design, rapid
product development, Deep Learning and Data Analysis
The Panelist
Getting familiar with ‘Zoom’
All dial-in participants will be muted to enable the presenters
to speak without interruption
Questions can be submitted via Zoom Questions chat
window and will be addressed at the end during Q&A
The recording will be emailed to you after the webinar
Please familiarize yourself with the Zoom ‘Control Panel’ on your screen
Live Demo of
Secure Data
Labeling Platform
Deep Dive into the
Data Labeling
Process
1 2
...In the next 45 minutes
Machine Learning automation platform for unstructured data
A quick intro about Skyl.ai
Guided Machine Learning Workflow
Build & deploy ML models faster on
unstructured data
Collaborative Data Collection & Labeling
Easy-to-use & scalable AI SaaS platform
POLL #1
At what stage of Machine learning adoption is your
organization?
⊚ Exploring - Curious about it
⊚ Planning - Creating AI/ML strategy
⊚ Experimenting - Building proof of concepts
⊚ Scaling up - Some departments are using it
⊚ In production - Using it in product features
⊚ Transforming - AI/Ml driven business
Deep Dive:
Data Labeling Process
01
What is Data Labeling?
Data labeling, also called data annotation/tagging, is the process of
preparing labeled datasets for machine learning.
Images Data labeling
Image Classification
ML Model
Examples of Data labeling
Computer Vision - Image Classification
Computer Vision - Object Detection
NLP - Text Extraction (NER)
By Collaborator (Human-in-loop)
● In-house employee- Assigning tasks to an in-house labeling team /
employees of the organization.
● Hire data labeling companies.
Automated data labeling
● Data labeling through machine learning algorithms
● Reduces the number of labeling tasks in the data labeling process
● Speed up the labeling process
Types of Data Labeling
3 Aspects for Building Quality Labeled Dataset
Right team to
carry out the data
labeling project
Right data
labeling process &
workflow
Right data
labeling tools in
place
⊚ Conducting mock data labeling test
⊚ Measuring data labeling consistency
⊚ Auditing (QC) of Labeled dataset periodically as it gets labeled
Best Practices to ensure Quality Labeled Dataset
Labeling Quality: Conducting Mock Data Labeling Test
Qualify the right collaborator for your data labeling job
Labeling Quality : Measuring Data Labeling Consistency
Negative sentiment
Neutral sentiment
Positive sentiment
Measuring how consistently collaborator agree with each other
Labeling Quality : Review of Labeled Dataset
Reviewing the labeled dataset by flagging out the bad labeled data
⊚ Access Control
⊚ Audit Log
⊚ Data Encryption
⊚ Data sources behind firewalls
Data Security
Data Security : Access Control
Data
scientist
Project Lead /
Data Manager
Data Labeler
(Collaborator)
Data Labeling
Job Reviewer
Having right access control throughout the data labeling process
Data Security : Audit Log
Gain insights into user activities for attaining organization and compliance needs
Data Security : Encryption
Encrypted
Data at rest Data in useData in motion
TLS/SSL
Securing data assets while in rest, motion and use
Data Labeling Tool
Data Security : Firewall
Having private network restriction to data by using on-prem data labeling solution
Private network Public Network
Demo of how to perform
secure data labeling02
Skyl Labelwise: Data Labeling Process
Demo of how to perform secure
Data Labeling
Skyl.ai Labelwise
Guided
Workflow
Data labeling solution for computer
vision & NLP
Quality
Labeled dataset
Right process and metrics in place to ensure quality
data labeling
Effective
Collaboration
Collaborate and manage data labeling
projects efficiently
Early
Visibility
Get early visibility; visualize and affirm correctness
on every step of the way
Scalable
High - Performance
Access on-demand and scalable, high-performance
infrastructure
Security
& Compliance
Access control, data encryption, audit log and
on-prem solution
We can help you with...
⊚ AI Adoption Assessment
⊚ AI Systems Integration
⊚ AI Performance Evaluation
⊚ AI-Enabled Software Development
Our AI Consulting Services
www.skyl.ai contact@skyl.ai
⊚ Free 1 month Trial + POC
⊚ Complimentary 30 min consultation
⊚ AI Implementation Playbook
www.skyl.ai contact@skyl.ai
Special offer for you...
Questions?
?
We hope to hear from you soon
Thank you for joining!
85 Broad Street, New York, NY, 10004
+1 718 300 2104, +1 646 202 9343
contact@skyl.ai

More Related Content

PPTX
#ATAGTR2019 Presentation "Security testing using ML(Machine learning), AI(Art...
PPTX
AI and Security
PPTX
How to analyze text data with Named Entity Recognition
PDF
AI & ML in Cyber Security - Why Algorithms Are Dangerous
PPTX
AI In Cybersecurity – Challenges and Solutions
PDF
Practical Applications of Machine Learning in Cybersecurity
PPTX
AI and ML in Cybersecurity
PDF
Artificial Intelligence – Time Bomb or The Promised Land?
#ATAGTR2019 Presentation "Security testing using ML(Machine learning), AI(Art...
AI and Security
How to analyze text data with Named Entity Recognition
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI In Cybersecurity – Challenges and Solutions
Practical Applications of Machine Learning in Cybersecurity
AI and ML in Cybersecurity
Artificial Intelligence – Time Bomb or The Promised Land?

What's hot (20)

PDF
Data Analytics in Real World
PPTX
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
PDF
"Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Found...
PDF
Cybersecurity with AI - Ashrith Barthur
PPTX
AI cybersecurity
PDF
AI & ML in Cyber Security - Why Algorithms are Dangerous
PPTX
High time to add machine learning to your information security stack
PPTX
Plenary Keynote Intro at Bio IT World West - Diane Burley, Lucidworks VP Content
PDF
Challenges in Applying AI to Enterprise Cybersecurity
PPT
Footprintig(Haching)
PDF
Security Analytics: The Promise of Artificial Intelligence, Machine Learning,...
PPTX
SRE[in]con 2019
PDF
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
PDF
The good, the bad, and the ugly on integration ai with cybersecurity
PDF
resume4
PPTX
Test strategy for Conversational AI
PPTX
The Machine Learning Audit
PPTX
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
DOCX
Resume ashay
Data Analytics in Real World
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
"Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Found...
Cybersecurity with AI - Ashrith Barthur
AI cybersecurity
AI & ML in Cyber Security - Why Algorithms are Dangerous
High time to add machine learning to your information security stack
Plenary Keynote Intro at Bio IT World West - Diane Burley, Lucidworks VP Content
Challenges in Applying AI to Enterprise Cybersecurity
Footprintig(Haching)
Security Analytics: The Promise of Artificial Intelligence, Machine Learning,...
SRE[in]con 2019
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
The good, the bad, and the ugly on integration ai with cybersecurity
resume4
Test strategy for Conversational AI
The Machine Learning Audit
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Resume ashay
Ad

Similar to How to do Secure Data Labeling for Machine Learning (20)

PPTX
How to perform Secure Data Labeling for Machine Learning
PPTX
How to do Secure Data Labeling for Machine Learning
PPTX
How to classify documents automatically using NLP
PPTX
How to analyze text data for AI and ML with Named Entity Recognition
PPTX
Twitter Sentiment Analysis in 10 Minutes using Machine Learning
PDF
Bridging the Gap: Analyzing Data in and Below the Cloud
PPTX
AI in Quality Control: How to do visual inspection with AI
PPTX
BREACHED: Data Centric Security for SAP
PPTX
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
PDF
Microsoft 365 | Modern workplace
PPTX
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
PDF
TrustArc Webinar - Unlocking AI Potential: Leveraging PIA Processes for Compr...
PDF
Data Platform at Liv Up
PPTX
Secure and manage your data while collaborating with Microsoft Teams.pptx
PDF
Andy Malone - Microsoft office 365 security deep dive
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
PPTX
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
PPTX
How an AI-backed recommendation system can help increase revenue for your onl...
PPTX
Securing SharePoint, OneDrive, & Teams with Sensitivity Labels
PPTX
Transform Banking with Big Data and Automated Machine Learning 9.12.17
How to perform Secure Data Labeling for Machine Learning
How to do Secure Data Labeling for Machine Learning
How to classify documents automatically using NLP
How to analyze text data for AI and ML with Named Entity Recognition
Twitter Sentiment Analysis in 10 Minutes using Machine Learning
Bridging the Gap: Analyzing Data in and Below the Cloud
AI in Quality Control: How to do visual inspection with AI
BREACHED: Data Centric Security for SAP
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
Microsoft 365 | Modern workplace
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
TrustArc Webinar - Unlocking AI Potential: Leveraging PIA Processes for Compr...
Data Platform at Liv Up
Secure and manage your data while collaborating with Microsoft Teams.pptx
Andy Malone - Microsoft office 365 security deep dive
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
How an AI-backed recommendation system can help increase revenue for your onl...
Securing SharePoint, OneDrive, & Teams with Sensitivity Labels
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Ad

More from Skyl.ai (19)

PPTX
AI in Quality Control: How to perform Visual Inspection with AI
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
PPTX
AI in Insurance: How to Automate Insurance Claims Processing with Machine Lea...
PPTX
Solving the dilemma should you build or buy ai
PPTX
How AI and Machine Learning can Transform Organizations
PPTX
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
PPTX
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?
PPTX
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
PPTX
AI in Insurance: How to Automate Insurance Claim Processing with Machine Lear...
PPTX
How AI is Changing Medical Imaging in the Healthcare Industry
PPTX
Twitter Sentiment Analysis in 10 Minutes Using Machine Learning
PPTX
How to Build an AI-powered Automatic Document Classification Model
PPTX
How to Implement Biomedical Named Entity Recognition with Machine Learning
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
PDF
Ai in insurance how to automate insurance claim processing with machine lear...
PPTX
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
PPTX
AI Recruitment - How Businesses Are Winning the Race for the Talent
PPTX
AI in Health Care: How to Implement Medical Imaging using Machine Learning?
PDF
Guide to end end machine learning projects
AI in Quality Control: How to perform Visual Inspection with AI
No Code AI - How to Deploy Machine Learning Models with Zero Code?
AI in Insurance: How to Automate Insurance Claims Processing with Machine Lea...
Solving the dilemma should you build or buy ai
How AI and Machine Learning can Transform Organizations
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
AI in Insurance: How to Automate Insurance Claim Processing with Machine Lear...
How AI is Changing Medical Imaging in the Healthcare Industry
Twitter Sentiment Analysis in 10 Minutes Using Machine Learning
How to Build an AI-powered Automatic Document Classification Model
How to Implement Biomedical Named Entity Recognition with Machine Learning
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Ai in insurance how to automate insurance claim processing with machine lear...
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
AI Recruitment - How Businesses Are Winning the Race for the Talent
AI in Health Care: How to Implement Medical Imaging using Machine Learning?
Guide to end end machine learning projects

Recently uploaded (20)

PDF
Technical Debt in the AI Coding Era - By Antonio Bianco
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PDF
Secure Java Applications against Quantum Threats
PDF
TicketRoot: Event Tech Solutions Deck 2025
PDF
substrate PowerPoint Presentation basic one
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PDF
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
PDF
Examining Bias in AI Generated News Content.pdf
PPTX
maintenance powerrpoint for adaprive and preventive
PPTX
Information-Technology-in-Human-Society.pptx
PPTX
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
PPTX
How to use fields_get method in Odoo 18
PPTX
From XAI to XEE through Influence and Provenance.Controlling model fairness o...
PPT
Overviiew on Intellectual property right
PPTX
Information-Technology-in-Human-Society (2).pptx
PDF
Domain-specific knowledge and context in large language models: challenges, c...
PPTX
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
PDF
Introduction to c language from lecture slides
PDF
State of AI in Business 2025 - MIT NANDA
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
Technical Debt in the AI Coding Era - By Antonio Bianco
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
Secure Java Applications against Quantum Threats
TicketRoot: Event Tech Solutions Deck 2025
substrate PowerPoint Presentation basic one
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
Examining Bias in AI Generated News Content.pdf
maintenance powerrpoint for adaprive and preventive
Information-Technology-in-Human-Society.pptx
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
How to use fields_get method in Odoo 18
From XAI to XEE through Influence and Provenance.Controlling model fairness o...
Overviiew on Intellectual property right
Information-Technology-in-Human-Society (2).pptx
Domain-specific knowledge and context in large language models: challenges, c...
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
Introduction to c language from lecture slides
State of AI in Business 2025 - MIT NANDA
EIS-Webinar-Regulated-Industries-2025-08.pdf

How to do Secure Data Labeling for Machine Learning

  • 1. How to do Secure Data Labeling for Machine Learning
  • 2. Extensive experience building future tech products using Machine Learning and Artificial Intelligence. Areas of expertise includes Deep Learning, Data Analysis, full stack development and building world class products in ecommerce, travel and healthcare sector. Shruti Tanwar Lead - Data Science The Speaker
  • 3. Solutions Consultant with experience working at the forefront of cutting-edge technology and leading innovative projects. Areas of expertise include solutions analysis and design. Fahid Basheer Solutions Consultant The Speaker
  • 4. Bikash Sharma CTO and Co-founder at Skyl.ai CTO & Software Architect with 15 years of experience working at the forefront of cutting-edge technology leading innovative projects Areas of expertise include Architecture design, rapid product development, Deep Learning and Data Analysis The Panelist
  • 5. Getting familiar with ‘Zoom’ All dial-in participants will be muted to enable the presenters to speak without interruption Questions can be submitted via Zoom Questions chat window and will be addressed at the end during Q&A The recording will be emailed to you after the webinar Please familiarize yourself with the Zoom ‘Control Panel’ on your screen
  • 6. Live Demo of Secure Data Labeling Platform Deep Dive into the Data Labeling Process 1 2 ...In the next 45 minutes
  • 7. Machine Learning automation platform for unstructured data A quick intro about Skyl.ai Guided Machine Learning Workflow Build & deploy ML models faster on unstructured data Collaborative Data Collection & Labeling Easy-to-use & scalable AI SaaS platform
  • 8. POLL #1 At what stage of Machine learning adoption is your organization? ⊚ Exploring - Curious about it ⊚ Planning - Creating AI/ML strategy ⊚ Experimenting - Building proof of concepts ⊚ Scaling up - Some departments are using it ⊚ In production - Using it in product features ⊚ Transforming - AI/Ml driven business
  • 10. What is Data Labeling? Data labeling, also called data annotation/tagging, is the process of preparing labeled datasets for machine learning. Images Data labeling Image Classification ML Model
  • 11. Examples of Data labeling
  • 12. Computer Vision - Image Classification
  • 13. Computer Vision - Object Detection
  • 14. NLP - Text Extraction (NER)
  • 15. By Collaborator (Human-in-loop) ● In-house employee- Assigning tasks to an in-house labeling team / employees of the organization. ● Hire data labeling companies. Automated data labeling ● Data labeling through machine learning algorithms ● Reduces the number of labeling tasks in the data labeling process ● Speed up the labeling process Types of Data Labeling
  • 16. 3 Aspects for Building Quality Labeled Dataset Right team to carry out the data labeling project Right data labeling process & workflow Right data labeling tools in place
  • 17. ⊚ Conducting mock data labeling test ⊚ Measuring data labeling consistency ⊚ Auditing (QC) of Labeled dataset periodically as it gets labeled Best Practices to ensure Quality Labeled Dataset
  • 18. Labeling Quality: Conducting Mock Data Labeling Test Qualify the right collaborator for your data labeling job
  • 19. Labeling Quality : Measuring Data Labeling Consistency Negative sentiment Neutral sentiment Positive sentiment Measuring how consistently collaborator agree with each other
  • 20. Labeling Quality : Review of Labeled Dataset Reviewing the labeled dataset by flagging out the bad labeled data
  • 21. ⊚ Access Control ⊚ Audit Log ⊚ Data Encryption ⊚ Data sources behind firewalls Data Security
  • 22. Data Security : Access Control Data scientist Project Lead / Data Manager Data Labeler (Collaborator) Data Labeling Job Reviewer Having right access control throughout the data labeling process
  • 23. Data Security : Audit Log Gain insights into user activities for attaining organization and compliance needs
  • 24. Data Security : Encryption Encrypted Data at rest Data in useData in motion TLS/SSL Securing data assets while in rest, motion and use Data Labeling Tool
  • 25. Data Security : Firewall Having private network restriction to data by using on-prem data labeling solution Private network Public Network
  • 26. Demo of how to perform secure data labeling02
  • 27. Skyl Labelwise: Data Labeling Process
  • 28. Demo of how to perform secure Data Labeling
  • 29. Skyl.ai Labelwise Guided Workflow Data labeling solution for computer vision & NLP Quality Labeled dataset Right process and metrics in place to ensure quality data labeling Effective Collaboration Collaborate and manage data labeling projects efficiently Early Visibility Get early visibility; visualize and affirm correctness on every step of the way Scalable High - Performance Access on-demand and scalable, high-performance infrastructure Security & Compliance Access control, data encryption, audit log and on-prem solution
  • 30. We can help you with... ⊚ AI Adoption Assessment ⊚ AI Systems Integration ⊚ AI Performance Evaluation ⊚ AI-Enabled Software Development Our AI Consulting Services www.skyl.ai [email protected]
  • 31. ⊚ Free 1 month Trial + POC ⊚ Complimentary 30 min consultation ⊚ AI Implementation Playbook www.skyl.ai [email protected] Special offer for you...
  • 33. We hope to hear from you soon Thank you for joining! 85 Broad Street, New York, NY, 10004 +1 718 300 2104, +1 646 202 9343 [email protected]

Editor's Notes

  • #2: Hello everyone and welcome. Thank you for joining today’s webinar on How to do Secure Data Labeling for Machine Learning. My name is Edwin Martinez and I’ll be your host today. First off, I’d like to introduce 3 expert speakers for today’s webinar..
  • #3: First we have Shruti Tanwar - Shruti is an expert in data science who is a veteran in building SaaS products using Machine Learning and AI. Her expertise includes Deep Learning and Data Analysis, as well as full stack development and building tech products in various different fields such as ecommerce, travel, and healthcare. Welcome, Shruti!
  • #4: Next we have Fahid Basheer. Fahid is a Solutions Consultant with experience working at the forefront of cutting-edge technology and leading innovative projects. His areas of expertise include solutions analysis and design. Welcome Fahid.
  • #5: Finally, we have Bikash Sharma, joining us as a panelist. Bikash is CTO and Software Architect with 15 years of experience in leading innovative software projects and solutions. He’s co-founded Skyl with his expert knowledge in AI and Machine Learning. Welcome, Bikash!
  • #6: Before we begin, I’d like to briefly talk about some Zoom features that will be relevant to us. All participants in the webinar will be muted to avoid any interruptions during the session. Any questions you might have can be submitted to the Zoom Questions chat window in the control panel, located on the bottom of the screen. We’ll make sure to address your questions during the Q&A session. Also, the recording of the webinar will be emailed to you afterwards, just in case you’ve missed any talking points or wish to view it again. So that’s all for the introduction - now we’ll get started with the webinar and I’ll hand over the session to Fahid
  • #7: Thank You for the introductions Edwin and welcome everyone, my name is Fahid, I work as a Solutions Consultant at Skyl.ai and I'll be one of the presenters for you today Now without further ado, Lets take a look at what we are going to cover in the next 45 minutes, So the first part of this webinar will be presented by me and it will be about the Data Labeling Process for Machine Learning projects, and we will first be taking a look at what Data Labeling means for these projects, the types of Data Labeling processes and some examples for different image and text based solutions, and a few best practices on how to maintain Quality Labeled Datasets. We will also be covering a very crucial part of Data Labeling, which is maintaining Data Security during these Data Labeling processes, so you can leave this webinar with a very comprehensive understanding of Data Labeling Management for your Machine Learning Projects. And in the second section of this webinar we will have Shruti, our resident data scientist, demonstrating for you live, how to perform data labeling and build these high quality datasets using a secure data labeling platform like Skyl.ai, so we have that in store for you as well. Like Edwin mentioned earlier, we will have a QnA session at the end of the webinar, so you don't have to worry if you have any questions about the sections we cover in the webinar, we will address your questions at that time.
  • #8: Let me start with a quick intro about the Skyl.ai platform and its capabilities Now the Skyl.ai platform is a Machine Learning automation platform that works with unstructured data, and this data can includes text based data, images or audio data etc. And using Skyl.ai’s platform businesses can build and deploy high quality NLP, Computer Vision models in hours rather than days or weeks. So how exactly does Skyl.ai do that? Well, Skyl.ai provides an easy to use unified platform for the entire machine learning workflow which includes data collection, data labeling, feature engineering, training the Machine Learning model by choosing out of the box algorithms at scale, and once model is trained, carrying out model evaluation and finally one click deployment and monitoring the model in production. So with Skyl.ai Platform you can basically. Manage all of your ML projects in one place. And allows you to take your AI experiments to production in no time with scale and leads to faster model release iteration cycles. The best part doing all this with no infrastructure or MLops effort required, the platform takes care of your infrastructure needs. Let me start with a quick intro about Skyl.ai and its capabilities. Skyl.ai is a ML automation platform for unstructured data which includes text, images, audio etc. Using Skyl.ai business can build and deploy high quality NLP, Computer Vision models in hours rather than days or weeks. So how does Skyl do that? Skyl.ai provides an easy to use unified platform for the entire machine learning workflow which includes data collection, data labeling, feature engineering, training the model by choosing out of the box algorithms at scale, once model is trained, carrying out model evaluation and finally one click deployment and monitoring the model in production. So with Skyl.ai Platform you can basically. Manage your ML projects in one place. And allows you to take your AI experiments to production in no time with scale and leads to faster model release iteration cycles. The best part doing all this with no infrastructure or MLops effort required.
  • #9: Now I'd like to launch a poll, and the poll will give us an idea about what stage of machine learning adoption is present in your, as in the attendees organization right now, so please go ahead and select the appropriate option on the poll, pertaining to your organization. So I'm just launching the poll please go ahead and select the appropriate option. Im just waiting for a few more people if you could complete it in a few seconds before I close the poll that that would be great. okay I'm about to close the poll alright interesting so we have about one third of our attendees in the mid stage like they're experimenting and building proof of concepts which is amazing and followed by that we have about 22% of the our attendees are exploring or scaling up so they're kind of like a bow and below that level and we have about 11 percent of attendees having their models used in production so we have you know people at various stages and 11 percent of people are in the planning stage so we have more or less an equal distribution. And at each stage of machine learning adoption, you may different types of questions, maybe on data labeling and management, or taking ML projects to fulfillment, and we will be glad to answer those questions for you during the QnA session. Exploring - Curious about it Planning - Creating AI/ML strategy Experimenting - Building proof of concepts Scaling up - Some departments are using it In production - Using it in product features Transforming - AI/Ml driven business
  • #10: Alright now we approach the main parts of the webinar, A deep dive into the Data Labeling Process So, all Machine Learning problems start with data—preferably, lots of data for which you already know the ground truth or the target answer, and this type of data is what we called labeled data. Now, Supervised machine learning algorithms learn from this labeled dataset, (data that has been tagged with labels). Which means that, Programmers do not explicitly program machine learning algorithms on how to make decisions, they program the models to learn from these labeled dataset. NOW Often, this data is NOT readily available in a labeled form and Collecting and preparing these high-quality datasets is the most important step in solving an Machine Learning problem.
  • #11: Alright now we take a look at what Data Labeling is, So, Data labeling, which can be referred to as data annotation or tagging interchangeably, is the process of preparing a labelled data set And this data can be in the form of images, text could be audio data, and the output of the data labeling process is to have one or more tags or ground truth value, relating to the input data, so you can see on the screen here that the image of a tshirt here has been tagged with around six labels. Now, Machine learning models learn to recognize repetitive patterns in this data, as in, supervised machine learning algorithms learn from labeled dataset. (data that has been tagged with labels.) So, after a sufficient amount of labeled data is processed, machine learning models can identify the same patterns in data which has not been labeled, so you can understand that labeling the data for processing is the first step to a working machine learning model.
  • #12: Okay, now as I said earlier we will be taking a look at some examples of data labeling, with reference to the kind of problem we are trying to solve.
  • #13: So the first example is of a computer vision implementation, specifically in Image Classification. So what Image classification Machine Learning models do is that they classify or categorize images based on one or more attributes or labels that they can infer from said image. And For training such a computer vision based Machine Learning model we would require a dataset of images labeled with these attributes. So in this particular example we are trying to classify attributes of apparels from their image, and these attributes could be The Type of apparel whether its Top wear, bottom wear, or head wear Or the Base color of the apparel if its blue, green, blue , yellow, Or who the clothing is meant for like Men or Women And so forth. So as part of building this labeled dataset for model training, you would provide a series of such images to a data labeler or a collaborator who will then label these attributes out for each image.
  • #14: This is another example of computer vision ML model is of the Object Detection type, where we not only classify the attributes in an image but also pinpoint out the location of the attribute or object under a segmented area, referred to as a bounding box. In this example as shown in figure in order to build a model which can detect surgical equipments in a tray like mayo scissor, forceps, etc. we build a labeled dataset that has these attributes as well the location of said attribute labeled out in the form of that bounding box. And this process would again be done by a data labeler or workforce, after which the dataset can be used to train this particular ML model.
  • #15: Now this one here is a Natural Language Processing example, where we extract text data(location of the word) from a given sentence and tag that text under various categories. And this type of a model is referred to as NER / Named entity recognition model. So in this example we are labeling sentences, and these sentences are customer reviews, like the ones you find under a product sold by amazon, and we are trying to extract key attributes from these sentences, which could be things like the pros / cons of that product or mentions of other products. This labeled dataset can then be used to train a NER model which can extract these key insights about products from various other product reviews. So now that we have gone through these examples of data labeling examples, let now focus on : who does the data labeling and how is it done?
  • #16: Now there 2 types of data labeling, based on who does it and how it is done: So the first type is what we refer to as a collaborator based type of data labeling, By Collaborator And this is the simplest labeling approach, where a human is employed to do the data labeling. So you basically assign tasks to employees within your organization who are the subject matter experts and these experts would know how to label data and what exactly needs to be labeled. Or you could hire a data labeling company which would manage all aspects of a data labeling project and is usually paid on an hourly basis. So how this works is, you provide the collaborators with your raw unlabeled data like images or text and along with it a set of instructions on what and how the raw data needs to be labeled. Second approach is using automated data labeling In this process we automate the data labeling through machine learning algorithms So using either unsupervised learning, in which we cluster various categories of unlabeled data and then assign this to the human labeler to start validating these semi-labeled dataset. Or we use active learning where the algorithm learns as you start labeling and automatically labels the next image, where in a data labeler basically validates and modifies the annotation or labels. Examples of active learning could be labeling video frames, where a human labels the first few frames of video and then the AI system learning from these previous labeled frames and suggest labels for the upcoming frames Automated data labeling can be useful, particularly in instances where there is a significant amount of unlabeled data like for video frames, or for data that would be extremely expensive or time consuming to otherwise label. So an automated labeling system speeds up the process of labeling and human labelers basically validate or correct that labeled dataset accordingly.
  • #17: Great, now we understand what is data labeling and how it's done. Now let's try and understand 3 aspects of data labeling that you need to consider while taking up a data labeling project to build out a quality labeled dataset, which ultimately leads to a high quality ML Model. The first aspect is to have the right team in place to carry out the data labeling projects, which would involve data managers / project lead whose responsibility is make sure the data labeling projects run smoothly which includes having the right data sources in place, and having visibility of project progress. Then the data labelers who shall be responsible for carrying out the labeling tasks based on provided instructions. And Quality Control reviewers who would be responsible for reviewing the labeled dataset for quality control and making sure that the job carried out by the data labeler is as per instructions. Finally the data scientist or Machine Learning engineer who shall validate and consume these labeled datasets and build out the AI model. Second Aspect is having a data labeling process or workflow - which involves defining the data labeling tasks, and have the right checks in place so that you catch any errors or low quality of data labeled and flag them, so as to not compromise your dataset. And the final aspect is having the right data labeling tools in place, which is a software or labeling platform which your team shall use to configure the right data labeling workflow that suits your need. And your human collaborator or data labeling partner can effectively carry out data annotation or labeling in a secured environment. There would be mechanism to carry out QC activities with right quality metrics in place and also visibility around the progress of labeling tasks. Lastly the tool should provide easy and secured access to this labeled dataset to your data scientist who shall use it to train the ML model.
  • #18: Alright, now let's explore how we can ensure a labeled dataset is of high quality and what are some of the best practises are to do so. We shall go through 3 key practices which you must implement while carrying a data labeling project: First is conducting mock labelling tests to qualify right data labeler / collaborator. Second is measuring data labeling inconsistency to ensure labeling is more reliable and consistent. Third is quality review of labeled data periodically, as they gets labeled, so there is a scope for improvement in the future, if you find any anomalies. So lets take a closer look at these practices
  • #19: Okay first of all your models will be only as good as your labeled dataset so it's important to have the right collaborator / data labelers perform your data labeling process. Also key thing to learn and understand is when we are preparing the labeled dataset, the machine learning model shall pick the nuances of these collaborators, that is, how they perceive the data based on their age, gender, demographic and knowledge about the subject. Most of the time this is where a bias is created in the labeled dataset, which ultimately may lead to a biased model. So now the question comes, how do we qualify a data labeler? So you need try and have collaborators with diverse personalities, age, demography These collaborators need to understand the subject well and should have no bias towards it Now in order to know the capability of your data labelers and how well versed they are with the particular data labeling job which you are going to assign to them, as a best practise have a mock data labeling test where a set of labeling tasks is served to all these candidates. And you can then evaluate among to qualify them for the data labeling job. Data labeling mock tests will help to build qualified collaborators. whose judgement on data labeling will help to build high quality labeled dataset.
  • #20: Now the next best practise is measuring data labeling consistency - in simple terms measure how consistently collaborators agree with each other, As humans we may disagree with each other’s opinion, it's no different while performing data labeling task either, There could be tasks where collaborators may not label the data as the other collaborators have, this could be due to various reasons as mentioned earlier, it could difference in age, personality, demographic or knowledge on the subject. Consider this example, 3 collaborators have assigned the same task of tagging a sentiment of a tweet. And all 3 of them have tagged it differently, which shows 3 of them don't agree with each other. Now in a data labeling job we need to measure this degree of agreement and one of the metrics to do it is, IRR or the Inter-Rater Reliability score. The IRR metrics is calculated from having collaborators label the same data, measure how many are in agreement and assign a score to that group of collaborators, the IRR score. The higher the IRR score of your collaborators, the better your data labeling job will turn out to be.
  • #21: Another best practice for data labeling quality is reviewing the labeled dataset. Now we understand that, data labeling is the most time consuming and resource intensive part of ML development. And these labeled dataset shall be used to build out ML models by data scientists / ML engineers. So it is important to review the data in terms how well the data is annotated and if it is infact good to build out a model. Consider this example: where a data labeling job is carried out for detecting pedestrians from video frames, and if you look at this particular video frame the bounding box is not covering the complete area of the pedestrian. Now this may lead to a poor machine learning model which may inappropriately detect pedestrians. So it is important for ML engineers / data scientists to review the labeled dataset and it is always recommended to do so while the data is getting labeled and flag those labeled data which are inappropriate, and also provide feedback to the annotator for corrective measures.
  • #22: Alright now we approach a very important topic, at the data security aspects of Secure data Labeling. There are 4 points to consider in a data labeling process. 1 Access control - meaning regulating and controlling who has access to what aspects of data labeling. 2. Audit Log - which is understanding who did what and when for a data labeling project. 3. Data Encryption - which is securing the data when its in rest, in motion and in use 4. Data source behind firewall - in this case the raw data might be behind a firewall to adhere to a business compliance So lets take a quick look at each one one of these points
  • #23: First off Access Control, So what is access control, so its a security technique that regulates who or what can view or use resources in a given environment. So understandably, it is very important to set up right access control around a data labeling process, which only allows authenticated and authorised users to have access to data. Now different types of access can be set to different user profiles in a data labeling process: So a Project Lead / Data Manager Can have Access to setup data sources or data assets that requires data labeling, and have Access to the progress of a labeling job, (manage job) Data Labelers can have read only access to those data which is assigned for data labeling. Data Labeling Job Reviewers can also have read only access to those data which is assigned for review Data scientists can have access via a secure API to the labeled dataset which they require.
  • #24: Audit Log Audit logs, or they can be also referred to as an audit trails, is animportant security requirement in data labeling processes. With an audit log record you can gain insight on who did what and when on data labeling activities, like accessing the data to be labeled, or download / viewing the labeled dataset, or access the data to outsource it to third party data labeling agencies. Audit trails are also important in terms of attaining an organization’s data security or industry compliance needs
  • #25: Now Encryption of data, Like mentioned earlier, your data needs to be secured at all points in your labeling projects, when its at rest or being moved around to other platforms, and when its being used or transformed. Encryption helps in protecting private information and sensitive data, like corporate confidential information, medical records, government classified information etc. And it enhances the security of communication between client apps and servers. so this data is secured using Transport Layer Security protocols or TLS as well as Secure Socket Layer protocols or SSL. So there are Advanced Encryption Standards used worldwide, and when youre using a data labeling platform youve to ensure that your data is secured wether its at rest or in transition
  • #26: And the final security initiative is a firewall system to prevent unauthorised access to or from a private computer network. So its an extra security layer between your private network devices and untrusted access from the Internet, hence securing your data from malicious attacks. So in this instance, there may be some reservations in moving the data beyond these firewalls, in which case you would want to use a platform that provides data-labeling functionality on premise, meaning where the data resides. So now we have caught up with all of the data security aspects of data labeling.
  • #27: Okay that was the end of the first part of the webinar, thank you so much for listening to me, and now Shruti will present to you live demonstrations of how secure data labeling projects can be managed and executed. Thank you and over to you Shruti.
  • #29: 5 minutes intro - 10 industry awareness - 15 min demo - 20 minutes QnA Define problem - Features model - How this model is built using skyl.ai
  • #30: TODO
  • #31: Thank you Fahid and Shruti, for the wonderful presentation and demo. I’d like to mention that Skyl.ai is dedicated to helping people with their Machine Learning journey by offering consulting services. Services such as: AI Adoption Assessment, Skyl will help find key areas in your organisation where AI is beneficial. AI Systems Integration, Skyl will help find the best ways to integrate AI models with your current software systems AI Performance Evaluation, Skyl will assess your AI workflow and help find ways to improve your AI system’s performance And AI-Enabled Software Development, The team at Skyl can develop highly customized, AI-enabled software solutions catered towards your organisation’s needs. If you’d like to find out more, please check out the skyl.ai website or you can send an email directly to [email protected].
  • #32: Skyl also has special offers for those of you that are curious about incorporating Machine Learning to your business. Skyl offers a free 1 month trial, plus Proof of Concept. You’ll be able to interact with real data on the screen, just like we showed in the demo. You’ll experience the process of going from collecting & labeling the data… all the way to deploying a model! Skyl also offers a complimentary 30 min consultation and an AI Implementation Playbook to go along. This is a great opportunity to see how Skyl can provide Machine Learning solutions to your challenges.
  • #33: Alright, now it’s Q&A time! As a reminder, if you have any questions, go to the question box in your control panel - located on the bottom of your Zoom screen. We’ll try to answer as many questions as possible in the time that we have left. So let’s answer some questions. Sample questions: Fahid Ques: How do you price your product? Ans: So we price our labeling tool on a pay as use basis, so depending upon the size of your data that you want labeled out the price varies, but you can check out all our plans on the Skyl.ai/plans page on our website Shruti Would Labelwise also be providing the labelling workforce for data labelling or does that need to be taken care of by the users / customers? Can I label my data on premise? Ok, that’s all the time we have for questions today, but feel free to contact us with your specific questions and we’ll make sure to get them answered.
  • #34: All right, so we have reached the end of the webinar. We hope you enjoyed it. We have a lot more webinars coming up on different machine learning topics and how they can be implemented into different businesses and industries, So don’t miss out and make sure you sign up for upcoming webinars as well Thank you for joining and I hope you have a wonderful day.