SlideShare a Scribd company logo
1
How to classify documents
automatically using NLP
Technology leader with 20+ years expertise in Product
Development, Business strategy and Artificial Intelligence
acceleration. Active contributor in the New York AI
community
Extensively worked with global organizations in BFSI,
Healthcare, Insurance, Manufacturing, Retail and Ecommerce
to define and implement AI strategies
Nisha Shoukath
Co-founder,
People10 & Skyl.ai
The Speaker
Extensive experience building future tech products using
Machine Learning and Artificial Intelligence.
Areas of expertise includes Deep Learning, Data Analysis,
full stack development and building world class products in
ecommerce, travel and healthcare sector.
Shruti Tanwar
Lead - Data Science
The Speaker
Technology enthusiast with 13+ years of experience working in
the information technology and services industry. Leads cutting-
edge solutions for businesses using Machine Learning and
Artificial Intelligence.
Areas of expertise includes Architecture design, Solutioning,
Data Engineering and Deep Learning.Mohit Juneja
Solutions Architect
The Speaker
Bikash Sharma
CTO and Co-founder at
Skyl.ai
CTO & Software Architect with 15 years of experience
working at the forefront of cutting-edge technology leading
innovative projects
Areas of expertise include Architecture design, rapid
product development, Deep Learning and Data Analysis
The Panelist
Getting familiar with ‘Zoom’
All dial-in participants will be muted to enable the presenters to
speak without interruption
Questions can be submitted via Zoom Questions chat
window and will be addressed at the end during Q&A
The recording will be emailed to you after the webinar
Please familiarize yourself with the Zoom ‘Control Panel’ on your screen
Live demo of
News category
classification using
NLP
...In the next 45 minutes
Content
classification and
how businesses are
leveraging it
1 2 3
How to quickly
overcome the
challenges in building
ML models
Machine Learning automation platform for unstructured data
A quick intro about Skyl.ai
Guided Machine Learning Workflow
Build & deploy ML models faster on
unstructured data
Collaborative Data Collection & Labelling
Easy-to-use & scalable AI SaaS platform
POLL #1
At what stage of Machine learning adoption your
organization is at?
⊚ Exploring - Curious about it
⊚ Planning - Creating AI/ML strategy
⊚ Experimenting - Building proof of concepts
⊚ Scaling up - Some departments are using it
⊚ In production - Using it in product features
⊚ Transforming - AI/Ml driven business
Content Classification
& its applications01
Text Classification is the activity of labeling natural
language texts with relevant categories from a
predefined set.
Text Classification
Types of text classification
Text analysis can be performed at:
⊚ Document level - obtain relevant information for a full document.
⊚ Paragraph level - extract the most important categories of a paragraph
⊚ Sentence level - get relevant information from single sentence
⊚ Sub-sentence level - obtains relevant information of sub-expressions
within sentences
Document Classification
Why do we need it?
⊚ Extracting insights from large volume of unstructured data like
articles, survey responses, or support tickets is complex
⊚ Time consuming manual classification method
⊚ Traditional rule based system can’t handle changing data
Business applications of
Content Classification
Legal Document Discovery
Find the relevant documents in the fastest way possible
⊚ File Type Identification
⊚ Sensitive Data Tagging
⊚ Identify the language
Source - Case Central
Laura
Amy
Sam
Jessika
Enabling Customer Support
Email & query management
⊚ Scan and redirect Emails to
the right office/department
⊚ Filter spam
⊚ Identify customer issues with
social listening and ticketing
Online Content Classification
Provide better user experience
⊚ Tagging content/news or
products using categories as
a way to improve browsing
⊚ Identify related content on
website Business Sports Entertainment Politics Technology
News
Live Demo of news
category classification02
Approach
Live Demo of news category
classification
Skyl.ai - as ML automation platform
Efficient
Data Management
Solve your data issues; collect and manage data
efficiently
Accuracy
& Quality
Maintain accuracy and quality; train and test faster;
monitor quality
Effective
Collaboration
Collaborate and manage projects efficiently
Early
Visibility
Get early visibility; visualize and affirm correctness on
every step of the way
Scalable
High - Performance
Access on-demand and scalable, high-performance
infrastructure
Reduce
Cost
Reduce cost of implementation; do it with less
specialized resources
POLL #2
Some challenges that you are facing while
implementing AI & Machine Learning
⊚ Not started yet, so no challenges
⊚ Data collection
⊚ Data Labeling
⊚ Large volumes of data
⊚ Identifying the right data set to
train
⊚ Data Security
⊚ Lack of knowledge of ML tools
⊚ Lack of end to end platform
⊚ Lack of expertise
⊚ Choosing the right algorithms
Overcoming the AI / ML
challenges with the right
tools and technologies03
Best Practices for Data Collection
⊚ Use relevant data sources
for data collection
⊚ Establish proper data
collection mechanisms
⊚ Do not stop with too-small
data sample size
Data Collection
Data Quality
Data SecurityData Security
Data SecurityData Labeling
Best Practices for Data Quality
⊚ Do validate your data and data
sources
⊚ Clean up your data regularly -
“garbage out”
⊚ Data correction - remove duplicates,
missing data, etc
⊚ Check the consistency of data while
data acquisition
Data Collection
Data Quality
Data SecurityData Security
Data SecurityData Labeling
Best Practices for Data Security
⊚ Monitor data processes
continuously to mitigate risks
⊚ Increase data security with encryption
and tokenization
⊚ Controlled access flows with
different organizational roles
Data Quality
Data Collection
Data SecurityData Security
Data SecurityData Labeling
Best Practices for Data Labeling
⊚ Define the problem you want to solve and
use relevant labels inline with the
entities you want to predict
⊚ Analyse trends and progress of your
data labeling in real time - to find biases
⊚ Do not add new entity types midway
⊚ Use short tag lists and annotationsData Labeling
Data Quality
Data SecurityData Security
Data Collection
Challenges
⊚ Requisite Infrastructure
⊚ Cost of Infrastructure
⊚ Data and ML pipeline
⊚ Model at scale for
inference
Best Practices
⊚ Use SaaS Model (Pay as you go) -
reliable, scalable and secure
⊚ The right software tuned and
optimized to fit the underlying hardware
⊚ A flexible infrastructure that can be
deployed in the cloud or in an on-premise
data center to optimize performance
Technology issues and solutions
Best Practices
⊚ Train existing employees with
education related to AI and ML
⊚ Use Saas products with good
documentation, support and
implementation that alleviates the
need to have highly skilled data scientists
and resources with multiple skills.
40%
Lack of skilled talent
Source: Techrepublic
Barrier in adopting AI
⊚ Companies face
shortage of necessary in-
house talent.
Specialized skills and knowledge
Challenges
⊚ Long implementation time
⊚ Measure ROI of the AI
deployment
Best Practices
⊚ AI implementation results in
increased process efficiency and
automation.
⊚ Create own AI KPIs and analyze
the difference in the
measurements before and after AI
deployment.
TechRepublic claims that 56% of
global CEOs expect it to take 3-5
years to see any real ROI on their
AI investment.
Speed and time to market
Collect
Feedback
Monitor the
model
Process
Feedback
Deploy the
changes
Train and
Evaluate
Continuous
Improvement
Best Practices
⊚ Perform incremental and
measurable improvements
⊚ Monitor your deployed models
and analyse inference count,
accuracy and execution time.
⊚ Check model performance in
real time
Monitoring and continuous improvement
AI Project Management
More Challenges and
Concerns
⊚ Project Cost
⊚ Return on Investment
⊚ On-demand scalability
⊚ Iterative corrections in
AI project
Source: AI for People and Business: A Framework for Better Human Experiences and Business Success
DATA
Time Cost
Performance Requirements
The TCPR Model
Skyl.ai - as ML automation platform
Efficient
Data Management
Solve your data issues; collect and manage data
efficiently
Accuracy
& Quality
Maintain accuracy and quality; train and test faster;
monitor quality
Effective
Collaboration
Collaborate and manage projects efficiently
Early
Visibility
Get early visibility; visualize and affirm correctness on
every step of the way
Scalable
High - Performance
Access on-demand and scalable, high-performance
infrastructure
Reduce
Cost
Reduce cost of implementation; do it with less
specialized resources
⊚ Free 1 month Trial + POC
⊚ Complimentary 30 min consultation
⊚ AI Implementation Playbook
www.skyl.ai contact@skyl.ai
Special offer for you...
? Questions?
36
85 Broad Street, New York, NY, 10004
+1 718 300 2104, +1 646 202 9343
contact@skyl.ai
We hope to hear from you soon
Thank you for joining!

More Related Content

What's hot (20)

PDF
Analytics - Trends and Prospects
Dr. Umesh Rao.Hodeghatta
 
PDF
Intro of Key Features of SoftCAAT Ent SQL Software
rafeq
 
PDF
IBM's Business Analytics Portfolio for Training Purposes
Natalija Pavic
 
PPTX
Chanchal Chatterjee PARTNERS 2017 Oct24
Chanchal Chatterjee
 
PDF
Minimize Your Client's Risk: From IP to Cash Flow
Traklight.com
 
PPTX
Do You Trust Your Machine Learning Outcomes?
Precisely
 
PPTX
Designing High Quality Data Driven Solutions 110520
MariaHalstead1
 
PPTX
Using People Analytics for a Sustainable Remote Workforce
Harbinger Systems - HRTech Builder of Choice
 
PPTX
The Path to Data and Analytics Modernization
Analytics8
 
PDF
Intro of Key Features of Soft CAAT Ent Software
rafeq
 
PPTX
How to Power Your HR Apps With AI And Make It Explainable
Harbinger Systems - HRTech Builder of Choice
 
PPT
Get your data analytics strategy right!
SPAN Infotech (India) Pvt Ltd
 
PPTX
Business Partner Product Enablement Roadmap, IBM Predictive Analytics
Arrow ECS UK
 
PPTX
Finding Meaning in the Numbers: Tools for Data Analysis & Dashboards
TechSoup Canada
 
PDF
Intro of Key Features of S-CAAT
rafeq
 
PDF
1140 track 1 weiss_using his mac
Rising Media, Inc.
 
PPTX
MLOps - Getting Machine Learning Into Production
Michael Pearce
 
PPTX
Project management for Big Data projects
Sandeep Kumar, PMP®
 
PPTX
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera, Inc.
 
PPTX
How to build a data analytics strategy in a digital world
CaseWare IDEA
 
Analytics - Trends and Prospects
Dr. Umesh Rao.Hodeghatta
 
Intro of Key Features of SoftCAAT Ent SQL Software
rafeq
 
IBM's Business Analytics Portfolio for Training Purposes
Natalija Pavic
 
Chanchal Chatterjee PARTNERS 2017 Oct24
Chanchal Chatterjee
 
Minimize Your Client's Risk: From IP to Cash Flow
Traklight.com
 
Do You Trust Your Machine Learning Outcomes?
Precisely
 
Designing High Quality Data Driven Solutions 110520
MariaHalstead1
 
Using People Analytics for a Sustainable Remote Workforce
Harbinger Systems - HRTech Builder of Choice
 
The Path to Data and Analytics Modernization
Analytics8
 
Intro of Key Features of Soft CAAT Ent Software
rafeq
 
How to Power Your HR Apps With AI And Make It Explainable
Harbinger Systems - HRTech Builder of Choice
 
Get your data analytics strategy right!
SPAN Infotech (India) Pvt Ltd
 
Business Partner Product Enablement Roadmap, IBM Predictive Analytics
Arrow ECS UK
 
Finding Meaning in the Numbers: Tools for Data Analysis & Dashboards
TechSoup Canada
 
Intro of Key Features of S-CAAT
rafeq
 
1140 track 1 weiss_using his mac
Rising Media, Inc.
 
MLOps - Getting Machine Learning Into Production
Michael Pearce
 
Project management for Big Data projects
Sandeep Kumar, PMP®
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera, Inc.
 
How to build a data analytics strategy in a digital world
CaseWare IDEA
 

Similar to How to classify documents automatically using NLP (20)

PPTX
Twitter Sentiment Analysis in 10 Minutes using Machine Learning
Skyl.ai
 
PPTX
How to analyze text data for AI and ML with Named Entity Recognition
Skyl.ai
 
PPTX
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
Skyl.ai
 
PPTX
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
Skyl.ai
 
PPTX
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
Skyl.ai
 
PPTX
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
Skyl.ai
 
PPTX
How to analyze text data with Named Entity Recognition
Skyl.ai
 
PPTX
AI in Quality Control: How to do visual inspection with AI
Skyl.ai
 
PDF
A Beginner's Guide to Business Analytics for business analytics assignment he...
Assignment World
 
PPTX
AI Recruitment - How Businesses Are Winning the Race for the Talent
Skyl.ai
 
PDF
Translating AI from Concept to Reality: Five Keys to Implementing AI for Know...
Enterprise Knowledge
 
PDF
The Five Pillars of AI Readiness Webinar
BrainSell Technologies
 
PPTX
An AI Maturity Roadmap for Becoming a Data-Driven Organization
David Solomon
 
PDF
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
PPTX
How to do Secure Data Labeling for Machine Learning
Skyl.ai
 
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
 
PPTX
Best Data Analytics Training in Hyderabad
pradeepghosh97
 
PPTX
How to do Secure Data Labeling for Machine Learning
Skyl.ai
 
PDF
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 
Twitter Sentiment Analysis in 10 Minutes using Machine Learning
Skyl.ai
 
How to analyze text data for AI and ML with Named Entity Recognition
Skyl.ai
 
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
Skyl.ai
 
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
Skyl.ai
 
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
Skyl.ai
 
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
Skyl.ai
 
How to analyze text data with Named Entity Recognition
Skyl.ai
 
AI in Quality Control: How to do visual inspection with AI
Skyl.ai
 
A Beginner's Guide to Business Analytics for business analytics assignment he...
Assignment World
 
AI Recruitment - How Businesses Are Winning the Race for the Talent
Skyl.ai
 
Translating AI from Concept to Reality: Five Keys to Implementing AI for Know...
Enterprise Knowledge
 
The Five Pillars of AI Readiness Webinar
BrainSell Technologies
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
David Solomon
 
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
How to do Secure Data Labeling for Machine Learning
Skyl.ai
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
 
Best Data Analytics Training in Hyderabad
pradeepghosh97
 
How to do Secure Data Labeling for Machine Learning
Skyl.ai
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 
Ad

More from Skyl.ai (17)

PPTX
How to perform Secure Data Labeling for Machine Learning
Skyl.ai
 
PPTX
AI in Quality Control: How to perform Visual Inspection with AI
Skyl.ai
 
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Skyl.ai
 
PPTX
AI in Insurance: How to Automate Insurance Claims Processing with Machine Lea...
Skyl.ai
 
PPTX
Solving the dilemma should you build or buy ai
Skyl.ai
 
PPTX
How AI and Machine Learning can Transform Organizations
Skyl.ai
 
PPTX
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?
Skyl.ai
 
PPTX
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
Skyl.ai
 
PPTX
AI in Insurance: How to Automate Insurance Claim Processing with Machine Lear...
Skyl.ai
 
PPTX
How AI is Changing Medical Imaging in the Healthcare Industry
Skyl.ai
 
PPTX
Twitter Sentiment Analysis in 10 Minutes Using Machine Learning
Skyl.ai
 
PPTX
How to Build an AI-powered Automatic Document Classification Model
Skyl.ai
 
PPTX
How to Implement Biomedical Named Entity Recognition with Machine Learning
Skyl.ai
 
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Skyl.ai
 
PDF
Ai in insurance how to automate insurance claim processing with machine lear...
Skyl.ai
 
PPTX
AI in Health Care: How to Implement Medical Imaging using Machine Learning?
Skyl.ai
 
PDF
Guide to end end machine learning projects
Skyl.ai
 
How to perform Secure Data Labeling for Machine Learning
Skyl.ai
 
AI in Quality Control: How to perform Visual Inspection with AI
Skyl.ai
 
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Skyl.ai
 
AI in Insurance: How to Automate Insurance Claims Processing with Machine Lea...
Skyl.ai
 
Solving the dilemma should you build or buy ai
Skyl.ai
 
How AI and Machine Learning can Transform Organizations
Skyl.ai
 
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?
Skyl.ai
 
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
Skyl.ai
 
AI in Insurance: How to Automate Insurance Claim Processing with Machine Lear...
Skyl.ai
 
How AI is Changing Medical Imaging in the Healthcare Industry
Skyl.ai
 
Twitter Sentiment Analysis in 10 Minutes Using Machine Learning
Skyl.ai
 
How to Build an AI-powered Automatic Document Classification Model
Skyl.ai
 
How to Implement Biomedical Named Entity Recognition with Machine Learning
Skyl.ai
 
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Skyl.ai
 
Ai in insurance how to automate insurance claim processing with machine lear...
Skyl.ai
 
AI in Health Care: How to Implement Medical Imaging using Machine Learning?
Skyl.ai
 
Guide to end end machine learning projects
Skyl.ai
 
Ad

Recently uploaded (20)

PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

How to classify documents automatically using NLP

  • 1. 1 How to classify documents automatically using NLP
  • 2. Technology leader with 20+ years expertise in Product Development, Business strategy and Artificial Intelligence acceleration. Active contributor in the New York AI community Extensively worked with global organizations in BFSI, Healthcare, Insurance, Manufacturing, Retail and Ecommerce to define and implement AI strategies Nisha Shoukath Co-founder, People10 & Skyl.ai The Speaker
  • 3. Extensive experience building future tech products using Machine Learning and Artificial Intelligence. Areas of expertise includes Deep Learning, Data Analysis, full stack development and building world class products in ecommerce, travel and healthcare sector. Shruti Tanwar Lead - Data Science The Speaker
  • 4. Technology enthusiast with 13+ years of experience working in the information technology and services industry. Leads cutting- edge solutions for businesses using Machine Learning and Artificial Intelligence. Areas of expertise includes Architecture design, Solutioning, Data Engineering and Deep Learning.Mohit Juneja Solutions Architect The Speaker
  • 5. Bikash Sharma CTO and Co-founder at Skyl.ai CTO & Software Architect with 15 years of experience working at the forefront of cutting-edge technology leading innovative projects Areas of expertise include Architecture design, rapid product development, Deep Learning and Data Analysis The Panelist
  • 6. Getting familiar with ‘Zoom’ All dial-in participants will be muted to enable the presenters to speak without interruption Questions can be submitted via Zoom Questions chat window and will be addressed at the end during Q&A The recording will be emailed to you after the webinar Please familiarize yourself with the Zoom ‘Control Panel’ on your screen
  • 7. Live demo of News category classification using NLP ...In the next 45 minutes Content classification and how businesses are leveraging it 1 2 3 How to quickly overcome the challenges in building ML models
  • 8. Machine Learning automation platform for unstructured data A quick intro about Skyl.ai Guided Machine Learning Workflow Build & deploy ML models faster on unstructured data Collaborative Data Collection & Labelling Easy-to-use & scalable AI SaaS platform
  • 9. POLL #1 At what stage of Machine learning adoption your organization is at? ⊚ Exploring - Curious about it ⊚ Planning - Creating AI/ML strategy ⊚ Experimenting - Building proof of concepts ⊚ Scaling up - Some departments are using it ⊚ In production - Using it in product features ⊚ Transforming - AI/Ml driven business
  • 11. Text Classification is the activity of labeling natural language texts with relevant categories from a predefined set. Text Classification
  • 12. Types of text classification Text analysis can be performed at: ⊚ Document level - obtain relevant information for a full document. ⊚ Paragraph level - extract the most important categories of a paragraph ⊚ Sentence level - get relevant information from single sentence ⊚ Sub-sentence level - obtains relevant information of sub-expressions within sentences
  • 13. Document Classification Why do we need it? ⊚ Extracting insights from large volume of unstructured data like articles, survey responses, or support tickets is complex ⊚ Time consuming manual classification method ⊚ Traditional rule based system can’t handle changing data
  • 15. Legal Document Discovery Find the relevant documents in the fastest way possible ⊚ File Type Identification ⊚ Sensitive Data Tagging ⊚ Identify the language Source - Case Central
  • 16. Laura Amy Sam Jessika Enabling Customer Support Email & query management ⊚ Scan and redirect Emails to the right office/department ⊚ Filter spam ⊚ Identify customer issues with social listening and ticketing
  • 17. Online Content Classification Provide better user experience ⊚ Tagging content/news or products using categories as a way to improve browsing ⊚ Identify related content on website Business Sports Entertainment Politics Technology News
  • 18. Live Demo of news category classification02
  • 20. Live Demo of news category classification
  • 21. Skyl.ai - as ML automation platform Efficient Data Management Solve your data issues; collect and manage data efficiently Accuracy & Quality Maintain accuracy and quality; train and test faster; monitor quality Effective Collaboration Collaborate and manage projects efficiently Early Visibility Get early visibility; visualize and affirm correctness on every step of the way Scalable High - Performance Access on-demand and scalable, high-performance infrastructure Reduce Cost Reduce cost of implementation; do it with less specialized resources
  • 22. POLL #2 Some challenges that you are facing while implementing AI & Machine Learning ⊚ Not started yet, so no challenges ⊚ Data collection ⊚ Data Labeling ⊚ Large volumes of data ⊚ Identifying the right data set to train ⊚ Data Security ⊚ Lack of knowledge of ML tools ⊚ Lack of end to end platform ⊚ Lack of expertise ⊚ Choosing the right algorithms
  • 23. Overcoming the AI / ML challenges with the right tools and technologies03
  • 24. Best Practices for Data Collection ⊚ Use relevant data sources for data collection ⊚ Establish proper data collection mechanisms ⊚ Do not stop with too-small data sample size Data Collection Data Quality Data SecurityData Security Data SecurityData Labeling
  • 25. Best Practices for Data Quality ⊚ Do validate your data and data sources ⊚ Clean up your data regularly - “garbage out” ⊚ Data correction - remove duplicates, missing data, etc ⊚ Check the consistency of data while data acquisition Data Collection Data Quality Data SecurityData Security Data SecurityData Labeling
  • 26. Best Practices for Data Security ⊚ Monitor data processes continuously to mitigate risks ⊚ Increase data security with encryption and tokenization ⊚ Controlled access flows with different organizational roles Data Quality Data Collection Data SecurityData Security Data SecurityData Labeling
  • 27. Best Practices for Data Labeling ⊚ Define the problem you want to solve and use relevant labels inline with the entities you want to predict ⊚ Analyse trends and progress of your data labeling in real time - to find biases ⊚ Do not add new entity types midway ⊚ Use short tag lists and annotationsData Labeling Data Quality Data SecurityData Security Data Collection
  • 28. Challenges ⊚ Requisite Infrastructure ⊚ Cost of Infrastructure ⊚ Data and ML pipeline ⊚ Model at scale for inference Best Practices ⊚ Use SaaS Model (Pay as you go) - reliable, scalable and secure ⊚ The right software tuned and optimized to fit the underlying hardware ⊚ A flexible infrastructure that can be deployed in the cloud or in an on-premise data center to optimize performance Technology issues and solutions
  • 29. Best Practices ⊚ Train existing employees with education related to AI and ML ⊚ Use Saas products with good documentation, support and implementation that alleviates the need to have highly skilled data scientists and resources with multiple skills. 40% Lack of skilled talent Source: Techrepublic Barrier in adopting AI ⊚ Companies face shortage of necessary in- house talent. Specialized skills and knowledge
  • 30. Challenges ⊚ Long implementation time ⊚ Measure ROI of the AI deployment Best Practices ⊚ AI implementation results in increased process efficiency and automation. ⊚ Create own AI KPIs and analyze the difference in the measurements before and after AI deployment. TechRepublic claims that 56% of global CEOs expect it to take 3-5 years to see any real ROI on their AI investment. Speed and time to market
  • 31. Collect Feedback Monitor the model Process Feedback Deploy the changes Train and Evaluate Continuous Improvement Best Practices ⊚ Perform incremental and measurable improvements ⊚ Monitor your deployed models and analyse inference count, accuracy and execution time. ⊚ Check model performance in real time Monitoring and continuous improvement
  • 32. AI Project Management More Challenges and Concerns ⊚ Project Cost ⊚ Return on Investment ⊚ On-demand scalability ⊚ Iterative corrections in AI project Source: AI for People and Business: A Framework for Better Human Experiences and Business Success DATA Time Cost Performance Requirements The TCPR Model
  • 33. Skyl.ai - as ML automation platform Efficient Data Management Solve your data issues; collect and manage data efficiently Accuracy & Quality Maintain accuracy and quality; train and test faster; monitor quality Effective Collaboration Collaborate and manage projects efficiently Early Visibility Get early visibility; visualize and affirm correctness on every step of the way Scalable High - Performance Access on-demand and scalable, high-performance infrastructure Reduce Cost Reduce cost of implementation; do it with less specialized resources
  • 34. ⊚ Free 1 month Trial + POC ⊚ Complimentary 30 min consultation ⊚ AI Implementation Playbook www.skyl.ai [email protected] Special offer for you...
  • 36. 36 85 Broad Street, New York, NY, 10004 +1 718 300 2104, +1 646 202 9343 [email protected] We hope to hear from you soon Thank you for joining!

Editor's Notes

  • #2: Hello everyone and welcome. Thank you for joining today’s webinar on How to classify documents automatically using NLP. My name is Edwin Martinez and I’ll be your host today. First off, I’d like to introduce 3 speakers for today’s webinar, who are experts in the field of AI.
  • #3: First we have Nisha Shoukath - Nisha is a technology entrepreneur with background in investment banking. She’s co-founded two successful technology startups and has worked with wide variety of global organizations from different industries. She has an active presence in the AI Community and helps Enterprises with defining AI strategy, and AI roadmaps. Welcome, Nisha!
  • #4: Next we have Shruti Tanwar - Shruti is an expert in data science who is a veteran in building SaaS products using Machine Learning and AI. Her expertise includes Deep Learning and Data Analysis, as well as full stack development and building tech products in various different fields such as ecommerce, travel, and healthcare. Welcome, Shruti!
  • #5: Next we have Mohit Juneja, Mohit is a Solutions Architect and Technology supporter with over 13 years of experience in the IT and Service industry. He leads cutting-edge solutions for businesses using Machine Learning and AI. He’s an expert in Architect design, Data Engineering, and Deep Learning. Welcome Mohit!
  • #6: Finally, we have Bikash Sharma, joining us as a panelist. Bikash is CTO and Software Architect with 15 years of experience in leading innovative software projects and solutions. He’s co-founded Skyl with his expert knowledge in AI and Machine Learning. Welcome, Bikash!
  • #7: Before we begin, I’d like to briefly talk about some Zoom features that will be relevant to us. All participants in the webinar will be muted to avoid any interruptions during the session. Any questions you might have can be submitted to the Zoom Questions chat window in the control panel, located on the bottom of the screen. We’ll make sure to address your questions during the Q&A session. Also, the recording of the webinar will be emailed to you afterwards, just in case you’ve missed any talking points or wish to view it again. So that’s all for the introduction - let’s bring in our first speaker Nisha for more on this topic
  • #10: Exploring - Curious about it Planning - Creating AI/ML strategy Experimenting - Building proof of concepts Scaling up - Some departments are using it In production - Using it in product features Transforming - AI/Ml driven business
  • #12: One of the first studies on Twitter data for sentiment was to study public perception of Obama’s performance as President. Another example could be the to explore the variation of sentiment regarding the TV series “Game of Thrones.” The unpredictable episode “The Rains of Castamere” resulted in a lot of negative tweets and a peak in the sentiment score.
  • #13: So, which one is better? Should you analyze your documents as a whole or break them into smaller units? Unfortunately, there is no straight answer. Your choice will depend on your data and objectives.
  • #14: In the past, many companies have used traditional business intelligence tools to monitor social media. However, this is not efficient because traditional BI tools cannot handle true sentiment analysis, capture sarcasm, or process and learn new slang.
  • #17: For example, let’s say you work for a software company and that you use document classification to tag incoming support tickets. You can define that new tickets labeled as Bug should be automatically routed to the technical team.
  • #19: How
  • #23: Not started yet, so no challenges Data collection Data Labeling Data Bias Large volumes of data Identifying the right data set to train Lack of knowledge of ML tools Lack of end to end platform Lack of expertise Choosing the right algorithms Monitoring the model performance
  • #24: Benefit
  • #25: Data is one of the most valuable resources today’s businesses have. The more information you have about your customers, the better you can understand their interests, wants and needs. Use of relevant data sources - to get consistent and accurate data relevant for the problem you want to solve Collection mechanisms - A formal data collection process is necessary as it ensures that the data gathered are both defined and accurate. Small sample size - do not give the distribution of the data to the edge cases and will not train the model for exceptional cases. Keep in mind that machine learning is a process of induction. The model can only capture what it has seen. If your training data does not include edge cases, they will very likely not be supported by the model.
  • #26: The best practices can be achieved by Data Cleaning: Applying a detailed data analysis at the initial phase for recognizing which sorts of irregularities and errors must be expelled. Notwithstanding a manual assessment of the information or data samples, analytic programs are frequently expected to pick up metadata about the data resources and distinguish the issues of data quality. Don’t let bad data or records go unresolved - remove duplicates and fill missing data; For missing data, you should flag and fill the values. Flag the observation with an indicator variable of missingness. Incorrect or inconsistent data leads to false conclusions. And so, how well you clean and understand the data has a high impact on the quality of the results.
  • #27: Encrypted data sources All data sources are encrypted; thus giving users an additional layer of security, making sure your data stays safe and protected. Access controlled flow Defined and controlled access flows with different organizational roles like business owner, project lead, collaborators etc. allow for selective restriction so that you have full command to regulate who can view or use resources in your ML projects.
  • #28: Adding tags midway - For example, the set of tags for a pizza chatbot might start with the tags “Size” “topping” and “drink” before someone realizes that you also need a “Side Dish” tag to capture Garlic Bread and Chicken Wings. Simply adding these tags and continuing work on the documents that haven’t been labeled yet poses a danger to the project. The new tags will be missing from all of the documents annotated before the new tags were added This means that your test set will be wrong for those tags, and your training data won’t contain the new tags leading to a model that won’t capture them. <https://towardsdatascience.com/four-mistakes-you-make-when-labeling-data-7e431c4438a2> In an annotation process, increasing the number of choices the annotator needs to make slows them down and leads to poor data quality.
  • #29: Requisite Infrastructure: When launching a machine learning initiative, organizations can easily underestimate the resources they need for infrastructure. There can be substantial infrastructure requirements for machine learning, especially in the cases of image, video, and audio processing. Cost of Infrastructure: Training and deploying a scalable infrastructure to support machine learning can be expensive and difficult to maintain. Having a cloud approach allows experimentation with machine learning at scale without the overhead of physical hardware acquisition, configuration, and deployment. Data & ML pipeline: AI, Machine learning and deep learning solutions require a high degree of computation speeds offered. Model at Scale for inference: Deploying a scalable infrastructure to support machine learning can be expensive and difficult to maintain. Things get tedious and difficult to maintain at scale compared to having single server deployment or in a developer’s environment which is not a usual case.
  • #30: Skills challenges: Choice of right ML algorithm - ML, DL AI product management - Dealing with Cold start, managing data labeling project, keeping transparency in the project; Keeping the model up to date. Adoption of AI technologies requires specialists like data scientists, data engineers, infrastructure engineers and other SMEs (Subject Matter Experts).
  • #31: Even with long implementation time, AI has potential to cut the expenses. TechRepublic claims that 56% of global CEOs expect it to take 3-5 years to see any real ROI on their AI investment. Machine automation produces quality products faster and more efficiently, while providing critical information to help managers make more informed business decisions.
  • #32: Making continuous improvement part of company culture is an excellent and cost-effective approach to tackling an organization’s most difficult challenges. When supported by improvement technology, results can be achieved quickly and success can be sustained over time.
  • #33: On-demand scalabilty: The truth that it’s better to have a working prototype of a smaller product, rather than an unfinished large one, still stands here with machine learning products. New ML MVPs should be prioritized based on the speed of delivery and their value to the company. If you can deliver products, even those which may be smaller, with speed, it can be a good, quick win for the whole team—you should prioritize these products first. Organizations need to keep in mind that machine learning is an iterative process, and modifications to models might happen over time to support changing requirements. TCPR Model: The TCPR model represents an indeterminate system—one in which more than one solution exists. Notice that the TCPR model rests on a foundation of data. This is critical. There’s no point even talking about the four components of TCPR without first identifying what data sources and fields (aka attributes or features) are available. Link: TCPR
  • #35: Thank you Nisha, Mohit and Shruti, for the wonderful presentation and demo. As mentioned earlier, the recording of the webinar will be emailed to you afterwards. [pause] Before we get to the Q&A, I want to mention some of the offers Skyl has for those of you that are curious about incorporating Machine Learning to your business. Skyl offers a free 1 month trial, plus Proof of Concept. You’ll be able to interact with real data on the screen, just like we showed in the demo. You’ll experience the process of going from collecting & labeling the data… all the way to deploying a model! Skyl also offers a complimentary 30 min consultation and an AI Implementation Playbook to go along. This is a great opportunity to see how Skyl can provide Machine Learning solutions to your challenges. If you’re interested in finding out more, please visit the skyl.ai website or you can send an email directly to [email protected].
  • #36: Alright, now it’s Q&A time! As a reminder, if you have any questions, go to the question box in your control panel - located on the bottom of your Zoom screen. We’ll try to answer as many questions as possible in the time that we have left. So let’s answer some questions. Sample questions: Shruti - (James) How do I know if my model performance is going down, and how do I fix it? - (Anonymous) How can I know the fairness of a model in Skyl? -(Julie) If I build a lot of models, how do I handle model deployment in that case? -How do you avoid creating a biased model and if you detect one, how do you rebuild it? Nisha -(anonymous) How can Skyl help me with my data labelling needs if I have data privacy issues? Ok, that’s all the time we have for questions today, but feel free to contact us with your specific questions and we’ll make sure to get them answered.
  • #37: All right, so we have reached the end of the webinar. We hope you enjoyed it. We have a lot more webinars coming up on different machine learning topics and how they can be implemented into different businesses and industries, So don’t miss out and make sure you sign up for upcoming webinars as well Thank you for joining and I hope you have a wonderful day.