SlideShare a Scribd company logo
#TOSMAC
Toronto SMAC Meetup – Welcome!
An Intro to Text Analytics on Big Data with a use case
#TOSMAC
Toronto SMAC Team
| © 2014 IBM Corporation2
Lucas Silva Felipe MosquettaMarcos de
Mello
#TOSMAC
Twitters numbers
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation3
As you know:
-500 million Tweets are sent per day.
-Twitter supports 35+ languages.
-255 million monthly active users.
Huge amount of data!
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation4
Overview
Section1 Section2 Section3 Section4 Section5
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation5
Overview
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation6
Overview
#TOSMAC
Let’s get started!
| © 2014 IBM Corporation7
#TOSMAC
Input data
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation8
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation9
Section2
#TOSMAC
Demo
| © 2014 IBM Corporation10
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation11
Next section
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation12
Next section Extractor: used to extract
structured information from
unstructured and
semi-structured data.
AQL: Annotation Query
Language. Rule language
with familiar SQL-like syntax.
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation13
Next section
Profiler:
troubleshooting performance
problems.
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation14
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation15
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation16
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation17
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation18
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation19
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation20
Types of extraction specifications:
- Dictionaries
-Regular expressions
- Part of speech
numbers:
7.5
4
13
#TOSMAC
Demo
| © 2014 IBM Corporation21
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation22
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation23
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation24
#TOSMAC
| © 2014 IBM Corporation25
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Basic feature AQL statements
- Develop the core building blocks of the extractor.
#TOSMAC
| © 2014 IBM Corporation26
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Candidate generation AQL statements
- Combine basic features AQL statements.
#TOSMAC
| © 2014 IBM Corporation27
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
#TOSMAC
| © 2014 IBM Corporation28
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
$7.5 million
#TOSMAC
| © 2014 IBM Corporation29
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Filter and consolidate AQL statements
- Refine results
- Remove invalid annotations
- Resolve overlap between annotations.
#TOSMAC
Demo
| © 2014 IBM Corporation30
#TOSMAC
| © 2014 IBM Corporation31
An Intro to Text Analytics on Big Data with a use case
Conclusion
#TOSMAC
Check point
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation32
#TOSMAC
What we have done
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation33
Section1 Section2 Section3
#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation34
Section4 Section5
#TOSMAC
Demo
| © 2014 IBM Corporation35
#TOSMAC
Also using R
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation36
1.75 0.32
#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation37
#TOSMAC
Demo
| © 2014 IBM Corporation38
#TOSMAC
So what?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation39
#TOSMAC
Companies
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation40
#TOSMAC
Exporting to you
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation41
#TOSMAC
Thank you!
Let's network!
| © 2014 IBM Corporation42

More Related Content

PPTX
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
Value Amplify Consulting
 
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
PDF
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
semanticsconference
 
PPTX
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
Rakuten Group, Inc.
 
PDF
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
semanticsconference
 
PDF
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
Dr. Haxel Consult
 
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
semanticsconference
 
PPTX
Viafoura's Big Data Use Case
Victor Anjos
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
Value Amplify Consulting
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
semanticsconference
 
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
Rakuten Group, Inc.
 
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
semanticsconference
 
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
Dr. Haxel Consult
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
semanticsconference
 
Viafoura's Big Data Use Case
Victor Anjos
 

Viewers also liked (7)

PPTX
Don't be Hadooped when looking for Big Data ROI
DataWorks Summit
 
PPTX
Big data analytics use case and software
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PDF
Creating a Business Case for Big Data
Perficient, Inc.
 
PDF
CRM as the hub of your big data - A Salesforce use case.
Suyati Technologies Pvt Ltd
 
PDF
Benefiting from Big Data - A New Approach for the Telecom Industry
Persontyle
 
PPTX
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 
PPTX
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Don't be Hadooped when looking for Big Data ROI
DataWorks Summit
 
Big data analytics use case and software
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Creating a Business Case for Big Data
Perficient, Inc.
 
CRM as the hub of your big data - A Salesforce use case.
Suyati Technologies Pvt Ltd
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Persontyle
 
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Ad

Similar to An Intro to Text Analytics on Big Data with a use case (20)

PDF
Governing Big Data : Principles and practices
Piyush Malik
 
PDF
L21 Big Data and Analytics
Ólafur Andri Ragnarsson
 
PDF
L18 Big Data and Analytics
Ólafur Andri Ragnarsson
 
PPT
MassTLC Big Data Seminar Sept 20
MassTLC
 
PPTX
Bigdata analytics
SwarnaLatha177
 
PDF
Big data survey
Ezhilarasan Elumalai
 
PPTX
Introduction to big data
Hari Priya
 
PPTX
Data analytics introduction
amiyadash
 
PDF
Big Data et eGovernment
eGov Innovation Center
 
PPT
Mass tlc big data panel sep 20
MassTLC
 
PDF
A Statistician's View on Big Data and Data Science (Version 3)
Prof. Dr. Diego Kuonen
 
PDF
Big data concepts for management information system
NoufAbdullah21
 
PPTX
Big Data Careers
Steven Miller
 
PDF
Big data big_ruse
shoshi107
 
PDF
Industry and academic partnerships july 2015 final
Steven Miller
 
PDF
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
Prof. Dr. Diego Kuonen
 
PDF
All About Big Data
Sai Venkatesh
 
PDF
QuickView #3 - Big Data
Sonovate
 
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Oomph! Recruitment
 
PDF
beyond the hype 2015 concepts methods.pdf
123zeineb
 
Governing Big Data : Principles and practices
Piyush Malik
 
L21 Big Data and Analytics
Ólafur Andri Ragnarsson
 
L18 Big Data and Analytics
Ólafur Andri Ragnarsson
 
MassTLC Big Data Seminar Sept 20
MassTLC
 
Bigdata analytics
SwarnaLatha177
 
Big data survey
Ezhilarasan Elumalai
 
Introduction to big data
Hari Priya
 
Data analytics introduction
amiyadash
 
Big Data et eGovernment
eGov Innovation Center
 
Mass tlc big data panel sep 20
MassTLC
 
A Statistician's View on Big Data and Data Science (Version 3)
Prof. Dr. Diego Kuonen
 
Big data concepts for management information system
NoufAbdullah21
 
Big Data Careers
Steven Miller
 
Big data big_ruse
shoshi107
 
Industry and academic partnerships july 2015 final
Steven Miller
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
Prof. Dr. Diego Kuonen
 
All About Big Data
Sai Venkatesh
 
QuickView #3 - Big Data
Sonovate
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Oomph! Recruitment
 
beyond the hype 2015 concepts methods.pdf
123zeineb
 
Ad

More from Raul Chong (17)

PPT
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Raul Chong
 
PPT
Design thinking
Raul Chong
 
PDF
Risk and financial portfolio analytics - A technical Introduction
Raul Chong
 
PDF
Introducing Bluemix
Raul Chong
 
PDF
Business Analytics and Optimization Introduction (part 2)
Raul Chong
 
PDF
Business Analytics and Optimization Introduction
Raul Chong
 
PDF
What has IBM Watson been up to since the Jeopardy! challenge?
Raul Chong
 
PDF
SMAC projects - The best summer internship experience I ever had!
Raul Chong
 
PDF
Starting your education in big data - Sneak peek to the new Big Data University
Raul Chong
 
PDF
Developing wearable technology apps quickly
Raul Chong
 
PDF
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
PDF
Mobile solutions for iOS (and other platforms) - Cloudant
Raul Chong
 
PDF
Mobile solutions for iOS (and other platforms) - Worklight
Raul Chong
 
PDF
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Raul Chong
 
PDF
0626 2014 01_toronto-smac meetup_io_t
Raul Chong
 
PDF
02 a holistic approach to big data
Raul Chong
 
PDF
0430 toronto smac_meetup_worklight_intro_final
Raul Chong
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Raul Chong
 
Design thinking
Raul Chong
 
Risk and financial portfolio analytics - A technical Introduction
Raul Chong
 
Introducing Bluemix
Raul Chong
 
Business Analytics and Optimization Introduction (part 2)
Raul Chong
 
Business Analytics and Optimization Introduction
Raul Chong
 
What has IBM Watson been up to since the Jeopardy! challenge?
Raul Chong
 
SMAC projects - The best summer internship experience I ever had!
Raul Chong
 
Starting your education in big data - Sneak peek to the new Big Data University
Raul Chong
 
Developing wearable technology apps quickly
Raul Chong
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Mobile solutions for iOS (and other platforms) - Cloudant
Raul Chong
 
Mobile solutions for iOS (and other platforms) - Worklight
Raul Chong
 
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Raul Chong
 
0626 2014 01_toronto-smac meetup_io_t
Raul Chong
 
02 a holistic approach to big data
Raul Chong
 
0430 toronto smac_meetup_worklight_intro_final
Raul Chong
 

Recently uploaded (20)

PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
The Future of Artificial Intelligence (AI)
Mukul
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Doc9.....................................
SofiaCollazos
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 

An Intro to Text Analytics on Big Data with a use case