An Intro to Text Analytics on Big Data with a use case

1 like•842 views

The document is an introduction to text analytics on big data, highlighting the vast volume of tweets and the potential of extracting structured information from unstructured data. It discusses extraction techniques, including dictionaries and regular expressions, as well as the use of Annotation Query Language (AQL) for developing and refining text extraction features. The content emphasizes the importance of text analytics in handling large data sets and troubleshooting performance issues.

Technology

#TOSMAC
Toronto SMAC Meetup – Welcome!
An Intro to Text Analytics on Big Data with a use case

#TOSMAC
Toronto SMAC Team
| © 2014 IBM Corporation2
Lucas Silva Felipe MosquettaMarcos de
Mello

#TOSMAC
Twitters numbers
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation3
As you know:
-500 million Tweets are sent per day.
-Twitter supports 35+ languages.
-255 million monthly active users.
Huge amount of data!

#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation4
Overview
Section1 Section2 Section3 Section4 Section5

#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation5
Overview

#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation6
Overview

#TOSMAC
Let’s get started!
| © 2014 IBM Corporation7

#TOSMAC
Input data
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation8

#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation9
Section2

#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation11
Next section

#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation12
Next section Extractor: used to extract
structured information from
unstructured and
semi-structured data.
AQL: Annotation Query
Language. Rule language
with familiar SQL-like syntax.

#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation13
Next section
Profiler:
troubleshooting performance
problems.

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation14
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation15

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation16

#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation17

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation18

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation19

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation20
Types of extraction specifications:
- Dictionaries
-Regular expressions
- Part of speech
numbers:
7.5
4
13

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation22
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech

#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation23

#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation24

#TOSMAC
| © 2014 IBM Corporation25
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Basic feature AQL statements
- Develop the core building blocks of the extractor.

#TOSMAC
| © 2014 IBM Corporation26
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Candidate generation AQL statements
- Combine basic features AQL statements.

#TOSMAC
| © 2014 IBM Corporation27
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million

#TOSMAC
| © 2014 IBM Corporation28
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
$7.5 million

#TOSMAC
| © 2014 IBM Corporation29
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Filter and consolidate AQL statements
- Refine results
- Remove invalid annotations
- Resolve overlap between annotations.

#TOSMAC
| © 2014 IBM Corporation31
An Intro to Text Analytics on Big Data with a use case
Conclusion

#TOSMAC
Check point
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation32

#TOSMAC
What we have done
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation33
Section1 Section2 Section3

#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation34
Section4 Section5

#TOSMAC
Also using R
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation36
1.75 0.32

#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation37

#TOSMAC
So what?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation39

#TOSMAC
Companies
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation40

#TOSMAC
Exporting to you
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation41

#TOSMAC
Thank you!
Let's network!
| © 2014 IBM Corporation42

More Related Content

PPTX

AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkValue Amplify Consulting

PPTX

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...semanticsconference

PDF

Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...semanticsconference

PPTX

E-commerce企業におけるビッグデータ活用の取り組みと今後の展望Rakuten Group, Inc.

PDF

Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEsemanticsconference

PDF

II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...Dr. Haxel Consult

PPTX

Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...semanticsconference

PPTX

Viafoura's Big Data Use CaseVictor Anjos

AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkValue Amplify Consulting

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...semanticsconference

Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...semanticsconference

E-commerce企業におけるビッグデータ活用の取り組みと今後の展望Rakuten Group, Inc.

Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEsemanticsconference

II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...Dr. Haxel Consult

Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...semanticsconference

Viafoura's Big Data Use CaseVictor Anjos

Viewers also liked (7)

PPTX

Don't be Hadooped when looking for Big Data ROIDataWorks Summit

PPTX

Big data analytics use case and softwareSandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

PDF

Creating a Business Case for Big DataPerficient, Inc.

PDF

CRM as the hub of your big data - A Salesforce use case.Suyati Technologies Pvt Ltd

PDF

Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle

PPTX

Monetizing Big Data at Telecom Service ProvidersDataWorks Summit

PPTX

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!

Don't be Hadooped when looking for Big Data ROIDataWorks Summit

Big data analytics use case and softwareSandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Creating a Business Case for Big DataPerficient, Inc.

CRM as the hub of your big data - A Salesforce use case.Suyati Technologies Pvt Ltd

Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle

Monetizing Big Data at Telecom Service ProvidersDataWorks Summit

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!

Similar to An Intro to Text Analytics on Big Data with a use case (20)

PDF

Governing Big Data : Principles and practicesPiyush Malik

PDF

L21 Big Data and AnalyticsÓlafur Andri Ragnarsson

PDF

L18 Big Data and AnalyticsÓlafur Andri Ragnarsson

PPT

MassTLC Big Data Seminar Sept 20MassTLC

PPTX

Bigdata analyticsSwarnaLatha177

PDF

Big data surveyEzhilarasan Elumalai

PPTX

Introduction to big dataHari Priya

PPTX

Data analytics introductionamiyadash

PDF

Big Data et eGovernmenteGov Innovation Center

PPT

Mass tlc big data panel sep 20MassTLC

PDF

A Statistician's View on Big Data and Data Science (Version 3)Prof. Dr. Diego Kuonen

PDF

Big data concepts for management information systemNoufAbdullah21

PPTX

Big Data CareersSteven Miller

PDF

Big data big_ruseshoshi107

PDF

Industry and academic partnerships july 2015 finalSteven Miller

PDF

A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)Prof. Dr. Diego Kuonen

PDF

All About Big Data Sai Venkatesh

PDF

QuickView #3 - Big DataSonovate

PDF

Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Oomph! Recruitment

PDF

beyond the hype 2015 concepts methods.pdf123zeineb

Governing Big Data : Principles and practicesPiyush Malik

L21 Big Data and AnalyticsÓlafur Andri Ragnarsson

L18 Big Data and AnalyticsÓlafur Andri Ragnarsson

MassTLC Big Data Seminar Sept 20MassTLC

Bigdata analyticsSwarnaLatha177

Big data surveyEzhilarasan Elumalai

Introduction to big dataHari Priya

Data analytics introductionamiyadash

Big Data et eGovernmenteGov Innovation Center

Mass tlc big data panel sep 20MassTLC

A Statistician's View on Big Data and Data Science (Version 3)Prof. Dr. Diego Kuonen

Big data concepts for management information systemNoufAbdullah21

Big Data CareersSteven Miller

Big data big_ruseshoshi107

Industry and academic partnerships july 2015 finalSteven Miller

A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)Prof. Dr. Diego Kuonen

All About Big Data Sai Venkatesh

QuickView #3 - Big DataSonovate

Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Oomph! Recruitment

beyond the hype 2015 concepts methods.pdf123zeineb

More from Raul Chong (17)

PPT

Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsRaul Chong

PPT

Design thinkingRaul Chong

PDF

Risk and financial portfolio analytics - A technical IntroductionRaul Chong

PDF

Introducing BluemixRaul Chong

PDF

Business Analytics and Optimization Introduction (part 2)Raul Chong

PDF

Business Analytics and Optimization IntroductionRaul Chong

PDF

What has IBM Watson been up to since the Jeopardy! challenge?Raul Chong

PDF

SMAC projects - The best summer internship experience I ever had!Raul Chong

PDF

Starting your education in big data - Sneak peek to the new Big Data UniversityRaul Chong

PDF

Developing wearable technology apps quicklyRaul Chong

PDF

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong

PDF

Mobile solutions for iOS (and other platforms) - CloudantRaul Chong

PDF

Mobile solutions for iOS (and other platforms) - WorklightRaul Chong

PDF

Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Raul Chong

PDF

0626 2014 01_toronto-smac meetup_io_tRaul Chong

PDF

02 a holistic approach to big dataRaul Chong

PDF

0430 toronto smac_meetup_worklight_intro_finalRaul Chong

Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsRaul Chong

Design thinkingRaul Chong

Risk and financial portfolio analytics - A technical IntroductionRaul Chong

Introducing BluemixRaul Chong

Business Analytics and Optimization Introduction (part 2)Raul Chong

Business Analytics and Optimization IntroductionRaul Chong

What has IBM Watson been up to since the Jeopardy! challenge?Raul Chong

SMAC projects - The best summer internship experience I ever had!Raul Chong

Starting your education in big data - Sneak peek to the new Big Data UniversityRaul Chong

Developing wearable technology apps quicklyRaul Chong

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong

Mobile solutions for iOS (and other platforms) - CloudantRaul Chong

Mobile solutions for iOS (and other platforms) - WorklightRaul Chong

Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Raul Chong

0626 2014 01_toronto-smac meetup_io_tRaul Chong

02 a holistic approach to big dataRaul Chong

0430 toronto smac_meetup_worklight_intro_finalRaul Chong

Recently uploaded (20)

PPTX

What-is-the-World-Wide-Web -- Introductiontonifi9488

PDF

Using Anchore and DefectDojo to Stand Up Your DevSecOps FunctionAnchore

PDF

The Future of Artificial Intelligence (AI)Mukul

PPTX

New ThousandEyes Product Innovations: Cisco Live June 2025ThousandEyes

PDF

Google I/O Extended 2025 Baku - all pptsHusseinMalikMammadli

PPTX

Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...AgileNetwork

PDF

How Open Source Changed My Career by abdelrahman ismaila0m0rajab1

PDF

Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...Sandesh Rao

PDF

Brief History of Internet - Early Days of Internetsutharharshit158

PDF

Responsible AI and AI Ethics - By Sylvester EbhonuSylvester Ebhonu

PDF

Security features in Dell, HP, and Lenovo PC systems: A research-based compar...Principled Technologies

PDF

OFFOFFBOX™ – A New Era for African Film | Startup Presentationambaicciwalkerbrian

PDF

Economic Impact of Data Centres to the Malaysian Economyflintglobalapac

PPTX

Applied-Statistics-Mastering-Data-Driven-Decisions.pptxparmaryashparmaryash

PPTX

cloud computing vai.pptx for the projectvaibhavdobariyal79

PDF

Doc9.....................................SofiaCollazos

PPTX

The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptxsujalchauhan1305

PDF

A Strategic Analysis of the MVNO Wave in Emerging Markets.pdfIPLOOK Networks

PDF

MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdfNeo4j

PDF

AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdfArtjoker Software Development Company