SlideShare a Scribd company logo
Alyona Medelyan (Pingar)
                      @zelandiya

    THE NEXT-GENERATION
             SHAREPOINT:
POWERED BY TEXT ANALYTICS
AGENDA
• Information tasks
• Text analytics
• APIs
• Demos
• Conclusions
Information tasks
What do they cost us?
How does SharePoint help?
Avg. hours per week
14.5
       13.3                                               = $37K       year / person


              9.6   9.5
                          8.8   8.3
                                        6.8   6.7
                                                    5.6   5.6
                                                                4.3   4.2

                                                                             1




                                                                     Source:
                                      IDC, Hidden Cost of Information (2005)
SHAREPOINT SAVES TIME
 Interact with SP from Outlook
       Create docs collaboratively
                   Customize search configuration
                              Use sites, sets & libraries
                                     Define Managed Metadata
                                                       Configure forms
                                                            Design Workflow
Text Analytics
What is it and how does it work?
What tasks does it solve?
WHAT IS TEXT ANALYTICS?
                unstructured data



Linguistics                                  Search
   Statistics                          Data Extraction
  Text Processing                    Document Organization
Machine Learning                    Business Intelligence
Natural Language Processing          Opinion Mining
     Text Mining
TEXT ANALYTICS SAVES MORE TIME
    Compose search reports
        Extract entities
                                        … automatically
        Mine opinions & sentiment
              Cluster search results
                   Redact
                           Summarize
                               Generate metadata
                                              Fill databases
                                                     Profanity check
Text Analytics Software
What companies offer text analytics?
What are open source tools like?
TEXT ANALYTICS: GLOBAL PERSPECTIVE

User adoption has grown by 25% in 2010
 creating an $835 million market because:

• Unstructured data grows (ex. social)  Text analytics!
• Text analytics is central to effective information access
• Many successes in NLP: IBM Watson, Wolfram Alpha



                                    Full report by Seth Grimes:
                                  http://altaplana.com/TA2011
APPLICATIONS OF TEXT ANALYTICS
            Search & info access                                    39%
Customer experience management                                      39%
             Brand management                                       39%
                          Research                               36%
          Competitive intelligence                            33%
                Customer service                        26%
                       E-discovery                15%
                      Life sciences               15%
                    Product design                15%
                Online commerce             11%
                            Finance        10%
                               Other      9%
            Content management           8%
                Insurance & fraud        8%
              Millitary intelligence    7%
                 Law enforcement       6%                        Source:
                                             http://altaplana.com/TA2011
SEARCH & INFO ACCESS
 METADATA EXTRACTION

Document                  Easy to extract:                Metadata
                          File type, name & location,
                          creation & modification date,
                          authors

           Difficult to extract:
           Keywords,
           people & companies mentioned,
           suppliers & addresses mentioned
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
    KEYWORD EXTRACTION

    Document     Candidates       Properties                        Keywords



               Hi All,
               As of today, MetaStock has several new functions.
 Frequency     The most important new feature is the ability to
    Position   display forward heat rate charts.
Corpus stats   Also, notice that the interface looks different -- this
Relatedness    reflects and accommodates the new features.
               If you have any questions regarding this new
               version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
 KEYWORD EXTRACTION

Document      Candidates       Properties         Scoring        Keywords



            Hi All,
            As of today, MetaStock has several new functions.
Heuristic   The most important new feature is the ability to
 scoring    display forward heat rate charts.
            Also, notice that the interface looks different -- this
Machine     reflects and accommodates the new features.
learning    If you have any questions regarding this new
            version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
NAMES EXTRACTION

Document      Examples       Properties       Learning        Names



           If you have any questions regarding this new version of
           MetaStock, please contact Bella Santuri.


                                NLP,
       Training data                            Machine
                             Heuristics,
       (annotations)                            Learning
                             Text mining
<SEARCH + TEXT ANALYTICS> COMPANIES




 Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                        Visualization
  Tweets                Sentiment Analysis
                                                                Summary
  Surveys

Naïve approach: Sentiment-words dictionary!

Negative    Positive    BUT:
  suck      fantastic                        If you are reading this because it
 terrible   excellent                        is your darling fragrance, please
  awful     awesome                          wear it at home exclusively, and
                                             tape the windows shut.

                                                No sentiment words!
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                  Visualization
  Tweets        Examples     Properties    Learning
                                                          Summary
  Surveys


                                       Presence
                                       Position
Training data          Lexicon                            Machine
                                    Part-of-Speech
(annotations)         induction                           Learning
                                       Negation
                                    Generalization
                Important:
                Identifying sentiment bearing sentences
                Attaching sentiment to a topic!
SENTIMENT ANALYSIS COMPANIES
Attensity
AlchemyAPI
Lexalytics
Saplo
Medallia
SAS
RESEARCH
    TEXT SUMMARIZATION
          Address      Hi All,
    Announcement       As of today, MetaStock has several new functions.
           Details     The most important new feature is the ability to
                       display forward heat rate charts.
       More details    Also, notice that the interface looks different -- this
                       reflects and accommodates the new features.
         Conclusion    If you have any questions regarding this new
                       version of MetaStock, please contact Bella Santuri.

Extractive summary:   As of today, MetaStock has several new functions.
Sentence compression: MetaStock has several new functions.
                      The new interface looks different.
Abstractive summary: MetaStock has new features and a new interface.
TEXT SUMMARIZATION COMPANIES




Lexalytics, Pingar
COMPETITIVE INTELLIGENCE:
ENTITY & ENTITY RELATION EXTRACTION




     Companies:
     OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
FRAUD INVESTIGATION:
NORMALIZATION OF DATES & NAMES




           Companies:
           Cicero, BasisTech
OPEN-SOURCE TOOLS
• NLTK – Apache license, Book, Python & academic
  datasets, nltk.org
• LingPipe – Commercial
  licenses, Tutorials, Coreference & Chinese
  segment, alias-i.com/lingpipe
• OpenNLP – Apache license, Parsing, MaxEnt
  ML, incubator.apache.org/opennlp
• GATE – restricted GPL, Training courses, Applications
  & framework, gate.ac.uk
• Stanford NLP – full GPL, Online docs, Full
  library, nlp.stanford.edu
APIs
What’s an API and how does it work?
What are the advantages of the API model?
Which API is the right one for you?
API ACCESS
                                     a protocol specifies how • SOAP
                                     XML needs to be encoded • REST
                a call is an XML message
                describing the request

                includes API authentication
                calls via a web service
                                              API                          ENGINE
             SDK
               usage examples
Developer creates                       An interface that                Software engine
  an application                     ensures communication             solves a specific task
REST API ACCESS FROM A BROWSER
API request
http://search.yahooapis.com/WebSearchService/V1/webSe
arch?appid=YahooDemo&query=madonna&context=Italian+sc
ulptors+and+painters+of+the+renaissance+favored+the+V
irgin+Mary+for+inspiration
API response
SOAP API ACCESS FROM VS2010
SOAP API ACCESS IN POWERSHELL




Read complete blog post “Bulk metadata extraction in SharePoint”:
http://bit.ly/powershell-migrate
API = EASY INTEGRATION & FLEXIBILITY
• Integrate into existing architecture
  via any programming language
• Improve known flaws in the current system/process
• Minimize adoption barriers within the company
  no or little training required for stuff
• Only pay for the features you need
• Flexible deployment:
   • Host API on site = Secure data exchange
   • Access the API in the cloud = Save on tech support & hardware
WHICH API IS BEST FOR YOU?
         I need to take some text and get a list of the
         important entities/keywords/phrases.


          Y: Term Extractor        API restrictions
          OpenCalais               Supported languages
          BeliefNetworks           Quality of results
          OpenAmplify              Semantic links
          AlchemyAPI 2nd           Synonyms/Duplicates
          Evri 1st

                           Blog post on API comparison:
                                      faganm.com/blog
HOW TO CHOOSE AN API:
• Define a specific task
• Think of what features are important
• Get prepared:
  • Subscribe for API keys
  • Get SDKs
  • Learn libraries
• Find representative data
• Build a test framework
• Compare results
METADATA EXTRACTION
IN SHAREPOINT
Demo
Pingar’s add-on for SharePoint 2010
built using a text analytics API
INTEGRATING APIS
INTO SCANNING
Video
Using Fuji Xerox SmartConnect and Pingar API
to scan documents in batch into SharePoint



                       http://www.youtube.com/watch?v=kluVp25upag
The Next Generation SharePoint: Powered by Text Analytics
THE NEXT-GENERATION SHAREPOINT:
POWERED BY TEXT ANALYTICS
• What can be automated?
  • Metadata extraction, Data entry, Opinion mining,
    Sanitization, Doc approval, Summarization, …

• How to integrate text analytics
  into existing SharePoint applications?
  • Easy! Via an API

• How to find the right text analytics API?
  • Review what’s available
    Set up an experiment
    Compare results
Thank you to all of our Sponsors

More Related Content

Viewers also liked (16)

PDF
How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
Brandwatch
 
PPTX
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
Basis Technology
 
PPTX
Optimizing multilingual search in SOLR
Basis Technology
 
PPTX
Rosette Search Essentials for Elasticsearch
Basis Technology
 
PDF
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Basis Technology
 
PDF
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Basis Technology
 
PPTX
Data analytics
Canopus InfoSystems Pvt.Ltd
 
PDF
Final Doc_1.1
Aditya Deshmukh
 
PPTX
Future of ai on the jvm
Adam Gibson
 
PDF
World Domination with Pentaho EE?
Jos van Dongen
 
PDF
How can iceland produce so many professional players sept 2010
Robin.Russell
 
PDF
Basic NLP with Python and NLTK
Francesco Bruni
 
PDF
Natural Language Toolkit (NLTK), Basics
Prakash Pimpale
 
PPTX
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
ODP
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
PPSX
Business Intelligence and Big Data Analytics with Pentaho
Uday Kothari
 
How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
Brandwatch
 
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
Basis Technology
 
Optimizing multilingual search in SOLR
Basis Technology
 
Rosette Search Essentials for Elasticsearch
Basis Technology
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Basis Technology
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Basis Technology
 
Final Doc_1.1
Aditya Deshmukh
 
Future of ai on the jvm
Adam Gibson
 
World Domination with Pentaho EE?
Jos van Dongen
 
How can iceland produce so many professional players sept 2010
Robin.Russell
 
Basic NLP with Python and NLTK
Francesco Bruni
 
Natural Language Toolkit (NLTK), Basics
Prakash Pimpale
 
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
Business Intelligence and Big Data Analytics with Pentaho
Uday Kothari
 

Similar to The Next Generation SharePoint: Powered by Text Analytics (20)

PPTX
Mesh Labs Introduction June 2012
Umesh Ramalingachar
 
PPTX
Information Management and Analytics
AKAGroup
 
PPTX
Summit EU Machine Learning
Ted Dunning
 
PPTX
Revenue Growth through Machine Learning
DataWorks Summit
 
PPTX
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
ivan provalov
 
PPT
Big Data = Big Decisions
InnoTech
 
PDF
Open Source for Enterprise Search: Breaking Down the Barriers to Information
Lucidworks (Archived)
 
PPTX
Crowd-Sourced Intelligence Built into Search over Hadoop
DataWorks Summit
 
PDF
Search + Big Data: It's (still) All About the User- Grant Ingersoll
lucenerevolution
 
PPTX
Text Analytics Past, Present & Future
Seth Grimes
 
PPTX
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
 
PDF
Adding structure to unstructured content for enhanced findability hakan tylen
Dynamic People B.V.
 
PPTX
Big data and Analytics
Kevin Magee
 
PPT
SPLive Orlando - Beyond the Search Center - Application or Solution?
Agnes Molnar
 
PPTX
MapR lucidworks joint webinar
Ted Dunning
 
PPTX
Large Scale Search, Discovery and Analytics in Action
Grant Ingersoll
 
PPTX
Knowledge Extraction from Social Media
Seth Grimes
 
PDF
Text Analytics 2009: User Perspectives on Solutions and Providers
Seth Grimes
 
PDF
IBM Stream au Hadoop User Group
Modern Data Stack France
 
PPTX
MapR LucidWorks Joint Webinar 121211
MapR Technologies
 
Mesh Labs Introduction June 2012
Umesh Ramalingachar
 
Information Management and Analytics
AKAGroup
 
Summit EU Machine Learning
Ted Dunning
 
Revenue Growth through Machine Learning
DataWorks Summit
 
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
ivan provalov
 
Big Data = Big Decisions
InnoTech
 
Open Source for Enterprise Search: Breaking Down the Barriers to Information
Lucidworks (Archived)
 
Crowd-Sourced Intelligence Built into Search over Hadoop
DataWorks Summit
 
Search + Big Data: It's (still) All About the User- Grant Ingersoll
lucenerevolution
 
Text Analytics Past, Present & Future
Seth Grimes
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
 
Adding structure to unstructured content for enhanced findability hakan tylen
Dynamic People B.V.
 
Big data and Analytics
Kevin Magee
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
Agnes Molnar
 
MapR lucidworks joint webinar
Ted Dunning
 
Large Scale Search, Discovery and Analytics in Action
Grant Ingersoll
 
Knowledge Extraction from Social Media
Seth Grimes
 
Text Analytics 2009: User Perspectives on Solutions and Providers
Seth Grimes
 
IBM Stream au Hadoop User Group
Modern Data Stack France
 
MapR LucidWorks Joint Webinar 121211
MapR Technologies
 
Ad

Recently uploaded (20)

PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
July Patch Tuesday
Ivanti
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Ad

The Next Generation SharePoint: Powered by Text Analytics

  • 1. Alyona Medelyan (Pingar) @zelandiya THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS
  • 2. AGENDA • Information tasks • Text analytics • APIs • Demos • Conclusions
  • 3. Information tasks What do they cost us? How does SharePoint help?
  • 4. Avg. hours per week 14.5 13.3 = $37K year / person 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
  • 5. SHAREPOINT SAVES TIME  Interact with SP from Outlook  Create docs collaboratively  Customize search configuration  Use sites, sets & libraries  Define Managed Metadata  Configure forms  Design Workflow
  • 6. Text Analytics What is it and how does it work? What tasks does it solve?
  • 7. WHAT IS TEXT ANALYTICS? unstructured data Linguistics Search Statistics Data Extraction Text Processing Document Organization Machine Learning Business Intelligence Natural Language Processing Opinion Mining Text Mining
  • 8. TEXT ANALYTICS SAVES MORE TIME  Compose search reports  Extract entities … automatically  Mine opinions & sentiment  Cluster search results  Redact  Summarize  Generate metadata  Fill databases  Profanity check
  • 9. Text Analytics Software What companies offer text analytics? What are open source tools like?
  • 10. TEXT ANALYTICS: GLOBAL PERSPECTIVE User adoption has grown by 25% in 2010 creating an $835 million market because: • Unstructured data grows (ex. social)  Text analytics! • Text analytics is central to effective information access • Many successes in NLP: IBM Watson, Wolfram Alpha Full report by Seth Grimes: http://altaplana.com/TA2011
  • 11. APPLICATIONS OF TEXT ANALYTICS Search & info access 39% Customer experience management 39% Brand management 39% Research 36% Competitive intelligence 33% Customer service 26% E-discovery 15% Life sciences 15% Product design 15% Online commerce 11% Finance 10% Other 9% Content management 8% Insurance & fraud 8% Millitary intelligence 7% Law enforcement 6% Source: http://altaplana.com/TA2011
  • 12. SEARCH & INFO ACCESS  METADATA EXTRACTION Document Easy to extract: Metadata File type, name & location, creation & modification date, authors Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned
  • 13. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 14. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 15. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Keywords Hi All, As of today, MetaStock has several new functions. Frequency The most important new feature is the ability to Position display forward heat rate charts. Corpus stats Also, notice that the interface looks different -- this Relatedness reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 16. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Scoring Keywords Hi All, As of today, MetaStock has several new functions. Heuristic The most important new feature is the ability to scoring display forward heat rate charts. Also, notice that the interface looks different -- this Machine reflects and accommodates the new features. learning If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 17. SEARCH & INFO ACCESS NAMES EXTRACTION Document Examples Properties Learning Names If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. NLP, Training data Machine Heuristics, (annotations) Learning Text mining
  • 18. <SEARCH + TEXT ANALYTICS> COMPANIES Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
  • 19. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Sentiment Analysis Summary Surveys Naïve approach: Sentiment-words dictionary! Negative Positive BUT: suck fantastic If you are reading this because it terrible excellent is your darling fragrance, please awful awesome wear it at home exclusively, and tape the windows shut. No sentiment words!
  • 20. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Examples Properties Learning Summary Surveys Presence Position Training data Lexicon Machine Part-of-Speech (annotations) induction Learning Negation Generalization Important: Identifying sentiment bearing sentences Attaching sentiment to a topic!
  • 22. RESEARCH  TEXT SUMMARIZATION Address Hi All, Announcement As of today, MetaStock has several new functions. Details The most important new feature is the ability to display forward heat rate charts. More details Also, notice that the interface looks different -- this reflects and accommodates the new features. Conclusion If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. Extractive summary: As of today, MetaStock has several new functions. Sentence compression: MetaStock has several new functions. The new interface looks different. Abstractive summary: MetaStock has new features and a new interface.
  • 24. COMPETITIVE INTELLIGENCE: ENTITY & ENTITY RELATION EXTRACTION Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
  • 25. FRAUD INVESTIGATION: NORMALIZATION OF DATES & NAMES Companies: Cicero, BasisTech
  • 26. OPEN-SOURCE TOOLS • NLTK – Apache license, Book, Python & academic datasets, nltk.org • LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe • OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp • GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk • Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu
  • 27. APIs What’s an API and how does it work? What are the advantages of the API model? Which API is the right one for you?
  • 28. API ACCESS a protocol specifies how • SOAP XML needs to be encoded • REST a call is an XML message describing the request includes API authentication calls via a web service API ENGINE SDK usage examples Developer creates An interface that Software engine an application ensures communication solves a specific task
  • 29. REST API ACCESS FROM A BROWSER API request http://search.yahooapis.com/WebSearchService/V1/webSe arch?appid=YahooDemo&query=madonna&context=Italian+sc ulptors+and+painters+of+the+renaissance+favored+the+V irgin+Mary+for+inspiration API response
  • 30. SOAP API ACCESS FROM VS2010
  • 31. SOAP API ACCESS IN POWERSHELL Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate
  • 32. API = EASY INTEGRATION & FLEXIBILITY • Integrate into existing architecture via any programming language • Improve known flaws in the current system/process • Minimize adoption barriers within the company no or little training required for stuff • Only pay for the features you need • Flexible deployment: • Host API on site = Secure data exchange • Access the API in the cloud = Save on tech support & hardware
  • 33. WHICH API IS BEST FOR YOU? I need to take some text and get a list of the important entities/keywords/phrases. Y: Term Extractor API restrictions OpenCalais Supported languages BeliefNetworks Quality of results OpenAmplify Semantic links AlchemyAPI 2nd Synonyms/Duplicates Evri 1st Blog post on API comparison: faganm.com/blog
  • 34. HOW TO CHOOSE AN API: • Define a specific task • Think of what features are important • Get prepared: • Subscribe for API keys • Get SDKs • Learn libraries • Find representative data • Build a test framework • Compare results
  • 35. METADATA EXTRACTION IN SHAREPOINT Demo Pingar’s add-on for SharePoint 2010 built using a text analytics API
  • 36. INTEGRATING APIS INTO SCANNING Video Using Fuji Xerox SmartConnect and Pingar API to scan documents in batch into SharePoint http://www.youtube.com/watch?v=kluVp25upag
  • 38. THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS • What can be automated? • Metadata extraction, Data entry, Opinion mining, Sanitization, Doc approval, Summarization, … • How to integrate text analytics into existing SharePoint applications? • Easy! Via an API • How to find the right text analytics API? • Review what’s available Set up an experiment Compare results
  • 39. Thank you to all of our Sponsors

Editor's Notes

  • #2: Opening slide please include
  • #3: How many hours per week does an average person that uses a computer spends on Searching?What the heck is text analytics, a 101 introduction course…How API work and why they are great for both business people and developers.
  • #12: What are your primary applications where text comes into play?