Semantic Search
Ready to Use?


Dr Victoria Uren
Motivation



“The classic keyword search box exerts a powerful gravitational pull.
Academics and industry researchers need to achieve the intellectual
‘escape velocity’ necessary to revolutionize search. They must invest
much more in bold strategies that can achieve natural-language
searching and answering, rather than providing the electronic
equivalent of the index at the back of a reference book. “

Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476,
pp25-26


“A little semantics goes a long way”
Jim Hendler
Plan


 Introduction - What is semantic search?

 Research Background
    How it works
    Interface types
    Research Issues

 What is usable?
    For web search
    For corporate data management
Introduction
Search as we know it


 Full text search
     TF-IDF & other statistical approaches
     PageRank – exploiting hyperlink graph

 Controlled term search
     OPAC
     MESH etc.

 Other metadata
     Date of publication, author etc.

 Output typically ranked pages, records, documents
Semantic Search
Classic IR perspective


  Improve statistical/link based search of documents / webpages
  by better understanding user’s information need

  Resolve ambiguity
     Clustering

  Query expansion
     Past searches, WordNet etc. to suggest related terms
NetIKX Semantic Search Presentation
NetIKX Semantic Search Presentation
Semantic Search
Web 3.0 perspective


  Improve search over machine understandable data which
  may, or may not, include annotated documents

  Search for entities (people, products …)

  Search for facts (capital of Georgia?)

  Fuse knowledge from different sources

  Exploit structure of formal knowledge
     Broader / narrower plus much more
Web 3.0 Search is
Metadata search


  So more like
     Searching a relational database
        E.g. an OPAC
     Search of the deep web

  BUT linked data is “heterogeneous”
     Multiple domains mixed together

  Microformats & RDFa are from multiple sources
     Quality & consistency variable
Benefits of Semantic Search


 Machine understandability
    i.e. controlled by “ontologies” so you can reason over it
    Supports entity search

 Ambiguity
    Seat/SEAT

 Broader/narrower
     Exploiting hierarchical class relations

 Complex queries over triples
    E.g. Joint between mild steel and stainless steel

 Heterogeneity
     Mappings between ontologies (silo bridging)
Research Systems
Formal queries over RDF


 SQL-like languages
    SPARQL , SeRQL

 Xpath like languages
    Xquery, Rpath

 Others
    Metalog (controlled English)
    F-logic
    RDF-QBE (query by example)
          James Bailey et al., Web and Semantic Web Query
          Languages: A Survey. Reasoning Web 2005: 35-133
Sample SPARQL

         Subject                                       Object
                              Predicate

SELECT ?x
WHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" }

PREFIX vcard:      http://www.w3.org/2001/vcard-rdf/3.0#

SELECT ?y ?givenName
WHERE { ?y vcard:Family "Smith" .
        ?y vcard:Given ?givenName . }


Examples from http://jena.sourceforge.net/ARQ/Tutorial/
Interfaces for Query Generation



  Keyword

  Forms

  Graph based

  Question answering

  Tabular browsers
Keyword based


 Aims to be as close as possible to Google-like keyword search

 Pluses
    Minimal learning curve for users
    Can handle heterogeneity

 Minus
    Query complexity is limited to Entity search & Simple
    triples
SemSearch




 Y. Lei, V. Uren, and E. Motta, A Ranking-Driven
 Approach to Semantic Search, Poster in ASWC 2008
SemSearch


4 matches                     6 matches
(2 classes & 2 individuals)   (relations)




      Total queries generated = 4*6 = 24
             for “News: Victoria“
Forms


 Familiar interface metaphor
   Database search
   Product search

 Plus
    Allows construction of more complex searches

 Minus
    Can’t handle heterogeneous open web - forms need to be
    pre-defined
NetIKX Semantic Search Presentation
NetIKX Semantic Search Presentation
Graph-based Search


 Aim is to expose the structure of the ontology to the user to
 scaffold query formulation

 Pluses
    Good for single ontology environments
    Helps the user comprehend the domain

 Minuses
    Can become unwieldy with big and complex domains
NetIKX Semantic Search Presentation
Question Answering


 Natural language input
    “What is the capital of Georgia?”

 Translation process transforms the natural language into a formal
 query

 Pluses
    Relatively complex queries possible (intersection of 2 triples)
    Can deal with heterogeneity
    User doesn’t need to understand the ontology

 Minuses
    Heavy computation
AquaLog: question answering

What are the           which is,      project, has-            AKT,
projects               projects,      project-member/          Dot.KoM
of Vanessa?            vanessa        has-project-leader,
                                      vanessa
    Natural
                        Linguistic            Logical
   Language                                                       Answer
                          Triple              Triples
     Query




         GATE                  Relation               Semantic
         components            Similarity             match
                               Service
               Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An
               ontology-driven question answering system for organizational
               semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.
Tabular Browsing


 Start with keyword search expand by browsing through links

 Pluses
    Supports data exploration
    Output as sets of facts

 Minuses
    Not suitable for heterogeneous datasets
    Can be slow
Parallax
(http://www.freebase.com/labs/parallax/)
Research Challenges


 Usability / expressivity trade off

 Heterogeneity
    Ontologies, quality, provenance
    Mapping, filtering

 Security & Privacy
    Personal data, social web

 Scalability
Near Commercial Systems
Usable Web3.0 Tools


  For Web search

  For Corporate data management




NOTE – a personal selection – I’m not endorsing any of these!
Sig.ma (Semantic Information Mashup) http://sig.ma



  Runs off Sindice crawl of pages with embedded RDFa and
  other microformats

  Uses a keyword search for entities

  No attempt at fusion or disambiguation
Web Search -Sig.ma
NetIKX Semantic Search Presentation
NetIKX Semantic Search Presentation
Google RichSnippets


 Entity data based on microformats, RDFa, microdata
    Reviews
    People
    Products (GoodRelations)
    Businesses & Organizations
    Recipes
    Events
    Video

 Supports entity search, with keyword search & facetted browsing

 Harvested from sites which supply the data in the required formats
NetIKX Semantic Search Presentation
Wolfram|Alpha
http://www.wolframalpha.com/




    Focus is on computational knowledge

    Natural language question input

    Uses its own proprietary knowledge base
NetIKX Semantic Search Presentation
DBpedia
http://dbpedia.neofonie.de/browse/




    Searches factual information extracted from Wikipedia as RDF

    Facetted browse approach in the home page

    BUT used in many many other research & Open Linked Data
    sites (e.g. Sig.ma)
NetIKX Semantic Search Presentation
Usable Web3.0 Tools


 For Web Search

 For Corporate Data Management

    Opportunity for bridging data silos
    Keyword search has never been as good for CMS and
    Intranet as for internet
       Need experts to configure free text search well
       Distribution of terms can be skewed – impossible to
       configure
    Web3.0 is a network native technology
Drupal 7


 One of the most popular CMS
    E.g. Recovery.gov was originally on Drupal
    Semantic Drupal research pioneered by DERI Galway

 Open Source
    Developers often prefer it to Sharepoint

 RDFa export as standard from CMS structure (no annotation needed)
    Publish structured data that Google, Sindice etc. can harvest

 API methods built in

 Search NOT built in
NetIKX Semantic Search Presentation
Virtuoso
(http://virtuoso.openlinksw.com/)




     Hybrid server
           XML
           SQL
           RDF
           Free Text

     Supporting
           Merging of data silos in different formats
           Production of Web applications & services
           Large Scale
           Open Source version
Ready to use?


Beyond the TRL3-5 “valley of
Death”

TRL7? for facetted browse, server
technology

Not yet a stable market -
technologies like SearchMonkey
may come & go
Acknowledgements


 People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija
 Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez,
 Enrico Motta

 Projects: X-Media, OpenKnowledge, AKT, SmartProducts

More Related Content

PPT
Folksonomies: a bottom-up social categorization system
PDF
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PDF
Analysing & Improving Learning Resources Markup on the Web
PPTX
It19 20140721 linked data personal perspective
PDF
Trustworthy AI and Open Science
PPTX
Semantic Web, Ontology, and Ontology Learning: Introduction
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
PDF
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Folksonomies: a bottom-up social categorization system
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
Analysing & Improving Learning Resources Markup on the Web
It19 20140721 linked data personal perspective
Trustworthy AI and Open Science
Semantic Web, Ontology, and Ontology Learning: Introduction
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web

What's hot (20)

PPTX
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
PPTX
Software Sustainability: Better Software Better Science
PPTX
Usage of Linked Data: Introduction and Application Scenarios
KEY
Semantic Web and Linked Open Data
PPTX
The Standardization of Semantic Web Ontology
PDF
Resource description framework
PPTX
Question answering in linked data
PPTX
Semantic web
PPT
euclid_linkedup WWW tutorial (Besnik Fetahu)
KEY
Introduction to the Semantic Web
PPTX
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
PPT
Porting Library Vocabularies to the Semantic Web - IFLA 2010
PPTX
Data-mining the Semantic Web
PPTX
Working with data.open.ac.uk, the Linked Data Platform of the Open University
PPTX
Experience from 10 months of University Linked Data
PDF
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
PPTX
An Introduction to Information Retrieval and Applications
PPTX
Introduction to Information Retrieval
PPTX
PhD Viva - Disambiguating Identity Web References using Social Data
PPTX
How the Web can change social science research (including yours)
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Software Sustainability: Better Software Better Science
Usage of Linked Data: Introduction and Application Scenarios
Semantic Web and Linked Open Data
The Standardization of Semantic Web Ontology
Resource description framework
Question answering in linked data
Semantic web
euclid_linkedup WWW tutorial (Besnik Fetahu)
Introduction to the Semantic Web
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Data-mining the Semantic Web
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Experience from 10 months of University Linked Data
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
An Introduction to Information Retrieval and Applications
Introduction to Information Retrieval
PhD Viva - Disambiguating Identity Web References using Social Data
How the Web can change social science research (including yours)
Ad

Viewers also liked (6)

DOCX
ODP
Tutorial titulo blog mrs
ODP
Mrs tutorial animales
ODP
Como facer letras graffiti no gimp
PPTX
Developing a Serious Game for PSS
ODP
Top 20 [Hall of Fame] iPhone & iPad Games in The Market
Tutorial titulo blog mrs
Mrs tutorial animales
Como facer letras graffiti no gimp
Developing a Serious Game for PSS
Top 20 [Hall of Fame] iPhone & iPad Games in The Market
Ad

Similar to NetIKX Semantic Search Presentation (20)

PDF
Semantic Search Tutorial at SemTech 2012
ODT
Riding The Semantic Wave
PPT
Peter Mika's Presentation at SSSW 2011
PPT
Semantic Search
PPTX
Recent Trends in Semantic Search Technologies
PPTX
Sem tech2013 tutorial
PPTX
Large-Scale Semantic Search
PPTX
SemTech 2011 Semantic Search tutorial
PDF
From Linked Data to Semantic Applications
PPTX
Semantic Search tutorial at SemTech 2012
PPTX
Making things findable
PDF
G Antoniou Frank Van Harmelen A Semantic Web Primer
PPTX
Semantic Search at Yahoo
PPT
Semantic Search overview at SSSW 2012
PPT
Slawek Korea
PDF
The technical case for a semantic web
PPTX
Making the Web Searchable - Keynote ICWE 2015
PPTX
(Keynote) Peter Mika - “Making the Web Searchable”
PPT
Related Entity Finding on the Web
PPT
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Search Tutorial at SemTech 2012
Riding The Semantic Wave
Peter Mika's Presentation at SSSW 2011
Semantic Search
Recent Trends in Semantic Search Technologies
Sem tech2013 tutorial
Large-Scale Semantic Search
SemTech 2011 Semantic Search tutorial
From Linked Data to Semantic Applications
Semantic Search tutorial at SemTech 2012
Making things findable
G Antoniou Frank Van Harmelen A Semantic Web Primer
Semantic Search at Yahoo
Semantic Search overview at SSSW 2012
Slawek Korea
The technical case for a semantic web
Making the Web Searchable - Keynote ICWE 2015
(Keynote) Peter Mika - “Making the Web Searchable”
Related Entity Finding on the Web
Semantic Web research anno 2006:main streams, popular falacies, current statu...

More from urvics (8)

PDF
Sari18 sept2015
PPTX
SSC 2015 beer game evaluation
PPTX
Da open day 2015
PPTX
Degree Apprenticeships Launch to Employers
PPTX
Servitization ict
PPTX
Vu17072014
PPTX
Linked data flows in multi-player games for servitization
PDF
Kecsm2012uren
Sari18 sept2015
SSC 2015 beer game evaluation
Da open day 2015
Degree Apprenticeships Launch to Employers
Servitization ict
Vu17072014
Linked data flows in multi-player games for servitization
Kecsm2012uren

Recently uploaded (20)

PDF
Compact First Student's Book Cambridge Official
PPTX
Neurology of Systemic disease all systems
PDF
FYJC - Chemistry textbook - standard 11.
PPTX
Thinking Routines and Learning Engagements.pptx
PDF
GSA-Past-Papers-2010-2024-2.pdf CSS examination
PPT
hemostasis and its significance, physiology
PPTX
4. Diagnosis and treatment planning in RPD.pptx
PDF
BSc-Zoology-02Sem-DrVijay-Comparative anatomy of vertebrates.pdf
DOCX
THEORY AND PRACTICE ASSIGNMENT SEMESTER MAY 2025.docx
PDF
Chevening Scholarship Application and Interview Preparation Guide
PPTX
ACFE CERTIFICATION TRAINING ON LAW.pptx
PPTX
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
PPTX
Neurological complocations of systemic disease
PPTX
IT infrastructure and emerging technologies
PDF
FAMILY PLANNING (preventative and social medicine pdf)
PPTX
Math 2 Quarter 2 Week 1 Matatag Curriculum
PDF
GIÁO ÁN TIẾNG ANH 7 GLOBAL SUCCESS (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) NĂM HỌ...
PPTX
Diploma pharmaceutics notes..helps diploma students
PPTX
Approach to a child with acute kidney injury
PDF
LATAM’s Top EdTech Innovators Transforming Learning in 2025.pdf
Compact First Student's Book Cambridge Official
Neurology of Systemic disease all systems
FYJC - Chemistry textbook - standard 11.
Thinking Routines and Learning Engagements.pptx
GSA-Past-Papers-2010-2024-2.pdf CSS examination
hemostasis and its significance, physiology
4. Diagnosis and treatment planning in RPD.pptx
BSc-Zoology-02Sem-DrVijay-Comparative anatomy of vertebrates.pdf
THEORY AND PRACTICE ASSIGNMENT SEMESTER MAY 2025.docx
Chevening Scholarship Application and Interview Preparation Guide
ACFE CERTIFICATION TRAINING ON LAW.pptx
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
Neurological complocations of systemic disease
IT infrastructure and emerging technologies
FAMILY PLANNING (preventative and social medicine pdf)
Math 2 Quarter 2 Week 1 Matatag Curriculum
GIÁO ÁN TIẾNG ANH 7 GLOBAL SUCCESS (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) NĂM HỌ...
Diploma pharmaceutics notes..helps diploma students
Approach to a child with acute kidney injury
LATAM’s Top EdTech Innovators Transforming Learning in 2025.pdf

NetIKX Semantic Search Presentation

  • 1. Semantic Search Ready to Use? Dr Victoria Uren
  • 2. Motivation “The classic keyword search box exerts a powerful gravitational pull. Academics and industry researchers need to achieve the intellectual ‘escape velocity’ necessary to revolutionize search. They must invest much more in bold strategies that can achieve natural-language searching and answering, rather than providing the electronic equivalent of the index at the back of a reference book. “ Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476, pp25-26 “A little semantics goes a long way” Jim Hendler
  • 3. Plan Introduction - What is semantic search? Research Background How it works Interface types Research Issues What is usable? For web search For corporate data management
  • 5. Search as we know it Full text search TF-IDF & other statistical approaches PageRank – exploiting hyperlink graph Controlled term search OPAC MESH etc. Other metadata Date of publication, author etc. Output typically ranked pages, records, documents
  • 6. Semantic Search Classic IR perspective Improve statistical/link based search of documents / webpages by better understanding user’s information need Resolve ambiguity Clustering Query expansion Past searches, WordNet etc. to suggest related terms
  • 9. Semantic Search Web 3.0 perspective Improve search over machine understandable data which may, or may not, include annotated documents Search for entities (people, products …) Search for facts (capital of Georgia?) Fuse knowledge from different sources Exploit structure of formal knowledge Broader / narrower plus much more
  • 10. Web 3.0 Search is Metadata search So more like Searching a relational database E.g. an OPAC Search of the deep web BUT linked data is “heterogeneous” Multiple domains mixed together Microformats & RDFa are from multiple sources Quality & consistency variable
  • 11. Benefits of Semantic Search Machine understandability i.e. controlled by “ontologies” so you can reason over it Supports entity search Ambiguity Seat/SEAT Broader/narrower Exploiting hierarchical class relations Complex queries over triples E.g. Joint between mild steel and stainless steel Heterogeneity Mappings between ontologies (silo bridging)
  • 13. Formal queries over RDF SQL-like languages SPARQL , SeRQL Xpath like languages Xquery, Rpath Others Metalog (controlled English) F-logic RDF-QBE (query by example) James Bailey et al., Web and Semantic Web Query Languages: A Survey. Reasoning Web 2005: 35-133
  • 14. Sample SPARQL Subject Object Predicate SELECT ?x WHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" } PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0# SELECT ?y ?givenName WHERE { ?y vcard:Family "Smith" . ?y vcard:Given ?givenName . } Examples from http://jena.sourceforge.net/ARQ/Tutorial/
  • 15. Interfaces for Query Generation Keyword Forms Graph based Question answering Tabular browsers
  • 16. Keyword based Aims to be as close as possible to Google-like keyword search Pluses Minimal learning curve for users Can handle heterogeneity Minus Query complexity is limited to Entity search & Simple triples
  • 17. SemSearch Y. Lei, V. Uren, and E. Motta, A Ranking-Driven Approach to Semantic Search, Poster in ASWC 2008
  • 18. SemSearch 4 matches 6 matches (2 classes & 2 individuals) (relations) Total queries generated = 4*6 = 24 for “News: Victoria“
  • 19. Forms Familiar interface metaphor Database search Product search Plus Allows construction of more complex searches Minus Can’t handle heterogeneous open web - forms need to be pre-defined
  • 22. Graph-based Search Aim is to expose the structure of the ontology to the user to scaffold query formulation Pluses Good for single ontology environments Helps the user comprehend the domain Minuses Can become unwieldy with big and complex domains
  • 24. Question Answering Natural language input “What is the capital of Georgia?” Translation process transforms the natural language into a formal query Pluses Relatively complex queries possible (intersection of 2 triples) Can deal with heterogeneity User doesn’t need to understand the ontology Minuses Heavy computation
  • 25. AquaLog: question answering What are the which is, project, has- AKT, projects projects, project-member/ Dot.KoM of Vanessa? vanessa has-project-leader, vanessa Natural Linguistic Logical Language Answer Triple Triples Query GATE Relation Semantic components Similarity match Service Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An ontology-driven question answering system for organizational semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.
  • 26. Tabular Browsing Start with keyword search expand by browsing through links Pluses Supports data exploration Output as sets of facts Minuses Not suitable for heterogeneous datasets Can be slow
  • 28. Research Challenges Usability / expressivity trade off Heterogeneity Ontologies, quality, provenance Mapping, filtering Security & Privacy Personal data, social web Scalability
  • 30. Usable Web3.0 Tools For Web search For Corporate data management NOTE – a personal selection – I’m not endorsing any of these!
  • 31. Sig.ma (Semantic Information Mashup) http://sig.ma Runs off Sindice crawl of pages with embedded RDFa and other microformats Uses a keyword search for entities No attempt at fusion or disambiguation
  • 35. Google RichSnippets Entity data based on microformats, RDFa, microdata Reviews People Products (GoodRelations) Businesses & Organizations Recipes Events Video Supports entity search, with keyword search & facetted browsing Harvested from sites which supply the data in the required formats
  • 37. Wolfram|Alpha http://www.wolframalpha.com/ Focus is on computational knowledge Natural language question input Uses its own proprietary knowledge base
  • 39. DBpedia http://dbpedia.neofonie.de/browse/ Searches factual information extracted from Wikipedia as RDF Facetted browse approach in the home page BUT used in many many other research & Open Linked Data sites (e.g. Sig.ma)
  • 41. Usable Web3.0 Tools For Web Search For Corporate Data Management Opportunity for bridging data silos Keyword search has never been as good for CMS and Intranet as for internet Need experts to configure free text search well Distribution of terms can be skewed – impossible to configure Web3.0 is a network native technology
  • 42. Drupal 7 One of the most popular CMS E.g. Recovery.gov was originally on Drupal Semantic Drupal research pioneered by DERI Galway Open Source Developers often prefer it to Sharepoint RDFa export as standard from CMS structure (no annotation needed) Publish structured data that Google, Sindice etc. can harvest API methods built in Search NOT built in
  • 44. Virtuoso (http://virtuoso.openlinksw.com/) Hybrid server XML SQL RDF Free Text Supporting Merging of data silos in different formats Production of Web applications & services Large Scale Open Source version
  • 45. Ready to use? Beyond the TRL3-5 “valley of Death” TRL7? for facetted browse, server technology Not yet a stable market - technologies like SearchMonkey may come & go
  • 46. Acknowledgements People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez, Enrico Motta Projects: X-Media, OpenKnowledge, AKT, SmartProducts