Building (Production)
Semantic Applications

Building Semantic Applications
Ontotext
• Leading semantic technology provider
– Top-5 core semantic technology developer
– Supplying components to vendors and solution
developers
• Established in year 2000
• Venture capital funding in 2008

• Head quarters in Sofia (Bulgaria); further offices in
Fairfield, CT (USA), London (UK), Innsbruck (AT) and
Varna (Bulgaria)
• Overall staff: ~70
Building Semantic Applications

#2
Ontotext Offering
• Unique technology portfolio:

– Semantic Databases: high-performance RDF DBMS,
SPARQL 1.1 support, scalable RDFS & OWL reasoning
– Text mining and Semantic Search: text-mining, IE, IR
– Web Mining: focused crawling, screen scraping, data fusion
– Linked Data Management: integration, publishing,

“reason-able views”
• Part of research projects with total budget £100M
– Partnering with SAP, IBM, Wikimedia, Google Labs, BT,
Telefonica, KT, BBC, RAI, and leading European universities
Building Semantic Applications

#3
OWLIM
• OWLIM is a family of semantic repositories for RDF(S) & OWL

• OWLIM-Lite: in-memory, fastest, scales to ~100 million
statements
• OWLIM-SE: file-based, sameAs & query optimizations, scales
to 20 billion statements
• OWLIM-Enterprise: replication cluster deployment for
resilience and high performance parallel query-answering
• OWLIM provides
– Management, integration & analysis of heterogeneous data
– Light-weight, high-performance reasoning
Building Semantic Applications

#4
OWLIM Performance
• OWLIM-SE is the only engine that can reason with more than
10 billion statements
• OWLIM-SE’s query performance is at least as good as any
engine than can handle semantics on 1 Billion statements
• OWLIM-SE successfully passes Lehigh University Benchmark
(LUBM 90000) – over 20 billion explicit & implicit statements
• OWLIM-SE is openly demonstrated via:
– FactForge (factforge.net) – ‘general knowledge’ from LOD Cloud

– Linked Life Data (linkedlifedata.com) - 25 of the most popular
life-science datasets

Building Semantic Applications

#5
FactForge and LinkedLifeData
FactForge
• 1.2B explicit
• .9B indexed inferred
• 10B retrievable
statements

LinkedLifeData
• 2.7B explicit
• 1.4B inferred
• 4.1B indexed

statements

Building Semantic Applications

#6
Sample FactForge Query
SELECT * WHERE {
?Person dbp-ont:birthPlace ?BirthPlace ;
rdf:type opencyc:Entertainer ;
owlim:hasPageRank ?RR .
?BirthPlace geo-ont:parentFeature dbpedia:Germany .
} ORDER BY DESC(?RR) LIMIT 100

• Who are the most important German entertainers
• This query involves data from DBPedia, Geonames, and
UMBEL (OpenCyc)
• It involves inference over types, sub-classes, and transitive
relationships
• Ranking the results by 'importance' – RDF rank
Building Semantic Applications

#7
FactForge Text Search Query and UI
• Get the descriptions of entities that have a FOAF topic
containing ‘American’ and something that is similar to ‘Life’
SELECT * WHERE {
?entity foaf:topic ?topic .
?entity rdfs:label ?label .
?topic onto:luceneQuery "American AND life~" .
}

• Full-Text Search
over literals and
URIs
• Web-based
auto-complete
Building Semantic Applications

#8
Efficiency Demonstrations
• Proving cost-efficiency is easy with Amazon’s EC2

• Benchmarks of OWLIM Enterprise’s Replication Cluster
– 100,000 queries for $1 total cost!
– 5M queries/hour in a 100 node cluster

• Processing TNA’s archive of Government web sites
–
–
–
–
–

42 TB of web data
1.3B files
160M unique documents analyzed and indexed
10B facts extracted and managed in an RDF database
40K GBP to process it twice in one month at EC2

Building Semantic Applications

#9
BBC World Cup 2010 Website

Delivering content...

not pages!
“ ..... we believe this
is the first large
scale, mass media
site to be using
concept extraction,
RDF and a Triple
store to deliver
content.”
Building Semantic Applications

#10
OWLIM Powered the BBC’s World Cup Web Site
“A RDF triplestore and

SPARQL approach
was chosen over and above traditional
relational database technologies due to
the requirements for interpretation of
metadata with respect to an ontological
domain model.”
Jem Rayfield,
Senior Technical Architect, BBC News and Knowledge

“It Begins ...”
Comment at ReadWriteWeb’s post on the subject

Building Semantic Applications

#11
Next DSP Application

• 200+ Countries

• 400-500 Disciplines

"maximising BBC content
and data for the audience"

• 10000+ Athletes

Building Semantic Applications

#12
Interlinking Text and Data

Building Semantic Applications

#13
Olympics Background Knowledge

Building Semantic Applications

#14
BBC Asset Authoring & Static Editing

Building Semantic Applications

#15
BBC Annotation in Graffiti

Building Semantic Applications

#16
Dynamic Semantic Publishing (World Cup)

Building Semantic Applications

#17
Dynamic Semantic Publishing (Olympics)

Technology and Case Studies

#18
The World Cup Website Statistics
• More than a million queries to OWLIM per day
– Caching was used in the architecture to allow for handling tens of
millions of requests to the web server

• Hundreds of updates per hour
• On of a cluster of several machines
– Typical DB servers with assembly cost below $10,000
Data centre 1

Data centre 2
Master A
read/write

Master B
read

Worker 3

Worker 1

Worker 6

Worker 4
Worker 5

Worker 2

Building Semantic Applications

#19
BBC Deployment Challenges
• High resilience
– replication cluster deployment across two data-centres
– resilient to the loss of a worker node, master node or data-centre

• High performance
– Parallel query answering across all available worker nodes in both
data-centres

• Rapidly changing data
– New data is inserted and existing data modified/deleted a few
hundred times an hour

• Non-trivial inference
– Expressive OWL inference where logical consistency is maintained on every
commit operation (several hundred times an hour)

Building Semantic Applications

#20
OWLIM-Enterprise: Replication Cluster
• Data replication is used to:
– Improve scalability of concurrent query requests
– Resilience – failover, online configuration

• How does it work?
–
–
–
–

Every user write request is pushed in a transaction queue
Each data write request is multiplexed to all repository instances
Each read request is dispatched to one instance only
To ensure load-balancing, each
read requests is sent to the
instance with smallest execution
queue at this point in time

Building Semantic Applications

#21
Replication Cluster - Behaviour
• The total loading/modification performance of the cluster is
equal to that of one instance
• The data scalability of the cluster is determined by the
amount of RAM of the weakest instance
• The query performance of the cluster represents the sum of
the throughputs that can be handled by each of the instances
• Failover:
– In case of failure of one or more instances, the performance
degradation is graceful
– The cluster is fully operational even when only one instance working

• Cluster can be reconfigured when running
Building Semantic Applications

#22
Replication Cluster - Types of Nodes
• Two types of nodes

• Flexible topologies possible
• Resilience to failure of workers and masters
Dispatches queries and
updates to workers
(read/write)

Queries &
updates
Master

Queries
only
Master
(hot standby)

Worker 1

Worker 3
Worker 2

Building Semantic Applications

Dispatches queries
to workers
(read only)
OWLIM-Enterprise
master nodes

OWLIM-Enterprise worker
nodes
(derivative of OWLIM-SE)
#23
Full Text Search
• Alternative information access method (different indices)

• Find information based on string elements (tokens)
• Two approaches
– Node Search – simple, but very fast token matching
– RDF Search – integrates Lucene to index 'RDF Molecules' with powerful
query expressions

• Both methods integrate with SPARQL to allow hybrid searching:
– Example: “Show all instances of renaissance artist based in Netherlands
whose name begins ‘Rem’ and rank according to most well-linked”

Building Semantic Applications

#24
RDF Rank
• OWLIM-SE includes a plug-in that allows for efficient
calculation of a modification of PageRank over RDF graphs
• Computation of rank values is fast, e.g.
– 400M LOD statements takes 310 sec (27 iterations)

• Results are available through a system predicate
• Example: get the 100 most important nodes in the RDF graph
SELECT ?n {?n rank:hasRDFRank ?r}
ORDER BY DESC(?r) LIMIT 100

Building Semantic Applications

#25
Notifications
• The client can subscribe for notifications for incoming
statements matching desired statement patterns
• The patterns are then used to filter incoming statements
– Notify the subscriber about those statements that help form a new solution of
at least one of the graph patterns
– Inferred statements are treated in the same way

• The subscriber should not rely on any particular order or
distinctness of the statement notifications
– Both inserted and deleted statements are notified

• High performance 'in-process' notifications
• Remote notification mechanism

Building Semantic Applications

#26
Geo-spatial Extensions
• Allows for the high performance evaluation of geo-spatial
queries
• Uses the WGS84 ontology
• Find points within rectangles, polygons or circles
• Compute great circle distance between points
• Example: (ordered) nearest airports to London:
SELECT DISTINCT ?airport {
dbpedia:London geo:lat ?lat1 ; geo:long ?long1 .
?airport omgeo:nearby(?lat1 ?long1 "50mi");
a dbp-ont:Airport ;
geo:lat ?lat2 ; geo:long ?long2 . }
ORDER BY ASC(omgeo:distance(?lat1, ?long1, ?lat2, ?long2) )

Building Semantic Applications

#27
Financial Publishing Case Study

Building Semantic Applications

#28
Financial Publishing Case Study

Building Semantic Applications

#29
Financial Publishing Case Study

Building Semantic Applications

#30

More Related Content

PDF
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PPTX
Scaling up Linked Data
PDF
The Semantic Web: What IAs Need to Know About Web 3.0
PPTX
Microtask Crowdsourcing Applications for Linked Data
PPTX
Building Linked Data Applications
KEY
Introduction to the Semantic Web
PPTX
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Usage of Linked Data: Introduction and Application Scenarios
Scaling up Linked Data
The Semantic Web: What IAs Need to Know About Web 3.0
Microtask Crowdsourcing Applications for Linked Data
Building Linked Data Applications
Introduction to the Semantic Web
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...

What's hot (19)

PPTX
Interaction with Linked Data
PPTX
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
PPTX
Social semantic web
PPTX
Linked Open Data in Romania
PPTX
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
PPTX
Big Linked Data - Creating Training Curricula
PPTX
NISO Webinar: Library Linked Data: From Vision to Reality
PPTX
Linked data life cycles
PDF
Paul houle resume
PPTX
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
PDF
Linked Data for the Masses: The approach and the Software
PPTX
The Semantic Data Web, Sören Auer, University of Leipzig
PDF
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
PDF
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
PDF
Documents, services, and data on the web
PDF
Linked (Open) Data
PPTX
Providing Linked Data
PDF
Linked Data (1st Linked Data Meetup Malmö)
PPTX
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Interaction with Linked Data
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
Social semantic web
Linked Open Data in Romania
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
Big Linked Data - Creating Training Curricula
NISO Webinar: Library Linked Data: From Vision to Reality
Linked data life cycles
Paul houle resume
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
Linked Data for the Masses: The approach and the Software
The Semantic Data Web, Sören Auer, University of Leipzig
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Documents, services, and data on the web
Linked (Open) Data
Providing Linked Data
Linked Data (1st Linked Data Meetup Malmö)
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...

Similar to ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semantic Applications (20)

PPTX
Incremental Export of Relational Database Contents into RDF Graphs
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PPTX
A machine learning and data science pipeline for real companies
PDF
Spark at Zillow
PPTX
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
PDF
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
PDF
Spark meetup TCHUG
PDF
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
PDF
Information Exploitation at BBN
PPTX
RDF-Gen: Generating RDF from streaming and archival data
PDF
Current & Future Use-Cases of OpenDaylight
PPTX
Essential Data Engineering for Data Scientist
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
PDF
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
PDF
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
PDF
MongoDB: What, why, when
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PPTX
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
PPTX
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Incremental Export of Relational Database Contents into RDF Graphs
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
A machine learning and data science pipeline for real companies
Spark at Zillow
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
Spark meetup TCHUG
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Information Exploitation at BBN
RDF-Gen: Generating RDF from streaming and archival data
Current & Future Use-Cases of OpenDaylight
Essential Data Engineering for Data Scientist
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
MongoDB: What, why, when
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...

More from eswcsummerschool (20)

PPTX
Semantic Aquarium - ESWC SSchool 14 - Student project
PPTX
Syrtaki - ESWC SSchool 14 - Student project
PDF
Keep fit (a bit) - ESWC SSchool 14 - Student project
PDF
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
PPTX
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
PPTX
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
PPTX
Exhibition recommendation using British Museum data and Event Registry - ESWC...
PPTX
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
PDF
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
PDF
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
PDF
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
PDF
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
PDF
Mon norton tut_publishing01
PDF
Mon domingue introduction to the school
PDF
Mon norton tut_querying cultural heritage data
PDF
Tue acosta hands_on_providinglinkeddata
PDF
Thu bernstein key_warp_speed
PDF
Fri schreiber key_knowledge engineering
PDF
Mon norton tut_queryinglinkeddata02
PDF
Mon fundulaki tut_querying linked data
Semantic Aquarium - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Mon norton tut_publishing01
Mon domingue introduction to the school
Mon norton tut_querying cultural heritage data
Tue acosta hands_on_providinglinkeddata
Thu bernstein key_warp_speed
Fri schreiber key_knowledge engineering
Mon norton tut_queryinglinkeddata02
Mon fundulaki tut_querying linked data

Recently uploaded (20)

PDF
Kalaari-SaaS-Founder-Playbook-2024-Edition-.pdf
PPTX
UCSP Section A - Human Cultural Variations,Social Differences,social ChangeCo...
PPTX
climate change of delhi impacts on climate and there effects
PDF
Unleashing the Potential of the Cultural and creative industries
PDF
Design and Evaluation of a Inonotus obliquus-AgNP-Maltodextrin Delivery Syste...
PPTX
Key-Features-of-the-SHS-Program-v4-Slides (3) PPT2.pptx
PPTX
Approach to a child with acute kidney injury
PPTX
PAIN PATHWAY & MANAGEMENT OF ACUTE AND CHRONIC PAIN SPEAKER: Dr. Rajasekhar ...
DOCX
HELMET DETECTION AND BIOMETRIC BASED VEHICLESECURITY USING MACHINE LEARNING.docx
PPTX
Neurology of Systemic disease all systems
PDF
Global strategy and action plan on oral health 2023 - 2030.pdf
PDF
New_Round_Up_6_SB.pdf download for free, easy to learn
PPTX
FILIPINO 8 Q2 WEEK 1(DAY 1).power point presentation
PDF
English 2nd semesteNotesh biology biopsy results from the other day and I jus...
PDF
GIÁO ÁN TIẾNG ANH 7 GLOBAL SUCCESS (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) NĂM HỌ...
PDF
Physical pharmaceutics two in b pharmacy
PPTX
ENGlishGrade8_Quarter2_WEEK1_LESSON1.pptx
PPTX
Copy of ARAL Program Primer_071725(1).pptx
PPTX
Ppt obs emergecy.pptxydirnbduejguxjjdjidjdbuc
PDF
FAMILY PLANNING (preventative and social medicine pdf)
Kalaari-SaaS-Founder-Playbook-2024-Edition-.pdf
UCSP Section A - Human Cultural Variations,Social Differences,social ChangeCo...
climate change of delhi impacts on climate and there effects
Unleashing the Potential of the Cultural and creative industries
Design and Evaluation of a Inonotus obliquus-AgNP-Maltodextrin Delivery Syste...
Key-Features-of-the-SHS-Program-v4-Slides (3) PPT2.pptx
Approach to a child with acute kidney injury
PAIN PATHWAY & MANAGEMENT OF ACUTE AND CHRONIC PAIN SPEAKER: Dr. Rajasekhar ...
HELMET DETECTION AND BIOMETRIC BASED VEHICLESECURITY USING MACHINE LEARNING.docx
Neurology of Systemic disease all systems
Global strategy and action plan on oral health 2023 - 2030.pdf
New_Round_Up_6_SB.pdf download for free, easy to learn
FILIPINO 8 Q2 WEEK 1(DAY 1).power point presentation
English 2nd semesteNotesh biology biopsy results from the other day and I jus...
GIÁO ÁN TIẾNG ANH 7 GLOBAL SUCCESS (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) NĂM HỌ...
Physical pharmaceutics two in b pharmacy
ENGlishGrade8_Quarter2_WEEK1_LESSON1.pptx
Copy of ARAL Program Primer_071725(1).pptx
Ppt obs emergecy.pptxydirnbduejguxjjdjidjdbuc
FAMILY PLANNING (preventative and social medicine pdf)

ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semantic Applications

  • 2. Ontotext • Leading semantic technology provider – Top-5 core semantic technology developer – Supplying components to vendors and solution developers • Established in year 2000 • Venture capital funding in 2008 • Head quarters in Sofia (Bulgaria); further offices in Fairfield, CT (USA), London (UK), Innsbruck (AT) and Varna (Bulgaria) • Overall staff: ~70 Building Semantic Applications #2
  • 3. Ontotext Offering • Unique technology portfolio: – Semantic Databases: high-performance RDF DBMS, SPARQL 1.1 support, scalable RDFS & OWL reasoning – Text mining and Semantic Search: text-mining, IE, IR – Web Mining: focused crawling, screen scraping, data fusion – Linked Data Management: integration, publishing, “reason-able views” • Part of research projects with total budget £100M – Partnering with SAP, IBM, Wikimedia, Google Labs, BT, Telefonica, KT, BBC, RAI, and leading European universities Building Semantic Applications #3
  • 4. OWLIM • OWLIM is a family of semantic repositories for RDF(S) & OWL • OWLIM-Lite: in-memory, fastest, scales to ~100 million statements • OWLIM-SE: file-based, sameAs & query optimizations, scales to 20 billion statements • OWLIM-Enterprise: replication cluster deployment for resilience and high performance parallel query-answering • OWLIM provides – Management, integration & analysis of heterogeneous data – Light-weight, high-performance reasoning Building Semantic Applications #4
  • 5. OWLIM Performance • OWLIM-SE is the only engine that can reason with more than 10 billion statements • OWLIM-SE’s query performance is at least as good as any engine than can handle semantics on 1 Billion statements • OWLIM-SE successfully passes Lehigh University Benchmark (LUBM 90000) – over 20 billion explicit & implicit statements • OWLIM-SE is openly demonstrated via: – FactForge (factforge.net) – ‘general knowledge’ from LOD Cloud – Linked Life Data (linkedlifedata.com) - 25 of the most popular life-science datasets Building Semantic Applications #5
  • 6. FactForge and LinkedLifeData FactForge • 1.2B explicit • .9B indexed inferred • 10B retrievable statements LinkedLifeData • 2.7B explicit • 1.4B inferred • 4.1B indexed statements Building Semantic Applications #6
  • 7. Sample FactForge Query SELECT * WHERE { ?Person dbp-ont:birthPlace ?BirthPlace ; rdf:type opencyc:Entertainer ; owlim:hasPageRank ?RR . ?BirthPlace geo-ont:parentFeature dbpedia:Germany . } ORDER BY DESC(?RR) LIMIT 100 • Who are the most important German entertainers • This query involves data from DBPedia, Geonames, and UMBEL (OpenCyc) • It involves inference over types, sub-classes, and transitive relationships • Ranking the results by 'importance' – RDF rank Building Semantic Applications #7
  • 8. FactForge Text Search Query and UI • Get the descriptions of entities that have a FOAF topic containing ‘American’ and something that is similar to ‘Life’ SELECT * WHERE { ?entity foaf:topic ?topic . ?entity rdfs:label ?label . ?topic onto:luceneQuery "American AND life~" . } • Full-Text Search over literals and URIs • Web-based auto-complete Building Semantic Applications #8
  • 9. Efficiency Demonstrations • Proving cost-efficiency is easy with Amazon’s EC2 • Benchmarks of OWLIM Enterprise’s Replication Cluster – 100,000 queries for $1 total cost! – 5M queries/hour in a 100 node cluster • Processing TNA’s archive of Government web sites – – – – – 42 TB of web data 1.3B files 160M unique documents analyzed and indexed 10B facts extracted and managed in an RDF database 40K GBP to process it twice in one month at EC2 Building Semantic Applications #9
  • 10. BBC World Cup 2010 Website Delivering content... not pages! “ ..... we believe this is the first large scale, mass media site to be using concept extraction, RDF and a Triple store to deliver content.” Building Semantic Applications #10
  • 11. OWLIM Powered the BBC’s World Cup Web Site “A RDF triplestore and SPARQL approach was chosen over and above traditional relational database technologies due to the requirements for interpretation of metadata with respect to an ontological domain model.” Jem Rayfield, Senior Technical Architect, BBC News and Knowledge “It Begins ...” Comment at ReadWriteWeb’s post on the subject Building Semantic Applications #11
  • 12. Next DSP Application • 200+ Countries • 400-500 Disciplines "maximising BBC content and data for the audience" • 10000+ Athletes Building Semantic Applications #12
  • 13. Interlinking Text and Data Building Semantic Applications #13
  • 14. Olympics Background Knowledge Building Semantic Applications #14
  • 15. BBC Asset Authoring & Static Editing Building Semantic Applications #15
  • 16. BBC Annotation in Graffiti Building Semantic Applications #16
  • 17. Dynamic Semantic Publishing (World Cup) Building Semantic Applications #17
  • 18. Dynamic Semantic Publishing (Olympics) Technology and Case Studies #18
  • 19. The World Cup Website Statistics • More than a million queries to OWLIM per day – Caching was used in the architecture to allow for handling tens of millions of requests to the web server • Hundreds of updates per hour • On of a cluster of several machines – Typical DB servers with assembly cost below $10,000 Data centre 1 Data centre 2 Master A read/write Master B read Worker 3 Worker 1 Worker 6 Worker 4 Worker 5 Worker 2 Building Semantic Applications #19
  • 20. BBC Deployment Challenges • High resilience – replication cluster deployment across two data-centres – resilient to the loss of a worker node, master node or data-centre • High performance – Parallel query answering across all available worker nodes in both data-centres • Rapidly changing data – New data is inserted and existing data modified/deleted a few hundred times an hour • Non-trivial inference – Expressive OWL inference where logical consistency is maintained on every commit operation (several hundred times an hour) Building Semantic Applications #20
  • 21. OWLIM-Enterprise: Replication Cluster • Data replication is used to: – Improve scalability of concurrent query requests – Resilience – failover, online configuration • How does it work? – – – – Every user write request is pushed in a transaction queue Each data write request is multiplexed to all repository instances Each read request is dispatched to one instance only To ensure load-balancing, each read requests is sent to the instance with smallest execution queue at this point in time Building Semantic Applications #21
  • 22. Replication Cluster - Behaviour • The total loading/modification performance of the cluster is equal to that of one instance • The data scalability of the cluster is determined by the amount of RAM of the weakest instance • The query performance of the cluster represents the sum of the throughputs that can be handled by each of the instances • Failover: – In case of failure of one or more instances, the performance degradation is graceful – The cluster is fully operational even when only one instance working • Cluster can be reconfigured when running Building Semantic Applications #22
  • 23. Replication Cluster - Types of Nodes • Two types of nodes • Flexible topologies possible • Resilience to failure of workers and masters Dispatches queries and updates to workers (read/write) Queries & updates Master Queries only Master (hot standby) Worker 1 Worker 3 Worker 2 Building Semantic Applications Dispatches queries to workers (read only) OWLIM-Enterprise master nodes OWLIM-Enterprise worker nodes (derivative of OWLIM-SE) #23
  • 24. Full Text Search • Alternative information access method (different indices) • Find information based on string elements (tokens) • Two approaches – Node Search – simple, but very fast token matching – RDF Search – integrates Lucene to index 'RDF Molecules' with powerful query expressions • Both methods integrate with SPARQL to allow hybrid searching: – Example: “Show all instances of renaissance artist based in Netherlands whose name begins ‘Rem’ and rank according to most well-linked” Building Semantic Applications #24
  • 25. RDF Rank • OWLIM-SE includes a plug-in that allows for efficient calculation of a modification of PageRank over RDF graphs • Computation of rank values is fast, e.g. – 400M LOD statements takes 310 sec (27 iterations) • Results are available through a system predicate • Example: get the 100 most important nodes in the RDF graph SELECT ?n {?n rank:hasRDFRank ?r} ORDER BY DESC(?r) LIMIT 100 Building Semantic Applications #25
  • 26. Notifications • The client can subscribe for notifications for incoming statements matching desired statement patterns • The patterns are then used to filter incoming statements – Notify the subscriber about those statements that help form a new solution of at least one of the graph patterns – Inferred statements are treated in the same way • The subscriber should not rely on any particular order or distinctness of the statement notifications – Both inserted and deleted statements are notified • High performance 'in-process' notifications • Remote notification mechanism Building Semantic Applications #26
  • 27. Geo-spatial Extensions • Allows for the high performance evaluation of geo-spatial queries • Uses the WGS84 ontology • Find points within rectangles, polygons or circles • Compute great circle distance between points • Example: (ordered) nearest airports to London: SELECT DISTINCT ?airport { dbpedia:London geo:lat ?lat1 ; geo:long ?long1 . ?airport omgeo:nearby(?lat1 ?long1 "50mi"); a dbp-ont:Airport ; geo:lat ?lat2 ; geo:long ?long2 . } ORDER BY ASC(omgeo:distance(?lat1, ?long1, ?lat2, ?long2) ) Building Semantic Applications #27
  • 28. Financial Publishing Case Study Building Semantic Applications #28
  • 29. Financial Publishing Case Study Building Semantic Applications #29
  • 30. Financial Publishing Case Study Building Semantic Applications #30