Semantics and ML
Semantic Integration Is What You Do Before The Deep Learning
Vladimir Alexiev, Chief Data Architect, Sirma AI (Ontotext)
dev.bg Machine Learning seminar, 13 May 2019, Sofia
Outline
• Semantic Web and Linked Data
• Knowledge Graphs
• Ontotext Projects
• Ontotext Demos
• Use of Machine Learning
What is Semantic Web and Linked Open Data?
• Semantic Web and Semantic Technologies
• Exposing data and datasets to machines
• Allowing machines to "understand" a bit of the data. Not giving a "higher meaning" to data
• RDF, Ontologies, RDF Shapes
• RDF: simple graph data model: triples (S,P,O), also quads (S,P,O,C=G)
• RDFS and OWL Ontologies: describe classes, properties, subclasses, sub-properties,
description logic constructs
• RDF Shapes (Application Profiles): describe constraints on RDF data
• May use with or without schema; the schema is part of the data
• Linked Open Data
• Expose datasets globally, making each entity/data point addressable (URL)
• Use global identifiers not ambiguous names: "things not strings"
• Link entities
Web 1.0, 2.0, 3.0
• Web 1.0: linked documents (World Wide Web)
• Before it there was ftp, gopher, online library catalogs…
• Web 2.0: web applications, social web
• Has Facebook taken over the web? New "decentralization" movement
• Web 3.0: linked data (Giant Global Graph)
• Metadata about documents, but also data about real-world entities: persons,
organizations, hierarchy, projects, publications, companies, startups, transactions,
networks, servers, printers, IoT things, etc
Where did it come from?
• TimBL CERN proposal,
1989:
• Both Web (1.0) and
Semantic Web (3.0)
• "Vague but Exciting"
• Not just documents, but
also real-world entities
• Why was it successful?
• Not the first nor the "best"
hypertext proposal
• But simple, workable, most
importantly open
LOD Cloud
WebDataCommons
Dec 2018: 30B
triples
What does LOD know about TimBL?
• TimBL at
Wikidata
Reasonator
• Names in 50
languages
• Description
is auto-
generated
• Parents
confirmed 3
times (with
different
details not
shown)
What does LOD know about TimBL?
• Depth of
Information
on TimBL
• Links to ~200
authority
files
• Info about
~20 awards
• Life Timeline
• etc, etc
Knowledge Graphs
Google Knowledge Graph
Everybody is Building a KG!
KG Conference, 7-8 May, Columbia, NY
• Digital Commerce
• Airbnb - Knowledge Graph at Airbnb
• Amazon - Deep Learning for Knowledge Extraction and Integration to build the Amazon Product
Graph
• Uber - Building an Enterprise Knowledge Graph at Uber: Lessons from Reality
• Pitney Bowes - Intelligent Customer Service Using Knowledge Graphs
• Financial Services
• Causality Link - A Perspective on the Reasoning Power of Knowledge Graphs
• Capital One - Knowledge Graph Pilot Provides Value
• Goldman Sachs - Pythia: the Goldman Sachs Social Graph
• TigerGraph - Analyzing Time-varying Transitive Risk in Swap Networks using Graphs
• Refinitiv Financial - Practical Use Cases and Challenges to Implement Graphs in Financial
Services: Combating Financial Crime
• Wells Fargo - Knowledge Graphs and AI: The Future of Financial Data
• Forensics
• OCCRP - Using Graphs and Data Integration to Track Organised Crime
• Enigma.io - Impact and Insights from Public Data: Fighting Money Laundering by Linking and
Resolving Entities
• Refinitiv Financial - Practical Use Cases and Challenges to Implement Graphs in Financial
Services: Combating Financial Crime
• Health Care, Government, Supply Chain, Libraries
• AstraZeneca - Fair Data Knowledge Graphs (From Theory to Practice)
• Montefiore Hospital - The Chasm of a Million Analytics, and How to
Bridge it?
• United Nations - A Graph as a Means to Store Unpredictable Knowledge
– A Practical Implementation
• JSTOR Labs - Why Wikibase? Why not?
• Eccenca - Knowledge Graph for Digital Transformation in the Supply-
Chain
• German National Library of Science and Technology - Creating a
knowledge graph based Enterprise Innovation Architecture
• How To...
• Diffbot - Knowledge Graphs for AI
• Accenture Labs - Using a Domain Knowledge Graph to Manage AI at
Scale
• Capsenta - Designing and Building Enterprise Knowledge Graphs from
Relational Databases in the Real World
• Google AI - Wikidata, Knowledge Graphs, and Beyond
• IBM Research - Extending Knowledge Graphs using Distantly Supervised
Deep Nets
• Microsoft - Building a Large-scale, Accurate and Fresh Knowledge
Graph
• Neo4J - A Real-World Guide to Building Your Knowledge Graphs
• Collibra - Collibra's Context Graph
• Ontotext - How Analytics on Big Knowledge Graphs Help Data Linking:
Company Importance and Similarity Demo
KG & ML Literature & Seminars
• Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web.
• Dagstuhl Seminar 18371, Mar 2019
• Grand Challenges: structure of knowledge & data and scale
• Creation of Knowledge Graphs
• Knowledge Integration at Scale
• Knowledge Dynamics and Evolution
• Evaluation of Knowledge Graphs
• Combining Graph Queries with Graph Analysis
• (Re)Defining Knowledge Graphs
• NLP and Knowledge Graphs
• ML and Knowledge Graphs
• Human and Social Factors of Knowledge Graphs
• Applications of Knowledge Graphs
• Knowledge Graphs and the Web
• Deep Learning for the Masses (… and The Semantic Layer), Favio Vázquez, Nov 20, 2018
• Acknowledgement: my title is stolen from this blog post
• 4th Workshop on Semantic Deep Learning (SemDeep-4) at ISWC 2018
• Big Data Semantics. Journal on Data Semantics, Apr 2018. DOI: 10.1007/s13740-018-0086-2
• Forbes: Why Machine Learning Needs Semantics Not Just Statistics (Jan 2019)
• Wired: Amazon Alexa and the Search for the One Perfect Answer (Feb 2019)
Thomson Reuters permid Company Graph
Microsoft Academic Graph
Semantic Scholar: search
Semantic Scholar: Author Page
Semantic Scholar: Topic Page
Wikidata Scholia: comparing authors
Sirma AI (Ontotext)
Ontotext Essential Facts
• World-leading
• Semantic technology vendor established year
2000
• 65 staff: 7 PhD, 30 MS, 20 BS, 6 university
lecturers
• Over 400 person-years invested in R&D
• Part of Sirma Group: 400 persons, public
company (BSE:SKK)
• Profitable and growing
• 80% of revenue from commercial projects
• Innovator
• Attracted $15M in innovation funding
• Trendsetter
• Member of: W3C, EDMC (FIBO), ODI, LDBC,
STI, DBPedia Association
• Ontotext Innovation Awards
• Innovative Enterprise of the Year 2017
• EU Innovation Radar Prize 2016 nomination
• BAIT Business Innovation Award 2014
• Innovative Enterprise of the Year 2014
• Washington Post “Destination Innovation”
Competition 2014 Award
• Pythagoras Award 2010
• Most successful BG company in EU FP
projects
Some of Our Clients (selection)
Ontotext Approach and Applications
Ontotext GraphDB, a Leading Graph Database
• Source:
db-engines.com
ranking of graph
databases
• GraphDB Workbench: User-
friendly DB admin and querying • REST API for database access
• Plugins / Connectors
OntoRefine: Uplift Tabular Data to LOD
• Easily clean and
import tabular
data
• View as RDF in
real-time with
virtual SPARQL
endpoint
• Transform
using JS & SPIN
• Import newly
created RDF
directly to
GraphDB
Knowledge Graph Platform Use Cases
• Content enrichment
• Who: STM publishers transforming their business model from publishing to information
• Challenges: Control & generate meta-data
• Reference projects: Elsevier, Wiley, IET, BBC, Euromoney
• Semantic search of enterprise documents
• Who: Enterprises with transactional document flows lacking analytic capabilities
• Challenges: Integrate with existing CMS/DMS + security + analytics
• Reference projects: Platts, AstraZeneca, Top-5 US bank, Top-5 German bank
• Knowledge graph development and continuous updates
• Who: Innovative businesses based on knowledge intensive processes
• Challenges: Collect, integrate, and maintain complex knowledge graph, semantic
search + analytics
• Reference projects: Top Asian business information agency, Big-4 Consultant
Semantic Data Integration
Example KG: FactForge
o DBpedia (the English version) 496M
o Geonames (all geographic features on Earth) 150M
o owl:sameAs links between DBpedia and Geonames 471K
o GLEI (global company register data) 3M
o Panama Papers DB (#LinkedLeaks) 20M
o Other datasets and ontologies: WordNet, WorldFacts, FIBO
o News metadata (2000 articles/day enriched by NOW) 1 023M
o Total size (2.2B explicit + 328M inferred statements) 2 522М
Class Exploration
o About 1400
Classes
o To cope with
this one needs
specific tools
o GraphDB
Workbench’s
Class Hierarchy
exploration
tool
Class Relations
#30
Visual Graph: Node details
#31
Reference Case for GraphDB and Ontotext Platform
Big Knowledge Graph
• 1B statements of master data
• 100M entities and concepts
• Entity linking across 5 data sources
• 1M documents, 100 KG tags/doc.
Performance
• 10 transactional updates/sec on master data
• 500 updates/sec for documents and metadata
• 100 graph queries/sec/node, incl. inferred facts
• RDFS+ reasoning: instant and transparent
• 1000 full-text searches/sec across docs and data
Text & Graph Analytics
• Extract new entities and facts from text
• Retrieval of similar documents and entities
• Automatic classification and link prediction
• Relevance and importance ranking
• Operations & Data Quality
• Multi-DC deployment across continents
• Worker nodes: 16 vCPU, 32GB RAM
• Daily updates from external data sources
• Maintain quality of linking and text analysis
• Metadata and instance data curation
Entity Awareness
• What does it mean to be "aware" of something?
• To have background info that allows some measure of
"intelligence"?
• We believe the numbers on the previous slide are a minimum that
can help a machine achieve "awareness"
• Let's try some games:
• Airports near London (within 50 miles)
• Airports near New York City
• Educational institutions near New York
• Educational institutions near Kaspichan
Demo: Ontotext Rank (and Similarity)
Ontotext R&D Projects
• More EU research projects than some BG
universities combined
• Vertical domains
• Cultural heritage (Europeana Creative, Food and
Drink, EHRI2)
• Companies (euBusinessGraph, CIMA), real estate
data (PDM) (ProDataMarket)
• Media/Publishing (TrendMiner, Multisensor, Evala)
• Fact & rumour checking (Pheme, WeVerify)
• Life Science (LarKC, KHRESMOI, KConnect,
ExaMode)
• Agriculture (BigDataGrapes)
• Science/innovation (TRR, InnoRate)
Project CIMA: Company Graph
• R&D
• Data virtualization (OBDA)
• Entity Linking
• Alignment Learning
• KG Embedding and Similarity
• Company Classification
• Company Graph
• Dataset discovery and analysis, procure
datasets
• Semantic structure mapping, taxonomy
mapping
• Semantic integration pipeline, data updates
• Cognitive Entity Matching
• Data curation
• ML algorithms and training
• Integration to Ontotext Platform, Demos
• Big Data connectors (e.g. Mongo, Cassandra)
• Cloud Services
• Demo applications
Project TRR: Science KG for FP7 Projects
• Info (Wikidata): Client: EC DG RTD (ministry of
science). Budget: 4M EUR, Duration: 4y. Partners:
PPMI (LT), Ontotext (BG), Fraunhofer (DE),
Intrasoft (LU)
• Get 8000 core FP7 projects (SP1 Collaboration)
• Build KG of science (projects, participants,
researchers, contacts, subjects, etc)
• Assess outputs (publications, datasets, patents…)
• Assess outcomes (startups, collaborations,
researcher mobility…)
• Assess impact (on research policy, economic,
societal, on health…)
Machine Learning at Ontotext
We're not a ML company but use ML for some of our tasks
ML at Ontotext
• Alignment Learning for Entity Matching
• Disambiguation for Named Entity Extraction
• Relation Learning for Relation Extraction
• Word+KG Embeddings for semantic similarity (VSA, predications)
• Ranking for auto-completion, entity popularity
Demo: News On the Web
(NOW)
GraphDB Semantic Similarity (Mar 2019)
• create hybrid similarity
searches
• use pre-built text-based
similarity vectors
• predication-based
similarity index
• run similarity indexes in
more that one iterations
• add term weights when
searching text-based
similarity indexes
• use analogical search for
predication indexing
New Developments in Bulgaria
Collaborations Between Academia and Industry
NBU MS Data Science
• Starts Sep 2019
• Covers ML, mathematics, R, Python, distributed (Spark), cloud…
• Ontotext course: Semantic Web Proof of Concept
• IICT BAS course: Semantic Text Analysis
GATE CoE: SU FMI + Chalmers Teaming
• Host
• Teaming
• Industry Supporters
Thank you!
Контакти: • Ontotext: Website, LinkedIn, Twitter, Rate GraphDB
• Vladimir Alexiev: Email, Publications, Homepage,
Resume, Linkedin; Twitter, Github
Следващо събитие:
Repeatability and reproducibility of ML research

Semantics and Machine Learning

  • 1.
    Semantics and ML SemanticIntegration Is What You Do Before The Deep Learning Vladimir Alexiev, Chief Data Architect, Sirma AI (Ontotext) dev.bg Machine Learning seminar, 13 May 2019, Sofia
  • 2.
    Outline • Semantic Weband Linked Data • Knowledge Graphs • Ontotext Projects • Ontotext Demos • Use of Machine Learning
  • 3.
    What is SemanticWeb and Linked Open Data? • Semantic Web and Semantic Technologies • Exposing data and datasets to machines • Allowing machines to "understand" a bit of the data. Not giving a "higher meaning" to data • RDF, Ontologies, RDF Shapes • RDF: simple graph data model: triples (S,P,O), also quads (S,P,O,C=G) • RDFS and OWL Ontologies: describe classes, properties, subclasses, sub-properties, description logic constructs • RDF Shapes (Application Profiles): describe constraints on RDF data • May use with or without schema; the schema is part of the data • Linked Open Data • Expose datasets globally, making each entity/data point addressable (URL) • Use global identifiers not ambiguous names: "things not strings" • Link entities
  • 4.
    Web 1.0, 2.0,3.0 • Web 1.0: linked documents (World Wide Web) • Before it there was ftp, gopher, online library catalogs… • Web 2.0: web applications, social web • Has Facebook taken over the web? New "decentralization" movement • Web 3.0: linked data (Giant Global Graph) • Metadata about documents, but also data about real-world entities: persons, organizations, hierarchy, projects, publications, companies, startups, transactions, networks, servers, printers, IoT things, etc
  • 5.
    Where did itcome from? • TimBL CERN proposal, 1989: • Both Web (1.0) and Semantic Web (3.0) • "Vague but Exciting" • Not just documents, but also real-world entities • Why was it successful? • Not the first nor the "best" hypertext proposal • But simple, workable, most importantly open
  • 6.
  • 7.
    What does LODknow about TimBL? • TimBL at Wikidata Reasonator • Names in 50 languages • Description is auto- generated • Parents confirmed 3 times (with different details not shown)
  • 8.
    What does LODknow about TimBL? • Depth of Information on TimBL • Links to ~200 authority files • Info about ~20 awards • Life Timeline • etc, etc
  • 9.
  • 10.
  • 11.
    Everybody is Buildinga KG! KG Conference, 7-8 May, Columbia, NY • Digital Commerce • Airbnb - Knowledge Graph at Airbnb • Amazon - Deep Learning for Knowledge Extraction and Integration to build the Amazon Product Graph • Uber - Building an Enterprise Knowledge Graph at Uber: Lessons from Reality • Pitney Bowes - Intelligent Customer Service Using Knowledge Graphs • Financial Services • Causality Link - A Perspective on the Reasoning Power of Knowledge Graphs • Capital One - Knowledge Graph Pilot Provides Value • Goldman Sachs - Pythia: the Goldman Sachs Social Graph • TigerGraph - Analyzing Time-varying Transitive Risk in Swap Networks using Graphs • Refinitiv Financial - Practical Use Cases and Challenges to Implement Graphs in Financial Services: Combating Financial Crime • Wells Fargo - Knowledge Graphs and AI: The Future of Financial Data • Forensics • OCCRP - Using Graphs and Data Integration to Track Organised Crime • Enigma.io - Impact and Insights from Public Data: Fighting Money Laundering by Linking and Resolving Entities • Refinitiv Financial - Practical Use Cases and Challenges to Implement Graphs in Financial Services: Combating Financial Crime • Health Care, Government, Supply Chain, Libraries • AstraZeneca - Fair Data Knowledge Graphs (From Theory to Practice) • Montefiore Hospital - The Chasm of a Million Analytics, and How to Bridge it? • United Nations - A Graph as a Means to Store Unpredictable Knowledge – A Practical Implementation • JSTOR Labs - Why Wikibase? Why not? • Eccenca - Knowledge Graph for Digital Transformation in the Supply- Chain • German National Library of Science and Technology - Creating a knowledge graph based Enterprise Innovation Architecture • How To... • Diffbot - Knowledge Graphs for AI • Accenture Labs - Using a Domain Knowledge Graph to Manage AI at Scale • Capsenta - Designing and Building Enterprise Knowledge Graphs from Relational Databases in the Real World • Google AI - Wikidata, Knowledge Graphs, and Beyond • IBM Research - Extending Knowledge Graphs using Distantly Supervised Deep Nets • Microsoft - Building a Large-scale, Accurate and Fresh Knowledge Graph • Neo4J - A Real-World Guide to Building Your Knowledge Graphs • Collibra - Collibra's Context Graph • Ontotext - How Analytics on Big Knowledge Graphs Help Data Linking: Company Importance and Similarity Demo
  • 12.
    KG & MLLiterature & Seminars • Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web. • Dagstuhl Seminar 18371, Mar 2019 • Grand Challenges: structure of knowledge & data and scale • Creation of Knowledge Graphs • Knowledge Integration at Scale • Knowledge Dynamics and Evolution • Evaluation of Knowledge Graphs • Combining Graph Queries with Graph Analysis • (Re)Defining Knowledge Graphs • NLP and Knowledge Graphs • ML and Knowledge Graphs • Human and Social Factors of Knowledge Graphs • Applications of Knowledge Graphs • Knowledge Graphs and the Web • Deep Learning for the Masses (… and The Semantic Layer), Favio Vázquez, Nov 20, 2018 • Acknowledgement: my title is stolen from this blog post • 4th Workshop on Semantic Deep Learning (SemDeep-4) at ISWC 2018 • Big Data Semantics. Journal on Data Semantics, Apr 2018. DOI: 10.1007/s13740-018-0086-2 • Forbes: Why Machine Learning Needs Semantics Not Just Statistics (Jan 2019) • Wired: Amazon Alexa and the Search for the One Perfect Answer (Feb 2019)
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    Ontotext Essential Facts •World-leading • Semantic technology vendor established year 2000 • 65 staff: 7 PhD, 30 MS, 20 BS, 6 university lecturers • Over 400 person-years invested in R&D • Part of Sirma Group: 400 persons, public company (BSE:SKK) • Profitable and growing • 80% of revenue from commercial projects • Innovator • Attracted $15M in innovation funding • Trendsetter • Member of: W3C, EDMC (FIBO), ODI, LDBC, STI, DBPedia Association • Ontotext Innovation Awards • Innovative Enterprise of the Year 2017 • EU Innovation Radar Prize 2016 nomination • BAIT Business Innovation Award 2014 • Innovative Enterprise of the Year 2014 • Washington Post “Destination Innovation” Competition 2014 Award • Pythagoras Award 2010 • Most successful BG company in EU FP projects
  • 21.
    Some of OurClients (selection)
  • 22.
  • 23.
    Ontotext GraphDB, aLeading Graph Database • Source: db-engines.com ranking of graph databases
  • 24.
    • GraphDB Workbench:User- friendly DB admin and querying • REST API for database access • Plugins / Connectors
  • 25.
    OntoRefine: Uplift TabularData to LOD • Easily clean and import tabular data • View as RDF in real-time with virtual SPARQL endpoint • Transform using JS & SPIN • Import newly created RDF directly to GraphDB
  • 26.
    Knowledge Graph PlatformUse Cases • Content enrichment • Who: STM publishers transforming their business model from publishing to information • Challenges: Control & generate meta-data • Reference projects: Elsevier, Wiley, IET, BBC, Euromoney • Semantic search of enterprise documents • Who: Enterprises with transactional document flows lacking analytic capabilities • Challenges: Integrate with existing CMS/DMS + security + analytics • Reference projects: Platts, AstraZeneca, Top-5 US bank, Top-5 German bank • Knowledge graph development and continuous updates • Who: Innovative businesses based on knowledge intensive processes • Challenges: Collect, integrate, and maintain complex knowledge graph, semantic search + analytics • Reference projects: Top Asian business information agency, Big-4 Consultant
  • 27.
  • 28.
    Example KG: FactForge oDBpedia (the English version) 496M o Geonames (all geographic features on Earth) 150M o owl:sameAs links between DBpedia and Geonames 471K o GLEI (global company register data) 3M o Panama Papers DB (#LinkedLeaks) 20M o Other datasets and ontologies: WordNet, WorldFacts, FIBO o News metadata (2000 articles/day enriched by NOW) 1 023M o Total size (2.2B explicit + 328M inferred statements) 2 522М
  • 29.
    Class Exploration o About1400 Classes o To cope with this one needs specific tools o GraphDB Workbench’s Class Hierarchy exploration tool
  • 30.
  • 31.
    Visual Graph: Nodedetails #31
  • 32.
    Reference Case forGraphDB and Ontotext Platform Big Knowledge Graph • 1B statements of master data • 100M entities and concepts • Entity linking across 5 data sources • 1M documents, 100 KG tags/doc. Performance • 10 transactional updates/sec on master data • 500 updates/sec for documents and metadata • 100 graph queries/sec/node, incl. inferred facts • RDFS+ reasoning: instant and transparent • 1000 full-text searches/sec across docs and data Text & Graph Analytics • Extract new entities and facts from text • Retrieval of similar documents and entities • Automatic classification and link prediction • Relevance and importance ranking • Operations & Data Quality • Multi-DC deployment across continents • Worker nodes: 16 vCPU, 32GB RAM • Daily updates from external data sources • Maintain quality of linking and text analysis • Metadata and instance data curation
  • 33.
    Entity Awareness • Whatdoes it mean to be "aware" of something? • To have background info that allows some measure of "intelligence"? • We believe the numbers on the previous slide are a minimum that can help a machine achieve "awareness" • Let's try some games: • Airports near London (within 50 miles) • Airports near New York City • Educational institutions near New York • Educational institutions near Kaspichan
  • 34.
    Demo: Ontotext Rank(and Similarity)
  • 35.
    Ontotext R&D Projects •More EU research projects than some BG universities combined • Vertical domains • Cultural heritage (Europeana Creative, Food and Drink, EHRI2) • Companies (euBusinessGraph, CIMA), real estate data (PDM) (ProDataMarket) • Media/Publishing (TrendMiner, Multisensor, Evala) • Fact & rumour checking (Pheme, WeVerify) • Life Science (LarKC, KHRESMOI, KConnect, ExaMode) • Agriculture (BigDataGrapes) • Science/innovation (TRR, InnoRate)
  • 36.
    Project CIMA: CompanyGraph • R&D • Data virtualization (OBDA) • Entity Linking • Alignment Learning • KG Embedding and Similarity • Company Classification • Company Graph • Dataset discovery and analysis, procure datasets • Semantic structure mapping, taxonomy mapping • Semantic integration pipeline, data updates • Cognitive Entity Matching • Data curation • ML algorithms and training • Integration to Ontotext Platform, Demos • Big Data connectors (e.g. Mongo, Cassandra) • Cloud Services • Demo applications
  • 37.
    Project TRR: ScienceKG for FP7 Projects • Info (Wikidata): Client: EC DG RTD (ministry of science). Budget: 4M EUR, Duration: 4y. Partners: PPMI (LT), Ontotext (BG), Fraunhofer (DE), Intrasoft (LU) • Get 8000 core FP7 projects (SP1 Collaboration) • Build KG of science (projects, participants, researchers, contacts, subjects, etc) • Assess outputs (publications, datasets, patents…) • Assess outcomes (startups, collaborations, researcher mobility…) • Assess impact (on research policy, economic, societal, on health…)
  • 38.
    Machine Learning atOntotext We're not a ML company but use ML for some of our tasks
  • 39.
    ML at Ontotext •Alignment Learning for Entity Matching • Disambiguation for Named Entity Extraction • Relation Learning for Relation Extraction • Word+KG Embeddings for semantic similarity (VSA, predications) • Ranking for auto-completion, entity popularity
  • 40.
    Demo: News Onthe Web (NOW)
  • 41.
    GraphDB Semantic Similarity(Mar 2019) • create hybrid similarity searches • use pre-built text-based similarity vectors • predication-based similarity index • run similarity indexes in more that one iterations • add term weights when searching text-based similarity indexes • use analogical search for predication indexing
  • 42.
    New Developments inBulgaria Collaborations Between Academia and Industry
  • 43.
    NBU MS DataScience • Starts Sep 2019 • Covers ML, mathematics, R, Python, distributed (Spark), cloud… • Ontotext course: Semantic Web Proof of Concept • IICT BAS course: Semantic Text Analysis
  • 44.
    GATE CoE: SUFMI + Chalmers Teaming • Host • Teaming • Industry Supporters
  • 45.
    Thank you! Контакти: •Ontotext: Website, LinkedIn, Twitter, Rate GraphDB • Vladimir Alexiev: Email, Publications, Homepage, Resume, Linkedin; Twitter, Github Следващо събитие: Repeatability and reproducibility of ML research

Editor's Notes

  • #23 Start telling the story right to left: we combine LOD with proprietary master data; often we extend this with commercial master data, e.g. company data from vendors like D&B; this way we build a Big Knowledge Graph and manage it in our triplestore GraphDB. This KG provides the necessary entity awareness and context for accurate recognition and disambiguation of entities and concepts in text; the result of the text analysis is metadata – tags that describe the content by linking it to the appropriate nodes in the knowledge graph. This metadata is also stored in GraphDB to enable unmatched search and query across masterdata, content and metadata.
  • #25 GraphDB Workbench is the administrative interface shipped with the database. It gives the users an intuitive and powerful interface to the GraphDB Server. The Server exposes all database engine APIs. Unlike most of our competitors the engine allows easy extensibility and the development of Plugins. One such example are the Connectors, which synchronizes the internal RDF database model with external services like Lucene, SOLR, Elastic search
  • #27 What are the different players on the market? PaaS – sell hardware; offer services without or very minimal customization – take it or leave it Text analytics companies - sell NLP with the ability to customize it with the client’s data AI platform – customize everything with your data Semantic technology – specializing in KG and related services
  • #29 FactForge is a hub for open data and news about people, organizations and locations.
  • #33 NOTE: Change/replace “Ontotext is ready to help” International businesses does needs global company data from market intelligence Combining global data from multiple sources and combining it with proprietary data is not straight forward. It’s “rocket sciences”. Particularly, if you use today’s mainstream technology. The good new is that “rocket science” got democratized a lot in the recent years and new things became possible. E.g. landing the 1st stage of a rocket on a barge in the see. Dealing with global company data for market intelligence purposes also is already possible…. With semantic data integration 