Andreas Blumauer
Co-founder and CEO
Semantic Web Company
© Semantic Web Company 2019
Semantic Web
Company
Founder & CEO
of
Andreas
Blumauer
active at
developer &
vendor of
based on
part of
standard for
standard for
graduates
PoolParty Software
Ltd
Director of
parent
company of
London
located
2004
founded
headquartered
>200
serves customers
Vienna
CMS/DMS/DA
M/..
7.0version
Graph
database
integrates
with
awarded
by
Gartner
KMWorld
Search
engine
manages
Taxonomies
Ontologies
Text Mining
used for
Knowledge Graphs
part of
based on
© Semantic Web Company 2019
KNOWLEDGE GRAPHS
How to create actionable and meaningful content?
© Semantic Web Company 2019
Is this text about
▸ Architecture?
▸ Wine tasting?
▸ Mediterranean Sea?
Languedoc-Roussillon
Languedoc is a significant
producer of wine, and a major
contributor to the surplus known
as the "wine lake". Today it
produces more than a third of
the grapes in France, and is a
focus for outside investors.
The region contains the historic
cities of Carcassonne, Toulouse,
Montpellier, countless Roman
monuments, medieval abbeys,
Romanesque churches, and old
castles.
© Semantic Web Company 2019
Recommender System
Connecting
▸ content to content
▸ people to content
▸ people to people
Demo: Event Advisor
Architecture
Mediterranean
Sea
Wine Tasting
Languedoc-Roussillon
Languedoc is a significant
producer of wine, and a major
contributor to the surplus known
as the "wine lake". Today it
produces more than a third of
the grapes in France, and is a
focus for outside investors.
The region contains the historic
cities of Carcassonne, Toulouse,
Montpellier, countless Roman
monuments, medieval abbeys,
Romanesque churches, and old
castles.
I am interested in ...
?
© Semantic Web Company 2019
Recommender System
Connecting
▸ content to content
▸ people to content
▸ people to people
Architecture
Mediterranean
Sea
Wine Tasting
Languedoc-Roussillon
Languedoc is a significant
producer of wine, and a major
contributor to the surplus known
as the "wine lake". Today it
produces more than a third of
the grapes in France, and is a
focus for outside investors.
The region contains the historic
cities of Carcassonne, Toulouse,
Montpellier, countless Roman
monuments, medieval abbeys,
Romanesque churches, and old
castles.
Occitanie, vineyard, sea, church, village
I am interested in ...
Demo: Event Advisor
© Semantic Web Company 2019
Classification of text
Private equity typically refers to investment
funds, generally organized as limited
partnerships, that buy and restructure
companies that are not publicly traded.
Bain Capital is a venture capital company based
in Boston, MA.
Since inception it has invested in hundreds of
companies. In 2018, Bain had $75b AUM.
Mitt Romney is an American politician and
businessman serving as the junior United States
senator from Utah since January 2019.
Give me all paragraphs in
documents about
“US based Private Equity
firms with AUM higher
than $20B”
© Semantic Web Company 2019
▸ Corpus statistics / Word embeddings
→ Keyphrase extraction
▸ Graph-based annotation
→ Entity/Concept linking
▸ Corpus Statistics embedded in graphs
→ Shadow Concepts
▸ Machine-learning-based annotation
→ Named entity recognition (NER)
▸ Machine-learning based classification
→ Document Classification
▸ Annotation based on regex
→ Regular expressions
Bain Capital is a
venture capital
company based in
Boston, MA.
Since inception it has
invested in hundreds of
companies. In 2018,
Bain had $75b AUM.
Graph-based
Entity
Linking
Regular
Expressions-based
Annotation
Semantic
Rules Engine
Give me all paragraphs in
documents about
“US based Private Equity
firms with AUM higher
than $20B”
ML-based
Entity Extraction
© Semantic Web Company 2019
▸ Deep Text Analytics is an application of Semantic AI
▸ Fusing methods and algorithms taken from language
modeling, corpus linguistics, machine learning,
knowledge representation and the semantic web result
into Deep Text Analytics methods
▸ Main areas of use cases for DTA are Information
retrieval, NLU, Question answering, and Recommender
Systems
© Semantic Web Company 2019
WHAT’S THE PROBLEM?
Many Challenges in Text Mining
© Semantic Web Company 2019
© Semantic Web Company 2019
Perth
Australia
Perth is one of the most
isolated major cities in
the world, with a
population of 2,022,044
living in Greater Perth.
Australia is a member of
the OECD, United Nations,
G20, ANZUS, and the World
Trade Organisation.
Country
City
is a
is a
is located in
distance between
Commonwealth of
Nations
International
Organisation
is a
Avoid illogical answers:
is part of
Support complex Q&A:
Which cities located in the
Commonwealth of Nations
have a population of more
than 2 mio. people?
© Semantic Web Company 2019
doc doc doc
Jaguar Elephant Wolf Rabbit
doc
Jaguar Elephant Wolf Rabbit
doc
Show me all documents
about Carnivores
Traditional approach Graph-based approach
doc doc doc
© Semantic Web Company 2019
Show me all documents
about Carnivores
Traditional approach Graph-based approach
doc doc doc
Jaguar
Carnivore Elephant
Wolf
Carnivore Rabbit
doc
Jaguar Elephant Wolf Rabbit
doc
Carnivores
doc doc doc
© Semantic Web Company 2019
Show me all documents
about Carnivores
Traditional approach Graph-based approach
doc doc doc
Jaguar
Cat
Carnivore Elephant
Wolf
Carnivore Rabbit
doc
Jaguar Elephant Wolf Rabbit
doc
Carnivores
doc doc
Cat
Show me all
documents about Cats
doc
16
doc doc doc
Jaguar
Cat
Felidae
Carnivore Elephant
Wolf
Carnivore Rabbit
doc
Jaguar Elephant Wolf Rabbit
doc
Carnivore
Traditional approach Graph-based approach
Show me all
documents about
Felidae
doc doc doc
Show me all documents
about Carnivores
Cat = Felidae
17
doc doc doc
Jaguar
Cat
Felidae
Carnivore
Keystone
Species
Elephant
Keystone
Species
Wolf
Carnivore Rabbit
doc
Jaguar Elephant Wolf Rabbit
doc
Carnivore
Traditional approach Graph-based approach
Show me all
documents about
Felidae
doc doc doc
Show me all documents
about Carnivores
Cat = Felidae
Keystone
Species?
Keystone
Species
18
doc doc doc
Jaguar
Cat
Felidae
Carnivore
Keystone
Species Elephant
Wolf
Carnivore Rabbit
doc
Jaguar Elephant Wolf Rabbit
doc
Carnivore
Traditional approach Graph-based approach
Show me all
documents about
Felidae
doc doc doc
Show me all documents
about Carnivores
Cat = Felidae
Keystone
Species?
Keystone
Species
Metadata per
document
1. No or little network effects
2. No reuse of metadata
3. Metadata resides in silos
4. Data quality hard to measure
5. Not machine-readable
Knowledge about
metadata
1. Explicit knowledge models
2. Reusable and measurable
3. Metadata is machine-processable
4. Standards-based metadata
5. Linkable metadata opens silos
© Semantic Web Company 2019
Unsupervised sense representation (Word sense induction)
Knowledge-based sense representation
The main problem:
From Word to Sense Embeddings:
A Survey on Vector Representations of Meaning
© Semantic Web Company 2019
Mechanism How does it work?
Entity extraction based on concepts
instead of simple term-based extraction
Make use of synonyms in a glossary / taxonomy;
disambiguation based on taxonomies / corpus statistics
Extraction of super-classes
and broader concepts
Make use of hierarchical structures in a taxonomy or
knowledge graph
Extraction of related terms and concepts KGs, co-occurrence models and word embeddings
Extraction of ‘Shadow Concepts’ Combining co-occurrence models and knowledge graphs
Semantic Classifier Enrichment of training documents with data from a
knowledge graph
Deep Text Analytics Derivation of new metadata and classifications based on
mechanisms as described above combined by rules
Complexity
© Semantic Web Company 2019
Austria’s capital, lies in the country’s east on the Danube
River. Its artistic and intellectual legacy was shaped by
residents including Mozart, Beethoven and Sigmund Freud.
The city is also known for its Imperial palaces, including
Schönbrunn, the Habsburgs’ summer residence. In the
MuseumsQuartier district, historic and contemporary
buildings display works by Egon Schiele, Gustav Klimt and
other artists.
Schaut man vom Kahlenberg auf die Donau
hinunter, kann man Wien mit allen Sinnen spüren.
Weinberge sind da zu sehen, dahinter glänzt das
bauliche Erbe der mitteleuropäischen Metropole.
Ein halbes Jahrtausend wurde hier
Weltgeschichte geschrieben. Kunstgeschichte
sowieso.
© Semantic Web Company 2019
© Semantic Web Company 2019
http://www.my.com/
taxonomy/62346723
prefLabel
Retina
image
http://www.my.com/
images/90546089
http://www.my.com/
taxonomy/
97345854
prefLabel
Funduscope
altLabel
Ophthalmoscope
http://www.mycom.com
/taxonomy/4543567
prefLabel
Diagnostic Equipment
has broader
Extraction of super-classes and broader concepts
© Semantic Web Company 2019
Document
Corpus
▸ Websites
▸ PDF, Word, …
▸ Abstracts from
DBpedia
▸ RSS Feeds
Term 8
Term 3
Term 7
Term 8
Term 6
Term 9
Term 5
Term 10
▸ Relevant terms and phrases
▸ Relevancy of terms
▸ co-occurrence between terms and terms
Term 1
Term 4
Term 2
© Semantic Web Company 2019
How do we learn from a lot of text?
Bla bla
bla bla.
Bla bla
bla bla
The stove is on.
The stove is
hot!
Ontological model → reasoning
Taxonomical model → is-a abstractions
Bla stove
bla bla.
Bla bla bla
hot
Switched on
devices are
dangerous
devices.
Switched on devices are
dangerous, only if the
operating temperature is
above 100 degrees and the
automatic shutdown
mechanism is broken.
The stove is on.
The stove is
hot!
Statistical model/cooccurences → is related
The stove is on.
The stove is
hot!
Bla bla bla bla
Bla bla bla bla.
© Semantic Web Company 2019
Use co-occurrences
between concepts
and terms to
extract ‘shadow
concepts’
26
This site is a
15th-century
Inca site located
2,430 metres
above sea level.
It is located in
Cusco, Peru.
It is situated on a mountain ridge above
the Sacred Valley through which the
Urubamba River flows. Most
archaeologists believe that it was built as
an estate for the Inca emperor Pachacuti.
Often mistakenly referred to as the "Lost
City of the Incas", it is the most familiar
icon of Inca civilization. The Incas built the
estate around 1450, but abandoned it a
century later at the time of the Spanish
Conquest.
Inca
site
Machu
Picchu
CuscoInca
empire
Inca
emperor
Peru
Spanish
Conquest
Sacred
Valley
Chankas
Lost City
Pachacuti
In addition to explicitly used concepts and terms, Machu Picchu is
extracted from the article as a Shadow Concept. As a prerequisite, one has
to provide and analyze a representative text corpus first.
© Semantic Web Company 2019
▸ Corpus statistics / Word embeddings
→ Keyphrase extraction
▸ Graph-based annotation
→ Entity/Concept linking
▸ Corpus Statistics embedded in graphs
→ Shadow Concepts
▸ Machine-learning-based annotation
→ Named entity recognition (NER)
▸ Machine-learning based classification
→ Document Classification
▸ Annotation based on regex
→ Regular expressions
Bain Capital is a
venture capital
company based in
Boston, MA.
Since inception it has
invested in hundreds of
companies. In 2018,
Bain had $75b AUM.
Graph-based
Entity
Linking
Regular
Expressions-based
Annotation
Semantic
Rules Engine
Give me all paragraphs in
documents about
“US based Private Equity
firms with AUM higher
than $20B”
ML-based
Entity Extraction
© Semantic Web Company 2019
PoolParty Semantic Classifier combines machine learning algorithms
(SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
© Semantic Web Company 2019
USE CASES
See how it can be used in your environment!
© Semantic Web Company 2019
© Semantic Web Company 2019
Some domains use text that
doesn’t always call a spade
a spade. With ‘shadow
concept extraction’ those
‘masked’ concepts still can
be surfaced.
Since these technologies would have become conventional
technologies that are made into products and introduced into market
at the time of their introduction, it would be difficult to differentiate
them as innovative environmental and energy technologies from other
global warming prevention technologies that have already been put to
practical use in the industrial, commercial, residential, and energy
conversion sectors.
- The Innovative Global Warming Prevention Technology Working
Group under the Research and Development Subcommittee
- Council assessed that innovative global warming prevention
technologies would bring about a reduction effect of 7.49 million t-CO2
case of average emissions factor for all power sources of carbon
dioxide in 2010. In view of the difficulty in putting innovative carbon
dioxide sequestration technology into practical use by 2010, the
Working Group reassigned it as an issue of global warming prevention
technology to be tackled by 2030.
The Central Environment Council, however, has not had the
opportunity to examine the contents of these technologies in detail.
(Promotion of climate change prevention activities by every social
actor)
- The Programme encourages every social actor to take actions to
prevent global warming. The actions include measures undertaken by
the public sector.
Climate Change
Since these technologies would have become conventional
technologies that are made into products and introduced into market
at the time of their introduction, it would be difficult to differentiate
them as innovative environmental and energy technologies from other
global warming prevention technologies that have already been put to
practical use in the industrial, commercial, residential, and energy
conversion sectors.
- The Innovative Global Warming Prevention Technology Working
Group under the Research and Development Subcommittee
- Council assessed that innovative global warming prevention
technologies would bring about a reduction effect of 7.49 million t-CO2
case of average emissions factor for all power sources of carbon
dioxide in 2010. In view of the difficulty in putting innovative carbon
dioxide sequestration technology into practical use by 2010, the
Working Group reassigned it as an issue of global warming prevention
technology to be tackled by 2030.
The Central Environment Council, however, has not had the
opportunity to examine the contents of these technologies in detail.
(Promotion of climate change prevention activities by every social
actor)
- The Programme encourages every social actor to take actions to
prevent global warming. The actions include measures undertaken by
the public sector.
Climate Change
© Semantic Web Company 2019
Mini Countryman
And it’s probably more of a crossover than ever, with
the design to match, Being a Mini, the Countryman is
clearly meant to be the driver’s car among small
crossovers. The suspension is sophisticated, and there
are lots of chassis options (a stiffer sports setup,
variable damping, the electronically controlled ALL4
all-wheel-drive).
But it’s also the crossover for people who’ve bags of cash to blow on personalisation and luxury.
There’s been a lot of effort on ramping up the cabin quality, but then the outgoing Countryman
was a sad let-down in that department.
On the outside, plastic wheel-arch extensions, with eyebrow creases in the metalwork above, as
well as roof bars and sill protectors all add to the visual crossover-ness. This remains the only
Mini with angular rather than oval headlamps, and there’s a load of visual posturing going on in
the lower face.
There are eight versions at launch, and they’re exactly what you’d expect. It’s Cooper or Cooper
S, each fuelled by petrol or diesel, each of them with front drive or ALL4. Oh and an eight-speed
auto, too, if you count that as a separate choice. The Cooper petrol is a three-cylinder, the rest
fours.
You get extra kit as standard versus the old car, including navigation, Bluetooth, emergency call
and park sensors. Upgrades include a bigger touch-screen nav with high-definition traffic,
various posher seats, a HUD, and driver aids. Oh and a cushion thingy that folds out from the
boot so you can sit on the rear bumper without getting your clothes mucky.
In June 2017 a Cooper E will launch, which has the Cooper three-cylinder petrol driving the front
wheels, and an electric motor for the rears, with a capacity to do a claimed 25 miles of gentle
all-electric running. So it has the performance of a Cooper S ALL4 with the tax-busting
advantages of a plug-in hybrid. And you wouldn’t use any fuel if you commuted a short distance.
The platform is BMW’s contemporary transverse-engined hardware, in the bigger of its two
sizes. That means it shares a lot with the BMW X1. The 4WD system is more sophisticated than
the previous Countryman’s. The proportion of drive to the rear is computed by a controller that
takes into account parameters including grip, steering angle and throttle position, as well as
whether you’ve got the sports mode and sports traction systems selected.
© Semantic Web Company 2019
TheCompany 12
Termination by TheCompany
for
Semantic analysis
and pre-selection of
relevant clauses in
contracts
Integrating Semantics into Dialog Workflows
My uncle lives in the same
household. Darf ich seine
Betreuungskosten
absetzen?
Ange-
hörige
same
household
haushalts-
zugehöriger
Angehöriger
uncle
teyze ortak
ev
Can I deduct my
aunt’s care
expenses?
© Semantic Web Company 2019
Use KGs and DTA to generate
test data for your ML (which are
GDPR compliant)
The Knowledge Engineer’s perspective
Bain Capital is a venture capital
company based in Boston, MA.
Since inception it has invested in
hundreds of companies including
AMC Entertainment, Brookstone,
and Burger King. The company was
co-founded by Mitt Romney.
UnifiedViews
PoolParty
GraphSearch
Identify new candidate concepts
to be included in a controlled
vocabulary
RDF
Graph Database
Factsheet
Schema mapping based
on ontologies
Entity linking based on
Knowledge Graph
Unstructured
Data
Semi-structured
Data
Structured
Data
Controlled vocabularies as a basis for highly precise
knowledge extraction and text classification
© Semantic Web Company 2019
Learn more about it!
© Semantic Web Company 2019
Andreas Blumauer
CEO, Semantic Web Company
▸ Mail andreas.blumauer@semantic-web.com
▸ Company https://www.semantic-web.com
▸ LinkedIn https://www.linkedin.com/in/andreasblumauer
▸ Twitter https://twitter.com/semwebcompany
▸ Medium https://medium.com/semantic-tech-hotspot

Deep Text Analytics - How to extract hidden information and aboutness from text

  • 1.
    Andreas Blumauer Co-founder andCEO Semantic Web Company
  • 2.
    © Semantic WebCompany 2019 Semantic Web Company Founder & CEO of Andreas Blumauer active at developer & vendor of based on part of standard for standard for graduates PoolParty Software Ltd Director of parent company of London located 2004 founded headquartered >200 serves customers Vienna CMS/DMS/DA M/.. 7.0version Graph database integrates with awarded by Gartner KMWorld Search engine manages Taxonomies Ontologies Text Mining used for Knowledge Graphs part of based on
  • 3.
    © Semantic WebCompany 2019 KNOWLEDGE GRAPHS How to create actionable and meaningful content?
  • 4.
    © Semantic WebCompany 2019 Is this text about ▸ Architecture? ▸ Wine tasting? ▸ Mediterranean Sea? Languedoc-Roussillon Languedoc is a significant producer of wine, and a major contributor to the surplus known as the "wine lake". Today it produces more than a third of the grapes in France, and is a focus for outside investors. The region contains the historic cities of Carcassonne, Toulouse, Montpellier, countless Roman monuments, medieval abbeys, Romanesque churches, and old castles.
  • 5.
    © Semantic WebCompany 2019 Recommender System Connecting ▸ content to content ▸ people to content ▸ people to people Demo: Event Advisor Architecture Mediterranean Sea Wine Tasting Languedoc-Roussillon Languedoc is a significant producer of wine, and a major contributor to the surplus known as the "wine lake". Today it produces more than a third of the grapes in France, and is a focus for outside investors. The region contains the historic cities of Carcassonne, Toulouse, Montpellier, countless Roman monuments, medieval abbeys, Romanesque churches, and old castles. I am interested in ... ?
  • 6.
    © Semantic WebCompany 2019 Recommender System Connecting ▸ content to content ▸ people to content ▸ people to people Architecture Mediterranean Sea Wine Tasting Languedoc-Roussillon Languedoc is a significant producer of wine, and a major contributor to the surplus known as the "wine lake". Today it produces more than a third of the grapes in France, and is a focus for outside investors. The region contains the historic cities of Carcassonne, Toulouse, Montpellier, countless Roman monuments, medieval abbeys, Romanesque churches, and old castles. Occitanie, vineyard, sea, church, village I am interested in ... Demo: Event Advisor
  • 7.
    © Semantic WebCompany 2019 Classification of text Private equity typically refers to investment funds, generally organized as limited partnerships, that buy and restructure companies that are not publicly traded. Bain Capital is a venture capital company based in Boston, MA. Since inception it has invested in hundreds of companies. In 2018, Bain had $75b AUM. Mitt Romney is an American politician and businessman serving as the junior United States senator from Utah since January 2019. Give me all paragraphs in documents about “US based Private Equity firms with AUM higher than $20B”
  • 8.
    © Semantic WebCompany 2019 ▸ Corpus statistics / Word embeddings → Keyphrase extraction ▸ Graph-based annotation → Entity/Concept linking ▸ Corpus Statistics embedded in graphs → Shadow Concepts ▸ Machine-learning-based annotation → Named entity recognition (NER) ▸ Machine-learning based classification → Document Classification ▸ Annotation based on regex → Regular expressions Bain Capital is a venture capital company based in Boston, MA. Since inception it has invested in hundreds of companies. In 2018, Bain had $75b AUM. Graph-based Entity Linking Regular Expressions-based Annotation Semantic Rules Engine Give me all paragraphs in documents about “US based Private Equity firms with AUM higher than $20B” ML-based Entity Extraction
  • 9.
    © Semantic WebCompany 2019 ▸ Deep Text Analytics is an application of Semantic AI ▸ Fusing methods and algorithms taken from language modeling, corpus linguistics, machine learning, knowledge representation and the semantic web result into Deep Text Analytics methods ▸ Main areas of use cases for DTA are Information retrieval, NLU, Question answering, and Recommender Systems
  • 10.
    © Semantic WebCompany 2019 WHAT’S THE PROBLEM? Many Challenges in Text Mining
  • 11.
    © Semantic WebCompany 2019
  • 12.
    © Semantic WebCompany 2019 Perth Australia Perth is one of the most isolated major cities in the world, with a population of 2,022,044 living in Greater Perth. Australia is a member of the OECD, United Nations, G20, ANZUS, and the World Trade Organisation. Country City is a is a is located in distance between Commonwealth of Nations International Organisation is a Avoid illogical answers: is part of Support complex Q&A: Which cities located in the Commonwealth of Nations have a population of more than 2 mio. people?
  • 13.
    © Semantic WebCompany 2019 doc doc doc Jaguar Elephant Wolf Rabbit doc Jaguar Elephant Wolf Rabbit doc Show me all documents about Carnivores Traditional approach Graph-based approach doc doc doc
  • 14.
    © Semantic WebCompany 2019 Show me all documents about Carnivores Traditional approach Graph-based approach doc doc doc Jaguar Carnivore Elephant Wolf Carnivore Rabbit doc Jaguar Elephant Wolf Rabbit doc Carnivores doc doc doc
  • 15.
    © Semantic WebCompany 2019 Show me all documents about Carnivores Traditional approach Graph-based approach doc doc doc Jaguar Cat Carnivore Elephant Wolf Carnivore Rabbit doc Jaguar Elephant Wolf Rabbit doc Carnivores doc doc Cat Show me all documents about Cats doc
  • 16.
    16 doc doc doc Jaguar Cat Felidae CarnivoreElephant Wolf Carnivore Rabbit doc Jaguar Elephant Wolf Rabbit doc Carnivore Traditional approach Graph-based approach Show me all documents about Felidae doc doc doc Show me all documents about Carnivores Cat = Felidae
  • 17.
    17 doc doc doc Jaguar Cat Felidae Carnivore Keystone Species Elephant Keystone Species Wolf CarnivoreRabbit doc Jaguar Elephant Wolf Rabbit doc Carnivore Traditional approach Graph-based approach Show me all documents about Felidae doc doc doc Show me all documents about Carnivores Cat = Felidae Keystone Species? Keystone Species
  • 18.
    18 doc doc doc Jaguar Cat Felidae Carnivore Keystone SpeciesElephant Wolf Carnivore Rabbit doc Jaguar Elephant Wolf Rabbit doc Carnivore Traditional approach Graph-based approach Show me all documents about Felidae doc doc doc Show me all documents about Carnivores Cat = Felidae Keystone Species? Keystone Species Metadata per document 1. No or little network effects 2. No reuse of metadata 3. Metadata resides in silos 4. Data quality hard to measure 5. Not machine-readable Knowledge about metadata 1. Explicit knowledge models 2. Reusable and measurable 3. Metadata is machine-processable 4. Standards-based metadata 5. Linkable metadata opens silos
  • 19.
    © Semantic WebCompany 2019 Unsupervised sense representation (Word sense induction) Knowledge-based sense representation The main problem: From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
  • 20.
    © Semantic WebCompany 2019 Mechanism How does it work? Entity extraction based on concepts instead of simple term-based extraction Make use of synonyms in a glossary / taxonomy; disambiguation based on taxonomies / corpus statistics Extraction of super-classes and broader concepts Make use of hierarchical structures in a taxonomy or knowledge graph Extraction of related terms and concepts KGs, co-occurrence models and word embeddings Extraction of ‘Shadow Concepts’ Combining co-occurrence models and knowledge graphs Semantic Classifier Enrichment of training documents with data from a knowledge graph Deep Text Analytics Derivation of new metadata and classifications based on mechanisms as described above combined by rules Complexity
  • 21.
    © Semantic WebCompany 2019 Austria’s capital, lies in the country’s east on the Danube River. Its artistic and intellectual legacy was shaped by residents including Mozart, Beethoven and Sigmund Freud. The city is also known for its Imperial palaces, including Schönbrunn, the Habsburgs’ summer residence. In the MuseumsQuartier district, historic and contemporary buildings display works by Egon Schiele, Gustav Klimt and other artists. Schaut man vom Kahlenberg auf die Donau hinunter, kann man Wien mit allen Sinnen spüren. Weinberge sind da zu sehen, dahinter glänzt das bauliche Erbe der mitteleuropäischen Metropole. Ein halbes Jahrtausend wurde hier Weltgeschichte geschrieben. Kunstgeschichte sowieso.
  • 22.
    © Semantic WebCompany 2019
  • 23.
    © Semantic WebCompany 2019 http://www.my.com/ taxonomy/62346723 prefLabel Retina image http://www.my.com/ images/90546089 http://www.my.com/ taxonomy/ 97345854 prefLabel Funduscope altLabel Ophthalmoscope http://www.mycom.com /taxonomy/4543567 prefLabel Diagnostic Equipment has broader Extraction of super-classes and broader concepts
  • 24.
    © Semantic WebCompany 2019 Document Corpus ▸ Websites ▸ PDF, Word, … ▸ Abstracts from DBpedia ▸ RSS Feeds Term 8 Term 3 Term 7 Term 8 Term 6 Term 9 Term 5 Term 10 ▸ Relevant terms and phrases ▸ Relevancy of terms ▸ co-occurrence between terms and terms Term 1 Term 4 Term 2
  • 25.
    © Semantic WebCompany 2019 How do we learn from a lot of text? Bla bla bla bla. Bla bla bla bla The stove is on. The stove is hot! Ontological model → reasoning Taxonomical model → is-a abstractions Bla stove bla bla. Bla bla bla hot Switched on devices are dangerous devices. Switched on devices are dangerous, only if the operating temperature is above 100 degrees and the automatic shutdown mechanism is broken. The stove is on. The stove is hot! Statistical model/cooccurences → is related The stove is on. The stove is hot! Bla bla bla bla Bla bla bla bla.
  • 26.
    © Semantic WebCompany 2019 Use co-occurrences between concepts and terms to extract ‘shadow concepts’ 26 This site is a 15th-century Inca site located 2,430 metres above sea level. It is located in Cusco, Peru. It is situated on a mountain ridge above the Sacred Valley through which the Urubamba River flows. Most archaeologists believe that it was built as an estate for the Inca emperor Pachacuti. Often mistakenly referred to as the "Lost City of the Incas", it is the most familiar icon of Inca civilization. The Incas built the estate around 1450, but abandoned it a century later at the time of the Spanish Conquest. Inca site Machu Picchu CuscoInca empire Inca emperor Peru Spanish Conquest Sacred Valley Chankas Lost City Pachacuti In addition to explicitly used concepts and terms, Machu Picchu is extracted from the article as a Shadow Concept. As a prerequisite, one has to provide and analyze a representative text corpus first.
  • 27.
    © Semantic WebCompany 2019 ▸ Corpus statistics / Word embeddings → Keyphrase extraction ▸ Graph-based annotation → Entity/Concept linking ▸ Corpus Statistics embedded in graphs → Shadow Concepts ▸ Machine-learning-based annotation → Named entity recognition (NER) ▸ Machine-learning based classification → Document Classification ▸ Annotation based on regex → Regular expressions Bain Capital is a venture capital company based in Boston, MA. Since inception it has invested in hundreds of companies. In 2018, Bain had $75b AUM. Graph-based Entity Linking Regular Expressions-based Annotation Semantic Rules Engine Give me all paragraphs in documents about “US based Private Equity firms with AUM higher than $20B” ML-based Entity Extraction
  • 28.
    © Semantic WebCompany 2019 PoolParty Semantic Classifier combines machine learning algorithms (SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
  • 29.
    © Semantic WebCompany 2019 USE CASES See how it can be used in your environment!
  • 30.
    © Semantic WebCompany 2019
  • 31.
    © Semantic WebCompany 2019 Some domains use text that doesn’t always call a spade a spade. With ‘shadow concept extraction’ those ‘masked’ concepts still can be surfaced. Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors. - The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee - Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor) - The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector. Climate Change Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors. - The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee - Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor) - The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector. Climate Change
  • 32.
    © Semantic WebCompany 2019 Mini Countryman And it’s probably more of a crossover than ever, with the design to match, Being a Mini, the Countryman is clearly meant to be the driver’s car among small crossovers. The suspension is sophisticated, and there are lots of chassis options (a stiffer sports setup, variable damping, the electronically controlled ALL4 all-wheel-drive). But it’s also the crossover for people who’ve bags of cash to blow on personalisation and luxury. There’s been a lot of effort on ramping up the cabin quality, but then the outgoing Countryman was a sad let-down in that department. On the outside, plastic wheel-arch extensions, with eyebrow creases in the metalwork above, as well as roof bars and sill protectors all add to the visual crossover-ness. This remains the only Mini with angular rather than oval headlamps, and there’s a load of visual posturing going on in the lower face. There are eight versions at launch, and they’re exactly what you’d expect. It’s Cooper or Cooper S, each fuelled by petrol or diesel, each of them with front drive or ALL4. Oh and an eight-speed auto, too, if you count that as a separate choice. The Cooper petrol is a three-cylinder, the rest fours. You get extra kit as standard versus the old car, including navigation, Bluetooth, emergency call and park sensors. Upgrades include a bigger touch-screen nav with high-definition traffic, various posher seats, a HUD, and driver aids. Oh and a cushion thingy that folds out from the boot so you can sit on the rear bumper without getting your clothes mucky. In June 2017 a Cooper E will launch, which has the Cooper three-cylinder petrol driving the front wheels, and an electric motor for the rears, with a capacity to do a claimed 25 miles of gentle all-electric running. So it has the performance of a Cooper S ALL4 with the tax-busting advantages of a plug-in hybrid. And you wouldn’t use any fuel if you commuted a short distance. The platform is BMW’s contemporary transverse-engined hardware, in the bigger of its two sizes. That means it shares a lot with the BMW X1. The 4WD system is more sophisticated than the previous Countryman’s. The proportion of drive to the rear is computed by a controller that takes into account parameters including grip, steering angle and throttle position, as well as whether you’ve got the sports mode and sports traction systems selected.
  • 33.
    © Semantic WebCompany 2019 TheCompany 12 Termination by TheCompany for Semantic analysis and pre-selection of relevant clauses in contracts
  • 34.
    Integrating Semantics intoDialog Workflows My uncle lives in the same household. Darf ich seine Betreuungskosten absetzen? Ange- hörige same household haushalts- zugehöriger Angehöriger uncle teyze ortak ev Can I deduct my aunt’s care expenses?
  • 35.
    © Semantic WebCompany 2019 Use KGs and DTA to generate test data for your ML (which are GDPR compliant)
  • 36.
    The Knowledge Engineer’sperspective Bain Capital is a venture capital company based in Boston, MA. Since inception it has invested in hundreds of companies including AMC Entertainment, Brookstone, and Burger King. The company was co-founded by Mitt Romney. UnifiedViews PoolParty GraphSearch Identify new candidate concepts to be included in a controlled vocabulary RDF Graph Database Factsheet Schema mapping based on ontologies Entity linking based on Knowledge Graph Unstructured Data Semi-structured Data Structured Data Controlled vocabularies as a basis for highly precise knowledge extraction and text classification
  • 37.
    © Semantic WebCompany 2019 Learn more about it!
  • 38.
    © Semantic WebCompany 2019 Andreas Blumauer CEO, Semantic Web Company ▸ Mail [email protected] ▸ Company https://www.semantic-web.com ▸ LinkedIn https://www.linkedin.com/in/andreasblumauer ▸ Twitter https://twitter.com/semwebcompany ▸ Medium https://medium.com/semantic-tech-hotspot