Patents, Semantics and
Open Innovation
The role of LOD in a business directory for
knowledge intensive industries
Nice, 20-OCT-2015
Ricardo Eito-Brun
reito@uc3m.es
Patents, Semantic and Open
Innovation. LOD
 Reasons leading to this research:
◦ Semantic Web technologies and applications, in particular LOD
publishing, constitutes a preliminary step to achieve Information
Systems interoperability.
◦ Having access to distributed data, hosted by different agents and
repositories, open new possibilities to research in multiple areas.
◦ In the particular case of patent information:
 Which are the possibilities we have if we are able of aggregating /
analysing these data together with other information data set?
 Is it feasible to implement improvements in the way we are
accessing patent information?
 Can we figure out innovative user interfaces to browse and search
patent collections?
Patents, Semantic and Open
Innovation. LOD
 Schema:
◦ The LOD promises. Potential benefits, technologies and
standards for data encoding and interoperability.
◦ LOD in the world of patents. A review of major milestones.
◦ Overview of current research: what researchers are doing.
◦ Case Study: the particular case of Web-based Innovation
Platforms and digital repositories.
Linked Open Data:
benefits, technologies and standards
 LOD has become the main application of the SMW approach.
 Semantic Web (SMW)
◦ Proposed by Tim Berners-Lee “The Semantic Web: A New Form
of Web Content That Is Meaninguful to Computers Will
Unleash a Revolution of New Possibilities”, Scientific American
2001.
◦ SMW is about having a more intelligent web, made up of
documents that could be easily processed by computers and
software agents with no human intervention.
◦ SMW data should be “exposed” or published in a machine-
readable format.
◦ Computes should be able of understanding the meaning of data.
Linked Open Data:
benefits, technologies and standards
 W3C SMW Activity:
◦ “The Semantic Web provides a common framework that
allows data to be shared and reused across application,
enterprise, and community boundaries. […]
◦ The Semantic Web is about two things.
 about common formats for integration and combination of
data drawn from diverse sources…
 about language for recording how the data relates to real
world objects. “*
*http://www.w3.org/2001/sw/
Linked Open Data:
benefits, technologies and standards
 SMW is presented as an extension of the traditional web.
 If Content in the traditional web is published for humans, Content in
the SMW is published for software programs which could interpret
them and obtain new data, information and knowledge.
 Inicial SMW initiatives were focused on improving software agents’
capabilities to solve information management problems:
◦ "Mom needs to see a specialist and then has to have a series of physical
therapy sessions. Biweekly or something. Lucy instructed her Semantic
Web agent through her handheld Web browser. The agent promptly
retrieved information about Mom's prescribed treatment from the
doctor's agent, looked up several lists of providers, and checked for the
ones in-plan for Mom's insurance within a 20-mile radius of her home…”
*http://www.w3.org/2001/sw/
Linked Open Data:
benefits, technologies and standards
 SMW pillars:
◦ URIs or IRIs to provide unique identifiers to resources.
◦ XML to encode and transfer information.
◦ RDF as an XML-based vocabulary to encode metadata describing
resorces.
◦ RDF-S as a means to structure the metadata about resources
(What can be asserted for a resource of a specific type).
◦ OWL as an alternative to RDF/RDF-S with additional capablitities
to express constraints on data.
◦ Additional languages to express the rules that govern reasoining
on SMW data.
*http://www.w3.org/2001/sw/
Linked Open Data:
benefits, technologies and standards
 RDF proposes a vocabulary (set of tags) to express metadata about
any type of resource.
 RDF data can be expressed in XML or in other alternative formats.
 An RDF file usually encloses metadata about a specific resource,
e.g.: person, document, institution, company, event…
 Resources are identified by unique identifiers (URIs).
◦ URIs are used to ensure that metadata about the same entity are grouped
together.
◦ In case different applications use different identifiers for the same entity, it
is possible to keep the equivalences between the different identifiers.
*http://www.w3.org/2001/sw/
Linked Open Data:
benefits, technologies and standards
Unique identifier for the
resource, expressed as
an URI.
Equivalences with other
identifiers proposed in
other contexts
(owl:sameAs)
Metadata about the
resource, with clearly
defined meaning.
Resources are given a
type (rdf:type)
Linked Open Data:
benefits, technologies and standards
 RDF records about resources can be linked or related.
 The value of a specific metadata or property may refer to the
Identifier of another resource.
 This allows having sets of structured, linked data.
 For example:
◦ Subjects or Topics in a classification code may have unique Ids.
◦ The “Subject” metadata field in a document will take as a value the ID of the
referred topic.
◦ Personal or corporate authors may have unique Ids.
◦ The “Author” metadata field in a document will take as a value the ID of its
personal/corporate author.
*http://www.w3.org/2001/sw/
Linked Open Data:
benefits, technologies and standards
dc:language refers to
the ID of the English
language.
Dc:subject refers to
the ID of a
classification code
taken from the DDC
system.
Dc:subject also refers
to the ID of a topic
taken from the LCSH
system.
Linked Open Data:
benefits, technologies and standards
 SMW standards and languages are not limited to RDF.
◦ RDF-S provides a way to define “schemas” for metadata,
in other words, what properties/metadata can we use to
describe entities of a specific type.
◦ SKOS provies a way to encode “subject headings”,
“thesauri” or “classification schemas” used to indicate the
topics the documents are about.
◦ Specific vocabularies to indicate which properties are
available to provide metadata on resources: e.g. Dublin
Core.
 .
Linked Open Data:
benefits, technologies and standards
Properties starting
with dc: and dct: are
taken from the Dublin
Core vocabulary, that
provides a set of
metadata.
Dc:subject points to
LCSH/DDC topics that
are expressed – in
some other place in
the web – using
SKOS.
A separate RDF-S
document states
which properties can
be used when
providing metadata
about resources of
type « Book »
Linked Open Data:
benefits, technologies and standards
 RDF statements build a “graphs” of resources, properties
and values.
 As the number of metadata collected about the different entities
grows, the graph is expanded.
 RDF model to represent information allows browsing and discovering
mechanisms that go beyond traditional search/browse capabilites.
SKOS:
conceptual structures for the SMW
 Another vocabulary closely related to the SMW and LOD.
 Used to:
◦ Encode “subject headings” or “classification schemas” in XML
format.
◦ Encode relationships between these conceptual structures (e.g.
equivalences between classes of different classification schemas)
◦ Provide list of topics to which document descriptions can be
linked.
 Concepts within a SKOS-encoded schema are related to each other
by relationships like <broader> , <narrower> or <related>.
 Labels can be given to concepts (linguistic equivalences, authorized,
not authorized, deprecated…).
 Concepts can also be annotated.
SKOS:
conceptual structures for the SMW
Each concept has a
separate skos:Concept
element, identified by
an URI.
Skos:related points to
other concepts with a
related meaning.
Skos:prefLabel and
altLabel provides
linguistic labels to the
concept.
Skm:UF points to
deprecated concepts.
SKOS:
conceptual structures for the SMW
 SKOS has become one of the key points in SMW initiatives.
 Organizations usually start putting their controlled vocabularies /
classification schemes online using SKOS.
 Then, “bibliographic” descriptions are linked to SKOS topics as a
second stage.
 This provides an initial pair of linked data sets.
 But SKOS becomes powerful if we can take advantage of the
capability of expressing relationships between different classification
schemas.
 This gives the opportunity of cross-searching different repositories.
Semantic Web: standards
SPARQL
 SPARQL is a W3C standard that defines a query language to search
for information within RDF graphs.
 SPARQL is for the SMW, the equivalent to the SQL language used
for relational databases like Oracle, MySQL, Postgresql…
 Collections of RDF documents within a repository can be searched
using “SPARQL end points”.
 SPARQL end points are aimed to software agents and software
applications.
 Queries are constructed dinamically by software agents, and results
are returned on XML for further processing.
Semantic Web: standards
SPARQL
Semantic Web:
technologies…
 Technologies and tools used to deal with SMW standards and
concepts can be classified in these groups:
◦ Editors, to help define RDF-S schemas.
◦ Conversion tools, to move existing data into the RDF format.
◦ RDF repositories or “triple stores”, to support:
 The storage of large data sets
 Bulk downloads,
 Human browsing
 SPARQL searching.
◦ Specific tools to manage controlled vocabularies and generate
SKOS representations.
Linked Open Data (LOD)
SMW at work…
 “Linked Data is simply about using the Web to create typed links
between data from different sources. These may be databases
maintained by two organisations, or heterogeneous systems within
one organisation that, historically, have not easily interoperated at the
data level.”
 “… Linked Data refers to data published on the Web in such a way
that it is machine-readable, its meaning is explicitly defined, it is
linked to other external data sets, and can in turn be linked to from
external data sets.”
 Tim Berners-Lee (2006)
Linked Open Data (LOD)
SMW at work…
 Linked Data Principles:
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information, using
the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover more
things.
 These principles are “rules” or “recommendations” on how to publish
LOD data on the web.
Linked Open Data (LOD)
SMW at work…
 There is a graphical display created by Richard Cyganiak and Anja
Jentzsch showing published data sets, http://lod-cloud.net/
Linked Open Data (LOD)
SMW at work…
 Conditions to be included in this catalog:
◦ Data available via URIs through http or https.
◦ Data published in RDF format (any serialization method: RDFa, RDF/XML, Turtle,
N-Triples).
◦ Dataset must have at least 1000 RDF statements.
◦ Dataset must contain links to at least one of the datasets in the diagram (at least
50 links).
 It is also possible to get data about the number of published LOD on:
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
 http://datahub.io is another site where you can identify datasets.
It contains 31 datasets related to patents, including EPO, USPTO,
KIPO, UK Patent…
Linked Open Data (LOD)
SMW at work…
 Conditions to be included in this catalog:
◦ Data available via URIs through http or https.
◦ Data published in RDF (serialization method: RDFa, RDF/XML, Turtle, N-Triples).
◦ Dataset must have at least 1000 RDF statements.
◦ Dataset must contain links to at least one of the datasets in the diagram (at least
50 links).
◦ Dataset available via dump OR through SPARQL endpoint.
 It is also possible to get data about the number of published LOD on:
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
 http://datahub.io is another site where you can identify datasets.
It contains 31 datasets related to patents, including EPO, USPTO,
KIPO, UK Patent…
Linked Open Data (LOD)
SMW at work…
 To check the use of specific vocabularies/metadata, Open
Knowledge Foundation hosts the LOV (Linked Open Vocabularies)
site since 2012:
 http://lov.okfn.org/dataset/lov/
 Additional LOD-related tools (search engines) include:
◦ http://watson.kmi.open.ac.uk/WatsonWUI/
◦ http://swoogle.umbc.edu/
◦ http://ws.nju.edu.cn/falcons/ontologysearch/index.jsp?query
LOD and Patents
Early, Academic Initiatives
 NSF (SciSIP) – Science of Science and Innovation Policy
Discussion on 2011 “Patent Data Workshop” to support quantitaive
studies on innovation.
◦ Remarked the effort o USPTO Patent Dashboard and Data Visualization
Center to study causes of innovation and outcomes of programs to
stimulate innovation.
 SciWire platform* that ingest and links metadata about patents,
grants, to explore R&D landscape.
 AKSW (Agile Knowledge Engineering and Semantic Web) to
publish US Patents:
◦ SPARQL End Point: http://us.patents.aksw.org/sparql
◦ June 2014, 187 million triples.
◦ Proposed RDF schema with basic data based on dc, foaf and its own
schema.
*Haak, Laurel; Baker, David; Probus, Matt: Creating a Data Infrastructure for Tracking
Knowledge Flow. 2012
LOD and Patents
Academic Initiatives
 Subramanian, S. (2013)*, analyzed USPTO patent conversion into
RDF format, and its merge with dbpedia data to provide
“consolidated / merged search results”. (enrich patent data).
 Singhi M., Ding Y.***, also merged USPTO patent data from the
SDB (Scholarly Database at Indiana Univ.) with dbpedia entries for
locations in a common database.
 SDB database includes 26 millions records (4 millions US patents
in the 1976-2010 period) including MEDLINE and NSF documents
about research grants.
 SDB uses patent information as part of its R&D analysis with the
Sci2 bibliometric tool.
* Subramanian, S.; Dhilpe, S. Yalamanchi, U. Exploiting Linked Data and Big Data for Semantic Patent Discovery.
COEN 296, Aug. 2013.
**Singhi, M., Ding, Y. Linking US Patent Data with Wikipedia.
*** Bäumer et al., Linked Open Data for Scientific Data Sets. KONVENS, 2014. Heldesheim, Germany.
LOD and Patents
Academic Initiatives
 Heinz Nixdorf Institute (Paderborn University) and KISTI (Korean
Institute of Science and Technology)* :
◦ Make scientific data available through RDF.
◦ Use of W3C D2R and D2RQ server/converter and RelFinder
visualization tool. Pilot project with 60 researchers and 400 related
publications..
 Zaveri, A. et al., (2012)** describe its conversion of the USPTO
patents into RDF.
◦ The USPTO patents full-text data is available for download in XML format from
the years 2002 onwards.
◦ From the years 1976 to 2001, data are available in plain-text.
◦ Each week USPTO releases a zipped file of all patents accepted in that week.
◦ Each year ca. 52 files are published each one containing about 5000 patents.
◦ They developed an “ontology” or schema to encode patent information.
* Bäumer et al., Linked Open Data for Scientific Data Sets. KONVENS, 2014. Heldesheim, Germany.
** Zaveri et al. (2012). Publishing and Interlinking the USPTO Patent Data. Semantic Web Journal. 24/09/2014.
LOD and Patents
Academic Initiatives
 Dongmin Seo et al. (2011) designed InSciTe, a technology
opportunity discovery (TOD) service to support decision-making on
R&D planning.
 It used RDF data (including patent information) to analyze and
visualize relations between technologies and agents.
◦ Trends and predictions,
◦ Relationship,
◦ Roadmaps,
◦ Competitors and collaborators.
 Data set included 3,100,000 patents from US, Europe and Japan.
LOD and Patents
Academic Initiatives
 Zaveri et al. (2011) described an interesting study on the use of
linked open data to assess the impact of research on the
biomedical area.
 They analyzed data for 20 European countries over a 10 years
period (1999 to 2009)..
 The data set included data from Eurostat and World Bank LOD
datasets.
 Input data included the number of Biotechnology patent
applications submitted to EPO.
Zaveri et al. (2013). Using Linked Data to evaluate the impact of Research and
Development in Europe: a Structural Equation Model. LNCS 8219, pp 244-259
LOD and Patents
Official Initiatives
 USPTO,
◦ In 2014 developed a 18 month roadmap for open data initiatives.
◦ On April 2015, it published the “Report of Findings from an Open Data
Roundtable with the U.S. Patent & Trademark Office”.
◦ No specific reference to RDF or “linked open data”
◦ Bulk download through http://patents.reedtech.com/, Google Patents.
◦ Plans and achievements:
 PatentsView prototype for patent visualization (5 million U.S. patents).
 Electronic Data Hosting. Repository of public bulk patent and
trademark data
 Assignment Search. Searchable database containing all recorded
Patent Assignment information dating back to August 1980.
LOD and Patents
Official Initiatives
 EPO (European Patent Office),
◦ OPS (Open Patent Service)** provides long time ago XML patent data
via REST-based web services.
◦ Queries built on the CQL query language.
◦ Data coming from EPODOC, EPOQUE (full-text) and BNS (image)
databases (same soures and coverage for bibliographic data as
Espacenet)
◦ Bibliographic data, legal status, facsimile images, CPC classification,
character-coded full text, register and family.
◦ Well-documented query interface for developers.
◦ For large datasets, possibility of bulk download is available.
Kallas, P. (2006). Open Patent Service. World Patent Information
Volume 28, Issue 4, December 2006, Pages 296–304
LOD and Patents
Official Initiatives
 EPO (European Patent Office),.
◦ In the LOD specific context, http://epo.publicdata.eu/ dataset with
around 22 million triples.
◦ Based on an OWL encoded schema/ontology.
◦ Current pilot based on the conversion of 100000 EP applications and
the CPC hierarchy (250000 technical classification symbols) into RDF
triples.
◦ A LOD user interface provided to view the data in a user friendly way
without programming, plus SPARQL endpoint and bulk downloads.
◦ Export to JSON, text, XML or Turtle
◦ Links to technical terms extracted from the abstracts are linked to
DBpedia
◦ States (geographical units) from addresses are mapped to
nuts.geovocab.org, and language codes to the Library of Congress
**Information provided by with Martin Kraker (EPO)
LOD and Patents
Official Initiatives
 Intellectual Property Government Open Data, IPGOD (Australia).
◦ Announced in October 2014, (IPGOD).
◦ Available via the Australian government's data portal at data.gov.au,
◦ It covers more than a century of Australian patent, trade mark, and
design data
◦ Information on the application process for each right.
◦ They have also created a unique set of identifiers to link the data to
external information on companies.
◦ IPGOD includes the PATSTAT "application identifier", so its data can be
linked direct to PATSTAT.
◦ Harmonised names of rights holders in Australia.
◦ Available through: https://data.gov.au/organization/ip-australia
◦ Detailed, well-documented data model**.
“Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue 1/2015.
**http://www.ipaustralia.gov.au/uploaded-files/reports/IP_Government_Open_Data_Paper_-_Final.pdf
LOD and Patents
Official Initiatives
 OEPM (Spain) Open Data.
◦ Part of the datos.gob.es initiative.
◦ http://datos.gob.es/catalogo/catalogo-opendata-de-oficina-
espanola-de-patentes-marcas-oepm
◦ It provides bulk downalod of data in XML (plus PDF) based on
the WIPO (ST36 standard for encoding data in XML)
◦ No SPARQL end point.
LOD and Patents
Official Initiatives
 KIPO (Korean Intellectual Property Office), as part of IP5 initiative,
started dissemination of patent information in XML on July 2014.
◦ KIPRIS tool, http://plus.kipris.or.kr/eng/main.do
◦ Related to Open Government Data in South Korea.
◦ Patents is one of 16 strategic areas in this program.
◦ What information to share, How to Share, How to support utilization.
◦ How to share: bulk, via API or LOD in XML format (WIPO ST.96 Std.).
◦ How to support utilization: Applicant Name standardization.
◦ IP-Biz Integrated service to connect patent and Business data.
◦ API, Bulk download.
“Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue
1/2015.
LOD and Patents
A brief summary
 Different data sets currently available, but…
◦ Open data (publicly made accessible) does not mean “linked
data”.
◦ XML publishing is not the same as RDF publishing.
◦ Adding links to other entities (companies, people or topics) is a
must-have to talk about “linked” open data.
◦ RDF-S standardization is also an interesting choice.
◦ Publishing data in RDF format is just a first step… to allow target
communities figure out how to use the data.
“Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue
1/2015.
LOD and Patents
Analysis of potential applications
 LOD data are useful when they are linked to other data to enhance
the capability of finding additional information.
 Different use cases are being considered to exploit existing patent
data sets, in particular:.
◦ Enhance institutional repositories with patent data (bulk import or
access through APIs).
 Prototype under development to integrate UCA repository with OEPM data)
 Integration of dspace repository SW to gather patent details.
◦ Integrate patent data into Web-based Innovation Platforms and
Business Directories.
 Checking the “innovation capabilities” of agents (persons, entities)
 Dynamic building of an “innovation profile” collecting and merging data about
patents, projects and papers.
 In innovation, linear models have been replaced by
collaborative models
 These models are based on feedback and interactions between
different partners.
 This evolution has evolved toward the Open Innovation model or
paradigm (Chesbrough 2003).
 Today, IM is seen as a:
◦ non-linear,
◦ evolutionary,
◦ interactive process between the company and its environment,
◦ that requires the close collaboration of different agents.
LOD and Patents
Analysis of potential applications
Principles of Open Innovation
 Valuable ideas may come from both inside and outside the
company
 It is the consequence of several factors:
◦ knowledge specialization,
◦ availability of highly skilled workers,
◦ increasing capabilities of suppliers,
◦ and the difficulties of having a complete domain of all the aspects that need to be
mastered in a successful innovation life cycle.
◦ Different “knowledge streams” must be managed: market, scientific and
technical, and social knowledge.
 Different agents have a different level of participation in the
generation of the knowledge that provide the inputs to create
innovations
 Complex interfaces between them.
LOD and Patents
Analysis of potential applications
 OI depends on the ability to cooperate with other partners who
are developing innovation.
 Agents need to give visibility to their innovation capabilities and
achievements (products or services, skills, etc.) in a global
context.
 In some sectors, e.g. aerospace, big companies need to set up
agreements with other companies to ensure “geographical return
on investment”.
 Which are the tools we have to give visibility to our company?
◦ Business directories
◦ Collaborative, Innovation platforms.
LOD and Patents
Analysis of potential applications
 Current company/business directories and databases focus on
“contact details” and “financial data”
 They are mostly oriented to assess the “health of the companies
from an economic perspective”.
◦ Do these directories offer the data to support OI planning
activities?
◦ Do they provide data to identify areas of expertise and
previous experience?
◦ How easy is to identify partners for specific projects?
LOD and Patents
Analysis of potential applications
 Restrictions of current business directories fall in these areas:
◦ They are not exhaustive, and some of them exclude SMEs/VSEs.
◦ They classify companies by large, general areas of activity.
◦ They focus on company financial characteristics: income, sales, audits,
investors, employees….
 Missing information:
◦ Product and service descriptions.
◦ Previous experience in projects.
◦ Technical achievements, patents.
◦ Experience and compliance with regulations and standards.
◦ Assessment of intellectual capital (employees with specific profiles),
areas of expertise, etc.
LOD and Patents
Analysis of potential applications
And the Collaborative innovation platforms?
 Innovation platforms are web-based collective workspaces to
leverage the innovation process.
◦ Registered companies put “challenges”, with a technical description of
the problem to solve.
◦ Participants can propose their solution to the problem.
◦ The company that propose the challenge select the most suitable
solution.
 Main constraints:
◦ Partners are identified in the specific context of a problem.
◦ They do not support partners’ assessments, but just the assessment of
the proposed solutions.
◦ Innovation life cycle requires a “long-term partnership”.
LOD and Patents
Analysis of potential applications
 We can conclude that a different type of “directory” may be
useful to support and foster collaboration and innovation:
◦ With not only data about companies, but individual researchers, university
departments and research groups.
◦ Additional data : work experience, technical achievements (patents,
technical papers, products)
◦ With a high level of specialization to characterize content (areas of expertise
and achievements).
 May Web 3.0 technologies be useful?
◦ Business directories are mainly Web 1.0 tools.
◦ Innovation platforms are mainly Web 2.0 tools.
◦ Hypothesis: this is a data integration problem.
LOD and Patents
Analysis of potential applications
 A preliminary survey let us identify these information items:
◦ Company contact details, including lines of business and activities.
◦ Areas of knowledge/expertise, going further in detail.
◦ Description of the company facilities and resources.
◦ Achievements:
 Projects
 Products and services.
 Technologies
 Patents
 Papers
◦ Other entities the company has worked with in collaborative projects
◦ References and customers, linked to achievements.
LOD and Patents
Analysis of potential applications
 These data items contribute to a metadata infrastructure that can
be used in two different ways :
◦ to identify and assess partners in a global context
◦ to assess the relevance of incoming ideas sent in response to
“innovation challenges”.
LOD and Patents
Analysis of potential applications
 Reusable ontologies:
◦ FOAF, DC, SIOC, SKOS, OBI, VIVO, Organization ontology, Core
Business Vocabulary.
◦ Idea Ontology for Innovation Management (Riedl et al., 2009)
◦ OntoGate (Bullinger, 2008)
 Modelling he idea assessment and selection.
◦ GI2MO Ontology (Westerski et al. 2010)
 Formalization of data to describe ideas and associated information.
◦ Iteams Ontology (Ning, 2006)
 Cover goals, actions, teams, results and community.
◦ Innovation Management Ontology (Elbassiti, 2014)
Our Research
Metadata Infrastructure
 Target:
◦ Having a prototype of a “Semantic-enabled” repository of agents
(companies, researchers and groups) and related achievements to
demonstrate how these tools can support OI initiatives.
◦ Two lines of work: biomedical engineering and aerospace.
◦ Geographical scope: Spain.
 Phases:
◦ Identification of information needs.
◦ Data capture in relational structure (2000 main entities).
◦ Vocabulary selection for data encoding.
◦ Loading data into repository.
◦ Linking data to external sources.
◦ Building user interfaces including dynamic searching of remote sites.
Our Research
Phases
 Patents contribution:
◦ Patents are part of the entity / person achievements.
◦ Patents provide “linguistic clues” to identify skills, competences and
areas of knowledge and build search/browse systems.
Our Research
Phases
 Thanks!
Our Research
Thanks

More Related Content

PDF
Big Data: Big Issues for IP
PPTX
Enterprise search
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PDF
Sebastian Hellmann
PDF
Semantic Technologies for Big Data
PPTX
Enterprise knowledge graphs
PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
PDF
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Big Data: Big Issues for IP
Enterprise search
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Sebastian Hellmann
Semantic Technologies for Big Data
Enterprise knowledge graphs
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...

What's hot (20)

PPTX
What can linked data do for digital libraries
PDF
II-SDV 2015, 20 - 21 April, in Nice
PPT
lawTechCamp - Knowledge Management Panel
PDF
Smart Data Applications powered by the Wikidata Knowledge Graph
PDF
Dataverse opportunities
 
PPTX
Linked data for Enterprise Data Integration
PPTX
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
PPTX
Conclusions - Linked Data
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
PDF
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
PDF
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
PDF
Building Knowledge Graphs in 10 steps
PDF
Ontos NLP Stack, Sep. 2016
PPTX
Semantic Technology in Publishing & Finance
PDF
Using the Semantic Web Stack to Make Big Data Smarter
PDF
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
PDF
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
PPTX
ODSC and iRODS
PPTX
CNI 2018: A Research Object Authoring Tool for the Data Commons
PDF
On demand access to Big Data through Semantic Technologies
What can linked data do for digital libraries
II-SDV 2015, 20 - 21 April, in Nice
lawTechCamp - Knowledge Management Panel
Smart Data Applications powered by the Wikidata Knowledge Graph
Dataverse opportunities
 
Linked data for Enterprise Data Integration
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
Conclusions - Linked Data
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
Building Knowledge Graphs in 10 steps
Ontos NLP Stack, Sep. 2016
Semantic Technology in Publishing & Finance
Using the Semantic Web Stack to Make Big Data Smarter
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
ODSC and iRODS
CNI 2018: A Research Object Authoring Tool for the Data Commons
On demand access to Big Data through Semantic Technologies
Ad

Similar to Linked Open Data in the World of Patents (20)

PPT
Future of Web 2.0 & The Semantic Web
PDF
Structured Data for the Financial Industry
PPTX
Linked Data MLA 2015
PPTX
Linked data MLA 2015
PPTX
Semantic Web, e-commerce
PDF
The Web of Data: The W3C Semantic Web Initiative
PPTX
The Evolving Semantic Web
PPT
Linked data and voyager
PPT
Lodlam saa 2011_jenelfarrell_2
PDF
WebGUI And The Semantic Web
PPT
EIFL 2014 - Linked Open Data
PDF
Contextual Computing - Knowledge Graphs & Web of Entities
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PPTX
Linked open data project
PPT
DM110 - Week 10 - Semantic Web / Web 3.0
PDF
The Future of Semantics on the Web
PDF
Ontology, Semantic Web and DBpedia
PDF
Contextual Computing: Laying a Global Data Foundation
PPT
Corrib.org - OpenSource and Research
PDF
20110728 datalift-rpi-troy
Future of Web 2.0 & The Semantic Web
Structured Data for the Financial Industry
Linked Data MLA 2015
Linked data MLA 2015
Semantic Web, e-commerce
The Web of Data: The W3C Semantic Web Initiative
The Evolving Semantic Web
Linked data and voyager
Lodlam saa 2011_jenelfarrell_2
WebGUI And The Semantic Web
EIFL 2014 - Linked Open Data
Contextual Computing - Knowledge Graphs & Web of Entities
Usage of Linked Data: Introduction and Application Scenarios
Linked open data project
DM110 - Week 10 - Semantic Web / Web 3.0
The Future of Semantics on the Web
Ontology, Semantic Web and DBpedia
Contextual Computing: Laying a Global Data Foundation
Corrib.org - OpenSource and Research
20110728 datalift-rpi-troy
Ad

More from Dr. Haxel Consult (20)

PDF
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
PDF
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
PDF
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
PDF
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
PDF
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
PDF
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
PDF
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
PDF
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
PDF
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
PDF
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
PDF
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
PDF
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
PDF
AI-SDV 2022: Copyright Clearance Center
PDF
AI-SDV 2022: Lighthouse IP
PDF
AI-SDV 2022: New Product Introductions: CENTREDOC
PDF
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
PDF
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...

Recently uploaded (20)

PPTX
Introduction to networking local area networking
PPTX
IoT Lecture IoT Lecture IoT Lecture IoT Lecture
PDF
JuanConnect E-Wallet Guide for new users.pdf
PPTX
using the citation of Research to create a research
PPTX
Digital Project Mastery using Autodesk Docs Workshops
PPTX
Introduction: Living in the IT ERA.pptx
PDF
Slides World Games Great Redesign Eco Economic Epochs.pdf
PPTX
PORTFOLIO SAMPLE…….………………………………. …pptx
PPT
chapter 5: system unit computing essentials
PPTX
Going_to_Greece presentation Greek mythology
PPTX
IT-Human Computer Interaction Report.pptx
PDF
AGENT SLOT TERPERCAYA INDONESIA – MAIN MUDAH, WD CEPAT, HANYA DI KANCA4D
DOCX
Audio to Video AI Technology Revolutiona
PPTX
Basic_of_Computer_System.pptx class-8 com
PPSX
AI AppSec Threats and Defenses 20250822.ppsx
PDF
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
PDF
healthwealthtech4all-blogspot-com-2025-08-top-5-tech-innovations-that-will-ht...
PPT
Expect The Impossiblesssssssssssssss.ppt
PDF
Paper: World Game (s) Great Redesign.pdf
PPTX
IOT LECTURE IOT LECTURE IOT LECTURE IOT LECTURE
Introduction to networking local area networking
IoT Lecture IoT Lecture IoT Lecture IoT Lecture
JuanConnect E-Wallet Guide for new users.pdf
using the citation of Research to create a research
Digital Project Mastery using Autodesk Docs Workshops
Introduction: Living in the IT ERA.pptx
Slides World Games Great Redesign Eco Economic Epochs.pdf
PORTFOLIO SAMPLE…….………………………………. …pptx
chapter 5: system unit computing essentials
Going_to_Greece presentation Greek mythology
IT-Human Computer Interaction Report.pptx
AGENT SLOT TERPERCAYA INDONESIA – MAIN MUDAH, WD CEPAT, HANYA DI KANCA4D
Audio to Video AI Technology Revolutiona
Basic_of_Computer_System.pptx class-8 com
AI AppSec Threats and Defenses 20250822.ppsx
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
healthwealthtech4all-blogspot-com-2025-08-top-5-tech-innovations-that-will-ht...
Expect The Impossiblesssssssssssssss.ppt
Paper: World Game (s) Great Redesign.pdf
IOT LECTURE IOT LECTURE IOT LECTURE IOT LECTURE

Linked Open Data in the World of Patents

  • 1. Patents, Semantics and Open Innovation The role of LOD in a business directory for knowledge intensive industries Nice, 20-OCT-2015 Ricardo Eito-Brun [email protected]
  • 2. Patents, Semantic and Open Innovation. LOD  Reasons leading to this research: ◦ Semantic Web technologies and applications, in particular LOD publishing, constitutes a preliminary step to achieve Information Systems interoperability. ◦ Having access to distributed data, hosted by different agents and repositories, open new possibilities to research in multiple areas. ◦ In the particular case of patent information:  Which are the possibilities we have if we are able of aggregating / analysing these data together with other information data set?  Is it feasible to implement improvements in the way we are accessing patent information?  Can we figure out innovative user interfaces to browse and search patent collections?
  • 3. Patents, Semantic and Open Innovation. LOD  Schema: ◦ The LOD promises. Potential benefits, technologies and standards for data encoding and interoperability. ◦ LOD in the world of patents. A review of major milestones. ◦ Overview of current research: what researchers are doing. ◦ Case Study: the particular case of Web-based Innovation Platforms and digital repositories.
  • 4. Linked Open Data: benefits, technologies and standards  LOD has become the main application of the SMW approach.  Semantic Web (SMW) ◦ Proposed by Tim Berners-Lee “The Semantic Web: A New Form of Web Content That Is Meaninguful to Computers Will Unleash a Revolution of New Possibilities”, Scientific American 2001. ◦ SMW is about having a more intelligent web, made up of documents that could be easily processed by computers and software agents with no human intervention. ◦ SMW data should be “exposed” or published in a machine- readable format. ◦ Computes should be able of understanding the meaning of data.
  • 5. Linked Open Data: benefits, technologies and standards  W3C SMW Activity: ◦ “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. […] ◦ The Semantic Web is about two things.  about common formats for integration and combination of data drawn from diverse sources…  about language for recording how the data relates to real world objects. “* *http://www.w3.org/2001/sw/
  • 6. Linked Open Data: benefits, technologies and standards  SMW is presented as an extension of the traditional web.  If Content in the traditional web is published for humans, Content in the SMW is published for software programs which could interpret them and obtain new data, information and knowledge.  Inicial SMW initiatives were focused on improving software agents’ capabilities to solve information management problems: ◦ "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home…” *http://www.w3.org/2001/sw/
  • 7. Linked Open Data: benefits, technologies and standards  SMW pillars: ◦ URIs or IRIs to provide unique identifiers to resources. ◦ XML to encode and transfer information. ◦ RDF as an XML-based vocabulary to encode metadata describing resorces. ◦ RDF-S as a means to structure the metadata about resources (What can be asserted for a resource of a specific type). ◦ OWL as an alternative to RDF/RDF-S with additional capablitities to express constraints on data. ◦ Additional languages to express the rules that govern reasoining on SMW data. *http://www.w3.org/2001/sw/
  • 8. Linked Open Data: benefits, technologies and standards  RDF proposes a vocabulary (set of tags) to express metadata about any type of resource.  RDF data can be expressed in XML or in other alternative formats.  An RDF file usually encloses metadata about a specific resource, e.g.: person, document, institution, company, event…  Resources are identified by unique identifiers (URIs). ◦ URIs are used to ensure that metadata about the same entity are grouped together. ◦ In case different applications use different identifiers for the same entity, it is possible to keep the equivalences between the different identifiers. *http://www.w3.org/2001/sw/
  • 9. Linked Open Data: benefits, technologies and standards Unique identifier for the resource, expressed as an URI. Equivalences with other identifiers proposed in other contexts (owl:sameAs) Metadata about the resource, with clearly defined meaning. Resources are given a type (rdf:type)
  • 10. Linked Open Data: benefits, technologies and standards  RDF records about resources can be linked or related.  The value of a specific metadata or property may refer to the Identifier of another resource.  This allows having sets of structured, linked data.  For example: ◦ Subjects or Topics in a classification code may have unique Ids. ◦ The “Subject” metadata field in a document will take as a value the ID of the referred topic. ◦ Personal or corporate authors may have unique Ids. ◦ The “Author” metadata field in a document will take as a value the ID of its personal/corporate author. *http://www.w3.org/2001/sw/
  • 11. Linked Open Data: benefits, technologies and standards dc:language refers to the ID of the English language. Dc:subject refers to the ID of a classification code taken from the DDC system. Dc:subject also refers to the ID of a topic taken from the LCSH system.
  • 12. Linked Open Data: benefits, technologies and standards  SMW standards and languages are not limited to RDF. ◦ RDF-S provides a way to define “schemas” for metadata, in other words, what properties/metadata can we use to describe entities of a specific type. ◦ SKOS provies a way to encode “subject headings”, “thesauri” or “classification schemas” used to indicate the topics the documents are about. ◦ Specific vocabularies to indicate which properties are available to provide metadata on resources: e.g. Dublin Core.  .
  • 13. Linked Open Data: benefits, technologies and standards Properties starting with dc: and dct: are taken from the Dublin Core vocabulary, that provides a set of metadata. Dc:subject points to LCSH/DDC topics that are expressed – in some other place in the web – using SKOS. A separate RDF-S document states which properties can be used when providing metadata about resources of type « Book »
  • 14. Linked Open Data: benefits, technologies and standards  RDF statements build a “graphs” of resources, properties and values.  As the number of metadata collected about the different entities grows, the graph is expanded.  RDF model to represent information allows browsing and discovering mechanisms that go beyond traditional search/browse capabilites.
  • 15. SKOS: conceptual structures for the SMW  Another vocabulary closely related to the SMW and LOD.  Used to: ◦ Encode “subject headings” or “classification schemas” in XML format. ◦ Encode relationships between these conceptual structures (e.g. equivalences between classes of different classification schemas) ◦ Provide list of topics to which document descriptions can be linked.  Concepts within a SKOS-encoded schema are related to each other by relationships like <broader> , <narrower> or <related>.  Labels can be given to concepts (linguistic equivalences, authorized, not authorized, deprecated…).  Concepts can also be annotated.
  • 16. SKOS: conceptual structures for the SMW Each concept has a separate skos:Concept element, identified by an URI. Skos:related points to other concepts with a related meaning. Skos:prefLabel and altLabel provides linguistic labels to the concept. Skm:UF points to deprecated concepts.
  • 17. SKOS: conceptual structures for the SMW  SKOS has become one of the key points in SMW initiatives.  Organizations usually start putting their controlled vocabularies / classification schemes online using SKOS.  Then, “bibliographic” descriptions are linked to SKOS topics as a second stage.  This provides an initial pair of linked data sets.  But SKOS becomes powerful if we can take advantage of the capability of expressing relationships between different classification schemas.  This gives the opportunity of cross-searching different repositories.
  • 18. Semantic Web: standards SPARQL  SPARQL is a W3C standard that defines a query language to search for information within RDF graphs.  SPARQL is for the SMW, the equivalent to the SQL language used for relational databases like Oracle, MySQL, Postgresql…  Collections of RDF documents within a repository can be searched using “SPARQL end points”.  SPARQL end points are aimed to software agents and software applications.  Queries are constructed dinamically by software agents, and results are returned on XML for further processing.
  • 20. Semantic Web: technologies…  Technologies and tools used to deal with SMW standards and concepts can be classified in these groups: ◦ Editors, to help define RDF-S schemas. ◦ Conversion tools, to move existing data into the RDF format. ◦ RDF repositories or “triple stores”, to support:  The storage of large data sets  Bulk downloads,  Human browsing  SPARQL searching. ◦ Specific tools to manage controlled vocabularies and generate SKOS representations.
  • 21. Linked Open Data (LOD) SMW at work…  “Linked Data is simply about using the Web to create typed links between data from different sources. These may be databases maintained by two organisations, or heterogeneous systems within one organisation that, historically, have not easily interoperated at the data level.”  “… Linked Data refers to data published on the Web in such a way that it is machine-readable, its meaning is explicitly defined, it is linked to other external data sets, and can in turn be linked to from external data sets.”  Tim Berners-Lee (2006)
  • 22. Linked Open Data (LOD) SMW at work…  Linked Data Principles: 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things.  These principles are “rules” or “recommendations” on how to publish LOD data on the web.
  • 23. Linked Open Data (LOD) SMW at work…  There is a graphical display created by Richard Cyganiak and Anja Jentzsch showing published data sets, http://lod-cloud.net/
  • 24. Linked Open Data (LOD) SMW at work…  Conditions to be included in this catalog: ◦ Data available via URIs through http or https. ◦ Data published in RDF format (any serialization method: RDFa, RDF/XML, Turtle, N-Triples). ◦ Dataset must have at least 1000 RDF statements. ◦ Dataset must contain links to at least one of the datasets in the diagram (at least 50 links).  It is also possible to get data about the number of published LOD on: http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/  http://datahub.io is another site where you can identify datasets. It contains 31 datasets related to patents, including EPO, USPTO, KIPO, UK Patent…
  • 25. Linked Open Data (LOD) SMW at work…  Conditions to be included in this catalog: ◦ Data available via URIs through http or https. ◦ Data published in RDF (serialization method: RDFa, RDF/XML, Turtle, N-Triples). ◦ Dataset must have at least 1000 RDF statements. ◦ Dataset must contain links to at least one of the datasets in the diagram (at least 50 links). ◦ Dataset available via dump OR through SPARQL endpoint.  It is also possible to get data about the number of published LOD on: http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/  http://datahub.io is another site where you can identify datasets. It contains 31 datasets related to patents, including EPO, USPTO, KIPO, UK Patent…
  • 26. Linked Open Data (LOD) SMW at work…  To check the use of specific vocabularies/metadata, Open Knowledge Foundation hosts the LOV (Linked Open Vocabularies) site since 2012:  http://lov.okfn.org/dataset/lov/  Additional LOD-related tools (search engines) include: ◦ http://watson.kmi.open.ac.uk/WatsonWUI/ ◦ http://swoogle.umbc.edu/ ◦ http://ws.nju.edu.cn/falcons/ontologysearch/index.jsp?query
  • 27. LOD and Patents Early, Academic Initiatives  NSF (SciSIP) – Science of Science and Innovation Policy Discussion on 2011 “Patent Data Workshop” to support quantitaive studies on innovation. ◦ Remarked the effort o USPTO Patent Dashboard and Data Visualization Center to study causes of innovation and outcomes of programs to stimulate innovation.  SciWire platform* that ingest and links metadata about patents, grants, to explore R&D landscape.  AKSW (Agile Knowledge Engineering and Semantic Web) to publish US Patents: ◦ SPARQL End Point: http://us.patents.aksw.org/sparql ◦ June 2014, 187 million triples. ◦ Proposed RDF schema with basic data based on dc, foaf and its own schema. *Haak, Laurel; Baker, David; Probus, Matt: Creating a Data Infrastructure for Tracking Knowledge Flow. 2012
  • 28. LOD and Patents Academic Initiatives  Subramanian, S. (2013)*, analyzed USPTO patent conversion into RDF format, and its merge with dbpedia data to provide “consolidated / merged search results”. (enrich patent data).  Singhi M., Ding Y.***, also merged USPTO patent data from the SDB (Scholarly Database at Indiana Univ.) with dbpedia entries for locations in a common database.  SDB database includes 26 millions records (4 millions US patents in the 1976-2010 period) including MEDLINE and NSF documents about research grants.  SDB uses patent information as part of its R&D analysis with the Sci2 bibliometric tool. * Subramanian, S.; Dhilpe, S. Yalamanchi, U. Exploiting Linked Data and Big Data for Semantic Patent Discovery. COEN 296, Aug. 2013. **Singhi, M., Ding, Y. Linking US Patent Data with Wikipedia. *** Bäumer et al., Linked Open Data for Scientific Data Sets. KONVENS, 2014. Heldesheim, Germany.
  • 29. LOD and Patents Academic Initiatives  Heinz Nixdorf Institute (Paderborn University) and KISTI (Korean Institute of Science and Technology)* : ◦ Make scientific data available through RDF. ◦ Use of W3C D2R and D2RQ server/converter and RelFinder visualization tool. Pilot project with 60 researchers and 400 related publications..  Zaveri, A. et al., (2012)** describe its conversion of the USPTO patents into RDF. ◦ The USPTO patents full-text data is available for download in XML format from the years 2002 onwards. ◦ From the years 1976 to 2001, data are available in plain-text. ◦ Each week USPTO releases a zipped file of all patents accepted in that week. ◦ Each year ca. 52 files are published each one containing about 5000 patents. ◦ They developed an “ontology” or schema to encode patent information. * Bäumer et al., Linked Open Data for Scientific Data Sets. KONVENS, 2014. Heldesheim, Germany. ** Zaveri et al. (2012). Publishing and Interlinking the USPTO Patent Data. Semantic Web Journal. 24/09/2014.
  • 30. LOD and Patents Academic Initiatives  Dongmin Seo et al. (2011) designed InSciTe, a technology opportunity discovery (TOD) service to support decision-making on R&D planning.  It used RDF data (including patent information) to analyze and visualize relations between technologies and agents. ◦ Trends and predictions, ◦ Relationship, ◦ Roadmaps, ◦ Competitors and collaborators.  Data set included 3,100,000 patents from US, Europe and Japan.
  • 31. LOD and Patents Academic Initiatives  Zaveri et al. (2011) described an interesting study on the use of linked open data to assess the impact of research on the biomedical area.  They analyzed data for 20 European countries over a 10 years period (1999 to 2009)..  The data set included data from Eurostat and World Bank LOD datasets.  Input data included the number of Biotechnology patent applications submitted to EPO. Zaveri et al. (2013). Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model. LNCS 8219, pp 244-259
  • 32. LOD and Patents Official Initiatives  USPTO, ◦ In 2014 developed a 18 month roadmap for open data initiatives. ◦ On April 2015, it published the “Report of Findings from an Open Data Roundtable with the U.S. Patent & Trademark Office”. ◦ No specific reference to RDF or “linked open data” ◦ Bulk download through http://patents.reedtech.com/, Google Patents. ◦ Plans and achievements:  PatentsView prototype for patent visualization (5 million U.S. patents).  Electronic Data Hosting. Repository of public bulk patent and trademark data  Assignment Search. Searchable database containing all recorded Patent Assignment information dating back to August 1980.
  • 33. LOD and Patents Official Initiatives  EPO (European Patent Office), ◦ OPS (Open Patent Service)** provides long time ago XML patent data via REST-based web services. ◦ Queries built on the CQL query language. ◦ Data coming from EPODOC, EPOQUE (full-text) and BNS (image) databases (same soures and coverage for bibliographic data as Espacenet) ◦ Bibliographic data, legal status, facsimile images, CPC classification, character-coded full text, register and family. ◦ Well-documented query interface for developers. ◦ For large datasets, possibility of bulk download is available. Kallas, P. (2006). Open Patent Service. World Patent Information Volume 28, Issue 4, December 2006, Pages 296–304
  • 34. LOD and Patents Official Initiatives  EPO (European Patent Office),. ◦ In the LOD specific context, http://epo.publicdata.eu/ dataset with around 22 million triples. ◦ Based on an OWL encoded schema/ontology. ◦ Current pilot based on the conversion of 100000 EP applications and the CPC hierarchy (250000 technical classification symbols) into RDF triples. ◦ A LOD user interface provided to view the data in a user friendly way without programming, plus SPARQL endpoint and bulk downloads. ◦ Export to JSON, text, XML or Turtle ◦ Links to technical terms extracted from the abstracts are linked to DBpedia ◦ States (geographical units) from addresses are mapped to nuts.geovocab.org, and language codes to the Library of Congress **Information provided by with Martin Kraker (EPO)
  • 35. LOD and Patents Official Initiatives  Intellectual Property Government Open Data, IPGOD (Australia). ◦ Announced in October 2014, (IPGOD). ◦ Available via the Australian government's data portal at data.gov.au, ◦ It covers more than a century of Australian patent, trade mark, and design data ◦ Information on the application process for each right. ◦ They have also created a unique set of identifiers to link the data to external information on companies. ◦ IPGOD includes the PATSTAT "application identifier", so its data can be linked direct to PATSTAT. ◦ Harmonised names of rights holders in Australia. ◦ Available through: https://data.gov.au/organization/ip-australia ◦ Detailed, well-documented data model**. “Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue 1/2015. **http://www.ipaustralia.gov.au/uploaded-files/reports/IP_Government_Open_Data_Paper_-_Final.pdf
  • 36. LOD and Patents Official Initiatives  OEPM (Spain) Open Data. ◦ Part of the datos.gob.es initiative. ◦ http://datos.gob.es/catalogo/catalogo-opendata-de-oficina- espanola-de-patentes-marcas-oepm ◦ It provides bulk downalod of data in XML (plus PDF) based on the WIPO (ST36 standard for encoding data in XML) ◦ No SPARQL end point.
  • 37. LOD and Patents Official Initiatives  KIPO (Korean Intellectual Property Office), as part of IP5 initiative, started dissemination of patent information in XML on July 2014. ◦ KIPRIS tool, http://plus.kipris.or.kr/eng/main.do ◦ Related to Open Government Data in South Korea. ◦ Patents is one of 16 strategic areas in this program. ◦ What information to share, How to Share, How to support utilization. ◦ How to share: bulk, via API or LOD in XML format (WIPO ST.96 Std.). ◦ How to support utilization: Applicant Name standardization. ◦ IP-Biz Integrated service to connect patent and Business data. ◦ API, Bulk download. “Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue 1/2015.
  • 38. LOD and Patents A brief summary  Different data sets currently available, but… ◦ Open data (publicly made accessible) does not mean “linked data”. ◦ XML publishing is not the same as RDF publishing. ◦ Adding links to other entities (companies, people or topics) is a must-have to talk about “linked” open data. ◦ RDF-S standardization is also an interesting choice. ◦ Publishing data in RDF format is just a first step… to allow target communities figure out how to use the data. “Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue 1/2015.
  • 39. LOD and Patents Analysis of potential applications  LOD data are useful when they are linked to other data to enhance the capability of finding additional information.  Different use cases are being considered to exploit existing patent data sets, in particular:. ◦ Enhance institutional repositories with patent data (bulk import or access through APIs).  Prototype under development to integrate UCA repository with OEPM data)  Integration of dspace repository SW to gather patent details. ◦ Integrate patent data into Web-based Innovation Platforms and Business Directories.  Checking the “innovation capabilities” of agents (persons, entities)  Dynamic building of an “innovation profile” collecting and merging data about patents, projects and papers.
  • 40.  In innovation, linear models have been replaced by collaborative models  These models are based on feedback and interactions between different partners.  This evolution has evolved toward the Open Innovation model or paradigm (Chesbrough 2003).  Today, IM is seen as a: ◦ non-linear, ◦ evolutionary, ◦ interactive process between the company and its environment, ◦ that requires the close collaboration of different agents. LOD and Patents Analysis of potential applications
  • 41. Principles of Open Innovation  Valuable ideas may come from both inside and outside the company  It is the consequence of several factors: ◦ knowledge specialization, ◦ availability of highly skilled workers, ◦ increasing capabilities of suppliers, ◦ and the difficulties of having a complete domain of all the aspects that need to be mastered in a successful innovation life cycle. ◦ Different “knowledge streams” must be managed: market, scientific and technical, and social knowledge.  Different agents have a different level of participation in the generation of the knowledge that provide the inputs to create innovations  Complex interfaces between them. LOD and Patents Analysis of potential applications
  • 42.  OI depends on the ability to cooperate with other partners who are developing innovation.  Agents need to give visibility to their innovation capabilities and achievements (products or services, skills, etc.) in a global context.  In some sectors, e.g. aerospace, big companies need to set up agreements with other companies to ensure “geographical return on investment”.  Which are the tools we have to give visibility to our company? ◦ Business directories ◦ Collaborative, Innovation platforms. LOD and Patents Analysis of potential applications
  • 43.  Current company/business directories and databases focus on “contact details” and “financial data”  They are mostly oriented to assess the “health of the companies from an economic perspective”. ◦ Do these directories offer the data to support OI planning activities? ◦ Do they provide data to identify areas of expertise and previous experience? ◦ How easy is to identify partners for specific projects? LOD and Patents Analysis of potential applications
  • 44.  Restrictions of current business directories fall in these areas: ◦ They are not exhaustive, and some of them exclude SMEs/VSEs. ◦ They classify companies by large, general areas of activity. ◦ They focus on company financial characteristics: income, sales, audits, investors, employees….  Missing information: ◦ Product and service descriptions. ◦ Previous experience in projects. ◦ Technical achievements, patents. ◦ Experience and compliance with regulations and standards. ◦ Assessment of intellectual capital (employees with specific profiles), areas of expertise, etc. LOD and Patents Analysis of potential applications
  • 45. And the Collaborative innovation platforms?  Innovation platforms are web-based collective workspaces to leverage the innovation process. ◦ Registered companies put “challenges”, with a technical description of the problem to solve. ◦ Participants can propose their solution to the problem. ◦ The company that propose the challenge select the most suitable solution.  Main constraints: ◦ Partners are identified in the specific context of a problem. ◦ They do not support partners’ assessments, but just the assessment of the proposed solutions. ◦ Innovation life cycle requires a “long-term partnership”. LOD and Patents Analysis of potential applications
  • 46.  We can conclude that a different type of “directory” may be useful to support and foster collaboration and innovation: ◦ With not only data about companies, but individual researchers, university departments and research groups. ◦ Additional data : work experience, technical achievements (patents, technical papers, products) ◦ With a high level of specialization to characterize content (areas of expertise and achievements).  May Web 3.0 technologies be useful? ◦ Business directories are mainly Web 1.0 tools. ◦ Innovation platforms are mainly Web 2.0 tools. ◦ Hypothesis: this is a data integration problem. LOD and Patents Analysis of potential applications
  • 47.  A preliminary survey let us identify these information items: ◦ Company contact details, including lines of business and activities. ◦ Areas of knowledge/expertise, going further in detail. ◦ Description of the company facilities and resources. ◦ Achievements:  Projects  Products and services.  Technologies  Patents  Papers ◦ Other entities the company has worked with in collaborative projects ◦ References and customers, linked to achievements. LOD and Patents Analysis of potential applications
  • 48.  These data items contribute to a metadata infrastructure that can be used in two different ways : ◦ to identify and assess partners in a global context ◦ to assess the relevance of incoming ideas sent in response to “innovation challenges”. LOD and Patents Analysis of potential applications
  • 49.  Reusable ontologies: ◦ FOAF, DC, SIOC, SKOS, OBI, VIVO, Organization ontology, Core Business Vocabulary. ◦ Idea Ontology for Innovation Management (Riedl et al., 2009) ◦ OntoGate (Bullinger, 2008)  Modelling he idea assessment and selection. ◦ GI2MO Ontology (Westerski et al. 2010)  Formalization of data to describe ideas and associated information. ◦ Iteams Ontology (Ning, 2006)  Cover goals, actions, teams, results and community. ◦ Innovation Management Ontology (Elbassiti, 2014) Our Research Metadata Infrastructure
  • 50.  Target: ◦ Having a prototype of a “Semantic-enabled” repository of agents (companies, researchers and groups) and related achievements to demonstrate how these tools can support OI initiatives. ◦ Two lines of work: biomedical engineering and aerospace. ◦ Geographical scope: Spain.  Phases: ◦ Identification of information needs. ◦ Data capture in relational structure (2000 main entities). ◦ Vocabulary selection for data encoding. ◦ Loading data into repository. ◦ Linking data to external sources. ◦ Building user interfaces including dynamic searching of remote sites. Our Research Phases
  • 51.  Patents contribution: ◦ Patents are part of the entity / person achievements. ◦ Patents provide “linguistic clues” to identify skills, competences and areas of knowledge and build search/browse systems. Our Research Phases