SlideShare a Scribd company logo
May (updated) 2010 Product Stack
Enterprise Approach
Enterprise Approach Semantic Enterprise  based on  semantic Web ,  linked data Leverage existing assets Data, records and instances Taxonomies, structure and schema Layer semantics on to existing systems Develop incrementally Add sophistication, scope over time Keep risks low Integrate with public and Web data  (“open world”)
Linked Data “ Linked Data is a set of best practices for publishing and deploying instance and class data using the RDF data model, naming the data objects using uniform resource identifiers (URIs), thereby exposing the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.”
Layers and Current Products
Current Products the pivotal product; Web services middleware that provides distributed data access and federation Drupal-based structured data linkage to  structWSF spreadsheet, JSON and XML authoring and conversion framework reference set of linking subjects and basis for domain vocabularies an ontology- and entity-driven information extraction and tagging system
Fit of Current Products within Layers
Existing Assets Layer
Existing Assets These are the materials that need to be federated, made interoperable, and given a common semantics structured data / databases semi-structure data (XML, Web pages) unstructured data (text)
Preserving Existing Assets Relational databases  (RDBMs) Distributed structured assets spreadsheets lightweight datastores Web pages and Web sites Existing documents and text Web databases and APIs Other databases  (RDF, OO, etc.)
Access/Conversion Layer
Conversion Provides in-place access to existing information Translates existing formats and structures to RDF Extracts structured information from unstructured text Aids creation of interoperable datasets Geared almost entirely to  records ,  instances  or  entities   (that is, basic data)
Conversion Methods Relational DBs: RDB2RDF RDFizers Information Extraction New Dataset Authoring Direct Use  (already in RDF)
Relational DB Conversion Simple mappings of instance records to RDF Methodologies well proven  if  kept to the instance level RDB schema inform the interoperable layer  (“ontologies”) Relational datastores left in place Record data obtained via access layer  (structWSF)
RDFizers General serialization or data format conversions to RDF Mostly applied to: Standard data formats and data structs Web content APIs Some legacy content  Sometimes some minor ontology or schema mapping Embodies all conversion steps to linked data We have access to more than  100+  existing formats
RDFizers – Listing 1 URN handlers (in addition to IRI and URI): DOI  LSID  OAI  RDF Serialization formats:  irON  N3  RDF/XML  Turtle  Languages and ontologies:  AB Meta  Annotea  APML  AtomOWL  Bibliographic Ontology  Creative Commons  EXIF  FOAF  GeoNames  GoodRelations  Java   Javadoc   MARC/MODS   Meta Standards  Music Ontology  Natural Language   Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)  Open Geospatial  OWL  SIOC  SIOCT SKOS  UMBEL  vCard  XML   Others  (X)HTML pages  Embedded Microformats and GRDDL * (see note below):  DC  eRDF  geoURL  Google Base  hAudio  hCalendar  hCard  hListing  hResume  hReview  HR-XML  Ning  RDFa  relLicense  SVG  XBRL  XFN  xFolk  XR-XML  XSLT  Syndication Formats:  Atom  OPML  OCS  RSS 1.1  RSS 2.0  XBEL (for bookmarks) REST-style Web service APIs:  Alchemy  Amazon  Apple  Best Buy  Calais  CNet  CrunchBase  Del.icio.us  Digg  Discogs  Disqus  eBay  Facebook  Flickr  Freebase (MQL)  FriendFeed  Garmin   Get Satisfaction  Google  Google Apps  Hoover's  HTTP (raw)  ISBN DB  Last.fm  Library Thing  Magnolia  Meetup  MusicBrainz  New York Times  New York Times Campaign Finance (NYTCF)  New York Times tags
RDFizers – Listing 2 Open Library  Open Social  Open Street  OpenLink (facets)  O'Reilly  Picasa  Radio Pop (BBC)  Rhapsody  Salesforce  Slideshare  Slidy  Technorati  Tesco  They Work For You  Twine  Twitter  Weather   Wikipedia  World Bank  Yahoo! BOSS  Yahoo! Finance  Yahoo! Maps  Yahoo! Weather  Yelp  YouTube  Zemanta  Zillow  Files (multitude of file formats and MIME types, including):  audio (general)  BibJSON  BibTEX  and  others   BitTorrent   commON  CSV   Fink   Flat files   irJSON  irXML  JPEG   JSON  images  MS Office  OpenOffice  Open Document Format  Palm   RDF123   video  XLS   etc.  Metadata extractors:  CRW   DEB   EXIF   OCW   RPM   XMP   Email formats:  EMail   Outlook   RFC822   Version control and related systems:  Bugzilla  Jira   POM   Subversion   Other Web service frameworks:  BPEL  WSDL  XBRL  XBEL  Data exchange formats:  iCalendar  LDIF   vCalendar  vCard  Relational databases and related:  D2RQ   D2RMAP   RDF Views  Virtuoso VADs  OpenLink license files  Third party metadata extraction frameworks:  Aperture   Spotlight  Miscellaneous and other related converters:  MPEG-7/CS  -> OWL  Random XSD  -> OWL * GRDDL  (Gleaning Resource Descriptions from Dialects of Languages) accommodates a wide variety of dialects (see  one listing ) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
scones
Information Extraction scones  ( S ubject  C oncept  O r  N amed  E ntitie S )  is our IE tagger Information extraction is applied to input Web pages and unstructured text May be applied  after  structure extraction: (often, at minimum,  defluffing ) Settable “window” for snippet  (from # of bracketing terms to full document) Extraction is performed for both: Entities (per Wikipedia and enterprise dictionaries) Subject concepts (per UMBEL and domain ontologies) Presently in prototype
(Named) Entities The  places ,  events ,  people ,  objects , and specific  things  of the real world Literally millions of notable instances Each belongs to one or more  subject concept (s) Currently, the predominate basis for linked data Public sources include Wikipedia and Freebase, others Can be readily mixed-and-matched with private entities
Creating New Entity Dictionaries
Triangulating Information Extraction
irON –  i nstance  r ecord and  O bject  N otation
irON Dataset Authoring Framework Simple authoring and dataset creation irON includes an abstract notation and vocabulary for instance records Serializations available for: XML (irXML) JSON (irJSON) CSV/spreadsheets (commON) Notations for: Instance records Schema Datasets and metadata Linkages to other schema
Three irON Serializations irXML irJSON commON
More-or-less Interchangeable Formats
structWSF
structWSF Generally RESTful Web services middleware Uniform, distributed access point Provides the interoperability architecture Based on canonical RDF data model Dataset access orientation Standard tools and services: User permissions and access CRUD (create, read, update, delete) Browse Full-text, faceted search Import / export Many others
RDF and Data Federation Model
Advantages of a Canonical Model All tools can be driven from a single data format basis Single converters can link in other hubs of data forms ‘ Round-tripping’ thru the canonical form can bring consistency and cleanliness to inputted data RDF is well-suited as the canonical form: Structured data Semi-structured data Unstructured data (after IE) Simple-to-complex data structures Logic and inferencing Suitable to all input data formats Many serializations possible
A Collaborative, Distributed Network
Flexible User Access Permissions
Access, APIs and Endpoints The resulting linked data may be exposed as: APIs Web services SPARQL endpoints
Ontologies Layer
Ontologies Ontologies provide the basis for: Interoperating Reconciling semantics Multiples may be used at any time Both enterprise (internal) and external ontologies Best built incrementally, with participation Easily modified:  OK  to test and experiment
Ontologies The structural relationships of concepts within a domain Generally class-  (or set-)  oriented Analogous to relational database schema, only with controlled vocabularies and exact semantics Sets the structure of how to organize the actual data  (“instances”)  in the domain Semantics and mapping techniques allow disparate ontologies to be inter-related Can inference or reason over the structure
Migrating Structure to the Ontology Layer
Ontologies Layer
irON
irON Record Vocabulary irON also provides the standard instance record vocabulary for all federated records Each record source has its own attributes But, irON provides common descriptors: Useful for interoperating Unique, Web-accessible identifiers Standard descriptions and labels Conventions for “driving” user interfaces and tools
UMBEL UMBEL  ( U pper  M apping and  B inding  E xchange  L ayer) 20,000  defined reference points in information space Means to assert what a given chunk of content  is about Enable similar content to be aggregated Place content in  context   with other content Aggregation points for tying in  instances  and  entities   Derived and a subset of the Cyc knowledge base Vocabulary basis for domain-specific subject ontologies
Notable Ontologies and Vocabularies
Management Layer
Management/Federation Layer Management/Federation Layer handles: Ontology mapping, management Queries and retrievals All Web services Imports and exports Inferencing and logic Ontology creation and expansion Works off of many RDF datastores Has efficient, full-text indexing with faceting Interface to the system is  structWSF Can plug into many options at the Applications Layer (only Drupal with conStruct SCS yet deployed)
Web-oriented Architecture
Applications Layer
conStruct SCS
conStruct Browse Screen
conStruct Capabilities Based on Drupal Single-click  ( cloud )  deployment Theming User and group access and management Data display templates General content management system  (CMS) Publishing RDF Open source
Re-cap
Summary Incremental, low-risk approach to the  semantic enterprise Maximum leverage and re-use of existing information assets Conversion and federation of all available data forms Excellent uses for: Business intelligence Knowledge management Master data management modernization Taxonomy modernization Enterprise content integration All baseline products are open source
Contacts & Information Michael K. Bergman CEO 319.621.5225 [email_address] blog:  www.mkbergman.com Steve Ardire Senior Advisor [email_address] Frédérick Giasson CTO [email_address] blog:  fgiasson.com /blog Web Sites structureddynamics.com umbel.org umbel.structureddynamics.com  (UMBEL Web services) citizen- dan.org  (community indicator systems) openstructs.org  (open source distros + documentation) constructscs.com  (Drupal structured data system)
 

More Related Content

What's hot (20)

PPT
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Jenn Riley
 
PDF
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
PDF
.Net and Rdf APIs
Recean Denis
 
PDF
Ontologies and semantic web
Stanley Wang
 
PPT
Solving Real Problems Using Linked Data
rumito
 
PDF
RDF and Java
Constantin Stan
 
PPTX
Owl web ontology language
hassco2011
 
PPTX
Efficient RDF Interchange (ERI) Format for RDF Data Streams
WU (Vienna University of Economics and Business)
 
PPT
Linked data and voyager
Edmund Chamberlain
 
PDF
Resource description framework
Stanley Wang
 
PPT
Virtuoso Universal Server Overview
rumito
 
PPT
Linked Data Driven Data Virtualization for Web-scale Integration
rumito
 
PDF
The web of interlinked data and knowledge stripped
Sören Auer
 
PPTX
Building Linked Data Applications
EUCLID project
 
ODP
Data Integration And Visualization
Ivan Ermilov
 
PPT
Semantics
Mokhtar Ben Henda
 
PDF
Jarrar: OWL (Web Ontology Language)
Mustafa Jarrar
 
PPT
Linked Data Planet Key Note
rumito
 
PPT
Site Interoperability Projects at DERI Galway's SW Cluster
John Breslin
 
PPT
Providing Tools for Author Evaluation - A case study
inscit2006
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Jenn Riley
 
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
.Net and Rdf APIs
Recean Denis
 
Ontologies and semantic web
Stanley Wang
 
Solving Real Problems Using Linked Data
rumito
 
RDF and Java
Constantin Stan
 
Owl web ontology language
hassco2011
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
WU (Vienna University of Economics and Business)
 
Linked data and voyager
Edmund Chamberlain
 
Resource description framework
Stanley Wang
 
Virtuoso Universal Server Overview
rumito
 
Linked Data Driven Data Virtualization for Web-scale Integration
rumito
 
The web of interlinked data and knowledge stripped
Sören Auer
 
Building Linked Data Applications
EUCLID project
 
Data Integration And Visualization
Ivan Ermilov
 
Jarrar: OWL (Web Ontology Language)
Mustafa Jarrar
 
Linked Data Planet Key Note
rumito
 
Site Interoperability Projects at DERI Galway's SW Cluster
John Breslin
 
Providing Tools for Author Evaluation - A case study
inscit2006
 

Viewers also liked (20)

PPT
Ch20 OS
C.U
 
PPT
Ch11 OS
C.U
 
KEY
Utilizing open-data
ccalnan
 
PDF
Livinbrand 2016 - Daniel Vítová, Public Eye: Základní pravidla budování vztah...
Ondřej Rudolf
 
PDF
Next Generation Web
Nick Finck
 
PPT
CategoríA Informativa
tat
 
PPT
Workshop
Macarena R.
 
PDF
Peace Corps Wiki and Peace Corps Journals
willjermuk
 
PPS
Top Reasons To Recycle
JAVI RUI
 
PDF
The role of COINS in the Civic Space: Building a pathway to shared prosperity
Betsey Merkel
 
PDF
Building Community In The Civic Space-revitalizing communities in America.
Betsey Merkel
 
KEY
Getting To "Paid"
Cindy Alvarez
 
PPTX
Slide 1
Andre Lara
 
PPS
Pink Aveona
anaq
 
PPT
Googley Family Philanthropy
jeffjarvis
 
PPS
Empress Carlota Maroof
anaq
 
PDF
User experience utopia - interact seattle
Nick Finck
 
PPT
miLexicon @ Eurocall2010
Joshua Underwood
 
PPT
solar system_yasmine
ebando1975
 
PPS
soy normal
zizudinho
 
Ch20 OS
C.U
 
Ch11 OS
C.U
 
Utilizing open-data
ccalnan
 
Livinbrand 2016 - Daniel Vítová, Public Eye: Základní pravidla budování vztah...
Ondřej Rudolf
 
Next Generation Web
Nick Finck
 
CategoríA Informativa
tat
 
Workshop
Macarena R.
 
Peace Corps Wiki and Peace Corps Journals
willjermuk
 
Top Reasons To Recycle
JAVI RUI
 
The role of COINS in the Civic Space: Building a pathway to shared prosperity
Betsey Merkel
 
Building Community In The Civic Space-revitalizing communities in America.
Betsey Merkel
 
Getting To "Paid"
Cindy Alvarez
 
Slide 1
Andre Lara
 
Pink Aveona
anaq
 
Googley Family Philanthropy
jeffjarvis
 
Empress Carlota Maroof
anaq
 
User experience utopia - interact seattle
Nick Finck
 
miLexicon @ Eurocall2010
Joshua Underwood
 
solar system_yasmine
ebando1975
 
soy normal
zizudinho
 
Ad

Similar to Structured Dynamics' Semantic Technologies Product Stack (20)

PPT
Adri Jovin - Semantic Web
Adri Jovin
 
PPT
Linked Data Tutorial
Sören Auer
 
PPTX
Enterprise knowledge graphs
Sören Auer
 
PPT
Corrib.org - OpenSource and Research
adameq
 
PPT
Web Topics
Praveen AP
 
PPTX
Linked data HHS 2015
Cason Snow
 
PPT
Web 3 Mark Greaves
Mediabistro
 
ODP
State of the Semantic Web
Ivan Herman
 
PPTX
Web 3.0 & io t (en)
Rikard Strid
 
PPTX
Web 3.0 & IoT (English)
Peter Waher
 
PPT
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
Joel Amoussou
 
ODP
Eol Drupal Dman Presentation
David Shorthouse
 
PPT
Semantic Web: Technolgies and Applications for Real-World
Amit Sheth
 
PPTX
Legislative data portals and linked data quality
Jose Emilio Labra Gayo
 
PDF
CDF Embraces XML and SOAP
The HDF-EOS Tools and Information Center
 
PPTX
Robust Module based data management system
Rahul Roi
 
PPT
Data Portability
sitarama murthy
 
PPTX
Linked Data MLA 2015
Cason Snow
 
PPTX
Linked data MLA 2015
Cason Snow
 
Adri Jovin - Semantic Web
Adri Jovin
 
Linked Data Tutorial
Sören Auer
 
Enterprise knowledge graphs
Sören Auer
 
Corrib.org - OpenSource and Research
adameq
 
Web Topics
Praveen AP
 
Linked data HHS 2015
Cason Snow
 
Web 3 Mark Greaves
Mediabistro
 
State of the Semantic Web
Ivan Herman
 
Web 3.0 & io t (en)
Rikard Strid
 
Web 3.0 & IoT (English)
Peter Waher
 
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
Joel Amoussou
 
Eol Drupal Dman Presentation
David Shorthouse
 
Semantic Web: Technolgies and Applications for Real-World
Amit Sheth
 
Legislative data portals and linked data quality
Jose Emilio Labra Gayo
 
CDF Embraces XML and SOAP
The HDF-EOS Tools and Information Center
 
Robust Module based data management system
Rahul Roi
 
Data Portability
sitarama murthy
 
Linked Data MLA 2015
Cason Snow
 
Linked data MLA 2015
Cason Snow
 
Ad

More from Mike Bergman (8)

PPTX
Context, Perspective, and Generalities in a Knowledge Ontology
Mike Bergman
 
PPT
Seven Arguments for Semantic Technologies
Mike Bergman
 
PPT
The Rationale for Semantic Technologies
Mike Bergman
 
PPT
Pragmatic Approaches to the Semantic Web
Mike Bergman
 
PPT
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
PPT
Data-driven Applications with conStruct
Mike Bergman
 
PPT
UMBEL: Subject Concepts Layer for the Web
Mike Bergman
 
PPT
UMBEL Semantic Web Services
Mike Bergman
 
Context, Perspective, and Generalities in a Knowledge Ontology
Mike Bergman
 
Seven Arguments for Semantic Technologies
Mike Bergman
 
The Rationale for Semantic Technologies
Mike Bergman
 
Pragmatic Approaches to the Semantic Web
Mike Bergman
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
Data-driven Applications with conStruct
Mike Bergman
 
UMBEL: Subject Concepts Layer for the Web
Mike Bergman
 
UMBEL Semantic Web Services
Mike Bergman
 

Recently uploaded (20)

PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 

Structured Dynamics' Semantic Technologies Product Stack

  • 1. May (updated) 2010 Product Stack
  • 3. Enterprise Approach Semantic Enterprise based on semantic Web , linked data Leverage existing assets Data, records and instances Taxonomies, structure and schema Layer semantics on to existing systems Develop incrementally Add sophistication, scope over time Keep risks low Integrate with public and Web data (“open world”)
  • 4. Linked Data “ Linked Data is a set of best practices for publishing and deploying instance and class data using the RDF data model, naming the data objects using uniform resource identifiers (URIs), thereby exposing the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.”
  • 6. Current Products the pivotal product; Web services middleware that provides distributed data access and federation Drupal-based structured data linkage to structWSF spreadsheet, JSON and XML authoring and conversion framework reference set of linking subjects and basis for domain vocabularies an ontology- and entity-driven information extraction and tagging system
  • 7. Fit of Current Products within Layers
  • 9. Existing Assets These are the materials that need to be federated, made interoperable, and given a common semantics structured data / databases semi-structure data (XML, Web pages) unstructured data (text)
  • 10. Preserving Existing Assets Relational databases (RDBMs) Distributed structured assets spreadsheets lightweight datastores Web pages and Web sites Existing documents and text Web databases and APIs Other databases (RDF, OO, etc.)
  • 12. Conversion Provides in-place access to existing information Translates existing formats and structures to RDF Extracts structured information from unstructured text Aids creation of interoperable datasets Geared almost entirely to records , instances or entities (that is, basic data)
  • 13. Conversion Methods Relational DBs: RDB2RDF RDFizers Information Extraction New Dataset Authoring Direct Use (already in RDF)
  • 14. Relational DB Conversion Simple mappings of instance records to RDF Methodologies well proven if kept to the instance level RDB schema inform the interoperable layer (“ontologies”) Relational datastores left in place Record data obtained via access layer (structWSF)
  • 15. RDFizers General serialization or data format conversions to RDF Mostly applied to: Standard data formats and data structs Web content APIs Some legacy content Sometimes some minor ontology or schema mapping Embodies all conversion steps to linked data We have access to more than 100+ existing formats
  • 16. RDFizers – Listing 1 URN handlers (in addition to IRI and URI): DOI LSID OAI RDF Serialization formats: irON N3 RDF/XML Turtle Languages and ontologies: AB Meta Annotea APML AtomOWL Bibliographic Ontology Creative Commons EXIF FOAF GeoNames GoodRelations Java Javadoc MARC/MODS Meta Standards Music Ontology Natural Language Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Open Geospatial OWL SIOC SIOCT SKOS UMBEL vCard XML Others (X)HTML pages Embedded Microformats and GRDDL * (see note below): DC eRDF geoURL Google Base hAudio hCalendar hCard hListing hResume hReview HR-XML Ning RDFa relLicense SVG XBRL XFN xFolk XR-XML XSLT Syndication Formats: Atom OPML OCS RSS 1.1 RSS 2.0 XBEL (for bookmarks) REST-style Web service APIs: Alchemy Amazon Apple Best Buy Calais CNet CrunchBase Del.icio.us Digg Discogs Disqus eBay Facebook Flickr Freebase (MQL) FriendFeed Garmin Get Satisfaction Google Google Apps Hoover's HTTP (raw) ISBN DB Last.fm Library Thing Magnolia Meetup MusicBrainz New York Times New York Times Campaign Finance (NYTCF) New York Times tags
  • 17. RDFizers – Listing 2 Open Library Open Social Open Street OpenLink (facets) O'Reilly Picasa Radio Pop (BBC) Rhapsody Salesforce Slideshare Slidy Technorati Tesco They Work For You Twine Twitter Weather Wikipedia World Bank Yahoo! BOSS Yahoo! Finance Yahoo! Maps Yahoo! Weather Yelp YouTube Zemanta Zillow Files (multitude of file formats and MIME types, including): audio (general) BibJSON BibTEX and others BitTorrent commON CSV Fink Flat files irJSON irXML JPEG JSON images MS Office OpenOffice Open Document Format Palm RDF123 video XLS etc. Metadata extractors: CRW DEB EXIF OCW RPM XMP Email formats: EMail Outlook RFC822 Version control and related systems: Bugzilla Jira POM Subversion Other Web service frameworks: BPEL WSDL XBRL XBEL Data exchange formats: iCalendar LDIF vCalendar vCard Relational databases and related: D2RQ D2RMAP RDF Views Virtuoso VADs OpenLink license files Third party metadata extraction frameworks: Aperture Spotlight Miscellaneous and other related converters: MPEG-7/CS -> OWL Random XSD -> OWL * GRDDL (Gleaning Resource Descriptions from Dialects of Languages) accommodates a wide variety of dialects (see one listing ) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
  • 19. Information Extraction scones ( S ubject C oncept O r N amed E ntitie S ) is our IE tagger Information extraction is applied to input Web pages and unstructured text May be applied after structure extraction: (often, at minimum, defluffing ) Settable “window” for snippet (from # of bracketing terms to full document) Extraction is performed for both: Entities (per Wikipedia and enterprise dictionaries) Subject concepts (per UMBEL and domain ontologies) Presently in prototype
  • 20. (Named) Entities The places , events , people , objects , and specific things of the real world Literally millions of notable instances Each belongs to one or more subject concept (s) Currently, the predominate basis for linked data Public sources include Wikipedia and Freebase, others Can be readily mixed-and-matched with private entities
  • 21. Creating New Entity Dictionaries
  • 23. irON – i nstance r ecord and O bject N otation
  • 24. irON Dataset Authoring Framework Simple authoring and dataset creation irON includes an abstract notation and vocabulary for instance records Serializations available for: XML (irXML) JSON (irJSON) CSV/spreadsheets (commON) Notations for: Instance records Schema Datasets and metadata Linkages to other schema
  • 25. Three irON Serializations irXML irJSON commON
  • 28. structWSF Generally RESTful Web services middleware Uniform, distributed access point Provides the interoperability architecture Based on canonical RDF data model Dataset access orientation Standard tools and services: User permissions and access CRUD (create, read, update, delete) Browse Full-text, faceted search Import / export Many others
  • 29. RDF and Data Federation Model
  • 30. Advantages of a Canonical Model All tools can be driven from a single data format basis Single converters can link in other hubs of data forms ‘ Round-tripping’ thru the canonical form can bring consistency and cleanliness to inputted data RDF is well-suited as the canonical form: Structured data Semi-structured data Unstructured data (after IE) Simple-to-complex data structures Logic and inferencing Suitable to all input data formats Many serializations possible
  • 32. Flexible User Access Permissions
  • 33. Access, APIs and Endpoints The resulting linked data may be exposed as: APIs Web services SPARQL endpoints
  • 35. Ontologies Ontologies provide the basis for: Interoperating Reconciling semantics Multiples may be used at any time Both enterprise (internal) and external ontologies Best built incrementally, with participation Easily modified: OK to test and experiment
  • 36. Ontologies The structural relationships of concepts within a domain Generally class- (or set-) oriented Analogous to relational database schema, only with controlled vocabularies and exact semantics Sets the structure of how to organize the actual data (“instances”) in the domain Semantics and mapping techniques allow disparate ontologies to be inter-related Can inference or reason over the structure
  • 37. Migrating Structure to the Ontology Layer
  • 39. irON
  • 40. irON Record Vocabulary irON also provides the standard instance record vocabulary for all federated records Each record source has its own attributes But, irON provides common descriptors: Useful for interoperating Unique, Web-accessible identifiers Standard descriptions and labels Conventions for “driving” user interfaces and tools
  • 41. UMBEL UMBEL ( U pper M apping and B inding E xchange L ayer) 20,000 defined reference points in information space Means to assert what a given chunk of content is about Enable similar content to be aggregated Place content in context with other content Aggregation points for tying in instances and entities Derived and a subset of the Cyc knowledge base Vocabulary basis for domain-specific subject ontologies
  • 42. Notable Ontologies and Vocabularies
  • 44. Management/Federation Layer Management/Federation Layer handles: Ontology mapping, management Queries and retrievals All Web services Imports and exports Inferencing and logic Ontology creation and expansion Works off of many RDF datastores Has efficient, full-text indexing with faceting Interface to the system is structWSF Can plug into many options at the Applications Layer (only Drupal with conStruct SCS yet deployed)
  • 49. conStruct Capabilities Based on Drupal Single-click ( cloud ) deployment Theming User and group access and management Data display templates General content management system (CMS) Publishing RDF Open source
  • 51. Summary Incremental, low-risk approach to the semantic enterprise Maximum leverage and re-use of existing information assets Conversion and federation of all available data forms Excellent uses for: Business intelligence Knowledge management Master data management modernization Taxonomy modernization Enterprise content integration All baseline products are open source
  • 52. Contacts & Information Michael K. Bergman CEO 319.621.5225 [email_address] blog: www.mkbergman.com Steve Ardire Senior Advisor [email_address] Frédérick Giasson CTO [email_address] blog: fgiasson.com /blog Web Sites structureddynamics.com umbel.org umbel.structureddynamics.com (UMBEL Web services) citizen- dan.org (community indicator systems) openstructs.org (open source distros + documentation) constructscs.com (Drupal structured data system)
  • 53.  

Editor's Notes

  • #17: At present, though constantly increasing, Zitgist's existing conversion services recognizes nearly 100 various formats GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a W3C markup format for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT  GRDDL accomodates a wide variety of dialects (see one listing) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
  • #18: At present, though constantly increasing, Zitgist's existing conversion services recognizes nearly 100 various formats GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a W3C markup format for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT  GRDDL accomodates a wide variety of dialects (see one listing) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
  • #23: More here also, use the candidate properties content to get the extract to the SC context (??? more about the “aboutness”) contextual UMBEL metadata on the fly