SlideShare a Scribd company logo
Tabular Data on the Web
Intro to W3C CSV on the Web Specifications
Gregg Kellogg
gregg@greggkellogg.net
@gkellogg
1
Impact of Tabular Data
• Tabular Data represents a large amount of all
data published on the Web
• According to the Open Data Institute, the vast
majority of published open data is tabular
• “Over 90% of the data on data.gov.uk is
tabular data.”
• data.gov lists 158,631 datasets; largely in CSV
2
Sources of Tabular Data
• Easiest way to publish data
• Spreadsheet Dumps
• Database Dumps
• SPARQL results
3
CSV data is dumb
• It’s a simple text format, data has no inherent
meaning.
• Cells may be data-typed or have a regular
format: what does “08/09/2015” mean?
• Cells may be related to data in other tables/
columns: Foreign Keys
• Cells may be associated with different entities:
Join results
4
Web CSV
• 5-star Linked Data
• CSV URLs
• CSVs link to other CSVs
• CSVs link to other
Resources
• RDF and JSON
conversion
5
W3C CSV on the Web
• Working Group chartered to allow applications to provide higher
interoperability with working with CSV, or similar formats.
• Use Cases: http://www.w3.org/TR/csvw-ucr/
• Model for Tabular Data and Metadata on the Web: http://
www.w3.org/TR/tabular-data-model/
• Metadata Vocabulary for Tabular Data: http://www.w3.org/TR/tabular-
metadata/
• Generating JSON from Tabular Data on the Web: http://www.w3.org/
TR/csv2json/
• Generating RDF from Tabular Data on the Web: http://www.w3.org/
TR/csv2rdf/
6
Examples
7
countryCode latitude longitude name
AD 42.5 1.6 Andorra
AE 23.4 53.8 United Arab Emirates
AF 33.9 67.7 Afghanistan
countries.csv
countryRef year population
AF 1960 9,616,353
AF 1961 9,799,379
AF 1961 9,989,846
country_slice.csv
Model for Tabular Data
id
Table Group
id
Table
notes
transformations
about URL
cells
datatype
default
Column
lang
name
number
ordered
property URL
required
separator
table
text direction
titles
value URL
virtual
cells
number
primary key
titles
Row
referenced rows
source number
table
about URL
column
errors
ordered
Cell
property URL
row
string value
table
text direction
value
value URL
8
notes
foreign keys
other annotations
url
other annotations
tables
columns
rows
table direction
other annotations
rows
table
Mapping CSV to Model
• Parse CSV: RFC4180 + dialect metadata.
• delimiter, doubleQuote, headerRowCount,
lineTerminators, quoteChar, …
• Dialect Description comes from Metadata Document.
• Match Headers to Columns.
• Parse Cells using Column metadata/datatype.
• Abstract data model used for viewing, validation, and
conversions.
9
Metadata
• Finding Metadata from a CSV
• User-specified, Link Header, well-known
locations
• Matching Metadata to a CSV
• CSV must be compatible with metadata (titles/
names)
• Metadata must reference CSV URL
10
foreignKeys
columns
@id
@type
Schema
primaryKey
rowTitles
11
url
targetFormat
scriptFormat
titles
source
@id
@type
Transformation
Definition
name
titles
required
suppressOutput
virtual
@id
@type
Column Description
columnReference
reference
Foreign Key
Definition
resource
schemaReference
columnReference
Foreign Key
Reference
array property
link property
URI template property
column reference property
object property
natural language property
atomic property
Legend:
reference to an array of values of a specific category
reference to a value of a specific category
@language
@base
Top-Level
Properties
tables
transformations
tableDirection
tableSchema
dialect
@context
@id
Table Group
notes
@type
decimalChar
groupChar
pattern
Number Format
url
transformations
tableDirection
tableSchema
dialect
notes
Table
@context
@id
@type
suppressOutput
null
lang
textDirection
separator
ordered
default
datatype
Inherited Properties
aboutUrl
propertyUrl
valueUrl
required
base
format
length
minLength
maxLength
minimum
maximum
Datatype
Description
minInclusive
maxInclusive
minExclusive
maxExclusive
@id
@type
encoding
lineTerminators
quoteChar
doubleQuote
skipRows
commentPrefix
header
Dialect Description
headerRowCount
skipBlankRows
skipInitialSpace
trim
@id
delimiter
skipColumns
Schema
• Column Descriptions
• Names/Titles
• Datatype
• Primary Keys
• Foreign Key Relationships
12
Embedded Metadata
• Generally Column Titles.
• Formats may define CSV conventions for
embedded metadata.
• Principally used to determine metadata
compatibility.
• Also serves as default metadata if no file
located.
13
Datatypes
• Basic XSD datatypes
• maximum/minimum facets
• minLength/maxLength facets
• format/pattern
• RegExp, Boolean, UAX35 date/time picture
string, UAX35 number picture string
14
Other Features
• Split cells into multiple items
• Validate Primary Keys and Foreign Key
references (single and multiple columns)
• Define URL properties for columns
• Multiple subjects per column (may be URLs)
• Values as URLs
15
Conversions: JSON
countryCode latitude longitude name
AD 42.5 1.6 Andorra
AE 23.4 53.8 United Arab
Emirates
AF 33.9 67.7 Afghanistan
countries.csv
16
{
"tables": [{
"url": "http://example.org/countries.csv",
"row": [{
"url": "http://example.org/countries.csv#row=2",
"rownum": 1,
"describes": [{
"countryCoe": "AD",
"latitude": "42.5",
"longitude": "1.6",
"name": "Andorra"
}]
}, {
"url": "http://example.org/countries.csv#row=3",
"rownum": 2,
"describes": [{
"countryCode": "AE",
"latitude": "23.4",
"longitude": "53.8",
"name": "United Arab Emirates"
}]
}, {
"url": "http://example.org/countries.csv#row=4",
"rownum": 3,
"describes": [{
"countryCode": "AF",
"latitude": "33.9",
"longitude": "67.7",
"name": "Afghanistan"
}]
}]
}]
}
countries.json
countries-standard.json
Conversions: JSON (min)
countryCode latitude longitude name
AD 42.5 1.6 Andorra
AE 23.4 53.8 United Arab
Emirates
AF 33.9 67.7 Afghanistan
17
[{
"countryCode": "AD",
"latitude": "42.5",
"longitude": "1.6",
"name": "Andorra"
}, {
"countryCode": "AE",
"latitude": "23.4",
"longitude": "53.8",
"name": "United Arab Emirates"
}, {
"countryCode": "AF",
"latitude": "33.9",
"longitude": "67.7",
"name": "Afghanistan"
}]
countries.csv
countries.json
countries-minimal.json
Conversions: RDF
countryCode latitude longitude name
AD 42.5 1.6 Andorra
AE 23.4 53.8 United Arab
Emirates
AF 33.9 67.7 Afghanistan
18
@base <http://example.org/countries.csv> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
_:tg a csvw:TableGroup ;
csvw:table [ a csvw:Table ;
csvw:url <http://example.org/countries.csv> ;
csvw:row [ a csvw:Row ;
csvw:rownum "1"^^xsd:integer ;
csvw:url <#row=2> ;
csvw:describes _:t1r1
], [ a csvw:Row ;
csvw:rownum "2"^^xsd:integer ;
csvw:url <#row=3> ;
csvw:describes _:t1r2
], [ a csvw:Row ;
csvw:rownum "3"^^xsd:integer ;
csvw:url <#row=4> ;
csvw:describes _:t1r3
]
] .
_:t1r1
<#countryCode> "AD" ;
<#latitude> "42.5" ;
<#longitude> "1.6" ;
<#name> "Andorra" .
_:t1r2
<#countryCode> "AE" ;
<#latitude> "23.4" ;
<#longitude> "53.8" ;
<#name> "United Arab Emirates" .
_:t1r3
<#countryCode> "AF" ;
<#latitude> "33.9" ;
<#longitude> "67.7" ;
<#name> "Afghanistan" .
countries.csv
countries.json
countries-standard.ttl
Conversions: RDF (min)
countryCode latitude longitude name
AD 42.5 1.6 Andorra
AE 23.4 53.8 United Arab
Emirates
AF 33.9 67.7 Afghanistan
19
@base <http://example.org/countries.csv> .
_:t1r1
<#countryCode> "AD" ;
<#latitude> "42.5" ;
<#longitude> "1.6" ;
<#name> "Andorra" .
_:t1r2
<#countryCode> "AE" ;
<#latitude> "23.4" ;
<#longitude> "53.8" ;
<#name> "United Arab Emirates" .
_:t1r3
<#countryCode> "AF" ;
<#latitude> "33.9" ;
<#longitude> "67.7" ;
<#name> "Afghanistan" .
countries.csv
countries.json
countries-minimal.ttl
Other examples
• Rich Annotations: JSON RDF
• Virtual Columns/Multiple Subjects: JSON RDF
• For more see Specifications and Test Suite
20
Tools
• CSVLint
• CKAN – open source data portal platform
• Socrata – cloud-based open data
• Google Fusion Tables – data visualization
• Ruby rdf-tabular – CSVW reference implementation
• RDF Distiller
• Structured Data Linter
21
Next Steps
• At-Risk – /.well-known/csvm
• More datatype formats
• Metadata in HTML (embedded JSON-LD)
• Tabular Data in HTML
• More implementations!
• Timeline
• Candidate Recommendation – July 2015
• Proposed Recommendation – Oct 2015
• W3C Recommendation – Dec 2015
22
More Information
GitHub
w3c
Gregg Kellogg
@gkellogg
gregg@greggkellogg.net
http://greggkellogg.net/
http://www.slideshare.net/gkellogg1/tabular-data-on-the-web
distiller
linterSlideshare

More Related Content

PPTX
KAHALAGAHAN NG PAG-AARAL NG KONTEMPORARYONG ISYU.pptx
JadeMagos1
 
PDF
Modyul 9 DLP.pdf
Carmelle Dawn Vasay
 
PPT
Talambuhay ni dr. jose rizal
Enzo Gatchalian
 
DOCX
PAGKAMAMANAYAN (LIGAL AT LUMAWAK NA PANANAW)
joril23
 
PPTX
filipino 10 - el filibusterismo kabanata 5.pptx
glainAE
 
PPTX
Aralin 3 aral pan. 10
liezel andilab
 
PPTX
liham paanyaya
Sophia Ann Gorospe
 
PPT
P agkamamamayang pilipino
Sherwin Dulay
 
KAHALAGAHAN NG PAG-AARAL NG KONTEMPORARYONG ISYU.pptx
JadeMagos1
 
Modyul 9 DLP.pdf
Carmelle Dawn Vasay
 
Talambuhay ni dr. jose rizal
Enzo Gatchalian
 
PAGKAMAMANAYAN (LIGAL AT LUMAWAK NA PANANAW)
joril23
 
filipino 10 - el filibusterismo kabanata 5.pptx
glainAE
 
Aralin 3 aral pan. 10
liezel andilab
 
liham paanyaya
Sophia Ann Gorospe
 
P agkamamamayang pilipino
Sherwin Dulay
 

What's hot (20)

PPTX
Filipino 10 - Tuwiran at Di-Tuwiran na Pahayag
Juan Miguel Palero
 
PPTX
Dilma Rousseff
GhieSamaniego
 
PPTX
Ang Aking Pag-ibig
winterordinado
 
PPTX
Salik na Nagiging Dahilan ng Pagkakaroon ng Diskriminasyon
Eddie San Peñalosa
 
PDF
Pabula
Jeremiah Castro
 
PPTX
FILIPINO 10 QUARTER 2 LESSON 2 " DAGLI " pptx.
marleyabelgas
 
PPTX
Pagsulat ng Lathalain (Campus Journalism)
Jenny Rose Basa
 
PPTX
Simbolismo.pptx
RioGDavid
 
PPTX
epiko.pptx
ChristyRaola1
 
PPTX
SI PINKAW
Wimabelle Banawa
 
PPTX
Espiritwalidad at Pananamplataya-EsP 10.pptx
VidaDomingo
 
PPTX
QUARTER 3 - LESSON 2- Gender roles sa Pilipinas.pptx
mark malaya
 
PPTX
Mga Hakbang upang Mapaunlad ang mga Kasanayan sa Pag-aaral
Eddie San Peñalosa
 
PPTX
G10-Yogyakarta.pptx
JenniferApollo
 
PDF
ESP6-Q3-MODYUL1 (1).pdf
NormalynCayanan2
 
PPTX
TULA NG PILIPINAS_Elehiya Para kay Ram [Autosaved].pptx
Mayumi64
 
DOCX
Halimbawa ng mga Lathalain
JustinJiYeon
 
DOCX
declamation piece
Carie Justine Estrellado
 
PPTX
URI-NG-KARAPATANG-PANTAO.pptx
AntonetteRici
 
Filipino 10 - Tuwiran at Di-Tuwiran na Pahayag
Juan Miguel Palero
 
Dilma Rousseff
GhieSamaniego
 
Ang Aking Pag-ibig
winterordinado
 
Salik na Nagiging Dahilan ng Pagkakaroon ng Diskriminasyon
Eddie San Peñalosa
 
FILIPINO 10 QUARTER 2 LESSON 2 " DAGLI " pptx.
marleyabelgas
 
Pagsulat ng Lathalain (Campus Journalism)
Jenny Rose Basa
 
Simbolismo.pptx
RioGDavid
 
epiko.pptx
ChristyRaola1
 
SI PINKAW
Wimabelle Banawa
 
Espiritwalidad at Pananamplataya-EsP 10.pptx
VidaDomingo
 
QUARTER 3 - LESSON 2- Gender roles sa Pilipinas.pptx
mark malaya
 
Mga Hakbang upang Mapaunlad ang mga Kasanayan sa Pag-aaral
Eddie San Peñalosa
 
G10-Yogyakarta.pptx
JenniferApollo
 
ESP6-Q3-MODYUL1 (1).pdf
NormalynCayanan2
 
TULA NG PILIPINAS_Elehiya Para kay Ram [Autosaved].pptx
Mayumi64
 
Halimbawa ng mga Lathalain
JustinJiYeon
 
declamation piece
Carie Justine Estrellado
 
URI-NG-KARAPATANG-PANTAO.pptx
AntonetteRici
 
Ad

Viewers also liked (20)

PPT
RDFS In A Nutshell V1
Fabien Gandon
 
PPT
Kxu stat-anderson-ch02
Alex Robianes Hernandez
 
PPT
V.i.new
Rita May Tagalog
 
PPT
Tutorials--Logarithmic Functions in Tabular and Graph Form
Media4math
 
PPTX
Approaches to Develop Curriculum for Children Visual Impairment
Rajnish Kumar Arya
 
PPTX
V.i. ppt copy
maricristalagtag
 
PDF
CRL: A Rule Language for Table Analysis and Interpretation
Alexey Shigarov
 
PPTX
Visual impairment
Cachelle
 
PPTX
Visual Impairment Information and Teaching Strategies
Mauro Garcia
 
PDF
Ontologies pour le Web 2.0
Alexandre Passant
 
PPTX
Ses 4 tabulation
metnashikiom2011-13
 
PDF
Construction ontologies
Aggoumazax Moh
 
PPT
Ontology In A Nutshell (version 2)
Fabien Gandon
 
PPT
Visual Impairment
aniwilfi
 
PPTX
visual impairment
wajiha b
 
PPT
visual impairment
Priyanka Chaurasia
 
PPTX
Visual Impairments
Petri Myllys
 
PPTX
Ncf 2005
Vijay Grover
 
PPTX
Frequency Distributions and Graphs
monritche
 
PPTX
Policies and Guidelines of Special Education in the Philippines
maria martha manette madrid
 
RDFS In A Nutshell V1
Fabien Gandon
 
Kxu stat-anderson-ch02
Alex Robianes Hernandez
 
Tutorials--Logarithmic Functions in Tabular and Graph Form
Media4math
 
Approaches to Develop Curriculum for Children Visual Impairment
Rajnish Kumar Arya
 
V.i. ppt copy
maricristalagtag
 
CRL: A Rule Language for Table Analysis and Interpretation
Alexey Shigarov
 
Visual impairment
Cachelle
 
Visual Impairment Information and Teaching Strategies
Mauro Garcia
 
Ontologies pour le Web 2.0
Alexandre Passant
 
Ses 4 tabulation
metnashikiom2011-13
 
Construction ontologies
Aggoumazax Moh
 
Ontology In A Nutshell (version 2)
Fabien Gandon
 
Visual Impairment
aniwilfi
 
visual impairment
wajiha b
 
visual impairment
Priyanka Chaurasia
 
Visual Impairments
Petri Myllys
 
Ncf 2005
Vijay Grover
 
Frequency Distributions and Graphs
monritche
 
Policies and Guidelines of Special Education in the Philippines
maria martha manette madrid
 
Ad

Similar to Tabular Data on the Web (20)

PPT
Integrating Government Data New
guest4543bb
 
PDF
Automatically converting tabular data to
IJwest
 
PPTX
ACS DataMart_ppt
Jeremy Searls
 
PPTX
ACS DataMart_ppt
Jeremy Searls
 
PPTX
Session 03 acquiring data
Sara-Jayne Terp
 
PPTX
Session 03 acquiring data
bodaceacat
 
PDF
Apache Spark's Built-in File Sources in Depth
Databricks
 
PDF
A Map of the PyData Stack
Peadar Coyle
 
PPTX
Linked Open Data in Romania
Vlad Posea
 
PPTX
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
PPTX
Bringing Data Analytics to the Edge
Ton Machielsen
 
PDF
Computational Social Science, Lecture 09: Data Wrangling
jakehofman
 
ODP
Data Integration And Visualization
Ivan Ermilov
 
PDF
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Mark Matienzo
 
PPTX
Data Culture / Culture Data
Barry Norton
 
PDF
Data science at the command line
Sharat Chikkerur
 
DOCX
Data formats
Julie Binder Maitra
 
PDF
Mon norton tut_publishing01
eswcsummerschool
 
PPTX
Integrating Heterogeneous Data Sources in the Web of Data
Franck Michel
 
Integrating Government Data New
guest4543bb
 
Automatically converting tabular data to
IJwest
 
ACS DataMart_ppt
Jeremy Searls
 
ACS DataMart_ppt
Jeremy Searls
 
Session 03 acquiring data
Sara-Jayne Terp
 
Session 03 acquiring data
bodaceacat
 
Apache Spark's Built-in File Sources in Depth
Databricks
 
A Map of the PyData Stack
Peadar Coyle
 
Linked Open Data in Romania
Vlad Posea
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Bringing Data Analytics to the Edge
Ton Machielsen
 
Computational Social Science, Lecture 09: Data Wrangling
jakehofman
 
Data Integration And Visualization
Ivan Ermilov
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Mark Matienzo
 
Data Culture / Culture Data
Barry Norton
 
Data science at the command line
Sharat Chikkerur
 
Data formats
Julie Binder Maitra
 
Mon norton tut_publishing01
eswcsummerschool
 
Integrating Heterogeneous Data Sources in the Web of Data
Franck Michel
 

More from Gregg Kellogg (7)

PPTX
JSON-LD update DC 2017
Gregg Kellogg
 
PDF
JSON-LD Update
Gregg Kellogg
 
PDF
JSON-LD: JSON for the Social Web
Gregg Kellogg
 
KEY
JSON-LD and MongoDB
Gregg Kellogg
 
PDF
JSON-LD: Linked Data for Web Apps
Gregg Kellogg
 
KEY
JSON-LD: JSON for Linked Data
Gregg Kellogg
 
KEY
Ruby semweb 2011-12-06
Gregg Kellogg
 
JSON-LD update DC 2017
Gregg Kellogg
 
JSON-LD Update
Gregg Kellogg
 
JSON-LD: JSON for the Social Web
Gregg Kellogg
 
JSON-LD and MongoDB
Gregg Kellogg
 
JSON-LD: Linked Data for Web Apps
Gregg Kellogg
 
JSON-LD: JSON for Linked Data
Gregg Kellogg
 
Ruby semweb 2011-12-06
Gregg Kellogg
 

Recently uploaded (20)

PDF
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PDF
LB# 820-1889_051-7370_C000.schematic.pdf
matheusalbuquerqueco3
 
PDF
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PPT
Transformaciones de las funciones elementales.ppt
rirosel211
 
PPT
Introduction to dns domain name syst.ppt
MUHAMMADKAVISHSHABAN
 
PPTX
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
renavieramopiquero
 
PPTX
How tech helps people in the modern era.
upadhyayaryan154
 
PDF
Cybersecurity Awareness Presentation ppt.
banodhaharshita
 
PPTX
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
PPTX
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
PPTX
Black Yellow Modern Minimalist Elegant Presentation.pptx
nothisispatrickduhh
 
PDF
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
PPT
1965 INDO PAK WAR which Pak will never forget.ppt
sanjaychief112
 
PPTX
Microsoft PowerPoint Student PPT slides.pptx
Garleys Putin
 
PDF
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
PPTX
dns domain name system history work.pptx
MUHAMMADKAVISHSHABAN
 
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
APNIC
 
PPTX
The Monk and the Sadhurr and the story of how
BeshoyGirgis2
 
PDF
Slides: PDF Eco Economic Epochs for World Game (s) pdf
Steven McGee
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
LB# 820-1889_051-7370_C000.schematic.pdf
matheusalbuquerqueco3
 
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
Transformaciones de las funciones elementales.ppt
rirosel211
 
Introduction to dns domain name syst.ppt
MUHAMMADKAVISHSHABAN
 
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
renavieramopiquero
 
How tech helps people in the modern era.
upadhyayaryan154
 
Cybersecurity Awareness Presentation ppt.
banodhaharshita
 
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
Perkembangan Perangkat jaringan komputer dan telekomunikasi 3.pptx
Prayudha3
 
Black Yellow Modern Minimalist Elegant Presentation.pptx
nothisispatrickduhh
 
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
1965 INDO PAK WAR which Pak will never forget.ppt
sanjaychief112
 
Microsoft PowerPoint Student PPT slides.pptx
Garleys Putin
 
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
dns domain name system history work.pptx
MUHAMMADKAVISHSHABAN
 
KIPER4D situs Exclusive Game dari server Star Gaming Asia
hokimamad0
 
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
APNIC
 
The Monk and the Sadhurr and the story of how
BeshoyGirgis2
 
Slides: PDF Eco Economic Epochs for World Game (s) pdf
Steven McGee
 

Tabular Data on the Web

  • 1. Tabular Data on the Web Intro to W3C CSV on the Web Specifications Gregg Kellogg [email protected] @gkellogg 1
  • 2. Impact of Tabular Data • Tabular Data represents a large amount of all data published on the Web • According to the Open Data Institute, the vast majority of published open data is tabular • “Over 90% of the data on data.gov.uk is tabular data.” • data.gov lists 158,631 datasets; largely in CSV 2
  • 3. Sources of Tabular Data • Easiest way to publish data • Spreadsheet Dumps • Database Dumps • SPARQL results 3
  • 4. CSV data is dumb • It’s a simple text format, data has no inherent meaning. • Cells may be data-typed or have a regular format: what does “08/09/2015” mean? • Cells may be related to data in other tables/ columns: Foreign Keys • Cells may be associated with different entities: Join results 4
  • 5. Web CSV • 5-star Linked Data • CSV URLs • CSVs link to other CSVs • CSVs link to other Resources • RDF and JSON conversion 5
  • 6. W3C CSV on the Web • Working Group chartered to allow applications to provide higher interoperability with working with CSV, or similar formats. • Use Cases: http://www.w3.org/TR/csvw-ucr/ • Model for Tabular Data and Metadata on the Web: http:// www.w3.org/TR/tabular-data-model/ • Metadata Vocabulary for Tabular Data: http://www.w3.org/TR/tabular- metadata/ • Generating JSON from Tabular Data on the Web: http://www.w3.org/ TR/csv2json/ • Generating RDF from Tabular Data on the Web: http://www.w3.org/ TR/csv2rdf/ 6
  • 7. Examples 7 countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan countries.csv countryRef year population AF 1960 9,616,353 AF 1961 9,799,379 AF 1961 9,989,846 country_slice.csv
  • 8. Model for Tabular Data id Table Group id Table notes transformations about URL cells datatype default Column lang name number ordered property URL required separator table text direction titles value URL virtual cells number primary key titles Row referenced rows source number table about URL column errors ordered Cell property URL row string value table text direction value value URL 8 notes foreign keys other annotations url other annotations tables columns rows table direction other annotations rows table
  • 9. Mapping CSV to Model • Parse CSV: RFC4180 + dialect metadata. • delimiter, doubleQuote, headerRowCount, lineTerminators, quoteChar, … • Dialect Description comes from Metadata Document. • Match Headers to Columns. • Parse Cells using Column metadata/datatype. • Abstract data model used for viewing, validation, and conversions. 9
  • 10. Metadata • Finding Metadata from a CSV • User-specified, Link Header, well-known locations • Matching Metadata to a CSV • CSV must be compatible with metadata (titles/ names) • Metadata must reference CSV URL 10
  • 11. foreignKeys columns @id @type Schema primaryKey rowTitles 11 url targetFormat scriptFormat titles source @id @type Transformation Definition name titles required suppressOutput virtual @id @type Column Description columnReference reference Foreign Key Definition resource schemaReference columnReference Foreign Key Reference array property link property URI template property column reference property object property natural language property atomic property Legend: reference to an array of values of a specific category reference to a value of a specific category @language @base Top-Level Properties tables transformations tableDirection tableSchema dialect @context @id Table Group notes @type decimalChar groupChar pattern Number Format url transformations tableDirection tableSchema dialect notes Table @context @id @type suppressOutput null lang textDirection separator ordered default datatype Inherited Properties aboutUrl propertyUrl valueUrl required base format length minLength maxLength minimum maximum Datatype Description minInclusive maxInclusive minExclusive maxExclusive @id @type encoding lineTerminators quoteChar doubleQuote skipRows commentPrefix header Dialect Description headerRowCount skipBlankRows skipInitialSpace trim @id delimiter skipColumns
  • 12. Schema • Column Descriptions • Names/Titles • Datatype • Primary Keys • Foreign Key Relationships 12
  • 13. Embedded Metadata • Generally Column Titles. • Formats may define CSV conventions for embedded metadata. • Principally used to determine metadata compatibility. • Also serves as default metadata if no file located. 13
  • 14. Datatypes • Basic XSD datatypes • maximum/minimum facets • minLength/maxLength facets • format/pattern • RegExp, Boolean, UAX35 date/time picture string, UAX35 number picture string 14
  • 15. Other Features • Split cells into multiple items • Validate Primary Keys and Foreign Key references (single and multiple columns) • Define URL properties for columns • Multiple subjects per column (may be URLs) • Values as URLs 15
  • 16. Conversions: JSON countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan countries.csv 16 { "tables": [{ "url": "http://example.org/countries.csv", "row": [{ "url": "http://example.org/countries.csv#row=2", "rownum": 1, "describes": [{ "countryCoe": "AD", "latitude": "42.5", "longitude": "1.6", "name": "Andorra" }] }, { "url": "http://example.org/countries.csv#row=3", "rownum": 2, "describes": [{ "countryCode": "AE", "latitude": "23.4", "longitude": "53.8", "name": "United Arab Emirates" }] }, { "url": "http://example.org/countries.csv#row=4", "rownum": 3, "describes": [{ "countryCode": "AF", "latitude": "33.9", "longitude": "67.7", "name": "Afghanistan" }] }] }] } countries.json countries-standard.json
  • 17. Conversions: JSON (min) countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan 17 [{ "countryCode": "AD", "latitude": "42.5", "longitude": "1.6", "name": "Andorra" }, { "countryCode": "AE", "latitude": "23.4", "longitude": "53.8", "name": "United Arab Emirates" }, { "countryCode": "AF", "latitude": "33.9", "longitude": "67.7", "name": "Afghanistan" }] countries.csv countries.json countries-minimal.json
  • 18. Conversions: RDF countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan 18 @base <http://example.org/countries.csv> . @prefix csvw: <http://www.w3.org/ns/csvw#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . _:tg a csvw:TableGroup ; csvw:table [ a csvw:Table ; csvw:url <http://example.org/countries.csv> ; csvw:row [ a csvw:Row ; csvw:rownum "1"^^xsd:integer ; csvw:url <#row=2> ; csvw:describes _:t1r1 ], [ a csvw:Row ; csvw:rownum "2"^^xsd:integer ; csvw:url <#row=3> ; csvw:describes _:t1r2 ], [ a csvw:Row ; csvw:rownum "3"^^xsd:integer ; csvw:url <#row=4> ; csvw:describes _:t1r3 ] ] . _:t1r1 <#countryCode> "AD" ; <#latitude> "42.5" ; <#longitude> "1.6" ; <#name> "Andorra" . _:t1r2 <#countryCode> "AE" ; <#latitude> "23.4" ; <#longitude> "53.8" ; <#name> "United Arab Emirates" . _:t1r3 <#countryCode> "AF" ; <#latitude> "33.9" ; <#longitude> "67.7" ; <#name> "Afghanistan" . countries.csv countries.json countries-standard.ttl
  • 19. Conversions: RDF (min) countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan 19 @base <http://example.org/countries.csv> . _:t1r1 <#countryCode> "AD" ; <#latitude> "42.5" ; <#longitude> "1.6" ; <#name> "Andorra" . _:t1r2 <#countryCode> "AE" ; <#latitude> "23.4" ; <#longitude> "53.8" ; <#name> "United Arab Emirates" . _:t1r3 <#countryCode> "AF" ; <#latitude> "33.9" ; <#longitude> "67.7" ; <#name> "Afghanistan" . countries.csv countries.json countries-minimal.ttl
  • 20. Other examples • Rich Annotations: JSON RDF • Virtual Columns/Multiple Subjects: JSON RDF • For more see Specifications and Test Suite 20
  • 21. Tools • CSVLint • CKAN – open source data portal platform • Socrata – cloud-based open data • Google Fusion Tables – data visualization • Ruby rdf-tabular – CSVW reference implementation • RDF Distiller • Structured Data Linter 21
  • 22. Next Steps • At-Risk – /.well-known/csvm • More datatype formats • Metadata in HTML (embedded JSON-LD) • Tabular Data in HTML • More implementations! • Timeline • Candidate Recommendation – July 2015 • Proposed Recommendation – Oct 2015 • W3C Recommendation – Dec 2015 22