SlideShare a Scribd company logo
Bringing parliamentary debates to the Semantic Web

Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1
 
1 Delft   University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb



DERIVE 2012
Boston, 12.11.2012.
Motivation




  Cross-media comparison:
• What choices do different media make in the coverage of people and topics while
  reporting on political events?

• Does the representation of topics and people change over time and how do the
  various media types differ?
Motivation




                                         Political events
Media


  Cross-media comparison:
• What choices do different media make in the coverage of people and topics while
  reporting on political events?

• Does the representation of topics and people change over time and how do the
  various media types differ?
Background: the
PoliMedia project

  • Funded by CLARIN-NL

  • May 2012 - May 2013

  • 3 phases :
     I. modeling phase: creating
        a semantic model (this
        presentation)
     II. data production phase:
         creating links between
         political events and media
     III.application phase:
        searching and navigating
        linked datasets
  • www.polimedia.nl
Research questions

• How to represent political events on the Semantic Web?
• How to represent links between media and political events on
  the Semantic Web?
Research questions

• How to represent political events on the Semantic Web?
• How to represent links between media and political events on
  the Semantic Web?
Political events data set

• Events: Dutch parliamentary debates

 Handelingen der Staten-General or Dutch Hansard


• Some provenance:
  1. Transcripts are made of the complete
     debates of the Dutch parliament.
  2. Published online by the government on
     http://www.statengeneraaldigitaal.nl/ (1818
     1995) and http://
     officielebekendmakingen.nl/ (from 1995)
  3. PoliticalMashup project has translated
     government pdf and txt files into XML, incl
     URI’s as identifiers, see http://
     politicalmashup.nl/
  4. We build on that.
Media data sets

• newspaper articles and radio bulletins

    • at the National Library of the Netherlands

    • Many, mostly regional news papers 1950-
      1995

    • Text + images of newspaper layout

• newscasts

    • at the Netherlands institute for Sound and
      Vision

    • evening news and current affairs
      programs

    • metadata in Dublin Core and CDMI format

    • enriched with thesaurus terms from the
      Gemeenschappelijke Thesaurus
      Audiovisuele Archieven (GTAA)
Semantic model: what do we need to represent? 1/2

• Important information for every parliamentary debate is:             Debate
    • When the debate was held                                        Metadata
    • What is being said in the debate (topics)
                                                                           Topic 1
    • Who is giving the speeches in the debate and in which
      role (persons)
                                                                     Speaker 1 / Content
        • Additional information about actors involved in the
          event (names of the politicians, their party, age, etc.)
                                                                     Speaker 2 / Content
    • Structure: Subparts of the debate have their own
      identifiers (part of the debate where only one speaker
      can be identified as actor)                                     Speaker 3 / Content
        • chronological order (the order in which the subparts
          where occurring inside the parliament debate,

    • Named entities apart from politicians (persons,                      Topic 2
      locations, etc.)
                                                                     Speaker 1 / Content
Semantic model: what do we need to represent? 2/2




                         • Various information about media
                           items linked to the debate

                         • Links between subparts of the
                           debate and news articles, radio
                           bulletins and television newscasts
URI’s

• PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech

• Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel

• debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d.
  198219830000846.2.11.12

• Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd:
  010069811:mpeg21:pdf
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model   W.R. van Hage, V. Malaisé, R.
                 Segers, L. Hollink and A.Th.
                 Schreiber. Design and use of
                 the Simple Event Model
                 (SEM)
Semantic model   W.R. van Hage, V. Malaisé, R.
                 Segers, L. Hollink and A.Th.
                 Schreiber. Design and use of
                 the Simple Event Model
                 (SEM)
Current work: finding links

• Queries: speaker name + named entities + topics (created using
  topic modeling methods) extracted from political events dataset
• used for retrieval of media articles




         TopicList   =
           NamedEntitiesVector   TopicWordSetVector   NamedEntitiesVector   TopicWordSetVector
               Speech                  Speech           PartOfDebate           PartOfDebate



           +
         Speaker X       =
            ActorFromSpeech                                                                      TimeFrame
Finally

  • SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard
    data will be available soon.

  • Feel free to use it!

  • Links to media + search/browse app are expected early next year.
Thank you for your
                  attention!




  Henri Beunders (EUR)         Damir Juric (TU Delft)
     Jaap Blom (NISV)          Max Kemman (EUR)
     Laura Hollink (VU)        Martijn Kleppe (EUR)
Geert-Jan Houben (TU Delft)    Johan Oomen (NISV)

More Related Content

PPTX
ICWE2013 - Discovering links between political debates and media
gjhouben
 
PPS
Susanne popp - E-researching in the History classroom
Multicultural Interdisciplinary Handbook Project
 
PDF
Social Media KPIs
Ali Shaheen
 
PDF
Connecting political data to media data
Laura Hollink
 
PDF
Connecting political data to media data
Laura Hollink
 
PPTX
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Tuukka Ylä-Anttila
 
PDF
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Danube University Krems, Centre for E-Governance
 
PPTX
Introduction to Research project PoliMedia
Martijn Kleppe
 
ICWE2013 - Discovering links between political debates and media
gjhouben
 
Susanne popp - E-researching in the History classroom
Multicultural Interdisciplinary Handbook Project
 
Social Media KPIs
Ali Shaheen
 
Connecting political data to media data
Laura Hollink
 
Connecting political data to media data
Laura Hollink
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Tuukka Ylä-Anttila
 
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Danube University Krems, Centre for E-Governance
 
Introduction to Research project PoliMedia
Martijn Kleppe
 

Similar to Bringing parliamentary debates to the Semantic Web (20)

PDF
Groningen nl pgroep
maartenmarx
 
PPTX
WeGov Analysis Tools to connect Policy Makers with Citizens Online
Timo Wandhoefer
 
PDF
Talk of Europe: Linked data of the European Parliament
Laura Hollink
 
PDF
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
Miriam Fernandez
 
PDF
Keynote Exploring and Exploiting Official Publications
maartenmarx
 
PPTX
Building the PoliMedia search system; data- and user-driven
MaxKemman
 
PDF
Cultural text mining workshop
Pim Huijnen
 
PPTX
Sense4us PACITA event presentation
SENSE4US project
 
PPTX
Ecpr general conference_presentation
University of Groningen (The Netherlands)
 
PDF
networks inparliament-ccct
maartenmarx
 
PPTX
A framework for real time semantic social media analysis
Zelia Blaga
 
PPT
Development cooperation: A bibliometric approach to examine knowledge and com...
Sarah Cummings
 
PDF
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Liliana Bounegru
 
PPT
Libby Hemphill, "Elected Officials and Social Media"
summersocialwebshop
 
PPTX
Elected Officials on Social Media for Webshop 2012
Illinois Institute of Technology
 
PPTX
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
PrattSILS
 
PPTX
Presentacion defensa marcelo_2018_v01
Marcelo Luis Barbosa dos Santos
 
PPT
A multifaceted study of online news diversity: issues and methods
smyrnaios
 
PDF
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Laura Hollink
 
PDF
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Olaf Janssen
 
Groningen nl pgroep
maartenmarx
 
WeGov Analysis Tools to connect Policy Makers with Citizens Online
Timo Wandhoefer
 
Talk of Europe: Linked data of the European Parliament
Laura Hollink
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
Miriam Fernandez
 
Keynote Exploring and Exploiting Official Publications
maartenmarx
 
Building the PoliMedia search system; data- and user-driven
MaxKemman
 
Cultural text mining workshop
Pim Huijnen
 
Sense4us PACITA event presentation
SENSE4US project
 
Ecpr general conference_presentation
University of Groningen (The Netherlands)
 
networks inparliament-ccct
maartenmarx
 
A framework for real time semantic social media analysis
Zelia Blaga
 
Development cooperation: A bibliometric approach to examine knowledge and com...
Sarah Cummings
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Liliana Bounegru
 
Libby Hemphill, "Elected Officials and Social Media"
summersocialwebshop
 
Elected Officials on Social Media for Webshop 2012
Illinois Institute of Technology
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
PrattSILS
 
Presentacion defensa marcelo_2018_v01
Marcelo Luis Barbosa dos Santos
 
A multifaceted study of online news diversity: issues and methods
smyrnaios
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Laura Hollink
 
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Olaf Janssen
 
Ad

More from Laura Hollink (7)

PDF
Creating and Analysing Linked Open Data for the EU Parliament
Laura Hollink
 
PDF
Enriching Linked Open Data with distributional semantics to study concept drift
Laura Hollink
 
PDF
Linked Open Data
Laura Hollink
 
PDF
Images in Online News: demo scenario
Laura Hollink
 
PDF
Presentation at the final meeting of the MuNCH project
Laura Hollink
 
PDF
Talk of Europe @ DHBenelux2015
Laura Hollink
 
PDF
WWW2013: Web Usage Mining with Semantic Analysis
Laura Hollink
 
Creating and Analysing Linked Open Data for the EU Parliament
Laura Hollink
 
Enriching Linked Open Data with distributional semantics to study concept drift
Laura Hollink
 
Linked Open Data
Laura Hollink
 
Images in Online News: demo scenario
Laura Hollink
 
Presentation at the final meeting of the MuNCH project
Laura Hollink
 
Talk of Europe @ DHBenelux2015
Laura Hollink
 
WWW2013: Web Usage Mining with Semantic Analysis
Laura Hollink
 
Ad

Bringing parliamentary debates to the Semantic Web

  • 1. Bringing parliamentary debates to the Semantic Web Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1   1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb DERIVE 2012 Boston, 12.11.2012.
  • 2. Motivation Cross-media comparison: • What choices do different media make in the coverage of people and topics while reporting on political events? • Does the representation of topics and people change over time and how do the various media types differ?
  • 3. Motivation Political events Media Cross-media comparison: • What choices do different media make in the coverage of people and topics while reporting on political events? • Does the representation of topics and people change over time and how do the various media types differ?
  • 4. Background: the PoliMedia project • Funded by CLARIN-NL • May 2012 - May 2013 • 3 phases : I. modeling phase: creating a semantic model (this presentation) II. data production phase: creating links between political events and media III.application phase: searching and navigating linked datasets • www.polimedia.nl
  • 5. Research questions • How to represent political events on the Semantic Web? • How to represent links between media and political events on the Semantic Web?
  • 6. Research questions • How to represent political events on the Semantic Web? • How to represent links between media and political events on the Semantic Web?
  • 7. Political events data set • Events: Dutch parliamentary debates Handelingen der Staten-General or Dutch Hansard • Some provenance: 1. Transcripts are made of the complete debates of the Dutch parliament. 2. Published online by the government on http://www.statengeneraaldigitaal.nl/ (1818 1995) and http:// officielebekendmakingen.nl/ (from 1995) 3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http:// politicalmashup.nl/ 4. We build on that.
  • 8. Media data sets • newspaper articles and radio bulletins • at the National Library of the Netherlands • Many, mostly regional news papers 1950- 1995 • Text + images of newspaper layout • newscasts • at the Netherlands institute for Sound and Vision • evening news and current affairs programs • metadata in Dublin Core and CDMI format • enriched with thesaurus terms from the Gemeenschappelijke Thesaurus Audiovisuele Archieven (GTAA)
  • 9. Semantic model: what do we need to represent? 1/2 • Important information for every parliamentary debate is: Debate • When the debate was held Metadata • What is being said in the debate (topics) Topic 1 • Who is giving the speeches in the debate and in which role (persons) Speaker 1 / Content • Additional information about actors involved in the event (names of the politicians, their party, age, etc.) Speaker 2 / Content • Structure: Subparts of the debate have their own identifiers (part of the debate where only one speaker can be identified as actor) Speaker 3 / Content • chronological order (the order in which the subparts where occurring inside the parliament debate, • Named entities apart from politicians (persons, Topic 2 locations, etc.) Speaker 1 / Content
  • 10. Semantic model: what do we need to represent? 2/2 • Various information about media items linked to the debate • Links between subparts of the debate and news articles, radio bulletins and television newscasts
  • 11. URI’s • PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech • Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel • debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d. 198219830000846.2.11.12 • Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd: 010069811:mpeg21:pdf
  • 17. Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
  • 18. Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
  • 19. Current work: finding links • Queries: speaker name + named entities + topics (created using topic modeling methods) extracted from political events dataset • used for retrieval of media articles TopicList = NamedEntitiesVector TopicWordSetVector NamedEntitiesVector TopicWordSetVector Speech Speech PartOfDebate PartOfDebate + Speaker X = ActorFromSpeech TimeFrame
  • 20. Finally • SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard data will be available soon. • Feel free to use it! • Links to media + search/browse app are expected early next year.
  • 21. Thank you for your attention! Henri Beunders (EUR) Damir Juric (TU Delft) Jaap Blom (NISV) Max Kemman (EUR) Laura Hollink (VU) Martijn Kleppe (EUR) Geert-Jan Houben (TU Delft) Johan Oomen (NISV)