Autor Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
Expression of Political Opinions
in Press
Costin-Gabriel CHIRU, Tudor Dimcica, Stere Caciandone
costin.chiru@cs.pub.ro
Introduction
• What? Application designed to analyze news
articles from Romanian mass media and extract
opinions about political entities
• Why? study media polarization around important
political events – elections
• How? Using Machine Learning (ML) techniques for
identifying and classifying opinions about the
political entities
• Results: reports and charts - used either for
studying political polarization, or to identify
partisan media
30.05.2017 CSCS21 2
Context
• Over 50% of the world population connected to internet 
internet has one of the biggest exposures for information 
might replace the traditional media in the future
• Digitization of the media  people express their thoughts
and feelings on the web  producers of media content
•  Web users have immediate access to rich sources of
information on any desired subject, both biased and
unbiased (especially access to news information)
• Past: newspapers were used to polarize large audiences
• Present: migration of the audience into the online 
newspapers moving their content on the internet
•  Need and opportunity of tools for processing and
extracting specific information to study how it affects and
polarizes the audiences
30.05.2017 CSCS21 3
Analysis Steps
• Extract information from given websites and
save it into a database – using crawlers
• Search for the entities of interest (input by the
user) in the articles and create article-entity
associations
• Classify the opinions from the data in
positive / negative using ML algorithms
• Generate charts to show the computed
polarities
30.05.2017 CSCS21 4
Application Architecture
• Three main modules - data acquisition, data
analysis and the graphical user interface (GUI)
• Data acquisition
– Extract only the political news from different
newspapers’ website
– Using a web crawler
– Challenging because of the structural differences
between different news websites
– Some websites had a very chaotical structure 
couldn’t be crawled automatically  eliminated
– Kept for analysis: the title, the full content, the
publishing date and the author.
30.05.2017 CSCS21 5
Data Analysis
• Using 2 opinion classifiers from the scikit-learn
library (Naive-Bayes & Support Vector Machines)
• Challenges in choosing the granularity of the data
(the level of the text analysis and classification)
– Involved the detection of entities for which opinions
were extracted
• Search only specific entities that are chosen by the user
• Could be automatized using a Romanian NER  requires a lot
of time  should be done offline
– Data normalization – transform the words in numbers
• Bag of words + term frequency
– Opinions detection - entity level
• An article is labeled with opinions for each entity associated
with it
30.05.2017 CSCS21 6
Graphical User
Interface (GUI)
• Main window -
implements the
functionality for classifying
opinions and plotting
visual statistics about the
classified opinions
• Article view - displays
relevant data about an
article and implements
manual classification
functionality
30.05.2017 CSCS21 7
30.05.2017 CSCS21 8
Evaluation – Gold Standard
• 50 random documents extracted from the
dataset manually labeled by two students
• Inter-rater agreement computed using the
Cohen's kappa coefficient: 0.6019 (upper
bound of the moderate agreement interval)
30.05.2017 CSCS21 9
Evaluation – Accuracy
• Use gold standard to evaluate the performances
of the algorithms used for classification: Naive-
Bayes with multinomial distribution, Naive-Bayes
with Bernoulli distribution, Support Vector
Machines using Stochastic Gradient Descent
• Algorithms chosen for good performance in
categorizing texts
30.05.2017 CSCS21 10
Algorithms Students Student A Student B Average
Naive-Bayes with multinomial distribution 66% 62% 64%
Naive-Bayes with Bernoulli distribution 72% 64% 68%
Support Vector Machines 80% 76% 78%
Case Study
• 2014 Romanian Presidential Elections
• Study focused on the two candidates that
reached the second round: Victor Ponta and
Klaus Iohannis
• The source for opinion extraction was the
news website Hotnews
• Analysis before and after the elections
30.05.2017 CSCS21 11
Appearances in Press Before
Announcing the Candidacy
• Klaus Iohannis
– member of the National
Liberal Party
– was the mayor of Sibiu
from June 2000 until he
was voted and invested as
the president of Romania
on December 2nd, 2014
30.05.2017 CSCS21 1
• Victor Ponta
– jurist, former prosecutor, and a well-
established political figure
– former president of the Social
Democratic Party
– former Prime Minister of Romania
– member of the Parliament since 2004
– Minister for Relations with
Parliament between 2008 and 2009
– Prime Minister from May 2012
Opinions Extracted From Hotnews
Articles
• 2009 -
2013
30.05.2017 CSCS21 1
• Jan. 2014
– Sep.
2014
Appearances During Electoral
Campaign and Elections
• Appearances Oct. 2014
30.05.2017 CSCS21 14
• Appearances Nov. 2014
• Opinions Oct. 2014
• Opinions Nov. 2014
Results of Elections
• First round took place on November, 2nd
: Victor
Ponta won 40.44%; Klaus Iohannis 2nd
30.37%
• The second round took place on November 16th
:
Klaus Iohannis won with 54.43%, although the
opinion polls made with traditional methods
presented Victor Ponta as the winner of the
election by a high margin
• Klaus Iohannis became the politician with the
most followers on Facebook in the entire Europe
and he enjoyed a high degree of positive opinions
in the online media.
30.05.2017 CSCS21 15
Appearances After Elections
30.05.2017 CSCS21 16
• Post-Election Appearances • Post-Election Opinions
Conclusions (1)
• Tool for sentiment analysis / opinion mining for
Romanian online media
• The user may add / remove entities from the
analysis
• The user can manually label the opinion of
articles towards entities or he/she can use the
automatic labeling using ML algorithms
• The generated charts offered interesting findings
that support the results of the analyzed
presidential elections
• Using this tool complementary with traditional
polls can create a very good picture of how
opinions are polarized around political entities
30.05.2017 CSCS21 17
Conclusions (2)
• Main limitation – The use of only this tool isn't enough,
as the gap between the popularity of the two
candidates was much higher than the difference of
votes between them in the second round
• Biggest challenge - find a suitable way to analyze the
available data
• Biggest advantage - capability of processing large
quantities of data and extracting unbiased opinions
• Possible improvements: crawlers for other
newspapers, publish it online, export the charts in
different formats, use different weights for articles
based on popularity / visibility of the source or the
author, extend the range of analyzed sentiments
30.05.2017 CSCS21 18
Questions
30.05.2017 CSCS21 19
Thank you very much!

More Related Content

PDF
Passive expert - sourcing, for policy making in the EU
PDF
Big Data presentation for Statistics Canada
PPTX
Big data experiments
PPTX
IAOS 2018 - New era for NSOs - Leader in data governance, M. Mägi
PPTX
Snaa2015 (1)
PPT
Public Safety Mashups to Support Policy Makers || Choennie
PPTX
9th triplehelix: Web visibility on political innovation system
PPTX
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Passive expert - sourcing, for policy making in the EU
Big Data presentation for Statistics Canada
Big data experiments
IAOS 2018 - New era for NSOs - Leader in data governance, M. Mägi
Snaa2015 (1)
Public Safety Mashups to Support Policy Makers || Choennie
9th triplehelix: Web visibility on political innovation system
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...

What's hot (6)

PPT
Gabriel Rissola: "Measuring the impact of eInclusion actors"
PPTX
EGOV / ePart 2015 - Policy Compass Workshop Presentation
PDF
Quality Approaches to Big Data
PDF
150215 nyc digisocahonen150227pd_a_ic
PPTX
The Structural Relationship between Politicians' Web Visibility and Political...
PPT
Patrick Burton Centre for Justice and Crime Prevention (CJCP)
Gabriel Rissola: "Measuring the impact of eInclusion actors"
EGOV / ePart 2015 - Policy Compass Workshop Presentation
Quality Approaches to Big Data
150215 nyc digisocahonen150227pd_a_ic
The Structural Relationship between Politicians' Web Visibility and Political...
Patrick Burton Centre for Justice and Crime Prevention (CJCP)
Ad

Similar to Expression of Political Opinions in Press (20)

PPTX
A Maturity Stages Model for SM-based Citizen sourcing - The EU Community Project
PPTX
ONS Local presents: Explore Subnational Statistics
PPT
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
PDF
Social Media in Policy Making - The EU Community project approach
PPTX
A framework for real time semantic social media analysis
PPTX
Tracking Social Media Participation: New Approaches to Studying User-Genera...
PPTX
Tracking Social Media Participation: New Approaches to Studying User-Genera...
PPTX
Sense4us PACITA event presentation
PPTX
Gatewatching and News Curation: Social Media and the Public Sphere
PPT
Emcis 2015 - Policy Impact Evaluation Through Prosperity Metrics and Open Dat...
PDF
Leveraging European Union Policy Community Through Advanced Exploitation...
PPTX
News Sharing on Twitter: A Nationally Comparative Study
PPTX
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
PDF
LT_presentation
PPTX
Building the PoliMedia search system; data- and user-driven
PPTX
Forecasting General Election Results in Poland 2011 on the Basis of Social Me...
PPTX
Social Media Data Analytics
PDF
Final PhD defense presentation
PDF
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
PDF
atlas columbia
A Maturity Stages Model for SM-based Citizen sourcing - The EU Community Project
ONS Local presents: Explore Subnational Statistics
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
Social Media in Policy Making - The EU Community project approach
A framework for real time semantic social media analysis
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Sense4us PACITA event presentation
Gatewatching and News Curation: Social Media and the Public Sphere
Emcis 2015 - Policy Impact Evaluation Through Prosperity Metrics and Open Dat...
Leveraging European Union Policy Community Through Advanced Exploitation...
News Sharing on Twitter: A Nationally Comparative Study
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
LT_presentation
Building the PoliMedia search system; data- and user-driven
Forecasting General Election Results in Poland 2011 on the Basis of Social Me...
Social Media Data Analytics
Final PhD defense presentation
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
atlas columbia
Ad

More from University Politehnica Bucharest (20)

PPT
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PPT
Time series analysis for sales prediction
PPTX
Identification and Classification of the Most Important Moments in Students’ ...
PPTX
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
PPTX
Identifying cyclic words with the help of google
PPT
Determine the time period when a text was written using time series analysis
PPT
Using machine learning to generate predictions based on the information extra...
PPT
Hearthstone helper using optical character recognition techniques for cards d...
PPT
Movie recommender system using the user's psychological profile
PPT
Tracing the paths between concepts in large bio medical corpora
PPT
The collection and analysis of public data - Bucharest case study
PPT
Archaisms and neologisms identification in texts
PPT
Unsupervised system for automatic grading of bachelor and master thesis
PPT
Tweets topic modelling across different countries prezentarea
PPT
Sentiment based text segmentation
PPTX
Creativity detection in texts
PPT
Nlp based heuristics for assessing participants in cscl chats
PPT
Detecting discourse creativity in chat conversations
PPT
PDF
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
Time series analysis for sales prediction
Identification and Classification of the Most Important Moments in Students’ ...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Identifying cyclic words with the help of google
Determine the time period when a text was written using time series analysis
Using machine learning to generate predictions based on the information extra...
Hearthstone helper using optical character recognition techniques for cards d...
Movie recommender system using the user's psychological profile
Tracing the paths between concepts in large bio medical corpora
The collection and analysis of public data - Bucharest case study
Archaisms and neologisms identification in texts
Unsupervised system for automatic grading of bachelor and master thesis
Tweets topic modelling across different countries prezentarea
Sentiment based text segmentation
Creativity detection in texts
Nlp based heuristics for assessing participants in cscl chats
Detecting discourse creativity in chat conversations
2012 Presidential Elections on Twitter - An Analysis of How the US and French...

Recently uploaded (20)

PPT
what do you want to know about myeloprolifritive disorders .ppt
PPTX
ELS 2ND QUARTER 1 FOR HUMSS STUDENTS.pptx
PPTX
ELS 2ND QUARTER 2 FOR HUMSS STUDENTS.pptx
PDF
Unit Four Lesson in Carbohydrates chemistry
PDF
chemical-kinetics-Basics for Btech .pdf
PPTX
Antihypertensive Medicinal Chemistry Unit II BP501T.pptx
PPTX
Cutaneous tuberculosis Dermatology
PPTX
Models of Eucharyotic Chromosome Dr. Thirunahari Ugandhar.pptx
PPTX
complications of tooth extraction.pptx FIRM B.pptx
PPTX
Chromosomal Aberrations Dr. Thirunahari Ugandhar.pptx
PDF
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
PPTX
The Electromagnetism Wave Spectrum. pptx
PDF
CHEM - GOC general organic chemistry.ppt
PPT
ZooLec Chapter 13 (Digestive System).ppt
PPTX
Introduction of Plant Ecology and Diversity Conservation
PDF
software engineering for computer science
PDF
2024_PohleJellKlug_CambrianPlectronoceratidsAustralia.pdf
PDF
Sujay Rao Mandavilli Variable logic FINAL FINAL FINAL FINAL FINAL.pdf
PDF
No dilute core produced in simulations of giant impacts on to Jupiter
PPTX
Earth-and-Life-Pieces-of-Evidence-Q2.pptx
what do you want to know about myeloprolifritive disorders .ppt
ELS 2ND QUARTER 1 FOR HUMSS STUDENTS.pptx
ELS 2ND QUARTER 2 FOR HUMSS STUDENTS.pptx
Unit Four Lesson in Carbohydrates chemistry
chemical-kinetics-Basics for Btech .pdf
Antihypertensive Medicinal Chemistry Unit II BP501T.pptx
Cutaneous tuberculosis Dermatology
Models of Eucharyotic Chromosome Dr. Thirunahari Ugandhar.pptx
complications of tooth extraction.pptx FIRM B.pptx
Chromosomal Aberrations Dr. Thirunahari Ugandhar.pptx
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
The Electromagnetism Wave Spectrum. pptx
CHEM - GOC general organic chemistry.ppt
ZooLec Chapter 13 (Digestive System).ppt
Introduction of Plant Ecology and Diversity Conservation
software engineering for computer science
2024_PohleJellKlug_CambrianPlectronoceratidsAustralia.pdf
Sujay Rao Mandavilli Variable logic FINAL FINAL FINAL FINAL FINAL.pdf
No dilute core produced in simulations of giant impacts on to Jupiter
Earth-and-Life-Pieces-of-Evidence-Q2.pptx

Expression of Political Opinions in Press

  • 1. Autor Conducător științific Universitatea Politehnica București Facultatea de Automatică și Calculatoare Catedra de Calculatoare Expression of Political Opinions in Press Costin-Gabriel CHIRU, Tudor Dimcica, Stere Caciandone [email protected]
  • 2. Introduction • What? Application designed to analyze news articles from Romanian mass media and extract opinions about political entities • Why? study media polarization around important political events – elections • How? Using Machine Learning (ML) techniques for identifying and classifying opinions about the political entities • Results: reports and charts - used either for studying political polarization, or to identify partisan media 30.05.2017 CSCS21 2
  • 3. Context • Over 50% of the world population connected to internet  internet has one of the biggest exposures for information  might replace the traditional media in the future • Digitization of the media  people express their thoughts and feelings on the web  producers of media content •  Web users have immediate access to rich sources of information on any desired subject, both biased and unbiased (especially access to news information) • Past: newspapers were used to polarize large audiences • Present: migration of the audience into the online  newspapers moving their content on the internet •  Need and opportunity of tools for processing and extracting specific information to study how it affects and polarizes the audiences 30.05.2017 CSCS21 3
  • 4. Analysis Steps • Extract information from given websites and save it into a database – using crawlers • Search for the entities of interest (input by the user) in the articles and create article-entity associations • Classify the opinions from the data in positive / negative using ML algorithms • Generate charts to show the computed polarities 30.05.2017 CSCS21 4
  • 5. Application Architecture • Three main modules - data acquisition, data analysis and the graphical user interface (GUI) • Data acquisition – Extract only the political news from different newspapers’ website – Using a web crawler – Challenging because of the structural differences between different news websites – Some websites had a very chaotical structure  couldn’t be crawled automatically  eliminated – Kept for analysis: the title, the full content, the publishing date and the author. 30.05.2017 CSCS21 5
  • 6. Data Analysis • Using 2 opinion classifiers from the scikit-learn library (Naive-Bayes & Support Vector Machines) • Challenges in choosing the granularity of the data (the level of the text analysis and classification) – Involved the detection of entities for which opinions were extracted • Search only specific entities that are chosen by the user • Could be automatized using a Romanian NER  requires a lot of time  should be done offline – Data normalization – transform the words in numbers • Bag of words + term frequency – Opinions detection - entity level • An article is labeled with opinions for each entity associated with it 30.05.2017 CSCS21 6
  • 7. Graphical User Interface (GUI) • Main window - implements the functionality for classifying opinions and plotting visual statistics about the classified opinions • Article view - displays relevant data about an article and implements manual classification functionality 30.05.2017 CSCS21 7
  • 9. Evaluation – Gold Standard • 50 random documents extracted from the dataset manually labeled by two students • Inter-rater agreement computed using the Cohen's kappa coefficient: 0.6019 (upper bound of the moderate agreement interval) 30.05.2017 CSCS21 9
  • 10. Evaluation – Accuracy • Use gold standard to evaluate the performances of the algorithms used for classification: Naive- Bayes with multinomial distribution, Naive-Bayes with Bernoulli distribution, Support Vector Machines using Stochastic Gradient Descent • Algorithms chosen for good performance in categorizing texts 30.05.2017 CSCS21 10 Algorithms Students Student A Student B Average Naive-Bayes with multinomial distribution 66% 62% 64% Naive-Bayes with Bernoulli distribution 72% 64% 68% Support Vector Machines 80% 76% 78%
  • 11. Case Study • 2014 Romanian Presidential Elections • Study focused on the two candidates that reached the second round: Victor Ponta and Klaus Iohannis • The source for opinion extraction was the news website Hotnews • Analysis before and after the elections 30.05.2017 CSCS21 11
  • 12. Appearances in Press Before Announcing the Candidacy • Klaus Iohannis – member of the National Liberal Party – was the mayor of Sibiu from June 2000 until he was voted and invested as the president of Romania on December 2nd, 2014 30.05.2017 CSCS21 1 • Victor Ponta – jurist, former prosecutor, and a well- established political figure – former president of the Social Democratic Party – former Prime Minister of Romania – member of the Parliament since 2004 – Minister for Relations with Parliament between 2008 and 2009 – Prime Minister from May 2012
  • 13. Opinions Extracted From Hotnews Articles • 2009 - 2013 30.05.2017 CSCS21 1 • Jan. 2014 – Sep. 2014
  • 14. Appearances During Electoral Campaign and Elections • Appearances Oct. 2014 30.05.2017 CSCS21 14 • Appearances Nov. 2014 • Opinions Oct. 2014 • Opinions Nov. 2014
  • 15. Results of Elections • First round took place on November, 2nd : Victor Ponta won 40.44%; Klaus Iohannis 2nd 30.37% • The second round took place on November 16th : Klaus Iohannis won with 54.43%, although the opinion polls made with traditional methods presented Victor Ponta as the winner of the election by a high margin • Klaus Iohannis became the politician with the most followers on Facebook in the entire Europe and he enjoyed a high degree of positive opinions in the online media. 30.05.2017 CSCS21 15
  • 16. Appearances After Elections 30.05.2017 CSCS21 16 • Post-Election Appearances • Post-Election Opinions
  • 17. Conclusions (1) • Tool for sentiment analysis / opinion mining for Romanian online media • The user may add / remove entities from the analysis • The user can manually label the opinion of articles towards entities or he/she can use the automatic labeling using ML algorithms • The generated charts offered interesting findings that support the results of the analyzed presidential elections • Using this tool complementary with traditional polls can create a very good picture of how opinions are polarized around political entities 30.05.2017 CSCS21 17
  • 18. Conclusions (2) • Main limitation – The use of only this tool isn't enough, as the gap between the popularity of the two candidates was much higher than the difference of votes between them in the second round • Biggest challenge - find a suitable way to analyze the available data • Biggest advantage - capability of processing large quantities of data and extracting unbiased opinions • Possible improvements: crawlers for other newspapers, publish it online, export the charts in different formats, use different weights for articles based on popularity / visibility of the source or the author, extend the range of analyzed sentiments 30.05.2017 CSCS21 18

Editor's Notes

  • #10: very close to representing a substantial agreement between the two observers
  • #14: The two candidates announced their candidacies on 20 September (Ponta) and 27 September (Iohannis).
  • #15: - electoral campaign started on October 3rd and ended on November, 1st.
  • #20: \