Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
Introduction:
The present work has been developed with the purpose of participating in the challenge
promoted by Rosette, exemplifying the combined use of RapidMiner software and Rosette
extensions for RapidMiner (see https://www.rosette.com/calling-all-data-scientists/).	
  
	
  
It is worth to point out that none of the presented data or results can be said to be conclusive
regarding the good or bad qualification of the proposed scenario. In fact, the goal of this work
is only to promote the combined use of the two technologies mentioned above. The size of the
sample was not taken into account so that the results represent the qualification of the
proposed scenario.	
  
2	
  
1.  Objectives:
- To analyze the relationship between the current moment of economy and politics in Brazil,
and the international scenario, based on tweets generated in the social network Twitter
(www.twitter.com) in English language, containing the terms "Brazil" and "Michel Temer".	
  
	
  
- The choice of the terms "Brazil" and "Michel Temer" were based on the fact that these two
terms are intrinsically linked to Brazil's political and economic scenario. There are no political
motivations, legal or personal, in this choice.	
  
	
  
1.1 Specific objectives:	
  
	
  
1.1.1: Analyze the statistical correlation between the number of retweets and the perceived
feeling in each of the original tweets (positive, neutral or negative), evaluating which type of
message has the greatest proliferation power among the analyzed tweets, considering the
number of retweets as a factor of proliferation;	
  
	
  
1.1.2: Categorize, search the main entities in specific categories, as well as analyze the
sentiment of each tweet, according to the relevant categories, in order to assess the political
and economic scenario of Brazil, focus of this study.	
  
3	
  
2. Description:
2.1 First stage:
At this stage, using RapidMiner, we have collected tweets in English containing the term "Brazil" and
tweets containing the term "Michel Temer", current president of Brazil. For this stage, two
RapidMiner operators were used: (i) Search on Twitter, which connects Rapid Miner with the Twitter
API and get the tweets with the parameterized query, and (ii) the Write Database operator, which
writes those tweets into a local MySQL-type database. Writing the tweets into a database is
important to accumulate the tweets along several days and, thus, generate a bigger database.
Tweets for the two terms were accumulated from December 1st to December 17th 2016, following
the Twitter API rules of not providing data that is more than a week old.	
  
4	
  
Search
Twitter
Write
Database
RapidMiner Process:
2. Description:
2.2 Second stage:
	
  
At this stage, a RapidMiner process was assembled using the read database operator to access
the database with the stored tweets, followed by the Analyze Sentiment operator of the
Rosette Text Toolkit extension, which, using the external API communication, ranks each tweet
with respect to the Feeling perceived in the text as positive, neutral or negative. After the
feeling classification, the Map operator was inserted to convert the pos, neu or neg outcomes
into numerical parameters 1, 0 or -1, respectively. This conversion is required for the next
Correlation Matrix operator, which requires numeric variables. From the correlation matrix
generated as the final result, we obtain the statistical correlation between the Sentiment
column and the Retweet-Count column. This provides the answers to one of the questions
previously defined as the objective of this study: does the way a message spreads through
retweets depend on its positive, neutral or negative content?	
  
5	
  
2. Description:
2.2 Second stage:
For purposes of understanding, the following definition of statistical correlation is taken from
the description of the RapidMiner Correlation Matrix operator: "A correlation is a number
between -1 and +1 that measures the degree of association between two attributes (call them
X and Y). A positive value for the correlation implies a positive association. In this case large
values of X tend to be associated with large values of Y and small values of X tend to be
associated with small values of Y. A negative value for the correlation implies a negative or
inverse association. In this case large values of X tend to be associated with small values of Y
and vice versa."	
  
6	
  
Read
Database
Analyze
Sentiment
RapidMiner Process:
Map Correlation
Matrix
Rosette Text Toolkit
2. Description:
2.2.1 On the correlation calculations	
  
	
  
The relationship between the classification of news into positive, neutral and negative
contents and the interest of the readers in these stories, especially within the context of
politics, has been subject of long debate over the past years. In Ref. [1], for example, among
other discussions, Trussler and Soroka investigated how negative, positive and neutral news on
the Internet attract the interest of readers. Results of their study suggest that news with
negative headlines are more prone to be selected to be read further, specially when the text
has political content, although this becomes more evident when the topic of the news is on
political strategy. Our study share similarities with [1], but with a social network character: we
analyze how the sentiment (positive/negative/neutral) of contents published in Twitter affect
the number of re-tweets and sharing of these contents. With the large database captured in
the first stage, we also go further and calculate correlation between the sentiment and the
number of re-tweets.
7	
  
2. Description:
2.2.1 On the correlation calculations	
  
	
  
Of course, larger databases, that would provide even more accurate correlation parameters,
can be obtained by adjusting parameters of the RapidMiner tool, or simply by taking a longer
data collection time, so that more tweets are collected.	
  
	
  
[1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The
International Journal of Press/Politics 19, 360 (2014).
Just like in [1], we label negative, neutral and positive entries with a sentiment parameter s =
-1, 0, and 1, respectively. This arbitrary choice of parameters does not affect the results of the
correlation calculations, either qualitatively of quantitatively, as one can check by analyzing
e.g. Pearson's correlation coefficient mathematical expression. The only requirement for the
parameter s is to have its value increasing from negative, to neutral, to positive – in this way,
we understand e.g. a negative correlation as being due to the fact that lower (higher) values of
s are more likely to be connected to higher (lower) values of the number of retweets – in other
words, converting s back to the sentiment classifications, negative correlation would mean a
connection between negative (positive) comments and more (less) re-tweets. 	
  
	
  
8	
  
2. Description:
2.2.1 On the correlation calculations	
  
	
  
Therefore, the combination of RapidMiner and Rosette tools allows us to assess the old
question of “how the negative/neutral/positive character of a given political news affects its
impact among readers”, but now using data analysis tools, which are easy to handle and avoid
the need of running experiments that usually involve a number of participants, computer
based surveys, etc.	
  
	
  
[1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The
International Journal of Press/Politics 19, 360 (2014). 	
  
	
  
9	
  
2. Description:
2.3 Third Stage
In the third stage, a process was set up with the Read Database operator to access the database with
tweets with the term "Brazil" (for this purpose, a SELECT was used inside the operator, in order to
combine all the tables generated from the Twitter API, one for each day of collection). After this
operator was inserted, the following operators of the Rosette Text Toolkit were included: (i)
Categorize, in order to find all the tweets generated between 1st and 17th of December of 2016
classified by the operator in the category Law, Gov't & Politics; (ii) Extract Entities, which analyzes if
there is a predominance of specific entities in the tweet classified in the category Law, Gov't &
Politics; and (iii) Analyze Sentiment, which analyzes if there is a predominant sentiment
classification, by specific entities found in tweets categorized as Law, Gov't & Politics.	
  
10	
  
Read
Database
Extract
Entities
RapidMiner Process:
Categorize
Analyze
Sentiment
Rosette Text Toolkit
3. Results: second stage
In the first graph, we compare the statistical
correlation obtained during the 17 days of collection
between the volume of retweets and the
classification of feeling in the generated text. It
should be noted that the correlation of the database
with tweets with the term "Brazil" shows a tendency
of dissemination of weak negative tweets, with result
of -0,16. The tweets generated with the term "Michel
Temer" have a correlation with the tendency of
dissemination of negative tweets also weak, with
result of -0.25, although close to the zone of
moderate correlation (above -0.30), with peaks
reaching -0.49 some days, as shown in the following
graphs. 	
  
11	
  
-0,25
-0,16
Michel Temer Brazil
Correlation result by
search term
3. Results: second stage
For the term "Michel Temer", when we analyzed the statistical correlation factors for each of the
collection days, we noticed the negative predominance of correlation indexes. In fact, there is
no case along the analyzed days where positive correlation indexes are observed, i.e., positive
tweets didn't lead to greater number of retweets in any of these days. In six of the seventeen
days analyzed, the negative correlation level even exceeds the -0.30 margin.	
  
12	
  
-0,49
-0,27
-0,36 -0,36
-0,08
-0,18
-0,11
0,01
-0,15
-0,24
-0,35
-0,29
-0,03
-0,26
-0,02
-0,37-0,33
Term: Michel Temer
Correlation per day
3. Results: second stage
As for the term "Brazil", when we analyze the correlation factor per day, we observe a greater
fluctuation of indexes, ranging from a moderate correlation strength for the dissemination of
tweets classified as positive, to a moderate correlation force for proliferation of tweets
classified as negative. Tweets that have the term Brazil involve diverse questions, including
those about politics. As a matter of fact, the collection period was followed by an accident with
the Chapecoense soccer team, which generated a great commotion worldwide, thus somewhat
affecting the statistical results shown here.	
  
13	
  
-0,16
0,1
-0,12
-0,01
0,46
-0,01
0,1
-0,27
-0,42
-0,01 -0,07
-0,16 -0,17
0 0,05
-0,57
0,36
Term: Brazil
Correlation per day
3. Results: second stage
	
  
In an analysis of the tweets collected over the seventeen days for each of the terms, the
percentage of tweets that were classified as negative on each of the bases was checked. Notice
the large volume of negative tweets for the database with the term "Michel Temer". This result
gives us a more complete understanding of the situation and confirms the tendency of a
greater dissemination of tweets classified as negative (volume of retweets), which have the
term "Michel Temer".	
  
14	
  
75%
25%
Percentage of negative tweets per search term
Michel Temer Brazil
3. Results: third stage
The focus of this study is Brazil's current political scenario. The graph below shows the
percentage of all tweets collected during the first 17 days of December that have the term
"Brazil", which were categorized as Law, Gov't & Politics, according to the operator Categorize of
the Rosette Text Toolkit.	
  
15	
  
2%
98%
Distribution by category
Law, Gov't & Politics Others categories
3. Results: third stage
This graph demonstrates the analysis of tweets categorized as Law, Gov't and Politics and their
behavior with respect to the classification of feelings. The operators of Categorize and Analyze
Sentiment were used, showing that in this category, tweets classified as negative are
predominant.	
  
16	
  
20%
18%
62%
Classifier: Law, Gov't & Politics tweets
Pos Neu Neg
3. Results: third stage
The graph below shows the combination of two operators of the Rosette Tool Kit, allowing,
after the process of categorizing the tweets, to extract the most relevant entities of a certain
category. The combination of these operators allows a detailed understanding of the proposed
scenario and can be a powerful tool for decision making in various segments, such as press,
communication or institutional relations.	
  
17	
  
38%
24% 24%
Senate President Supreme Court
Key entities found in tweets categorized as Law, Gov't &
Politics
Key entities found in tweets categorized as Law, Gov't & Politics
4. Conclusions
By making the combined use of RapidMiner with the Rosette Text Toolkit, we have been able to
create a powerful way to analyze data in text, especially social networks. The use of both tools allows
professionals from different areas, who have the need to analyze this type of information, to do so
with low levels of technical knowledge.
Focusing on the result and the analyzes made possible by using the Rosette Text Toolkit gives you
the chance to analyze social media data much more deeply than other tools with standardized
reports. It is possible to continuously create new metrics and think about the process of knowledge
discovery in the database. As an example, we can mention the search for patterns in tweets that are
classified as negative, that belong to a certain category, with the predominance of a specific entity.
In terms of decision-making, the combination of the two tools, with the methodologies developed
in this work, allows real-time evaluation of people's reaction to a given political scenario. Through
these analyzes one can shape institutional campaigns, measure public interests or even model
speeches and other communications in line with people's yearnings.
18	
  
5. Final remarks	
  
This work was developed by Delano Lima, graduated in Advertising and postgraduate in Marketing
Management by UNIFOR - University of Fortaleza. Validation of the process of converting the feeling
classifications from text to numerical data was done in collaboration with Prof. Andrey Chaves, from
the Department of Physics of Universidade Federal do Ceará (UFC), PhD in Physics by UFC and
University of Antwerp, in Belgium, with a post-doc period at Columbia University, USA.	
  
	
  
For any question about this work, please contact Mr. Delano Lima at delano@miningmetrics.net. For
specific questions about the process of converting sentiment classifiers into text to numbers, please
contact Professor Andrey Chaves at andrey@fisica.ufc.br.	
  
19	
  
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?

More Related Content

PPT
Evolving social data mining and affective analysis
PDF
Stock market prediction using Twitter sentiment analysis
PDF
A Survey Of Collaborative Filtering Techniques
PDF
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
PDF
Multi-Tier Sentiment Analysis System in Big Data Environment
PDF
Temporal Exploration in 2D Visualization of Emotions on Twitter Stream
PDF
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
PDF
Mining social data
Evolving social data mining and affective analysis
Stock market prediction using Twitter sentiment analysis
A Survey Of Collaborative Filtering Techniques
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
Multi-Tier Sentiment Analysis System in Big Data Environment
Temporal Exploration in 2D Visualization of Emotions on Twitter Stream
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Mining social data

What's hot (20)

PDF
Prediction of Reaction towards Textual Posts in Social Networks
PDF
Predicting the Brand Popularity from the Brand Metadata
PDF
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
PDF
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
PDF
Multiple Regression to Analyse Social Graph of Brand Awareness
PPTX
FAKE NEWS DETECTION PPT
PPTX
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
PDF
Safeguarding Abila: Discovering Evolving Activist Networks
PDF
merged_document
PDF
IRJET- Competitive Analysis of Attacks on Social Media
PDF
Epidemiological Modeling of News and Rumors on Twitter
PPTX
Literature review on customer emotions in social media
PDF
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
PAGES
Usability Review of Mashup Tools
PDF
IRJET - Election Result Prediction using Sentiment Analysis
PDF
Document(2)
PPTX
Social Network Analysis for Telecoms
DOCX
Individual project 2.20
PDF
Forex-Foreteller: Currency Trend Modeling using News Articles
PDF
Slides: Epidemiological Modeling of News and Rumors on Twitter
Prediction of Reaction towards Textual Posts in Social Networks
Predicting the Brand Popularity from the Brand Metadata
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
Multiple Regression to Analyse Social Graph of Brand Awareness
FAKE NEWS DETECTION PPT
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
Safeguarding Abila: Discovering Evolving Activist Networks
merged_document
IRJET- Competitive Analysis of Attacks on Social Media
Epidemiological Modeling of News and Rumors on Twitter
Literature review on customer emotions in social media
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
Usability Review of Mashup Tools
IRJET - Election Result Prediction using Sentiment Analysis
Document(2)
Social Network Analysis for Telecoms
Individual project 2.20
Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Epidemiological Modeling of News and Rumors on Twitter
Ad

Similar to Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics? (20)

PDF
Analyzing-Threat-Levels-of-Extremists-using-Tweets
PDF
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
PDF
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
PDF
Stock market prediction using Twitter sentiment analysis
PDF
Twitter sentimentanalysis report
PDF
A large-scale sentiment analysis using political tweets
PDF
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
PDF
Using Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
PDF
PDF
591 Final Report - Team 7 - Political Issues
PDF
Text mining on Twitter information based on R platform
PDF
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
PDF
Big data analysis of news and social media content
PDF
Final Poster for Engineering Showcase
PDF
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
PDF
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
PDF
IRJET - Political Orientation Prediction using Social Media Activity
DOCX
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
PDF
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
PDF
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
Analyzing-Threat-Levels-of-Extremists-using-Tweets
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
Stock market prediction using Twitter sentiment analysis
Twitter sentimentanalysis report
A large-scale sentiment analysis using political tweets
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
Using Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
591 Final Report - Team 7 - Political Issues
Text mining on Twitter information based on R platform
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
Big data analysis of news and social media content
Final Poster for Engineering Showcase
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET - Political Orientation Prediction using Social Media Activity
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
Ad

Recently uploaded (20)

PPTX
Internet of Everything -Basic concepts details
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Altius execution marketplace concept.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Examining Bias in AI Generated News Content.pdf
PDF
SaaS reusability assessment using machine learning techniques
PDF
Decision Optimization - From Theory to Practice
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
Internet of Everything -Basic concepts details
Co-training pseudo-labeling for text classification with support vector machi...
Lung cancer patients survival prediction using outlier detection and optimize...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
4 layer Arch & Reference Arch of IoT.pdf
LMS bot: enhanced learning management systems for improved student learning e...
Altius execution marketplace concept.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Examining Bias in AI Generated News Content.pdf
SaaS reusability assessment using machine learning techniques
Decision Optimization - From Theory to Practice
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
A symptom-driven medical diagnosis support model based on machine learning te...
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
Rapid Prototyping: A lecture on prototyping techniques for interface design
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
giants, standing on the shoulders of - by Daniel Stenberg
Build Real-Time ML Apps with Python, Feast & NoSQL

Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?

  • 2. Introduction: The present work has been developed with the purpose of participating in the challenge promoted by Rosette, exemplifying the combined use of RapidMiner software and Rosette extensions for RapidMiner (see https://www.rosette.com/calling-all-data-scientists/).     It is worth to point out that none of the presented data or results can be said to be conclusive regarding the good or bad qualification of the proposed scenario. In fact, the goal of this work is only to promote the combined use of the two technologies mentioned above. The size of the sample was not taken into account so that the results represent the qualification of the proposed scenario.   2  
  • 3. 1.  Objectives: - To analyze the relationship between the current moment of economy and politics in Brazil, and the international scenario, based on tweets generated in the social network Twitter (www.twitter.com) in English language, containing the terms "Brazil" and "Michel Temer".     - The choice of the terms "Brazil" and "Michel Temer" were based on the fact that these two terms are intrinsically linked to Brazil's political and economic scenario. There are no political motivations, legal or personal, in this choice.     1.1 Specific objectives:     1.1.1: Analyze the statistical correlation between the number of retweets and the perceived feeling in each of the original tweets (positive, neutral or negative), evaluating which type of message has the greatest proliferation power among the analyzed tweets, considering the number of retweets as a factor of proliferation;     1.1.2: Categorize, search the main entities in specific categories, as well as analyze the sentiment of each tweet, according to the relevant categories, in order to assess the political and economic scenario of Brazil, focus of this study.   3  
  • 4. 2. Description: 2.1 First stage: At this stage, using RapidMiner, we have collected tweets in English containing the term "Brazil" and tweets containing the term "Michel Temer", current president of Brazil. For this stage, two RapidMiner operators were used: (i) Search on Twitter, which connects Rapid Miner with the Twitter API and get the tweets with the parameterized query, and (ii) the Write Database operator, which writes those tweets into a local MySQL-type database. Writing the tweets into a database is important to accumulate the tweets along several days and, thus, generate a bigger database. Tweets for the two terms were accumulated from December 1st to December 17th 2016, following the Twitter API rules of not providing data that is more than a week old.   4   Search Twitter Write Database RapidMiner Process:
  • 5. 2. Description: 2.2 Second stage:   At this stage, a RapidMiner process was assembled using the read database operator to access the database with the stored tweets, followed by the Analyze Sentiment operator of the Rosette Text Toolkit extension, which, using the external API communication, ranks each tweet with respect to the Feeling perceived in the text as positive, neutral or negative. After the feeling classification, the Map operator was inserted to convert the pos, neu or neg outcomes into numerical parameters 1, 0 or -1, respectively. This conversion is required for the next Correlation Matrix operator, which requires numeric variables. From the correlation matrix generated as the final result, we obtain the statistical correlation between the Sentiment column and the Retweet-Count column. This provides the answers to one of the questions previously defined as the objective of this study: does the way a message spreads through retweets depend on its positive, neutral or negative content?   5  
  • 6. 2. Description: 2.2 Second stage: For purposes of understanding, the following definition of statistical correlation is taken from the description of the RapidMiner Correlation Matrix operator: "A correlation is a number between -1 and +1 that measures the degree of association between two attributes (call them X and Y). A positive value for the correlation implies a positive association. In this case large values of X tend to be associated with large values of Y and small values of X tend to be associated with small values of Y. A negative value for the correlation implies a negative or inverse association. In this case large values of X tend to be associated with small values of Y and vice versa."   6   Read Database Analyze Sentiment RapidMiner Process: Map Correlation Matrix Rosette Text Toolkit
  • 7. 2. Description: 2.2.1 On the correlation calculations     The relationship between the classification of news into positive, neutral and negative contents and the interest of the readers in these stories, especially within the context of politics, has been subject of long debate over the past years. In Ref. [1], for example, among other discussions, Trussler and Soroka investigated how negative, positive and neutral news on the Internet attract the interest of readers. Results of their study suggest that news with negative headlines are more prone to be selected to be read further, specially when the text has political content, although this becomes more evident when the topic of the news is on political strategy. Our study share similarities with [1], but with a social network character: we analyze how the sentiment (positive/negative/neutral) of contents published in Twitter affect the number of re-tweets and sharing of these contents. With the large database captured in the first stage, we also go further and calculate correlation between the sentiment and the number of re-tweets. 7  
  • 8. 2. Description: 2.2.1 On the correlation calculations     Of course, larger databases, that would provide even more accurate correlation parameters, can be obtained by adjusting parameters of the RapidMiner tool, or simply by taking a longer data collection time, so that more tweets are collected.     [1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The International Journal of Press/Politics 19, 360 (2014). Just like in [1], we label negative, neutral and positive entries with a sentiment parameter s = -1, 0, and 1, respectively. This arbitrary choice of parameters does not affect the results of the correlation calculations, either qualitatively of quantitatively, as one can check by analyzing e.g. Pearson's correlation coefficient mathematical expression. The only requirement for the parameter s is to have its value increasing from negative, to neutral, to positive – in this way, we understand e.g. a negative correlation as being due to the fact that lower (higher) values of s are more likely to be connected to higher (lower) values of the number of retweets – in other words, converting s back to the sentiment classifications, negative correlation would mean a connection between negative (positive) comments and more (less) re-tweets.     8  
  • 9. 2. Description: 2.2.1 On the correlation calculations     Therefore, the combination of RapidMiner and Rosette tools allows us to assess the old question of “how the negative/neutral/positive character of a given political news affects its impact among readers”, but now using data analysis tools, which are easy to handle and avoid the need of running experiments that usually involve a number of participants, computer based surveys, etc.     [1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The International Journal of Press/Politics 19, 360 (2014).     9  
  • 10. 2. Description: 2.3 Third Stage In the third stage, a process was set up with the Read Database operator to access the database with tweets with the term "Brazil" (for this purpose, a SELECT was used inside the operator, in order to combine all the tables generated from the Twitter API, one for each day of collection). After this operator was inserted, the following operators of the Rosette Text Toolkit were included: (i) Categorize, in order to find all the tweets generated between 1st and 17th of December of 2016 classified by the operator in the category Law, Gov't & Politics; (ii) Extract Entities, which analyzes if there is a predominance of specific entities in the tweet classified in the category Law, Gov't & Politics; and (iii) Analyze Sentiment, which analyzes if there is a predominant sentiment classification, by specific entities found in tweets categorized as Law, Gov't & Politics.   10   Read Database Extract Entities RapidMiner Process: Categorize Analyze Sentiment Rosette Text Toolkit
  • 11. 3. Results: second stage In the first graph, we compare the statistical correlation obtained during the 17 days of collection between the volume of retweets and the classification of feeling in the generated text. It should be noted that the correlation of the database with tweets with the term "Brazil" shows a tendency of dissemination of weak negative tweets, with result of -0,16. The tweets generated with the term "Michel Temer" have a correlation with the tendency of dissemination of negative tweets also weak, with result of -0.25, although close to the zone of moderate correlation (above -0.30), with peaks reaching -0.49 some days, as shown in the following graphs.   11   -0,25 -0,16 Michel Temer Brazil Correlation result by search term
  • 12. 3. Results: second stage For the term "Michel Temer", when we analyzed the statistical correlation factors for each of the collection days, we noticed the negative predominance of correlation indexes. In fact, there is no case along the analyzed days where positive correlation indexes are observed, i.e., positive tweets didn't lead to greater number of retweets in any of these days. In six of the seventeen days analyzed, the negative correlation level even exceeds the -0.30 margin.   12   -0,49 -0,27 -0,36 -0,36 -0,08 -0,18 -0,11 0,01 -0,15 -0,24 -0,35 -0,29 -0,03 -0,26 -0,02 -0,37-0,33 Term: Michel Temer Correlation per day
  • 13. 3. Results: second stage As for the term "Brazil", when we analyze the correlation factor per day, we observe a greater fluctuation of indexes, ranging from a moderate correlation strength for the dissemination of tweets classified as positive, to a moderate correlation force for proliferation of tweets classified as negative. Tweets that have the term Brazil involve diverse questions, including those about politics. As a matter of fact, the collection period was followed by an accident with the Chapecoense soccer team, which generated a great commotion worldwide, thus somewhat affecting the statistical results shown here.   13   -0,16 0,1 -0,12 -0,01 0,46 -0,01 0,1 -0,27 -0,42 -0,01 -0,07 -0,16 -0,17 0 0,05 -0,57 0,36 Term: Brazil Correlation per day
  • 14. 3. Results: second stage   In an analysis of the tweets collected over the seventeen days for each of the terms, the percentage of tweets that were classified as negative on each of the bases was checked. Notice the large volume of negative tweets for the database with the term "Michel Temer". This result gives us a more complete understanding of the situation and confirms the tendency of a greater dissemination of tweets classified as negative (volume of retweets), which have the term "Michel Temer".   14   75% 25% Percentage of negative tweets per search term Michel Temer Brazil
  • 15. 3. Results: third stage The focus of this study is Brazil's current political scenario. The graph below shows the percentage of all tweets collected during the first 17 days of December that have the term "Brazil", which were categorized as Law, Gov't & Politics, according to the operator Categorize of the Rosette Text Toolkit.   15   2% 98% Distribution by category Law, Gov't & Politics Others categories
  • 16. 3. Results: third stage This graph demonstrates the analysis of tweets categorized as Law, Gov't and Politics and their behavior with respect to the classification of feelings. The operators of Categorize and Analyze Sentiment were used, showing that in this category, tweets classified as negative are predominant.   16   20% 18% 62% Classifier: Law, Gov't & Politics tweets Pos Neu Neg
  • 17. 3. Results: third stage The graph below shows the combination of two operators of the Rosette Tool Kit, allowing, after the process of categorizing the tweets, to extract the most relevant entities of a certain category. The combination of these operators allows a detailed understanding of the proposed scenario and can be a powerful tool for decision making in various segments, such as press, communication or institutional relations.   17   38% 24% 24% Senate President Supreme Court Key entities found in tweets categorized as Law, Gov't & Politics Key entities found in tweets categorized as Law, Gov't & Politics
  • 18. 4. Conclusions By making the combined use of RapidMiner with the Rosette Text Toolkit, we have been able to create a powerful way to analyze data in text, especially social networks. The use of both tools allows professionals from different areas, who have the need to analyze this type of information, to do so with low levels of technical knowledge. Focusing on the result and the analyzes made possible by using the Rosette Text Toolkit gives you the chance to analyze social media data much more deeply than other tools with standardized reports. It is possible to continuously create new metrics and think about the process of knowledge discovery in the database. As an example, we can mention the search for patterns in tweets that are classified as negative, that belong to a certain category, with the predominance of a specific entity. In terms of decision-making, the combination of the two tools, with the methodologies developed in this work, allows real-time evaluation of people's reaction to a given political scenario. Through these analyzes one can shape institutional campaigns, measure public interests or even model speeches and other communications in line with people's yearnings. 18  
  • 19. 5. Final remarks   This work was developed by Delano Lima, graduated in Advertising and postgraduate in Marketing Management by UNIFOR - University of Fortaleza. Validation of the process of converting the feeling classifications from text to numerical data was done in collaboration with Prof. Andrey Chaves, from the Department of Physics of Universidade Federal do Ceará (UFC), PhD in Physics by UFC and University of Antwerp, in Belgium, with a post-doc period at Columbia University, USA.     For any question about this work, please contact Mr. Delano Lima at [email protected]. For specific questions about the process of converting sentiment classifiers into text to numbers, please contact Professor Andrey Chaves at [email protected].   19