A credibility analysis system for assessing information on twitter

Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
A Credibility Analysis System for AssessingInformation on Twitter
Now-a-days twitter popularity is growing due to its fast dissemination
(spreading information) of messages and for researchers finding Credibility
(finding whether message is fake or genuine) of messages is a hot topic and
lots of researchers already introduces some techniques to find whether given
message in twitter is fake or genuine but all those techniques were not using
all available information to determine fake or genuine messages.
Sometime in all online social networks some malicious users will spread fake
news and if this news publish on social media then it will put bad effect on
human society. This malicious users will hire peoples or robots to create fake
account and then using that fake account they will publish fake messages. This
type of accounts will not have much profile features such as favourites,
followers, following, hashtag, retweets etc. So by analysing those features we
can determine whether tweets is credible (genuine) or non-credible (fake);
In propose paper author is using four different components to find whether
tweet message is credible (genuine) or non-credible (fake). Below are four
components used in algorithm to find user reputation and tweets credibility.
1) Reputation Based Component: In this technique we will identify number
of followers, following, retweet, hashtag and favourites from tweets
dataset to calculate user’s reputation. For example if tweet is more
genuine then more number of users show interest in that tweet topic
and number of retweets, favourites count will be automatically
increased. Followers/ following of such tweets will also be increased. If
tweets is not genuine then very less number of users will follow it and
reputation score will be less. Formula to calculate reputation
Math.log(favourites)/math.log(max(U_followers,Total_retweet_for_topic)
Similarly reputation score will be calculated for hashtag also, sum of favourites
and hashtag will give reputation score. If user has less followers or retweets
then its score will be less.
If calculated value is less than 0.1 then we consider this user tweet as Non-
Credible (fake) and if greater > 0.1 then consider as Credible (genuine)

Mobile:+91 9966499110
2) Classifier Engine: using data mining algorithms such as Random Forest
and Naïve Bayes or propose FeatureRank_NB we can classify/predict
given tweet is genuine or fake. First with all existing data a train model
will be generated with above algorithms and then this algorithms train
model will be applied on new test tweet to determine whether given
test tweet is genuine or fake. Propose FeatureRank_NB algorithm will
apply ranking algorithm on all features such as favourite, followers,
following, hashtag etc to determine relevant (important values) features
to predict tweets are genuine or fake. Ranking algorithm will give high
score to those attributes which occur more number of time and that
attribute consider as important and more weight or rank assigned to
those attributes.
3) User Experience Component: In this module we will apply sentiment
analysis algorithm to determine whether given tweets contains more
number of positive or negative words. If user is experience and genuine
then he will used more number of positive words and if user is fake then
he will used more number of negative words. By applying sentiment
detection we can detect user experience and help us to identify whether
user tweet is fake or genuine. To calculate sentiments we are using
Stanford Natural Language Processing API’s which classify given tweets
as Positive or Negative or Neutral.
4) Feature Ranking Algorithm: This module will be applied on all tweets
features to select only those attributes which are important and has
more rank.
All algorithms given in paper will work on above describe four components.
Implementation
To implement above paper concept we are using tweets dataset from twitter.
Screen shots.
To run project double click on ‘run.bat’ file to get below screen

Mobile:+91 9966499110
In above screen click on ‘Upload Tweets Dataset’ button to upload tweets
folder. Each file in tweet folder contains tweets and profile from one user
After uploading tweets folder will get below screen

Mobile:+91 9966499110
In above screen we can see we extracted all features from all user profile. In
above screen ‘Tweet Content’ column contains tweets text message. Now click
on ‘Calculate User Reputation’ button to calculate user reputation from above
extracted features
In above screen for all tweets from each user we calculated reputation score
and also calculated whether tweets contains positive or negative messages.

Mobile:+91 9966499110
Now click on ‘Extract Features’ button to extract features and to calculate
credibility score from those tweets features
In above screen for each user tweet we calculated credibility based on
sentiment and reputation score. If score less than 0.60% then tweets will
consider as non-credible tweet and > 0.60% consider as credible tweet. Now
from above features we will train Random Forest Algorithm to build
classification model and to calculate correctly classified instances (record).
Now click on ‘Run Random Forest Algorithm’ button to build train model.

Mobile:+91 9966499110
In above screen selected text we can see total 36 tweets users found and out
of that Random forest able to correctly classified 27 tweets. Now classify same
thing with propose‘FeatureRank NB’ classifier, now click on ‘Run Feature Rank
NB’ button to classify with propose features ranking technique
In above screen we can see propose feature ranking technique able to
correctly classified 28 records which are more than existing random forest
technique. Now click on ‘Correctly Classified Instances Comparison Graph’ to
see correctly classified instances between existing and propose technique

Mobile:+91 9966499110
In above graph x-axis represents algorithm name and y-axis represents
correctly classified count. Existing random forest has less number of correctly
classified instances compare to propose Feature Ranking NB algorithm

A credibility analysis system for assessing information on twitter

More Related Content

Similar to A credibility analysis system for assessing information on twitter (20)

More from Venkat Projects (20)

Recently uploaded (20)

A credibility analysis system for assessing information on twitter