SlideShare a Scribd company logo
Semantically Enriched Machine Learning Approach to
Filter YouTube Comments for Socially Augmented User
                      Models
          Ahmad Ammari, Vania Dimitrova, Dimoklis
          Despotakis
          School of Computing, University of Leeds,
          Leeds, UK




                             Presented By:

                             Ahmad Ammari
                             User and Community Modelling
                             School of Computing, University of Leeds,
                             UK
Outline
• The ImREAL Project
• Socially Augmented User Modelling
• Research Objective, Roadmap,
  Challenges
• The Social Noise Filtering Approach
  –   Machine Learning – Based
  –   Methodology
  –   Comment Content Pre-Processing
  –   Semantic Enrichment
  –   Scoring and Labelling the Training Dataset
• Experimental Description / Results
• Evaluation
• Conclusions & Future Work
Immersive Reflective
                            Experience-based Adaptive
Specific Targeted Research Project STReP – FP7
         Learning
Partners
  University of Leeds, UK;               Trinity College Dublin, Ireland;
  Graz University of Technology, Austria; University of Erlangen-Nuremberg, Ger;
  Delft University of Technology, NL;     Imaginary SRL - IMA, Italy;
  Empower The User, ETU, Ireland;
                                    Problem:
 Experience in a simulated world is disconnected from the ‘real-
                            world’


                REALITY                                  VIRTUALITY

                                      ImREAL
           Augmented Reality         Approach         Augmented Virtuality
Augmented Simulated Experiential
                                           Learning




    Interactive
      User
      model

    Adaptive
                  Simulated Experiential
                  Learning Environment


    coach
                                             Augmented
                                                user         Real
                                              modelling      world
     Practice
                                                            activity
                                                            model-
                                                              ling
     Provide                                    Meta-
     content                                   cognitive               Records of Real
                                                                                         Other participants
                                                                         Job-related
                                                                                         (e.g. customers,
                                              scaffolding               Experiences
                                                                                            managers)




Simulated Learning Environment                                                Real World Experience
Augmented User Modelling
Socially Augmented User Modelling
                                                Open
                                            Social Spaces
           Simulated
          Environment



          User
         Profiles
                                                Sports
                           Psycholo   Social
                              gy
                                      Profile
                                        s
                                                   Diseases


                                  Politic
                                    s
Existing User
   Socially
    Model
Augmented User   Limited      Weighted Social
    Model        Scope!!         Interests
Broad Research Objective
Mining Social Media Content

generated by Users having awareness
 and/or Interest in an Activity Domain

to Derive Social Profiles


that Augment Existing User Models
Research Roadmap / Challenges
   • Three-Phase Research Roadmap
               towards achieving the Broad Objective
Phase One




                                        Phase Three
                         Phase Two


             Social
              Noise
            Filtration
The Social Noise Filtering Approach
• Supervised Machine Learning Model
  – Historic Content with known relevance states are
    used for training
  – Machine Learning Model learns the underlying
    rules
  – Model is used to predict unknown relevance
    states for new content with certain prediction
    confidence
The Social Noise Filtration Service:
                      Methodology

                          Semantically
                         Enriched Job
Experimental            Interview Bag of
      CASE STUDY:
ly Controlled Analyze   Filtering YouTube Comments
                         Words (JIBoW)
 Comments

  Social Media Source: YouTube
  Subject Content: Public Comments on Shared
  Videos
                    SCORE
  Activity Domain: Job Interview
                        Term – Comment
                              Matrix
                        (Training Corpus)
                                            S
                                            C
  Public
              Pre-                          O
Comments                                    R
            Process                         E
   On
                                            S
 YouTube
YouTube Video Selection
• Selected as part of a research study by
  [Despotakis, Lau & Dimitrova, 2011]
• Four Job Interview-related categories are
  manually identified from video content
  – Guides / Best Practices
  – Interviewee’s Stories
  – Interviewer’s Stories
  – Interview Mock Examples
• Videos from all categories are selected to
  retrieve the comment set for ML training
Comment Content Pre-Processing
• Objective: Deriving dataset for
  Classification
      Stop                 tfidf
                                           Comment
                                            – Term
      Word     Stemming
                          Weighting         Matrix
     Removal
                                             CTM
       1          2           3                4


                          I think most
                          Americans are like the
                          first example




                          think – Americans – like – first –
                          example
Semantically Enriched Job Interview
                                      Bag of Words
   • A Semantically Enriched Job Interview Bag of Words (JIBoW)
     used as Novel Means to Score and Label Training YouTube
     Comment Set
   • Collection of Textual Comments on Job Interview Videos [*]
        – Experimentally controlled
        – Closed social space
   • Text and Semantic Pre-Processing Phases
   • Semantically Expanded by the WordNet Lexicon and DISCO
     with Word Synonyms, Antonyms, Derivations, and
     semantically similar words




[*] Despotakis, Lau, Dimitrova (2011): A Semantic
Approach to Extract Individual Viewpoints from User
Comments on An Activity, AUM Workshop, UMAP
2011, Girona, Spain
Scoring and Labelling Training Corpus
• A Novel Term Frequency – based Mathematical Model
• Computes a Relevance Score for each observation in the
  training comment dataset
   – Intersection Size between Comment BoW and JIBoW
   – Score is Normalized by the Average Intersection Size




  • A Threshold is used to classify the comments for
    training a binary classifier
  • Labels observation (noisy, relevant) accordingly
Example Scoring & Labelling
C1: “The interviewee looks confident, he should
have some job experience in his work life”

  Comment       JIBOW
    BOW          w10
  interviewee    w21
   confident     w34
      job        w4
  experience     w57
     work        w113
      life       wn
Example Scored & Labelled Comments
Datasets
• YouTube API for Retrieval, Lucene API for Pre-
  Processing
• Post –YouTube Corpus Description:
         Analysis Data        Experimentally Controlled Corpus




• Training Corpus: 1159 Instances
   – Classified by the scoring model for Training C4.5 & Naïve
     Bayes Multinomial (NBM) Classifiers
   – {724 Noisy, 435 Relevant}
• Derived a Comment Term Matrix : 1159 Instances X 903
  tfidf Term Weights + 1 Discrete Class Column
Experimental Results
• Three variations of Training-to-Testing ratio
  Models for each classifier have been trained &
  tested
         See Evaluation
                                  ROC Area
             Results

• The Two Classifiers show good performance
  in predicting relevant & noisy comments in the
  testing data sets
• C4.5 is slightly better in predicting noisy
  comments from within the total noise in the
  data
• NBM shows less risk in misclassifying
  relevant comments as noise
Evaluation
Human-based Evaluation Experiment was
conducted to measure how well the service:
Goal1: Considers the comments that show
awareness in the application domain (Job
Interviews) See Example Question and
                    Records


Goal2: Considers the comments that their authors
are likely interested in the application domain
            See Example Question and
                    Records
Evaluation Results
                   Number of Evaluators                                  2
                   Number of Evaluated Comments (15% of Whole           180
                   Dataset)
                   Number of Comment Scored as Relevant                  90
                   Comments
                  Number of Comment Scored as Noisy Comments
                 Evaluator 2                                  90
                                                        Evaluator 1
      Goal 2           Goal 1                          Goal 2             Goal 1
                                                                   9%
                         3%                                                              Noisy
                                    Noisy
                                                                          15%
        17 24                                         46%
        % %                                                                              Relevant
                                    Releva                              19%
                       42%                                   45%                66%
        59                    55%   nt                                                   Doesn't
        %                                                                                know
                                    Doesn't
                                    know

     Metric            Goal 2       Goal 1           Metric             Goal 2        Goal 1
Total Match Rate        51.1%       68.3%       Total Match Rate        32.2%         60.0%
Total Mismatch                                  Total Mismatch
                        48.9%       31.7%                               67.8%         40.0%
Rate                                            Rate
Precision (Noisy)       42.2%       76.7%       Precision (Noisy)       36.7%         90.6%
Precision                                       Precision
                        76.7%       63.3%                               73.3%         44.4%
(Relevant)                                      (Relevant)
Recall (Noisy)          73.1%       67.6%       Recall (Noisy)          84.6%         68.2%
Summary
• Conclusions
  – High Rate of YouTube Video comments are Noisy
  – ML Models are good in Predicting and Filtering
    out Comments that do not show author
    awareness nor interests in the Activity Domain of
    Interests
• Future Work
  – Add more filters to improve the Scoring and
    Labelling Mechanism based on Evaluation
    Baseline
  – Exploit Activity Modelling Ontology to Derive
    JIBoW
  – Evaluate Impact of Semantic Enrichment
YouTube-based Social Profiling Service:
                                   Methodology
     YouTube / SM Comments          Noise Filtration Service            Comments Predicted as
                                                                             Relevant

                                                                           RC1    … ……. RCn
                                                                                     …….


                                          Clusters of Social Profiles
Profile1    Profile2    ProfileN
x   y      u   o      p   q   
e   r      x   o      x   c   
e   y      f   g      z   s   

        Associations of
                                                                         Profiling Source Authors
    Frequent Characteristics
                                  YT User Profiles
                            Uploaded YT Video meta data
                            Favored YT Video meta data
     ImREAL                 Comments on the YT Videos
    Simulators                       Social Profiling Corpus
Presented By:

Ahmad Ammari
User and Community Modelling
School of Computing, University of
Leeds, UK

More Related Content

What's hot (16)

PDF
Ple 2.0 ed-media
Denis Gillet
 
PDF
Demola affective robotics_20120502
Rod Walsh
 
PDF
Sandhya's portfolio
Sandhya
 
PPTX
Abertay4
Derek Nicoll
 
PPTX
Academics' online presence - assessing & shaping visibility 2012
Laura Czerniewicz
 
PDF
Lecture 4: Social Web Personalization (2012)
Lora Aroyo
 
PPTX
CUbRIK at SMILA Conference in Berlin
CUbRIK Project
 
PDF
Multi Level Education (181)
Rebecca Obounou
 
PDF
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
Ergosign GmbH
 
PDF
2012 Award Winning Poster
Eric B. Bauman
 
PDF
Networked Innovation And Collaboration
Welten Institute-Open Universiteit Nederland
 
PPT
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
Zornitza Yovcheva
 
PDF
Conole Wolverhampton Keynote
grainne
 
KEY
Sakai Learning Capabilities Design Lenses in Action
Jon Hays
 
PPTX
Who is the Customer? What is Experience? Indispensable Insights to empower yo...
CHI Poland
 
PPTX
Lightweight Concurrency
Andreas Heil
 
Ple 2.0 ed-media
Denis Gillet
 
Demola affective robotics_20120502
Rod Walsh
 
Sandhya's portfolio
Sandhya
 
Abertay4
Derek Nicoll
 
Academics' online presence - assessing & shaping visibility 2012
Laura Czerniewicz
 
Lecture 4: Social Web Personalization (2012)
Lora Aroyo
 
CUbRIK at SMILA Conference in Berlin
CUbRIK Project
 
Multi Level Education (181)
Rebecca Obounou
 
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
Ergosign GmbH
 
2012 Award Winning Poster
Eric B. Bauman
 
Networked Innovation And Collaboration
Welten Institute-Open Universiteit Nederland
 
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
Zornitza Yovcheva
 
Conole Wolverhampton Keynote
grainne
 
Sakai Learning Capabilities Design Lenses in Action
Jon Hays
 
Who is the Customer? What is Experience? Indispensable Insights to empower yo...
CHI Poland
 
Lightweight Concurrency
Andreas Heil
 

Viewers also liked (6)

PPT
03 ms office
fosterstac
 
PPTX
How to encourage more comments on your blog
Marie Ennis-O'Connor
 
PPT
MOE EBR training slides Dec 2010
ecocommish
 
PDF
Peer review exercise 1
s1170031
 
PPTX
Blogging 101 - Brian Cormier
Brian Cormier
 
PDF
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
Barry Feldman
 
03 ms office
fosterstac
 
How to encourage more comments on your blog
Marie Ennis-O'Connor
 
MOE EBR training slides Dec 2010
ecocommish
 
Peer review exercise 1
s1170031
 
Blogging 101 - Brian Cormier
Brian Cormier
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
Barry Feldman
 
Ad

Similar to Aum workshop paper_presentation (20)

PPTX
Lak12 - Leeds - Deriving Group Profiles from Social Media
lydia-lau
 
PDF
2014 10 23 (fie2014) emadrid uam exploring on e learning enhancement by mean ...
eMadrid network
 
PPTX
Integrating digital traces into a semantic enriched data
Dhaval Thakker
 
PPTX
Digital Identity and Personal Learning Networks
Sue Beckingham
 
PPT
Ub session 4
Yum Studio
 
PPT
Conole overview tel
Grainne Conole
 
PPT
The Problem of Learning in the Post-Course Era by Randy Bass
CNDLS at Georgetown University
 
KEY
Metrics in virtual worlds
Michael Vallance
 
PPTX
Social Network Analytics in Education and Research: Lies, Damned Lies and Pre...
Jisc
 
PPT
Learning 3.0
Kartik S
 
PPT
Berkeley cyberlearning 030811_final
Roy Pea
 
PDF
InspirUS
Selina Ellis
 
PDF
Language Technologies for Lifelong Learning
telss09
 
PPTX
Conole plenary
Grainne Conole
 
PDF
LRMI: Peek Under the Hood of Personalized Learning
AAP PreK-12 Learning Group
 
PPT
Ed Technology Pedagogy 2014
Christopher Jennings
 
PDF
The learner voice: students' use and experience of technologies
grainne
 
PDF
CSCW in Times of Social Media
Hendrik Drachsler
 
PDF
ALEF: A Framework for Adaptive Web-based Learning 2.0
ariquis
 
PDF
Design the future of the Australian Web Industry with Design Thinking
William Donovan
 
Lak12 - Leeds - Deriving Group Profiles from Social Media
lydia-lau
 
2014 10 23 (fie2014) emadrid uam exploring on e learning enhancement by mean ...
eMadrid network
 
Integrating digital traces into a semantic enriched data
Dhaval Thakker
 
Digital Identity and Personal Learning Networks
Sue Beckingham
 
Ub session 4
Yum Studio
 
Conole overview tel
Grainne Conole
 
The Problem of Learning in the Post-Course Era by Randy Bass
CNDLS at Georgetown University
 
Metrics in virtual worlds
Michael Vallance
 
Social Network Analytics in Education and Research: Lies, Damned Lies and Pre...
Jisc
 
Learning 3.0
Kartik S
 
Berkeley cyberlearning 030811_final
Roy Pea
 
InspirUS
Selina Ellis
 
Language Technologies for Lifelong Learning
telss09
 
Conole plenary
Grainne Conole
 
LRMI: Peek Under the Hood of Personalized Learning
AAP PreK-12 Learning Group
 
Ed Technology Pedagogy 2014
Christopher Jennings
 
The learner voice: students' use and experience of technologies
grainne
 
CSCW in Times of Social Media
Hendrik Drachsler
 
ALEF: A Framework for Adaptive Web-based Learning 2.0
ariquis
 
Design the future of the Australian Web Industry with Design Thinking
William Donovan
 
Ad

More from Ahmad Ammari (6)

PPTX
Itecn453 lec01
Ahmad Ammari
 
PPTX
Cis 2303 lo1 part 1_weeks_1_2 - student ver
Ahmad Ammari
 
PPTX
Itec410 lec01
Ahmad Ammari
 
PPTX
Distributed data mining
Ahmad Ammari
 
PPTX
Blog clustering
Ahmad Ammari
 
PPT
You tube Group Profiling Services
Ahmad Ammari
 
Itecn453 lec01
Ahmad Ammari
 
Cis 2303 lo1 part 1_weeks_1_2 - student ver
Ahmad Ammari
 
Itec410 lec01
Ahmad Ammari
 
Distributed data mining
Ahmad Ammari
 
Blog clustering
Ahmad Ammari
 
You tube Group Profiling Services
Ahmad Ammari
 

Recently uploaded (20)

PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 

Aum workshop paper_presentation

  • 1. Semantically Enriched Machine Learning Approach to Filter YouTube Comments for Socially Augmented User Models Ahmad Ammari, Vania Dimitrova, Dimoklis Despotakis School of Computing, University of Leeds, Leeds, UK Presented By: Ahmad Ammari User and Community Modelling School of Computing, University of Leeds, UK
  • 2. Outline • The ImREAL Project • Socially Augmented User Modelling • Research Objective, Roadmap, Challenges • The Social Noise Filtering Approach – Machine Learning – Based – Methodology – Comment Content Pre-Processing – Semantic Enrichment – Scoring and Labelling the Training Dataset • Experimental Description / Results • Evaluation • Conclusions & Future Work
  • 3. Immersive Reflective Experience-based Adaptive Specific Targeted Research Project STReP – FP7 Learning Partners University of Leeds, UK; Trinity College Dublin, Ireland; Graz University of Technology, Austria; University of Erlangen-Nuremberg, Ger; Delft University of Technology, NL; Imaginary SRL - IMA, Italy; Empower The User, ETU, Ireland; Problem: Experience in a simulated world is disconnected from the ‘real- world’ REALITY VIRTUALITY ImREAL Augmented Reality Approach Augmented Virtuality
  • 4. Augmented Simulated Experiential Learning Interactive User model Adaptive Simulated Experiential Learning Environment coach Augmented user Real modelling world Practice activity model- ling Provide Meta- content cognitive Records of Real Other participants Job-related (e.g. customers, scaffolding Experiences managers) Simulated Learning Environment Real World Experience
  • 5. Augmented User Modelling Socially Augmented User Modelling Open Social Spaces Simulated Environment User Profiles Sports Psycholo Social gy Profile s Diseases Politic s Existing User Socially Model Augmented User Limited Weighted Social Model Scope!! Interests
  • 6. Broad Research Objective Mining Social Media Content generated by Users having awareness and/or Interest in an Activity Domain to Derive Social Profiles that Augment Existing User Models
  • 7. Research Roadmap / Challenges • Three-Phase Research Roadmap towards achieving the Broad Objective Phase One Phase Three Phase Two Social Noise Filtration
  • 8. The Social Noise Filtering Approach • Supervised Machine Learning Model – Historic Content with known relevance states are used for training – Machine Learning Model learns the underlying rules – Model is used to predict unknown relevance states for new content with certain prediction confidence
  • 9. The Social Noise Filtration Service: Methodology Semantically Enriched Job Experimental Interview Bag of CASE STUDY: ly Controlled Analyze Filtering YouTube Comments Words (JIBoW) Comments Social Media Source: YouTube Subject Content: Public Comments on Shared Videos SCORE Activity Domain: Job Interview Term – Comment Matrix (Training Corpus) S C Public Pre- O Comments R Process E On S YouTube
  • 10. YouTube Video Selection • Selected as part of a research study by [Despotakis, Lau & Dimitrova, 2011] • Four Job Interview-related categories are manually identified from video content – Guides / Best Practices – Interviewee’s Stories – Interviewer’s Stories – Interview Mock Examples • Videos from all categories are selected to retrieve the comment set for ML training
  • 11. Comment Content Pre-Processing • Objective: Deriving dataset for Classification Stop tfidf Comment – Term Word Stemming Weighting Matrix Removal CTM 1 2 3 4 I think most Americans are like the first example think – Americans – like – first – example
  • 12. Semantically Enriched Job Interview Bag of Words • A Semantically Enriched Job Interview Bag of Words (JIBoW) used as Novel Means to Score and Label Training YouTube Comment Set • Collection of Textual Comments on Job Interview Videos [*] – Experimentally controlled – Closed social space • Text and Semantic Pre-Processing Phases • Semantically Expanded by the WordNet Lexicon and DISCO with Word Synonyms, Antonyms, Derivations, and semantically similar words [*] Despotakis, Lau, Dimitrova (2011): A Semantic Approach to Extract Individual Viewpoints from User Comments on An Activity, AUM Workshop, UMAP 2011, Girona, Spain
  • 13. Scoring and Labelling Training Corpus • A Novel Term Frequency – based Mathematical Model • Computes a Relevance Score for each observation in the training comment dataset – Intersection Size between Comment BoW and JIBoW – Score is Normalized by the Average Intersection Size • A Threshold is used to classify the comments for training a binary classifier • Labels observation (noisy, relevant) accordingly
  • 14. Example Scoring & Labelling C1: “The interviewee looks confident, he should have some job experience in his work life” Comment JIBOW BOW w10 interviewee w21 confident w34 job w4 experience w57 work w113 life wn
  • 15. Example Scored & Labelled Comments
  • 16. Datasets • YouTube API for Retrieval, Lucene API for Pre- Processing • Post –YouTube Corpus Description: Analysis Data Experimentally Controlled Corpus • Training Corpus: 1159 Instances – Classified by the scoring model for Training C4.5 & Naïve Bayes Multinomial (NBM) Classifiers – {724 Noisy, 435 Relevant} • Derived a Comment Term Matrix : 1159 Instances X 903 tfidf Term Weights + 1 Discrete Class Column
  • 17. Experimental Results • Three variations of Training-to-Testing ratio Models for each classifier have been trained & tested See Evaluation ROC Area Results • The Two Classifiers show good performance in predicting relevant & noisy comments in the testing data sets • C4.5 is slightly better in predicting noisy comments from within the total noise in the data • NBM shows less risk in misclassifying relevant comments as noise
  • 18. Evaluation Human-based Evaluation Experiment was conducted to measure how well the service: Goal1: Considers the comments that show awareness in the application domain (Job Interviews) See Example Question and Records Goal2: Considers the comments that their authors are likely interested in the application domain See Example Question and Records
  • 19. Evaluation Results Number of Evaluators 2 Number of Evaluated Comments (15% of Whole 180 Dataset) Number of Comment Scored as Relevant 90 Comments Number of Comment Scored as Noisy Comments Evaluator 2 90 Evaluator 1 Goal 2 Goal 1 Goal 2 Goal 1 9% 3% Noisy Noisy 15% 17 24 46% % % Relevant Releva 19% 42% 45% 66% 59 55% nt Doesn't % know Doesn't know Metric Goal 2 Goal 1 Metric Goal 2 Goal 1 Total Match Rate 51.1% 68.3% Total Match Rate 32.2% 60.0% Total Mismatch Total Mismatch 48.9% 31.7% 67.8% 40.0% Rate Rate Precision (Noisy) 42.2% 76.7% Precision (Noisy) 36.7% 90.6% Precision Precision 76.7% 63.3% 73.3% 44.4% (Relevant) (Relevant) Recall (Noisy) 73.1% 67.6% Recall (Noisy) 84.6% 68.2%
  • 20. Summary • Conclusions – High Rate of YouTube Video comments are Noisy – ML Models are good in Predicting and Filtering out Comments that do not show author awareness nor interests in the Activity Domain of Interests • Future Work – Add more filters to improve the Scoring and Labelling Mechanism based on Evaluation Baseline – Exploit Activity Modelling Ontology to Derive JIBoW – Evaluate Impact of Semantic Enrichment
  • 21. YouTube-based Social Profiling Service: Methodology YouTube / SM Comments Noise Filtration Service Comments Predicted as Relevant RC1 … ……. RCn ……. Clusters of Social Profiles Profile1 Profile2 ProfileN x y  u o  p q  e r  x o  x c  e y  f g  z s  Associations of Profiling Source Authors Frequent Characteristics YT User Profiles Uploaded YT Video meta data Favored YT Video meta data ImREAL Comments on the YT Videos Simulators Social Profiling Corpus
  • 22. Presented By: Ahmad Ammari User and Community Modelling School of Computing, University of Leeds, UK