SlideShare a Scribd company logo
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   1/16




   Measuring Website Similarity using
     an Entity-Aware Click Graph


 Pablo N. Mendes1, Peter Mika2, Hugo Zaragoza2, Roi Blanco2

                                 1. Freie Universität Berlin
                              2. Yahoo! Research Barcelona


                             Nov 1st 2012, Maui, CIKM 2012
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   2/16



            Introduction: query log analysis
   ●   Query logs record user interaction with Web
       search engines
   ●   Query log analysis has been proven critical to
       improving search
   ●   For search engines
        –   Ranking, autosuggest, “Also try”, etc.

   ●   For site owners
        –   insight into user needs, allows optimizing Web
            presence, etc.
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph    3/16



           Introduction: website similarity
   ●   Click graph: relating queries and websites,
          edges are clicks




                            Click graph                                    Site similarity graph (SG)

   ●   Allows modeling website relatedness based on
       shared queries leading to each website pair
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   4/16



                             Problems: Sparsity
   ●   44% of queries occur only once even when
       considering a full year of data [1]

   ●   using “shared queries” as relatedness
       measure relatedness becomes tough in the
       long tail.




         [1] Baeza-Yates. Relating content through web usage. In HT ’09, 2009.
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   5/16



                  Problems: partial overlaps




  ●   Breaking up into words distorts semantics
        –   “Forest” vs “Forest Gump”
        –   “Pitt” vs “Brad Pitt”
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   6/16



                                        Introduction
 ●   >62% of queries contain entity name or type [20]




[20] Pound, Mika, & Zaragoza. Ad-hoc object retrieval in the web of data. In WWW’10, 2010.
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   7/16



                   Entity-aware Click Graph

  ●   Websites can share
      entities and/or
      modifiers
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   8/16


          Entity-aware Website Similarity
                      Graph


 ●   More connected
 ●   Preserves semantics
 ●   Allows analysis of
     how websites relate
     to entities and modifiers
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   9/16



                                      Experiments
   ●   Website similarity
        –   Find top K similar sites
        –   Evaluation: two sites are “similar” if they are in the
            same category in ODP (Open Directory Project)

   ●   Website characteristics from the searcher POV
        –   What entities lead to a website
        –   What context words lead to a website
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   10/16



             Dataset Statistics: Query Log
   ●   1 month of queries from Yahoo!, 45M sessions
   ●   5M entities from Freebase
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   11/16



                                           Results 1
   ●   Similarity edge prediction
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   12/16



                                           Results 1
   ●   Similarity edge prediction with credit to partial
       category overlap
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph      13/16


                                           Results 2
                                Many entities
                                Few modifiers

                                                                                    Many entities
                                                                                   Many modifiers
  Entropy of
distribution of
    entities


                                                  Few entities
                                                 Many modifiers




                                                                 Entropy of
                                                          distribution of modifiers
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   14/16



                                           Results 2
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   15/16



                                         Conclusion
 ●   Recognizing entities in Web search logs allows for
     click graphs that account for internal composition of
     queries
 ●   New similarity graphs built from entity-aware click
     graphs allow enable more robust and flexible
     similarity analysis (evaluated for website similarity)

 ●   Future:
      –   Exploit the knowledge base (e.g. type hierarchy)
      –   More complex queries
      –   etc
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   16/16



                                         Thank you!
●   Web: http://pablomendes.com
●   E-mail: pablo.mendes@fu-berlin.de
●   Twitter: @pablomendes
●   Slideshare: slideshare.net/pablomendes



    Questions?

More Related Content

What's hot (7)

PDF
Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
Agencia de Mercadotecnia
 
DOCX
653 discussion questions for the week
sbyrnes
 
PPTX
Social Media Data Mining
Ryan Reede
 
DOCX
Davai predictive user modeling
Thomas Bergstraesser
 
PPTX
Storytelling, social media and metrics
Mari Pierce-Quinonez
 
PPTX
Facebook and Data Mining
Pratik Dalvi
 
PPTX
Open Data Sources for Grants
jasonparker83
 
Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
Agencia de Mercadotecnia
 
653 discussion questions for the week
sbyrnes
 
Social Media Data Mining
Ryan Reede
 
Davai predictive user modeling
Thomas Bergstraesser
 
Storytelling, social media and metrics
Mari Pierce-Quinonez
 
Facebook and Data Mining
Pratik Dalvi
 
Open Data Sources for Grants
jasonparker83
 

Similar to Entity Aware Click Graph (20)

PDF
Iaetsd similarity search in information networks using
Iaetsd Iaetsd
 
PDF
Питер Мика "Making the web searchable"
Yandex
 
PDF
Searching for Interestingness in Wikipedia and Yahoo! Answers
Gabriela Agustini
 
PDF
EventMedia Live: Exploring Events Connections in Real-Time to Enhance Content
Raphael Troncy
 
PDF
Patterns for Personalization on the Web
Lora Aroyo
 
PDF
A survey of web metrics
unyil96
 
PDF
Aggregate rank bringing order to web sites
OUM SAOKOSAL
 
PPT
Semantic Search overview at SSSW 2012
Peter Mika
 
PDF
Comparing taxonomies for organising collections of documents presentation
pathsproject
 
PDF
Similarity on DBpedia
Samantha Lam
 
PPTX
Bill Slawski SEO and the New Search Results
Bill Slawski
 
PPT
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
PDF
Kdd12 tutorial-inf-part-ii
Laks Lakshmanan
 
PDF
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
csandit
 
PDF
semantic markup using schema.org
Joshua Shinavier
 
PPTX
On the Intrinsic Locality Properties of Web Reference Streams
Bruno Abrahao
 
PPTX
Hany's Doctoral Consortium
heinestien
 
PPTX
Common Crawl: An Open Repository of Web Data
huguk
 
PPTX
London HUG
Boudicca
 
PDF
Hany's JCDL Doctoral Consortium
heinestien
 
Iaetsd similarity search in information networks using
Iaetsd Iaetsd
 
Питер Мика "Making the web searchable"
Yandex
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Gabriela Agustini
 
EventMedia Live: Exploring Events Connections in Real-Time to Enhance Content
Raphael Troncy
 
Patterns for Personalization on the Web
Lora Aroyo
 
A survey of web metrics
unyil96
 
Aggregate rank bringing order to web sites
OUM SAOKOSAL
 
Semantic Search overview at SSSW 2012
Peter Mika
 
Comparing taxonomies for organising collections of documents presentation
pathsproject
 
Similarity on DBpedia
Samantha Lam
 
Bill Slawski SEO and the New Search Results
Bill Slawski
 
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Kdd12 tutorial-inf-part-ii
Laks Lakshmanan
 
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
csandit
 
semantic markup using schema.org
Joshua Shinavier
 
On the Intrinsic Locality Properties of Web Reference Streams
Bruno Abrahao
 
Hany's Doctoral Consortium
heinestien
 
Common Crawl: An Open Repository of Web Data
huguk
 
London HUG
Boudicca
 
Hany's JCDL Doctoral Consortium
heinestien
 
Ad

More from Pablo Mendes (10)

PDF
WWW2012 Tutorial Visualizing SPARQL Queries
Pablo Mendes
 
PDF
Sieve - Data Quality and Fusion - LWDM2012
Pablo Mendes
 
PDF
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
Pablo Mendes
 
PDF
Ligado nos Políticos at ESWC'2011 Workshop
Pablo Mendes
 
PDF
SMWCon Fall 2011 Lightning Talk
Pablo Mendes
 
PPTX
DBpedia Spotlight at I-SEMANTICS 2011
Pablo Mendes
 
PDF
Dados Ligados (Linked Data) CONSEGI 2011
Pablo Mendes
 
PPTX
Cuebee Architecture
Pablo Mendes
 
PPTX
Twarql Architecture - Streaming Annotated Tweets
Pablo Mendes
 
PPTX
Dynamic Associative Relationships on the Linked Open Data Web
Pablo Mendes
 
WWW2012 Tutorial Visualizing SPARQL Queries
Pablo Mendes
 
Sieve - Data Quality and Fusion - LWDM2012
Pablo Mendes
 
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
Pablo Mendes
 
Ligado nos Políticos at ESWC'2011 Workshop
Pablo Mendes
 
SMWCon Fall 2011 Lightning Talk
Pablo Mendes
 
DBpedia Spotlight at I-SEMANTICS 2011
Pablo Mendes
 
Dados Ligados (Linked Data) CONSEGI 2011
Pablo Mendes
 
Cuebee Architecture
Pablo Mendes
 
Twarql Architecture - Streaming Annotated Tweets
Pablo Mendes
 
Dynamic Associative Relationships on the Linked Open Data Web
Pablo Mendes
 
Ad

Entity Aware Click Graph

  • 1. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 1/16 Measuring Website Similarity using an Entity-Aware Click Graph Pablo N. Mendes1, Peter Mika2, Hugo Zaragoza2, Roi Blanco2 1. Freie Universität Berlin 2. Yahoo! Research Barcelona Nov 1st 2012, Maui, CIKM 2012
  • 2. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 2/16 Introduction: query log analysis ● Query logs record user interaction with Web search engines ● Query log analysis has been proven critical to improving search ● For search engines – Ranking, autosuggest, “Also try”, etc. ● For site owners – insight into user needs, allows optimizing Web presence, etc.
  • 3. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 3/16 Introduction: website similarity ● Click graph: relating queries and websites, edges are clicks Click graph Site similarity graph (SG) ● Allows modeling website relatedness based on shared queries leading to each website pair
  • 4. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 4/16 Problems: Sparsity ● 44% of queries occur only once even when considering a full year of data [1] ● using “shared queries” as relatedness measure relatedness becomes tough in the long tail. [1] Baeza-Yates. Relating content through web usage. In HT ’09, 2009.
  • 5. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 5/16 Problems: partial overlaps ● Breaking up into words distorts semantics – “Forest” vs “Forest Gump” – “Pitt” vs “Brad Pitt”
  • 6. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 6/16 Introduction ● >62% of queries contain entity name or type [20] [20] Pound, Mika, & Zaragoza. Ad-hoc object retrieval in the web of data. In WWW’10, 2010.
  • 7. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 7/16 Entity-aware Click Graph ● Websites can share entities and/or modifiers
  • 8. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 8/16 Entity-aware Website Similarity Graph ● More connected ● Preserves semantics ● Allows analysis of how websites relate to entities and modifiers
  • 9. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 9/16 Experiments ● Website similarity – Find top K similar sites – Evaluation: two sites are “similar” if they are in the same category in ODP (Open Directory Project) ● Website characteristics from the searcher POV – What entities lead to a website – What context words lead to a website
  • 10. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 10/16 Dataset Statistics: Query Log ● 1 month of queries from Yahoo!, 45M sessions ● 5M entities from Freebase
  • 11. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 11/16 Results 1 ● Similarity edge prediction
  • 12. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 12/16 Results 1 ● Similarity edge prediction with credit to partial category overlap
  • 13. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 13/16 Results 2 Many entities Few modifiers Many entities Many modifiers Entropy of distribution of entities Few entities Many modifiers Entropy of distribution of modifiers
  • 14. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 14/16 Results 2
  • 15. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 15/16 Conclusion ● Recognizing entities in Web search logs allows for click graphs that account for internal composition of queries ● New similarity graphs built from entity-aware click graphs allow enable more robust and flexible similarity analysis (evaluated for website similarity) ● Future: – Exploit the knowledge base (e.g. type hierarchy) – More complex queries – etc
  • 16. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 16/16 Thank you! ● Web: http://pablomendes.com ● E-mail: [email protected] ● Twitter: @pablomendes ● Slideshare: slideshare.net/pablomendes Questions?