SlideShare a Scribd company logo
Using Query Reformulation for  User Profiling Jim Jansen College of Information Sciences and Technology  The Pennsylvania State University  [email_address] Interested in how much  descriptive  information we can generate about a  people  by leveraging  search log data .
What Did We Find Out? We can tell quite a lot about a user! When combined with other information,  query reformulation  is a  revealing  searching characteristic.
The State of Web Search Why search data is important
The Power of Search and the Web  Search is  the   top online activity Search drives over  7 billion monthly  queries in the U.S. Online activity has a  huge impact  on people’s daily lives: 70 minutes less with family 30 minutes less TV 8.5 minutes less sleep Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
Analysis of Search Marketplace  Holding  fairly stable  over the last year or so, albeit with some  Bing flux
Search Logs Contains the  trace data  recorded when a person visits the search engine, submits a query, views results, etc On one hand, logs have been  criticized   for  not being rich enough  (i.e., only have behaviors but  not  the  ‘why ’ factors) On the other hand, logs have been  criticized  for  recording too much  about us (i.e., logging a lot of  personal  information about a person) search logs How much we can  learn  about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of  what  a person is doing, of  why  they are doing it, and to  predict  what are they going to do next?
An illustrative example
How much can we tell from a single query?  ASIS&T  is an acronym for the American Society of Information Science and Technology  Good  probability  that this user is an  academic , a researcher, a librarian, or a student in one of these disciplines  Leveraging  demographic information : 57 percent female / 43 percent male probability  66.2 percent chance works in the information science field 55.6 percent probability this user has master’s degree
How much can we tell from a single query?  Leveraging  demographic information  (cont’d): 32.3 percent probability this user has a doctorate 53 percent likelihood works in academia.  Using  IP , we can locate the geographical area Based on  time , could infer that: this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel or looking for presentations or papers from the meeting (if the query is submitted after the conference).  Theoretically,  we can tell a lot ! However, with  billions of queries  per month, we can’t do the analysis  by hand  like this example. To develop user profiles, we need  automated methods . Research Question -  How complete of a  profile  can one develop for a Web search engine  user  from search  log  data?  [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
Specific aspects with automated methods …  Location  Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification –  where the user is at –  where the user is going –  what the user is interested in –  how motivated is the user –  Info, Nav, Transactional –  eCommerce related –  getting ready to buy –  will user click on link - demographic targeting/personalization - specific user targeting
Automated methods using query reformulation Location  Geographical interest Topical interest Topical complexity  – n-grams pattern analysis Content desires Commercial intent Purchase intent Potential to click on a link   Gender User identification
Where to get full story?  The methodological implementation reported in paper in your ASIST proceedings: Jansen, B.J., Zhang, M., Booth, B. Park, D., Zhang, Y., Kathuria, A. and Bonner, P.  (2009)  To What Degree Can Log Data Profile a Web Searcher?  Proceedings of the American Society for Information Science and Technology 2009 Annual Meeting. Vancouver, British Columbia. 6-11 November.
Topical Complexity Number of  queries  by a  user  in a  session  on a  topic  can tell us many things: the  complexity  of the topic the user’s  motivation  for the need provide  prediction  of future action
Information Searching Probabilistic  user modeling increasingly important area allows computer systems to adapt to users Algorithmic techniques typically employ  state models Simple Bayesian Classifier Markov Modeling n-grams Note: not always ‘informational’ anymore. Many time people are searching for ‘ other things ’. Rose & Levinson (2004); Jansen, Booth, & Spink (2008).
Illustration of Probabilistic User Modeling Using n-grams Given these states … …  how accurately can we predict these? AC 5 A 4 ABCDE 3 ABCDE 2 ABCF 1 Search State Transitions User 40% D C 60% B A 100% E CD 66% D BC 1OO% C AB Accuracy Next State? Predictive Pattern
Example Using Search Log ~ 965,000 searching sessions ~ 1,500,000 queries 8 states focusing on query reformulation Similar results for other aspects of searching See - Qui (1993), Jansen (2005), Jansen & McNeese (2006) Maybe ‘states’ are not the correct paradigm? Jansen, B. J., Booth, D. L., & Spink, A. (2009).  Patterns of query modification during Web searching .  Journal of the American Society for Information Science and Technology . 10% improvement from 1 st  to 2 nd  order: okay, but would like to do better 0 1 st 2 nd 3 rd 4 th Order of the Model Accuracy of Prediction 0.1  0.2  0.3  0.4  0.5  0.6 0.28 0.40 0.47 0.44 0.44 0.60 Drop out rate (folks who don’t submit a query ~40%)
User Profiling Framework  Classify user aspects into two levels:  internal  and  external .  Internal  aspects refer to  attributes  of the users themselves.  External  aspects relate to the  behavior or interest  of the users.  Interaction  between  internal  and  external  aspects. Can  infer   external  aspects from  internal  aspects.  External  aspects  reflect   internal  aspects
Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology  The Pennsylvania State University  [email_address]
Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
Back
Back

More Related Content

What's hot (8)

PDF
Opinion mining for social media
Diana Maynard
 
PPTX
Political Poster Edit
Clayton Boessen
 
PDF
Computational methods for intelligent matchmaking for knowledge work
Jari Jussila
 
PPTX
Digital Trace Data for Demographic Research
Ingmar Weber
 
PPTX
Crim 4384 statistics
ciakov
 
PDF
Team CDTW Capstone Presentation
Todd Rutherford
 
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Digital Methods Initiative
 
PPTX
Tinder Research Report
Hannah Carlson
 
Opinion mining for social media
Diana Maynard
 
Political Poster Edit
Clayton Boessen
 
Computational methods for intelligent matchmaking for knowledge work
Jari Jussila
 
Digital Trace Data for Demographic Research
Ingmar Weber
 
Crim 4384 statistics
ciakov
 
Team CDTW Capstone Presentation
Todd Rutherford
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Digital Methods Initiative
 
Tinder Research Report
Hannah Carlson
 

Viewers also liked (20)

PPS
Il Re e l' Imperatore
pulcino85
 
PPT
Cold war (1)
Patricia Guzman
 
DOC
Danh SáCh HọC Sinh YếU Thi LạI NăM HọC 2008
guestd9ddd7b
 
PPS
מקומות קסומים
guest518ac2
 
PDF
Why Join RE/MAX Crest Westside as a Sales Partner Info Book
debupton
 
PPS
Awkward Family Photos Slide Show
RobinNicole621
 
PDF
Walton Boulevard Reconstruction, APWA Project of the Year
OHM Advisors
 
PDF
Adventures in freemium
Navin Ganeshan
 
PPT
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
Jim Jansen
 
PPTX
How I learned to stop worrying and love Oracle
Guy Harrison
 
PPT
United Teak International
Max Kuling
 
PPT
Information Skills: 1. Planning & Mindmapping (Natural Sciences, Bangor Unive...
Vashti Zarach
 
PPT
Informe Productes
fcoalma
 
PDF
Municipal Infrastructure: Managing Assets to Capital Improvement Plans
OHM Advisors
 
PPT
Lesson 15 When Where To Show Your Ads
Jim Jansen
 
PPTX
Gravador de Chamadas - Alternativas e Tipos
Prestus®
 
PPTX
Module 6: Bloggin in the Classroom
guest7cd880
 
PPSX
MorfologíA Submarina Del Peru I
katty
 
PPT
Wordpress To Go Democamp Mtl2009
Brendan Sera-Shriar
 
PPT
Jenny, Katerina And Arynda
katerinawsy
 
Il Re e l' Imperatore
pulcino85
 
Cold war (1)
Patricia Guzman
 
Danh SáCh HọC Sinh YếU Thi LạI NăM HọC 2008
guestd9ddd7b
 
מקומות קסומים
guest518ac2
 
Why Join RE/MAX Crest Westside as a Sales Partner Info Book
debupton
 
Awkward Family Photos Slide Show
RobinNicole621
 
Walton Boulevard Reconstruction, APWA Project of the Year
OHM Advisors
 
Adventures in freemium
Navin Ganeshan
 
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
Jim Jansen
 
How I learned to stop worrying and love Oracle
Guy Harrison
 
United Teak International
Max Kuling
 
Information Skills: 1. Planning & Mindmapping (Natural Sciences, Bangor Unive...
Vashti Zarach
 
Informe Productes
fcoalma
 
Municipal Infrastructure: Managing Assets to Capital Improvement Plans
OHM Advisors
 
Lesson 15 When Where To Show Your Ads
Jim Jansen
 
Gravador de Chamadas - Alternativas e Tipos
Prestus®
 
Module 6: Bloggin in the Classroom
guest7cd880
 
MorfologíA Submarina Del Peru I
katty
 
Wordpress To Go Democamp Mtl2009
Brendan Sera-Shriar
 
Jenny, Katerina And Arynda
katerinawsy
 
Ad

Similar to The Use of Query Reformulation to Predict Future User Actions (20)

PDF
Www04 -rose
Shirisha Devarakonda
 
PPT
DBLP-SSE: A DBLP Search Support Engine
Yi Zeng
 
PPT
David Nicholas, Ciber: Audience Analysis and Modelling, the case of CIBER and...
michellep
 
PDF
Ibrahim ramadan paper
Ibrahim Ramadan Abd-Elhamid
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
PDF
WISE2019 presentation
Yusuke Yamamoto
 
PPTX
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Galit Shmueli
 
DOCX
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
audeleypearl
 
PPTX
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
BloomReach
 
PPTX
CSC315_LECTURE on database design and management
tissandavid
 
DOCX
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Alejandro Arizpe, MBA, MSc IT, PMP
 
PDF
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Cataldo Musto
 
PDF
Data Collection Tool Used For Information About Individuals
Christy Hunt
 
PDF
Optimizing Search Interactions within Professional Social Networks (thesis p...
Nik Spirin
 
PDF
Ac02411221125
ijceronline
 
PPTX
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
PDF
G017415465
IOSR Journals
 
PDF
Smashing SIlos: UX is the New SEO
BrightEdge
 
DOCX
httpowl.english.purdue.eduowlresource54401 The Pur
PazSilviapm
 
DBLP-SSE: A DBLP Search Support Engine
Yi Zeng
 
David Nicholas, Ciber: Audience Analysis and Modelling, the case of CIBER and...
michellep
 
Ibrahim ramadan paper
Ibrahim Ramadan Abd-Elhamid
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
WISE2019 presentation
Yusuke Yamamoto
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Galit Shmueli
 
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
audeleypearl
 
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
BloomReach
 
CSC315_LECTURE on database design and management
tissandavid
 
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Alejandro Arizpe, MBA, MSc IT, PMP
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Cataldo Musto
 
Data Collection Tool Used For Information About Individuals
Christy Hunt
 
Optimizing Search Interactions within Professional Social Networks (thesis p...
Nik Spirin
 
Ac02411221125
ijceronline
 
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
G017415465
IOSR Journals
 
Smashing SIlos: UX is the New SEO
BrightEdge
 
httpowl.english.purdue.eduowlresource54401 The Pur
PazSilviapm
 
Ad

More from Jim Jansen (13)

PPTX
Networked Consumers: How networked and how important?
Jim Jansen
 
PPT
Web analytics presentation
Jim Jansen
 
PPTX
Jjansen networked consumer_2011
Jim Jansen
 
PPT
Web analytics webinar
Jim Jansen
 
PPT
Twitter and EWOM Branding
Jim Jansen
 
PPT
Lesson_04_ist402_google_adwords_02
Jim Jansen
 
PPT
Lesson 13 Writing Good Ads 02
Jim Jansen
 
PPT
Lesson 11 Writing Good Ads
Jim Jansen
 
PPT
Lesson 07 Ist402 Keywords Take 02
Jim Jansen
 
PPT
Lesson 06 Ist402 Keywords 02
Jim Jansen
 
PPT
Lesson 05 Three Course Requirements
Jim Jansen
 
PPT
Ist402 Google Marketing Challenge V02
Jim Jansen
 
PPT
What Is Log Analyis
Jim Jansen
 
Networked Consumers: How networked and how important?
Jim Jansen
 
Web analytics presentation
Jim Jansen
 
Jjansen networked consumer_2011
Jim Jansen
 
Web analytics webinar
Jim Jansen
 
Twitter and EWOM Branding
Jim Jansen
 
Lesson_04_ist402_google_adwords_02
Jim Jansen
 
Lesson 13 Writing Good Ads 02
Jim Jansen
 
Lesson 11 Writing Good Ads
Jim Jansen
 
Lesson 07 Ist402 Keywords Take 02
Jim Jansen
 
Lesson 06 Ist402 Keywords 02
Jim Jansen
 
Lesson 05 Three Course Requirements
Jim Jansen
 
Ist402 Google Marketing Challenge V02
Jim Jansen
 
What Is Log Analyis
Jim Jansen
 

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Python basic programing language for automation
DanialHabibi2
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 

The Use of Query Reformulation to Predict Future User Actions

  • 1. Using Query Reformulation for User Profiling Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address] Interested in how much descriptive information we can generate about a people by leveraging search log data .
  • 2. What Did We Find Out? We can tell quite a lot about a user! When combined with other information, query reformulation is a revealing searching characteristic.
  • 3. The State of Web Search Why search data is important
  • 4. The Power of Search and the Web Search is the top online activity Search drives over 7 billion monthly queries in the U.S. Online activity has a huge impact on people’s daily lives: 70 minutes less with family 30 minutes less TV 8.5 minutes less sleep Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
  • 5. Analysis of Search Marketplace Holding fairly stable over the last year or so, albeit with some Bing flux
  • 6. Search Logs Contains the trace data recorded when a person visits the search engine, submits a query, views results, etc On one hand, logs have been criticized for not being rich enough (i.e., only have behaviors but not the ‘why ’ factors) On the other hand, logs have been criticized for recording too much about us (i.e., logging a lot of personal information about a person) search logs How much we can learn about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of what a person is doing, of why they are doing it, and to predict what are they going to do next?
  • 8. How much can we tell from a single query? ASIS&T is an acronym for the American Society of Information Science and Technology Good probability that this user is an academic , a researcher, a librarian, or a student in one of these disciplines Leveraging demographic information : 57 percent female / 43 percent male probability 66.2 percent chance works in the information science field 55.6 percent probability this user has master’s degree
  • 9. How much can we tell from a single query? Leveraging demographic information (cont’d): 32.3 percent probability this user has a doctorate 53 percent likelihood works in academia. Using IP , we can locate the geographical area Based on time , could infer that: this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel or looking for presentations or papers from the meeting (if the query is submitted after the conference). Theoretically, we can tell a lot ! However, with billions of queries per month, we can’t do the analysis by hand like this example. To develop user profiles, we need automated methods . Research Question - How complete of a profile can one develop for a Web search engine user from search log data? [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
  • 10. Specific aspects with automated methods … Location Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification – where the user is at – where the user is going – what the user is interested in – how motivated is the user – Info, Nav, Transactional – eCommerce related – getting ready to buy – will user click on link - demographic targeting/personalization - specific user targeting
  • 11. Automated methods using query reformulation Location Geographical interest Topical interest Topical complexity – n-grams pattern analysis Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification
  • 12. Where to get full story? The methodological implementation reported in paper in your ASIST proceedings: Jansen, B.J., Zhang, M., Booth, B. Park, D., Zhang, Y., Kathuria, A. and Bonner, P. (2009) To What Degree Can Log Data Profile a Web Searcher? Proceedings of the American Society for Information Science and Technology 2009 Annual Meeting. Vancouver, British Columbia. 6-11 November.
  • 13. Topical Complexity Number of queries by a user in a session on a topic can tell us many things: the complexity of the topic the user’s motivation for the need provide prediction of future action
  • 14. Information Searching Probabilistic user modeling increasingly important area allows computer systems to adapt to users Algorithmic techniques typically employ state models Simple Bayesian Classifier Markov Modeling n-grams Note: not always ‘informational’ anymore. Many time people are searching for ‘ other things ’. Rose & Levinson (2004); Jansen, Booth, & Spink (2008).
  • 15. Illustration of Probabilistic User Modeling Using n-grams Given these states … … how accurately can we predict these? AC 5 A 4 ABCDE 3 ABCDE 2 ABCF 1 Search State Transitions User 40% D C 60% B A 100% E CD 66% D BC 1OO% C AB Accuracy Next State? Predictive Pattern
  • 16. Example Using Search Log ~ 965,000 searching sessions ~ 1,500,000 queries 8 states focusing on query reformulation Similar results for other aspects of searching See - Qui (1993), Jansen (2005), Jansen & McNeese (2006) Maybe ‘states’ are not the correct paradigm? Jansen, B. J., Booth, D. L., & Spink, A. (2009). Patterns of query modification during Web searching . Journal of the American Society for Information Science and Technology . 10% improvement from 1 st to 2 nd order: okay, but would like to do better 0 1 st 2 nd 3 rd 4 th Order of the Model Accuracy of Prediction 0.1 0.2 0.3 0.4 0.5 0.6 0.28 0.40 0.47 0.44 0.44 0.60 Drop out rate (folks who don’t submit a query ~40%)
  • 17. User Profiling Framework Classify user aspects into two levels: internal and external . Internal aspects refer to attributes of the users themselves. External aspects relate to the behavior or interest of the users. Interaction between internal and external aspects. Can infer external aspects from internal aspects. External aspects reflect internal aspects
  • 18. Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address]
  • 19. Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
  • 20. Back
  • 21. Back