David Jurgens

I research how humans behave by observing the things we say, what we do, and who we are. My research combines natural language processing and social psychology together to understand behavior in its natural social context. I collaborate with colleagues from the social sciences to improve our theories using data-driven insights and methodologies.

Prospective students: I am likely not actively recruiting PhD students who would start Fall 2026. However, I will still look at applications in CSE for those work on Natural Language Processing, so feel free to tag me in your application. I am most interested in students who have significant technical experience with NLP methods and a strong interest (or coursework) in social psychology or experience with experiments.

You can get systems to learn individual and group behaviors during annotation but only if you add structural priors to the model

Modeling Annotator Disagreement with Demographic-Aware Experts and Synthetic Perspectives

Yinuo Xu, Veronica Derricks, Allison Earl, and David Jurgens

preprint

📄 paper

NUTMEG is an alternative to MACE for identifying ground truth when groups of annotators systematically disagree.

NUTMEG: Separating Signal From Noise in Annotator Disagreement

Jonathan Ivey, Susan Gauch, and David Jurgens

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

Unstructured Evidence Attribution for Long Context Query Focused Summarization

Dustin Wright, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein, David Jurgens

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

Not all definitions of empathy are useful for modeling empathy in language, but some are!

The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs

Allison Lahnala, Charlie Welch, David Jurgens, Lucie Flek

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

You can get LLMs to reason more morally if you tell them what morals mean and how to reason ethically

Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framewo

Mohna Chakraborty, Lu Wang, and David Jurgens

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

Multilingual training can help authorship attribution models generalize to new and unseen languages

Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains

Junghwan Kim, Haotian Zhang, and David Jurgens

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

Setting an LLM persona causes spill over effects on other aspects of the model's behavior!

Are Economists Always More Introverted? Analyzing Consistency in Persona-Assigned LLMs

Manon Reusens, Bart Baesens, and David Jurgens

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

There are so many papers on linguistic coordination. We tried to make sense of them.

Coordinating Chaos: A Structured Review of Linguistic Coordination Methodologies.

Ben Litterer, David Jurgens, and Dallas Card.

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)

📄 paper

The podcast ecosystem, colored by topic.

Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus.

Ben Litterer, David Jurgens, and Dallas Card.

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)

📄 paper 💾 data 💻 code 💻 code

Moral reasoning is complex and we introduce a new dataset that captures multiple aspects of moral reasoning

Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral

Shrivani Kumar, David Jurgens

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)

📄 paper

LLM performance on subjective tasks when fine-tuned on demographic information

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

Matthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, and Dirk Hovy

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)

📄 paper

The noisy path from source to citation: measuring how scholars engage with past research

The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research

Hong Chen, Misha Teplitskiy, David Jurgens

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)

📄 paper

Authorship attribution models currently do poorly in cross-lingual settings.

The Million Authors Corpus: A Cross-Lingual and Cross-Domain Wikipedia Dataset for Authorship Verification

Abraham Israeli, Shuai Liu, Jonathan May, and David Jurgens

Findings of ACL

📄 paper

Tokenization methods vary in their sensitivity to language variation.

Tokenization is Sensitive to Language Variation

Anna Wegmann, Dong Nguyen, and David Jurgens

Findings of ACL

📄 paper

Generative AI can change how people use social media for better or worse

The Impact of Generative AI on Social Media: An Experimental Study

Anders Giovanni Møller, Daniel M. Romero, David Jurgens, Luca Maria Aiello

preprint

📄 paper

Evaluation Framework for AI Systems in "the Wild"

Sarah Jabbour, Trenton Chang, Anindya Das Antar, Joseph Peper, Insu Jang, Jiachen Liu, Jae-Won Chung, Shiqi He, Michael Wellman, Bryan Goodman, Elizabeth Bondi-Kelly, Kevin Samy, Rada Mihalcea, Mosharaf Chowhury, David Jurgens, Lu Wang

AI Lab Whitepaper

📄 paper

Who reaps all the superchats? A large-scale analysis of income inequality in virtual YouTuber livestreaming

Who Reaps All the Superchats? A Large-Scale Analysis of Income Inequality in Virtual YouTuber Livestreaming

Ruijing Zhao, Brian Diep, Jiaxin Pei, Dongwook Yoon, David Jurgens, Jian Zhu

Proceedings of the 2025 Conference on Human Factors in Computing Systems (CHI), 2025

📄 paper

Different genres plotted according to their Biber-derived stylistic regularity

Neurobiber: Fast and Interpretable Stylistic Feature Extraction

Kenan Alkiek, Anna Wegmann, Jian Zhu, David Jurgens

preprint

📄 paper

Referring to a generic

The persuasive role of generic-you in online interactions

Minxue Niu, Emily Mower Provost, David Jurgens, Susan A. Gelman, Ethan Kross, and Ariana Orvell

Scientific Reports 15(1), 1347

📄 paper

Hashtags spread differently depending on the network structure and the identity of the users who use them

The Role of Network and Identity in the Diffusion of Hashtags

Aparna Ananthasubramaniam, Yufei 'Louise' Zhu, David Jurgens, Daniel Romero

The Web Conference, 2025

📄 paper

When you read an email, does it matter more who you are or how the email is written if you want a reply? Read our paper to find out!

Causally Modeling the Linguistic and Social Factors that Predict Email Response

Yinuo Xu, Hong Chen, Sushrita Rakshit, Aparna Ananthasubramaniam, Omkar Yadav, Mingqian Zheng, Michael Jiang, Lechen Zhang, Bowen Yi, Kenan Alkiek, Abraham Israeli, Bangzhao Shu, Hua Shen, Jiaxin Pei, Haotian Zhang, Miriam Schirmer, and David Jurgens.

Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

📄 paper

The answers of LLMs align with the perceptions of specific social groups.

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens.

Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

📄 paper 💾 data 💻 code

People are more likely to read science news depending on how it is written

Modeling Public Perceptions of Science in Media

Jiaxin Pei, Dustin Wright, Isabelle Augenstin, David Jurgens

preprint

📄 paper

Socially aware language technologies and their connections with linguistics, social sciences, and NLP

The Call for Socially Aware Language Technologies.

Diyi Yang, Dirk Hovy, David Jurgens, and Barbara Plank.

Computational Sociolinguistics 51(2).

📄 paper

Not all good Wikipedia articles stay good. Why is that? Read our paper to find out.

A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia.

Abraham Israeli, David Jurgens, and Daniel Romero.

preprint.

📄 paper 💾 data 💻 code

Optimizing the system and task parts of the prompt can have huge benefits

SPRIG: Improving Large Language Model Performance by System Prompt Optimization.

Lechen Zhang, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.

preprint.

📄 paper 💾 data 💻 code

The prompt matters in how human an LLM can seem

Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue.

Johnathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni Møller, Lechen Zhang, and David Jurgens.

preprint.

📄 paper 💾 data 💻 code

Pathways of linguistic diffusion seen on Twitter

Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation

Aparna Ananthasubramaniam, David Jurgens, Daniel M. Romero.

npj Complexity. 2024.

📄 paper

The pipeline for collecting data of traumatic events

The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI

Miriam Schirmer, Tobias Leemann, Gjergji Kasneci, Jürgen Pfeffer, and David Jurgens.

Findings of EMNLP. 2024.

📄 paper

LLM agents can simulate human trust behaviors

Can Large Language Model Agents Simulate Human Trust Behaviors?

Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, David Jurgens, James Evans, Philip Torr, Bernard Ghanem, and Guohao Li

NeurIPS 2024

📄 paper

Communities respond differently to the same message depending on their underlying values

ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions

Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, and Yulia Tsvetkov.

Findings of EMNLP. 2024.

📄 paper 💾 data 💻 code

Tables are data too. Maybe they can be text as well!

Tab2Text - A framework for deep learning with tabular data

Tong Lin*, Jason Yan*, David Jurgens, and Sabina Tomkins.

Findings of EMNLP. 2024.

📄 paper

LLMs answer questions more or less accurately depending on the social roles in the question prompt

Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.

Findings of EMNLP. 2024.

📄 paper 💾 data 💻 code

Human-AI Alignment is bidirectional

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions.

Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens.

preprint.

📄 paper

A Multilingual Similarity Dataset for News Article Frame

Xi Chen, Mattia Samory, Scott Hale, David Jurgens, Przemyslaw A Grabowicz

Proceedings of the International AAAI Conference on Web and Social Media (ICWSM).

📄 paper 💾 data

Large language models are bad at answering psychological questionnaires consistently

You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

Bangzhao Shu*, Lechen Zhang*, Minje Choi, Lavinia Dunagan, Dallas Card, and David Jurgens.

Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

📄 paper 💾 data 💻 code

Memes are multimodal constructions where the base image template and additional text fills both have semantic value.

Social Meme-ing: Measuring Linguistic Variation in Memes

Naitian Zhou, David Jurgens, and David Bamman.

Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

📄 paper 💾 data 💻 code

The empathetic alignment between an author and responder on Reddit shows most people just give advice.

Modeling Empathetic Alignment in Conversation

Jiamin Yang and David Jurgens.

Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

📄 paper 💾 data 💻 code

Jiamin's amazing annotation tool: https://github.com/jessicayjm/span_alignment_annotation_tool

Strong influence connections in the global news network.

Global News Synchrony During the Start of the COVID-19 Pandemic

Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw Adam Grabowicz.

Proceedings of the 2024 Web Conference.

📄 paper 💾 data 💻 code

The network model for estimating contextual informativeness.

Finding Educationally Supportive Contexts for Vocabulary Learning with Attention-Based Models

Sungjin Nam, Kevyn Collins-Thompson, David Jurgens and Xin Tong.

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).

📄 paper

Characteristics of and variation in suicide mortality related to retirement during the Great Recession: perspectives from the National Violent Death Reporting System.

Aparna Ananthasubramaniam, David Jurgens, Eskira Kahsay, and Briana Mezuk.

The Gerontologist gnae015. 2024.

📄 paper

zero-shot LLM performance on social language tasks

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Minje Choi,* Jiaxin Pei,* Sagar Kumar, Chang Shu and David Jurgens.

Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.

📄 paper 💾 data 💻 code

Media storms over time with labels

When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Ben Litterer, David Jurgens, and Dallas Card.

Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.

📄 paper 💾 data 💻 code

The probability that, given an appropriate message for the relationships represented by a row, the message will also be appropriate in another relationship listed in the column. Probabilities are calculated across the entire data

Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

David Jurgens,* Agrima Seth,* Jackson Sargent,† Athena Aghighi,† and Michael Geraci.†

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2023.

📄 paper 💾 data 💻 code

Relative use of politeness strategies when annotators rewrite emails to be more polite

When Do Annotator Demographics Matter? Measuring The Influence of Annotator Demographics with the POPQUORN Dataset

Jiaxin Pei and David Jurgens.

Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) at ACL. 2023.

📄 paper 💾 data 💻 code

The causal-estimated effect of banning on users matching the style of others

Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer and David Jurgens.

Proceedings of the 1st Workshop on Social Influence in Conversations (SICon) at ACL. 2023.

📄 paper 💾 data 💻 code

Best Paper

Overall performance on each language. The box indicates the lower quartile to the upper quartile and the whisker indicates the maximum and the minimum. Outliers are shown as dots. Participants generally achieve better performances on languages in the training set and achieved good performance on Arabic and Dutch. Predicting intimacy in Hindi and Korean remains challenging. Moreover, performances on unseen languages generally have larger variances.

SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, and Francesco Barbieri.

Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval).

📄 paper 💾 data

The effects of personal shocks on people's social media activities

Analyzing the Engagement of Social Relationships During Life Event Shocks in Social Media

Minje Choi, David Jurgens, and Daniel Romero.

Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.

📄 paper 💾 data 💻 code

The influence of multilingual individuals on social connectedness in Europe

Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media

Julia Mendelsohn, Sayan Ghosh, David Jurgens, and Ceren Budak.

Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.

📄 paper 💾 data 💻 code

Best Methodology Paper

Work Expectations, Depressive Symptoms, and Passive Suicidal Ideation Among Older Adults: Evidence From the Health and Retirement Study

Briana Mezuk, Linh Dang, David Jurgens, Jacqui Smith.

The Gerontologist 62 (10), 1454-1465 2022.

📄 paper

The way the press portrays certain scientific results differs by where those results were described in the paper

Modeling Information Change in Science Communication with Semantically Matched Paraphrases

Dustin Wright, Jiaxin Pei, David Jurgens, and Isabelle Augenstein.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.

📄 paper 💾 data 💻 code

Not all empathy papers use empathy in the same way

A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

Allison Claire Lahnala, Charles Welch, David Jurgens, and Lucie Flek.

Proceedings of the Findings of Empirical Methods in Natural Language Processing (EMNLP Findings). 2022.

📄 paper

POTATO: The Portable Text Annotation Tool

Jiaxin Pei, Aparna Kamakshi Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent and David Jurgens.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): Systems Demonstrations. 2022.

📄 paper 💻 code

Citation context sizes

MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

Anne Lauscher, Brandon Ko, Bailey Kuhl, Sophie Johnson, Arman Cohan, David Jurgens, Kyle Lo.

Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2022.

📄 paper 💾 data 💻 code

Citation context sizes

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

Christina Lu and David Jurgens.

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022.

📄 paper 💾 data 💻 code

Correlations between the ways in which two news articles can be similar.

SemEval-2022 Task 8: Multilingual news article similarity

Xi Chen, Ali Zeynali, Chico Camargo, Fabian Flöck, Devin Gaffney, Przemyslaw Grabowicz, Scott Hale, David Jurgens, and Mattia Samory.

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 2022.

📄 paper 💾 data

The effect of curriculum ordering on word similarity tasks

An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications

Sungjin Nam, David Jurgens, and Kevyn Collins-Thompson.

in submission. 2022.

📄 paper

The effects of mentorship

Diversifying the Professoriate

Bas Hofstra, Daniel A. McFarland, Sanne Smith, David Jurgens.

Socius. 2022.

📄 paper

Similarities in Redditor political affiliations and commenting activity

Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis

Kenan Alkik, Bohan Zhang, and David Jurgens.

ACL Findings. 2022.

📄 paper 💻 code

Multilingual performance on grapheme to phoneme conversion

ByT5 model for massively multilingual grapheme-to-phoneme conversion

Jian Zhu, Cong Zhang, and David Jurgens.

Interspeech 2022.

📄 paper 💻 code

Food healthiness ratings

Language in Popular American Culture Constructs the Meaning of Healthy and Unhealthy Eating: Narratives of Craveability, Excitement, and Social Connection in Movies, Television, Social Media, Recipes, and Food Reviews

Bradley P. Turnwald, Margaret A. Perry, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, Hazel R. Markus, Alia J. Crum.

Appetitte. 2022.

📄 paper

Phone-to-audio alignment without text: A Semi-supervised Approach

Phone-to-audio alignment without text: A Semi-supervised Approach

Jian Zhu, Cong Zhang, and David Jurgens.

Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing.

📄 paper 💻 code

Modeling Framing in Immigration Discourse on Social Media

Julia Mendelsohn, Ceren Budak, David Jurgens

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2021.

📄 paper 💻 code

Latent classes of biased words and their effects on toxicity

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Sayan Ghosh, Dylan Baker, David Jurgens, and Vinodkumar Prabhakaran.

Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT).

📄 paper

Best Paper

Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender

Sky Wang and David Jurgens.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Jian Zhu and David Jurgens..

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper 💾 data 💻 code

Measuring Sentence-Level and Aspect-Level Certainty in Science Communications

Jiaxin Pei and David Jurgens.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)

📄 paper 💾 data 💻 code

Detecting Community Sensitive Norm Violations in Online Conversations

Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens and Yulia Tsvetkov.

Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP)

📄 paper

An Animated Picture Says at Least a Thousand Words: Selecting Gif-based Replies in Multimodal Dialog.

Xingyao Wang and David Jurgens.

Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP)

📄 paper 💾 data 💻 code

Slack gif-bot App: https://github.com/xingyaoww/gif-reply-slack-bot

Driving cessation pipeline

A Data Science Approach to Estimating the Frequency of Driving Cessation Associated Suicide in the US: Evidence From the National Violent Death Reporting System

Tomohiro M. Ko,, Viktoryia A. Kalesnikava, David Jurgens, and Briana Mezuk.

Frontiers in Public Health

📄 paper

Teaching is serious business

Learning PyTorch Through A Neural Dependency Parsing Exercise

David Jurgens.

Proceedings of the Fifth Workshop on Teaching NLP, 2021.

📄 paper

Teaching is serious business

Learning about Word Vector Representations and Deep Learning through Implementing Word2vec

David Jurgens.

Proceedings of the Fifth Workshop on Teaching NLP, 2021.

📄 paper

Author mentions in science news reveal widespread disparities across name‐inferred ethnicities.

Hao Peng, Misha Teplitskiy, David Jurgens.

Journal of Quantitative Social Sciences.

📄 paper

(preprint)

Quantifying Intimacy In Language

Jiaxin Pei and David Jurgens.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

📄 paper 💻 code

project webpage: https://blablablab.si.umich.edu/projects/intimacy/; pip-installable package: https://pypi.org/project/question-intimacy/

Condolence and Empathy in Online Communities

Naitian Zhou and David Jurgens.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

📄 paper

project webpage: https://blablablab.si.umich.edu/projects/condolence/

Still out there: Modeling and Identifying Russian Troll Accounts on Twitter.

Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, Ankit Bhargava, Libby Hemphill, David Jurgens, Eric Gilbert.

Proceedings of Web Science, 2020.

📄 paper

Best Paper Runner-Up

Measuring the predictability of life outcomes with a scientific mass collaboration.

Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex “Sandy” Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmani, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Büchi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagné, Yue Gaobj, Andrew Halpern-Manners, Sonia P. Hashim, Sonia A. Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick C. Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Möser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge-Johannes Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William P. Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K Wolters, Wei Lee Woon, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watts, and Sara McLanahan.

Proceedings of the National Academy of Sciences. Mar 2020, 201915006; DOI: 10.1073/pnas.1915006117

📄 paper

Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts

Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov.

Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.

📄 paper

Perceptions of social roles across cultures.

Meixing Dong, David Jurgens, Carmen Banea and Rada Mihalcea.

Proceedings of Social Informatics (SocInfo), 2019.

📄 paper

Nominated for Best Paper

Suicide Among Older Adults Living in or Transitioning to Residential Long-term Care, 2003 to 2015

Briana Mezuk, Tomohiro M. Ko, Viktoryia A. Kalesnikava, and David Jurgens.

JAMA Network Open 2019;2(6):e195627

📄 paper

Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions

Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions

Innocent Ndubuisi-Obi*, Sayan Ghosh*, David Jurgens.

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019

📄 paper

The spectrum of abusive behaviors

A Just and Comprehensive Strategy for Using NLP to Address Online Abuse

David Jurgens, Libby Hemphill and Eshwar Chandrasekharan.

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019

📄 paper

Caste attitudes

Smart, Responsible, and Upper Caste Only:Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles

Ashwin Rajadesingan, Ramaswami Mahalingam, David Jurgens.

Proceedings of the AAAI International Conference on Web and Social Media (ICWSM), 2019

📄 paper

Best Paper Award

Population inference

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data.

Zijian Wang, Scott Hale, David Ifeoluwa Adelani, Przemyslaw Grabowicz, Timo Hartmann, Fabian Flöck and David Jurgens*.

Proceedings of the Web Conference, 2019

📄 paper 💻 code

*Corresponding senior author; demo: http://www.euagendas.org/m3demo/

Group success

Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities.

Tiago Cunha, David Jurgens, Chenhao Tan and Daniel Romero.

Proceedings of the Web Conference, 2019

📄 paper

It's going to be okay: Measuring Access to Support in Online Communities.

Zijian Wang and David Jurgens.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018

📄 paper 💾 data 💻 code

supplementary: http://anthology.aclweb.org/attachments/D/D18/D18-1004.Attachment.pdf

RtGender: A Corpus of Responses to Gender for Studying Gender Bias.

Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov.

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2018

📄 paper 💾 data

Measuring the Evolution of a Scientific Field through Citation Frames.

David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky.

Transactions of the Association for Computational Linguistics (TACL). 2018.

📄 paper 💾 data 💻 code

An Analysis of Individuals' Behavior Change in Online Groups.

David Jurgens, James McCorriston, and Derek Ruths.

Proceedings of the 9th International Conference on Social Informatics (SocInfo). 2017.

📄 paper

preprint

Writer Profiling Without the Writer's Text.

David Jurgens, Yulia Tsvetkov, and Dan Jurafsky.

Proceedings of the 9th International Conference on Social Informatics (SocInfo). 2017.

📄 paper

preprint

Language from Police Body Camera Footage Shows Racial Disparities in Officer Respect.

Rob Voigt, Nicholas P. Camp, Vinod Prabhakaran, William L. Hamilton, Rebecca C. Hetey, Camilla M. Griffiths, David Jurgens, Dan Jurafsky, and Jennifer L. Eberhardt.

Proceedings of the National Academy of Science (PNAS). 2017.

📄 paper

Incorporating Dialectal Variability for Socially Equitable Language Identification.

David Jurgens, Yulia Tsvetkov, Dan Jurafsky.

Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2017.

📄 paper 💻 code

slides: docs/jurgens-tsvetkov-jurafsky.acl2017.slides.pdf

User Migration in Online Social Networks: A Case Study on Reddit During A Period of Community Unrest.

Edward Newell*, David Jurgens*, Hardik Vala, Jad Sassine, Caitrin Armstrong, Derek Ruths, and Haji Mohammad Saleem.

Proceedings of the 10th International AAAI Conference on Web and Social Media (ICWSM). 2016

📄 paper

Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel.

Hardik Vala, Stefan Dimitrov, David Jurgens, Andrew Piper, and Derek Ruths.

Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC). 2016.

📄 paper

Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation.

Osman Baskaya and David Jurgens.

Journal of Artificial Intelligence Research (JAIR). 55(1) pp. 1025-1058.

📄 paper

SemEval-2016 Task 14: Semantic Taxonomy Enrichment

David Jurgens and Mohammad Taher Pilehvar.

Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval). 2016.

📄 paper

website: http://alt.qcri.org/semeval2016/task14/

Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts.

Hardik Vala, David Jurgens, Andrew Piper, and Derek Ruths.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2015.

📄 paper 💾 data

Evaluating learning language representations.

J. Karlgren, J. Callin, K. Collins-Thompson, A.C. Gyllensten, A. Ekgren, D. Jurgens, A. Korhonen, F. Olsson, M. Sahlgren, and H. Schütze.

Proceedings of Conference and Labs of Evaluation Forum (CLEF). 2015.

📄 paper

Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships.

Silvia Necsulescu, Sara Mendes, David Jurgens, Núria Bel, and Roberto Navigli.

Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM). 2015.

📄 paper

Everyone's Invited: A New Paradigm For Evaluation on Non-transferable Datasets.

David Jurgens, Tyler Finethy, Caitrin Armstrong, and Derek Ruths.

Proceedings of the ICWSM Workshop on Standards and Practices in Large-Scale Social Media Research. 2015.

📄 paper 💻 code 💻 code

FREESR website: http://freesr.networkdynamics.org/; project website: http://networkdynamics.org/resources/geoinference/

Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice.

David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths.

Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM). 2015

📄 paper 💻 code

poster: docs/jurgens-et-al_icwsm-2015_poster.pdf; website: http://networkdynamics.org/resources/geoinference/

An Analysis of Exercising Behavior in Online Populations.

David Jurgens, James McCorriston, and Derek Ruths.

Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM). 2015

📄 paper

poster: docs/jurgens-mccorriston-ruths_icwsm-2015_poster.pdf; website: http://networkdynamics.org/resources/exercise/

Organizations are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter.

James McCorriston, David Jurgens, and Derek Ruths.

Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM). 2015

📄 paper 💾 data 💻 code

website: http://networkdynamics.org/resources/software/humanizr/

Cross Level Semantic Similarity: An Evaluation Framework for Universal Measures of Similarity.

David Jurgens, Mohammad Taher Pilehvar, and Roberto Navilgi.

Journal of Language Resources and Evaluation. 50(1) pp. 5-30.

📄 paper

preprint

It's All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation.

David Jurgens and Roberto Navigli.

Transactions of the Association for Computational Linguistics (TACL) 2014.

📄 paper

slides: docs/jurgens-navigli-2014-tacl.pdf (pdf), docs/jurgens-navigli-2014-tacl.pptx (pptx); games!: http://www.knowledgeforge.org/

Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization.

Ryan Compton, David Jurgens, and David Allan.

Proceedings of the IEEE International Conference on Big Data. 2014.

📄 paper

Press: Forbes, MIT Technology Review, Business Insider, Daily Caller, Schneier on Security

Twitter users #CodeSwitch hashtags! #MoltoImportante #wow #헐.

David Jurgens, Stefan Dimitrov, and Derek Ruths.

Proceedings of The First Workshop on Computational Approaches to Code Switching. 2014.

📄 paper

blog post: http://networkdynamics.org/2015/04/09/code-switching-in-twitter-wow-tresinteressant/

SemEval-2014 Task 3: Cross-Level Semantic Similarity

David Jurgens, Mohammad Taher Pilehvar, and Roberto Navigli.

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval) 2014.

📄 paper

slides: http://www.pilevar.com/taher/pubs/Semeval_2014_Jurgensetal.slides.pdf; website: http://alt.qcri.org/semeval2014/task3/

Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose.

Daniele Vannella, David Jurgens, Daniele Scarfini, Domenico Toscani, and Roberto Navigli.

Proceedings of the Annual Meeting for the Association for Computational Linguistics (ACL) 2014.

📄 paper

poster: docs/acl-2014-poster.pdf; games!: http://www.knowledgeforge.org/

An analysis of ambiguity in word sense annotations.

David Jurgens.

Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) 2014.

📄 paper

Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.

Mohammad T. Pilehvar, David Jurgens, and Roberto Navigli.

Proceedings of the Annual Meeting for the Association for Computational Linguistics (ACL) 2013.

📄 paper 💻 code

Best paper nominee

slides: docs/acl-2013-slides.pdf

That's what friends are for: Inferring location in online communities based on social relationships.

David Jurgens.

Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM) 2013.

📄 paper

slides

David Jurgens is an associate professor jointly in the School of Information and the Department of Electrical Engineering and Computer Science at the University of Michigan. He holds a PhD from the University of California Los Angeles and was a postdoctoral scholar in the Department of Computer Science at Stanford University and prior at McGill University. His research combines natural language processing, social psychology, and data science to discover, explain and predict human behavior in large social systems. His research has been published in top computational social science and natural language processing venues including PNAS, WWW, ACL, ICWSM, EMNLP, and others. His work has won the Cozzarelli Prize from the National Academy of Science, Cialdini Prize from the Society for Personality and Social Psychology, best paper at ICWSM and W-NUT, best paper nomination at ACL and Web Science, and has been featured in news outlets such as the BBC, Time, MIT Technology Review, New Scientist, and Forbes.

How he got there: Before joining UMSI, David was a postdoctoral scholar, jointly in the the Stanford NLP and SNAP Groups under Dan Jurafsky, Jure Leskovec and Dan McFarland. Prior, he ventured beyond the wall to the cold regions of Montreal (don't let the idyllic summers fool you) and was a postdoctoral scholar at McGill University in the Network Dynamics group with Derek Ruths. Before finishing his PhD, he was a research scientist at the Linguistics Computing Laboratory at Sapienza University of Rome under Roberto Navigli. During his PhD, he was concurrently a visiting researcher at the Information and Systems Science Lab at HRL Laboratories. After trips abroad and to Malibu, he received his PhD in Computer Science from the University of California, Los Angeles under Michael Dyer. Early in his career before he discovered you could study language and people, he received his BA in Philosophy and Political Science and an MS in Computer Science on Computer Vision under Robert Pless from Washington University in St. Louis.

Current

Winter 2024, Fall 2025: EECS 595 -- Natural Language Processing

Fall 2025: SI 670 -- Applied Machine Learning

Past

Fall 2017, 2018: SI 671 -- Data Mining: Methods and Applications

Winter 2017—present: SI 630 -- Natural Language Processing: Algorithms and People

Fall 2020—2023: SI 650 / EECS 549 -- Information Retrieval

Winter 2023: SI 330 -- Data Manipulation

Fall 2019: SI 710 -- PhD Seminar: Computational Sociolinguistics
A new course! On computational sociolinguistics! With actual computation and actual sociolinguistics! If any of that excites you, drop me an email and I can share more details.

Being a faculty involves too many things to reasonably keep track of so this page is just left out of date in favor of staying sane. For folks who are very interested still, please see my CV for a more up-to-date list.

Current

Co-editor of a Frontiers special issue on Computational Sociolinguistics. The journal has a rolling deadline so feel free to submit here, or just read the papers as they appear.
Area chair for Social Media for ACL and EMNLP
Sponsorship chair for ICWSM 2019.
Senior PC for WWW 2020 (Web & Society)
Co-chair of the International Workshop on NLP and Computational Social Science at ACL-2017 with Dirk Hovy, David Bamman, Oren Tsur, and Svitlana Volkova.

Past

Data Chair for ICWSM 2017
Co-chair of the International Workshop on NLP and Computational Social Science at ACL-2017 with Dirk Hovy, David Bamman, A. Seza Dogruoz, Brendan O'Connor, Oren Tsur, and Svitlana Volkova.
Co-chair of the International Workshops on NLP and Computational Social Science at EMNLP-2016 and at WebSci-2016 with Dirk Hovy, David Bamman, A. Seza Dogruoz, Jacob Eisenstein, Brendan O'Connor, Alice Oh, Oren Tsur, and Svitlana Volkova
General reviewing habits Program Committee at various (most) times for WWW, ICWSM, NAACL, ACL, EMNLP, CSCW, LREC, EACL; External Reviewer for CHI (rarely).
Co-chair of the International Workshop of Semantic Evaluation (SemEval) workshop in 2015 with Daniel Cer, Preslav Nakov, and Torsten Zesch and in 2016 with Daniel Cer, Marine Carpuat, and Steven Bethard
Co-organizing SemEval-2016 Task 14: Semantic Taxonomy Enrichment with Mohammad Taher Pilehvar.
Co-presented the tutorial "Semantic Similarity Frontiers: From Concepts to Documents" with Mohammad Taher Pilehvar at EMNLP-2015 -- over 150 registered attendees!
Check out our annotated bibliography of recent semantic similarity papers on GitHub.
Co-presented the tutorial "Multilingual Semantic Processing" with Roberto Navigli at LREC-2014
Co-organizer of SemEval-2014 Task 3: Cross-Level Semantic Similarity with Mohammad Taher Pilehvar and Roberto Navigli.
Co-organizer of SemEval-2013 Task 12: Multilingual Word Sense Disambiguation with Roberto Navigli and Daniele Vannella
Co-organizer of SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses with Ioannis Klapaftis
Co-organizer of SemEval-2012 Task 2: Measuring Degrees of Relational Similarity with Saif M. Mohammad, Peter D. Turney, and Keith J. Holyoak
Journal Reviewer: Artificial Intelligence, Behavior Research Methods, International Journal of Corpus Linguistics, Language Resources and Evaluation, Natural Language Engineering, PLOS ONE, Transactions on Knowledge Discovery from Data (TKDD)

Broadly, I conduct research in the areas of natural language processing. My current research focuses on these central themes:

Social Reasoning: Social settings are complex. We study how people reason about social situations, and how language and behavior change in social contexts. Our work is grounded in social and cognitive psychology, and develops new computational methods to study language and mental models.
Human-AI Collaboration in Evaluation: Most NLP models are designed to do one or more tasks. To train or assess how good those models are, we need some kind of ground truth to evaluate. Creating this ground truth can be very challenging! Our work examines how and when we can use humans and AI systems together and individually to better evaluate NLP models for even the most complex tasks.
Information Ecosystems: The interconnected and rapid nature of news and social media means that people can get new information almost anywhere, anytime. How does this news spread and who does it reach, especially as it cross social, linguistic, or medium boundaries? Our work studies whole ecosystems of how the language of information changes and the social process by which it emerges and evolves.

My long-term research goal combines human and language technologies to create social understanding that reflects both the content and people involved in communication. In all my research, I strive to improve social equality by representing all people participating in these social systems.

Masters and Undergraduate Students

For current students, due to the technical work that we do in the lab, I typically require students to have taking some class on NLP or advanced Machine Learning to give them the requisite skills. Without those classes, we end up teaching you many of the same techniques in a less principled way which takes more time. If feel like you already have significant research experience (just not in NLP or ML), please explain what you've done. Students are typically expected to join our group's research meetings. During the school year, we'll typically have one meeting a week that is also with the project's co-supervisor (a PhD student or postdoc). Interest students should apply through this form

PhD students

I admit roughly one PhD student per year. Sometimes students are co-admitted or co-advised, so the number of admits can vary.

For prospective PhDs, I especially like students who come with a strong computational background with some experience in social science. There are no set criteria, but you're much better off towards admissions if you've contacted me (or your advisor has) and let me know of your interests and goals. Make sure to look over these pages carefully; the match should be pretty strong. A PhD student is very costly - in time and money - and I select students for my research group carefully.

If you're a current PhD student outside of CSE or SI, I'm open to collaborations. One of the best things about SI and CSE are the interdisciplinary environments and I'm potentially open to hosting students outside my home departments (but inside UM) in lab or co-advising on projects where it makes sense. Regardless, I'd love to hear from you and you're always welcome to come take my classes

PhD Students not at the University of Michigan

If you're a PhD student somewhere else and want to work with me (while being external), this could happen under the right circumstances. Typically, your advisor at your primary institution and I would co-advise you on a specific project. I typically only do these kinds of arrangements when I know your advisor (more common) or when the collaborative project make sense (rare). To get this started, have your advisor email me (not you directly) about what the project is.

Non-PhD External Students not at the University of Michigan

I unfortunately rarely work with highschoolers, undergraduates or master students who are not physically at the University of Michigan. I still get emails from external students asking if we could together on something remotely and I really would love to, but my priority is to advise the current students at UM given the limited bandwith I have for advising. Your best bet to work with me is to get admitted to one of our programs and then drop me an email.

Postdocs

I would love to have you all in my lab but this is generally dependent on funding (but seriously, I would take you all if I could). Email me if you think you're a good match and tell me why and we might be able to figure something out. That said, at the moment, I'm not currently actively seeking postdocs (due to funding, of course). If you're coming with your own funding, that changes everything, so drop me a line then.

Q: How do I pronounce your last name?
A: Like you would in the old country

Q: Which old country is that?
A: 🤷

Q: Can I research with you or be a member of the Blablablab?
A: For an overly-detailed answer, click the "Prospective students" tab thingie above. That should cover everything.