AI in between online and offline discourse - and what has ChatGPT to do with all of that?

AI in between online and offline discourse - and
what is the role of ChatGPT in all of that?
Denkreise, Universität Bonn
Stefan Dietze, 10.05.2023

2
Discourse Interactions
Algorithms/AI
Discourse, interactions, AI: complex interdependencies
Society, Media, Politics & Policies
(Offline & Online)

3
AI-based methods to understand & influence online
discourse & interactions (PART I)
▪ Recommendation & ranking in Web search (e.g.
Google), shopping or social media
▪ Social media feeds & timelines
▪ Fact checking, detection of misinformation (e.g.
news & social media)
▪ Classification of users (e.g. into political
leanings, experts/lay people etc)
▪ Hate-speech detection
▪ Stance & sentiment detection
▪ …
AI-based methods trained on & influencing data
from discourse & interactions
Algorithms/AI

4
Contemporary and future AI challenges (PART II)
▪ Self-reinforcing cycle between AI &
online/offline behaviour: AI influenced by /
trained on data from discourse/interactions
that are in turn influenced by AI and so on
▪ Biases (in models & behaviour) => echo
chambers, filter bubbles etc
▪ Transparency, reproducibility, decay
▪ ….
Algorithms/AI

5
AI for mining claims and stances in online discourse
stance,
claim trustworthiness?
stance,
claim trustworthiness?

6
Detecting stances towards claims/opinions in online discourse
Motivation
▪ Problem: detecting stance of documents (e.g. Web pages, scientific
publication) towards a given claim (unbalanced class distribution)
▪ Motivation: stance of documents (in particular disagreement) useful
(a) as signal for truthfulness (fake news detection) and (b) document or
source classification (PLDs, publishers)
Approach
▪ Cascading binary classifiers: addressing individual issues (e.g.
misclassification costs) per step
▪ Features, e.g. textual similarity (Word2Vec etc), sentiments, LIWC, etc.
▪ Best-performing models: 1) SVM with class-wise penalty, 2) CNN, 3)
SVM with class-wise penalty
▪ Experiments on FNC-1 dataset (and FNC baselines)
Results
▪ Minor overall performance improvement
▪ Improvement on disagree class by 27%
(but still far from robust)
A. Roy, A. Ekbal, S. Dietze, P. Fafalios, Exploiting stance hierarchies for cost-sensitive stance detection of
Web documents, IJIS2022

7
ClaimsKG: a knowledge graph of fact-checked claims (to feed AI)
Motivation
▪ Fact-checked claims spread across various
(unstructured) fact-checking sites
▪ Example: finding claims about / made by US
republican politicians across the Web?
Approach
▪ Harvesting claims & metadata from fact-
checking/news sites (e.g. snopes.com,
Politifact.com etc); currently approx. 70.000
claims
▪ Information extraction & linking
o Linking mentioned entities eg POTUS and
Trump to „wikipedia:DonaldTrump“
o Normalisation of ratings (true, false,
mixture, other); coreference resolution of
claims
o Exposing data through established
vocabularies and W3C standards and APIs
https://data.gesis.org/claimskg/
A. Tchechmedjiev, P. Fafalios, K. Boland, S. Dietze, B. Zapilko, K. Todorov,
ClaimsKG – A Live Knowledge Graph of fact-checked Claims, ISWC2019

8
ClaimsKG as training corpus for AI models for fact-checking
▪ Ground truth data for AI-based methods for fact-
checking
▪ Initial step for any fact-checker when „check-worthy
claims“ appear: „Claim Retrieval“ i.e. identifying if an
arbitrary statement (eg from a news article) has
already been fact-checked (ClaimsKG as knowledge
base)
▪ Shared AI task: „CLEF2022-CheckThat! Lab“
▪ Unsupervised approach „SimBA“ achieving state-of-
the-art performance (P@1 = 94%)
Hövelmeyer, A., Boland, K., Dietze, S., SimBa at CheckThat! 2022: Lexical
and Semantic Similarity Based Detection of Verified Claims in an
Unsupervised and Supervised Way, CLEF Working Notes 2022, CLEF 2022-
Conference and Labs of the Evaluation Forum.

http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:Intensity "0.75"
onyx:Intensity "0.0"
http://dbpedia.org/resource/Solid
wna:negative-emotion
AI for mining opinions: the case of Twitter
Challenges
▪ Heterogenity: multimodal, multilingual, informal,
“noisy” language
▪ Context dependence: interpretation of
tweets/posts (entities, sentiments) requires
consideration of context (e.g. time, linked content),
“Dusseldorf” => City or Football team
▪ Dynamics & scale: e.g. 6000 tweets per second,
plus interactions (retweets etc) and context (e.g.
25% of tweets contain URLs)
▪ Evolution and temporal aspects: evolution of
interactions over time crucial for many social
sciences questions
▪ Representativity and bias: demographic
distributions not known a priori in archived data
collections

TweetsKB – a large-scale research corpus of societal opinions
▪ Harvesting & archiving of 14 Billion tweets
(permanent collection from Twitter 1% sample since
2013)
▪ Information extraction pipeline to build a KG of
entities, interactions & sentiments
(distributed Map/Reduce batch processing)
o Entity linking with knowledge graph/DBpedia
(“president”/“potus”/”trump” =>
dbp:DonaldTrump)
o Sentiment analysis/annotation
o Geotagging
o Lifting into knowledge graph schema
Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., Dietze, S., TweetsCOV19 – A Knowledge Base of Semantically Annotated Tweets about the
COVID-19 Pandemic, CIKM2020
https://data.gesis.org/tweetskb

https://data.gesis.org/tweetskb
▪ Harvesting & archiving of 14 Billion tweets
(permanent collection from Twitter 1% sample since
2013)
▪ Information extraction pipeline to build a KG of
entities, interactions & sentiments
(distributed Map/Reduce batch processing)
o Entity linking with knowledge graph/DBpedia
(“president”/“potus”/”trump” =>
dbp:DonaldTrump)
o Sentiment analysis/annotation
o Geotagging
o Lifting into knowledge graph schema
▪ Public, privacy-aware, large-scale research corpus
of public opinions and their evolution
=> interdisciplinary research
Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., Dietze, S., TweetsCOV19 – A Knowledge Base of Semantically Annotated Tweets about the
COVID-19 Pandemic, CIKM2020
TweetsKB – a large-scale research corpus of societal opinions

Social science research using TweetsKB
https://dd4p.gesis.org
Investigating Vaccine Hesitancy in DACH countries

Germany suspends
vaccinations with Astra
Zeneca
Twitter discourse zu “Impfbereitschaft”
Investigating Vaccine Hesitancy in DACH countries

Vaccine Hesitancy– key topics in “safety” category
„Schwangerschaft“ „Kimmich“
„Alter“
„Nebenwirkungen“
„Herzinfarkt“
„Zulassung“

17
▪ AI4Sci project: understanding and classification of science discourse online (news, social Web)
▪ Percentage of tweets containing
links to scientific articles (journals,
publishers, science blogs etc)
▪ Uses list of > 30 K science web
domains
▪ Data source: TweetsKB
(https://data.gesis.org/tweetskb/),
> 10 bn tweets archived since 2013
https://ai4sci-project.org/
How about scientific discourse on the Web?
Example: Twitter

How about scientific discourse on the Web?
Example: Twitter
18
Science claim
Science reference
Science relevance
No science
Science reference
Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., "SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online
Discourse", CIKM2022

SciTweets dataset & classifier
20
▪ Ground truth dataset, heuristics-based sampling
strategy and annotation framework for testing
classification models
▪ 1261 expert-labeled tweets across all
classes/labels
▪ Baseline classifiers based on SciBERT transformer
model (fine-tuned/tested on SciTweets)
▪ Ongoing: analysis of large-scale science discourse
and its evolution
Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse,
CIKM2022

21
But: do we actually need dedicated NLP methods after ChatGPT?
• ChatGPT does not outperform specialized
models in most NLP tasks
• ChatGPT’s impressiveness stems from its
capacity to do many things reasonably
well
• ChatGPT does not mark the end of AI/NLP
research
Source: http://opensamizdat.com/posts/chatgpt_survey/

24
Contemporary and future AI challenges (PART II)
▪ Self-reinforcing cycle between AI &
online/offline behaviour: AI influenced by /
trained on data from discourse/interactions
that are in turn influenced by AI and so on
▪ Biases (in models & behaviour) => echo
chambers, filter bubbles etc
▪ Transparency, reproducibility, decay
▪ ….
Algorithms/AI

25
What‘s the role of ChatGPT et al
in all this?
• ChatGPT didn’t come out of nowhere but is a
continuation of trends throughout the last decades
• Mostly powered by progress in (deep) representation
& transfer learning

26
What is driving the recent AI advances?
Deep & representation learning for language understanding
Percentage of deep learning papers in major NLP conferences
(Source: Young et al., Recent Trends in Deep Learning Based Natural Language Processing)
• Pretraining of embeddings: predicting
low-dimensional vector representations
of words & text, e.g. Word2Vec
[Mikolov et al., 2013]
• RNN/CNN architectures in
encoder/decoder settings (e.g. for
machine translation) [Vaswani et al.,
2017]
• Pretraining language models for task-
specific transfer learning, e.g., BERT -
Bidirectional Encoder Representations
from Transformers [Devlin et al., 2018]
T. Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality, NIPS (2013)
J. Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
A. Vaswani et al. Attention is all you need, NIPS (2017)

27
Transfer learning for language understanding
Goal: Language Model (“reusable language
representation”)
(Paris:France, Tokyo:Japan)
Approach: un-/self-supervised Machine Learning
(e.g. „Masked Language Modeling“)
Data: the more the merrier, e.g. Wikipedia, Google Book
Corpus (200 Bn words), Common Crawl, Google crawl of
indexed Web pages etc
Data cheap, computation costly
Goal: dedicated model for downstream task
(e.g. dialog system, Tweet classification => Tweet X:
RightWing)
Approach: supervised machine learning, reinforcement
learning
Data: labelled data
(e.g. expert annotated tweets, weak labels)
Data costly, computation cheap
Pretraining of foundational language models Finetuning/supervised ML („the old ways“)

28
Transfer learning for language understanding
Goal: Language Model (“reusable language
representation”)
(Paris:France, Tokyo:Japan)
Approach: un-/self-supervised Machine Learning
(e.g. „Masked Language Modeling“)
Data: the more the merrier, e.g. Wikipedia, Google Book
Corpus (200 Bn words), Common Crawl, Google crawl of
indexed Web pages etc
Data cheap, computation costly
Goal: dedicated model for downstream task
(e.g. dialog system, Tweet classification => Tweet X:
RightWing)
Approach: supervised machine learning, reinforcement
learning
Data: labelled data
(e.g. expert annotated tweets, weak labels)
Data costly, computation cheap
Pretraining of foundational language models Finetuning/supervised ML („the old ways“)
• Language understanding capabilities of
models established during pretraining
stage
• Paradigm shift in NLP / NLU due to
transferable foundational models
• Little/no finetuning required for specific
down-stream tasks („Zero-Shot Learning“)
• Requires no costly data annotation but can
be fed with massive volumes of text
• „The larger the better“
(GPT3 = 175 Bn parameters)

29
AI progress & societal impact/challenges
▪ Stopping the „arms race“? The bigger the
better?
▪ Concentration of power: platform providers
(gatekeeper to data), AI companies (gatekeeper
to models, infrastructure)
▪ Environmental cost (Bender et al., On the Dangers of
Stochastic Parrots: Can Language Models Be Too Big?)
▪ Misinformation from generative models (e.g.
deep fakes)
▪ But it‘s not only about AI: data, infrastructure,
information has been controlled by large
commercial platform providers for quite some
time

30
Crucial AI challenges that we/AI research is trying to tackle
▪ Transparency of AI models
▪ Access to data, infrastructure, models
▪ Reproducibility & explainability
▪ Bias of data & AI models
▪ Decay of AI models
▪ …

31
AI transparency
Source: https://motherboard.vice.com/en_us/article/j5npeg/why-is-google-translate-spitting-out-sinister-religious-prophecies/
• Neural machine translation as black box system => hard to pin down causes of „learned“ translations
• Potential reasons: adversarial data fed into NN; NN learned to generate syntactically well formed
sentences/statements even on nonsensical input

32
AI transparency: how to evaluate AI if it outperforms humans?
• Saturation of NLP/image processing
benchmarks over time. Initial
performance at -1, human performance
at 0.
• Challenges for performance evaluation
• Creating novel kinds of AI benchmarks
• Note: humans still outperform AI in
many (less specialized) tasks/respects,
e.g. “common-sense reasoning”
Kiela, D. et al., Dynabench: Rethinking Benchmarking in NLP, NAACL2021

33
AI decay: evolving language and discourse context
▪ LMs trained on large volumes of text
▪ Yet: strong vocabulary shift, over-
/underrepresentation of topics/vocabulary in
particular time periods (e.g. Twitter COVID19-
discourse 2020 vs 2019)
▪ AI models for online discourse analysis
require frequent training and updates
(access to data/infrastructure?)
▪ Efficient methods for updating large language
models Source: Hombaiah et al., “Dynamic Language Models for continuously
evolving Content”, SIGKDD2021

34
AI reproducibility: „A worrying analysis of neural recommender approaches”
Dacrema, M. F., Cremonesi, P., Jannach, D., 2019. Are we really making much progress? A worrying analysis of
recent neural recommendation approaches. In ACM RecSys2019. DOI:https://doi.org/10.1145/3298689.3347058
Are DL-based methods reproducible?
Do they actually beat simple baselines?
• Reproducibility crisis (due to lack of transparency)
• State-of-the-art crisis (driven by peer review crisis)

AI biases: learned and elevated by AI/deep learning models
Source: https://techcrunch.com/2016/03/24/microsoft-silences-its-new-a-i-bot-tay-after-twitter-users-teach-it-racism/
35
[N-word]
• Biases in human interactions can be learned and elevated by ML models => fairness and ethics of AI, data
quality and representativeness
• Crucial: research into understanding of AI models

36
Model understanding: do LLMs actually „understand“ language?
▪ Language model (e.g. GPT) prompting
as a way to assess its capacity to
comprehend language („intelligence“)
▪ How does prompt syntax influence
„knowledge retrieval“?
Linzbach, S., Tressel, T., Kallmeyer, L., Dietze, S., Jabeen, H., 2023. Decoding Prompt Syntax: Analysing its Impact on Knowledge Retrieval in Large Language Models. In
Companion Proceedings of the ACM Web Conference 2023
The role of syntax in question-answering through LLMs

37
▪ Experiments on BERT-based
(Transformer) models
▪ Testing for sentence typology (simple,
compound, etc) and
active/passive/nominalised verb
morphology
Linzbach, S., Tressel, T., Kallmeyer, L., Dietze, S., Jabeen, H., 2023. Decoding Prompt Syntax: Analysing its Impact on Knowledge Retrieval in Large Language Models. In
Companion Proceedings of the ACM Web Conference 2023
Model understanding: do LLMs actually „understand“ language?
The role of syntax in question-answering through LLMs

38
Model understanding: how to measure actual „intelligence“ of AI?
• AI benchmark datasets/tests, eg to probe their
commonsense reasoning skills, e.g. AI2Reasoning (ARC)
• Models learn to perform well on a task while struggling
to actually understand
• Example: ChatGPT produces reasonable-sounding
answers that are actually wrong
• Constructing benchmark datasets for evaluation of AI
models remains a huge challenge
Branco et al., Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning, EMNLP 2021

39
To sum up: how to beat the cycle?
▪ AI research = creating understanding AI models, e.g. what
and how do LLMs learn?
(e.g. SFB Deep Linguistic Modeling)
▪ FAIR access to AI data & models (e.g. GESIS Methods Hub)
▪ Values-oriented ML: e.g. explainability > accuracy
▪ Realistic and fair benchmark datasets & setups for AI
evaluation
▪ Maintainability: efficient ways on how to train/update
foundational models (e.g. DD4P)
▪ Interdisiciplinary research into complex interdependencies
between AI & online / offline discourse (e.g. NEWORDER)
▪ Governance & incentives: monopolies, ecological aspects
and access to data/infrastructure are not only technical but
societal & political issues
Algorithms/AI
Society, Media, Politics & Policies
(Offline & Online)

40
@stefandietze
https://stefandietze.net
http://gesis.org/en/kts
Thanks!

41
References
▪ Branco et al., Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning, EMNLP 2021
▪ Linzbach, S., Tressel, T., Kallmeyer, L., Dietze, S., Jabeen, H., 2023. Decoding Prompt Syntax: Analysing its Impact on Knowledge
Retrieval in Large Language Models. In Companion Proceedings of the ACM Web Conference 2023
▪ Dacrema, M. F., Cremonesi, P., Jannach, D., 2019. Are we really making much progress? A worrying analysis of recent neural
recommendation approaches. In ACM RecSys2019. DOI:https://doi.org/10.1145/3298689.3347058
▪ Kiela, D. et al., Dynabench: Rethinking Benchmarking in NLP, NAACL2021
▪ Hombaiah et al., “Dynamic Language Models for continuously evolving Content”, SIGKDD2021
▪ Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting
Scientific Online Discourse, CIKM2022
▪ Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., Dietze, S., TweetsCOV19 – A Knowledge Base of Semantically
Annotated Tweets about the COVID-19 Pandemic, CIKM2020
▪ A. Tchechmedjiev, P. Fafalios, K. Boland, S. Dietze, B. Zapilko, K. Todorov, ClaimsKG – A Live Knowledge Graph of fact-checked
Claims, ISWC2019
▪ Hövelmeyer, A., Boland, K., Dietze, S., SimBa at CheckThat! 2022: Lexical and Semantic Similarity Based Detection of Verified Claims
in an Unsupervised and Supervised Way, CLEF Working Notes 2022, CLEF 2022- Conference and Labs of the Evaluation Forum.
▪ A. Roy, A. Ekbal, S. Dietze, P. Fafalios, Exploiting stance hierarchies for cost-sensitive stance detection of Web documents, IJIS2022

42
@stefandietze
https://stefandietze.net
http://gesis.org/en/kts

AI in between online and offline discourse - and what has ChatGPT to do with all of that?

More Related Content

Similar to AI in between online and offline discourse - and what has ChatGPT to do with all of that? (20)

More from Stefan Dietze (20)

Recently uploaded (20)

AI in between online and offline discourse - and what has ChatGPT to do with all of that?