What it's like to do a
Master's thesis with me
(Ted Pedersen)
tpederse@d.umn.edu
http://www.d.umn.edu/~tpederse
September 16, 2013
Outline
●What is research?
●What are my interests?
●What do you need to do to succeed?
●A little bit about previous students
●Comments on reading I've provided
Research
What is research?
Asking questions about the
world where the answers
are interesting, whether
they are positive or negative
Interesting?
● Can I implement this algorithm?
– Important and interesting to you, but not that
significant to the rest of us
● Can I improve this algorithm to run in linear time
(rather than exponential)
– Great if you succeed, but if you fail...?
● Can I show this problem is inherently exponential
and can't be improved upon?
– Might be a winner, assuming that this answer is
still unknown and problem is of general interest
Interesting?
●
My method is 67% accurate. Their method is
62% accurate.
– Hurrah! Yawn. Nice but incomplete.
– What do we now know about the world
because of this?
●
I've reimplemented Smith's method and added
to it a new kind of feature. This has improved
Smith's result by 5%.
●
Plausible, assuming we can clearly show
improvement is due to the new feature
Interesting!
● Does knowing the part of speech of
preceding words help us predict the
meaning of a word?
–Yes. Tells us that syntax and semantics
are connected, and that syntactic clues
are important to semantics.
–No. Suggests that syntax and semantics
are disconnected.
● Imagine that this is the feature we added to
Smith's method
What is research?
● We develop interesting questions to answer
● We call these hypotheses
● We then figure out the best way to answer
those questions
● In our work, answers are found experimentally
–Just like in many sciences, except we use computers to
conduct the experiments (and a lot of other sciences
use computers to do experiments too)
● Could also be more theoretical, but that's not
usually what we do
This is Science
●
I'm a Scientist
●
We do some engineering to build systems to
conduct experiments, but ours goals are scientific
● We want to answer questions about the world, in
particular human language
● Any engineering is a means to an end
–The end is an answer to our question
–A nicely built system is not science, it's the laboratory in which
you can begin to do your science
–The department is called Computer Science, and your degree
will be a Master of Science
What is a Master's Thesis?
● It presents an interesting and original question (hypotheses)
● It shouldn't matter if the answer is positive or negative
(otherwise you force the results one way or the other)
● You must persuade your audience that the question is
indeed interesting and worth answering
● You must present an argument that supports your answer
● Our arguments are nearly always experimental
● They are based on a series of well formed clearly
explained experiments that can be replicated by others
● Questions do not need to be incredibly difficult or time
consuming to pursue, but they should be interesting and to
some extent unanswered or needing confirmation
My interests
What questions interest me?
● Natural Language Processing – making
computers better able to process human
language (written form)
● Computational Linguistics – understanding
the nature of language better by studying it
with computational techniques
What kinds of language interest me?
●General text
● News articles, web search results
●Medical text
● Clinical records, patient-centered social networks
●Most often in English
● Sometimes other languages
● I don't work on translation
NLP● Word sense disambiguation (WSD)
● Assigning meanings to words based on the context in
which they occur
–The boy fishes from the bank
–The bank gave me a loan
● Assume meanings are already defined, for example in a
dictionary
● Many of our recent questions concern the role of semantic
coherence in allowing us to determine meanings of words
● http://senserelate.sourceforge.net
● http://search.cpan.org/dist/UMLS-SenseRelate/
NLP
● Word sense discrimination
● Assumes you don't know the possible meanings ahead
of time
–Goal is to discover them
● Group occurrences of a word together based on
contextual similarity
● Label the discovered groups (clusters) with a definition or
description
● Many interesting questions about the role of surrounding
context in determining and defining meaning
● http://senseclusters.sourceforge.net
NLP & CL
● Collocation discovery
● Identify combinations of words (in large samples of text) that
tend to occur together and carry some additional meaning
–Toaster oven, kick the bucket, card carrying member
● Often use statistical measures of association or networks of
word co-occurrences to identify
● Necessary step in some approaches to word sense
disambiguation and discrimination
● A frequent question is whether a particular technique can
identify a certain kind of expression (and why or why not)
● http://ngram.sourceforge.net
CL
●
Semantic Similarity and Relatedness
●
ranking or comparing concepts based on their similarity
–Is a dog more like a cat or a house?
–Is corn more related to a farmer or an astronaut?
●http://wn-similarity.sourceforge.net
–Is blood more like a tissue or a bone?
–Is aspirin more related to a headache or a vaccination?
●http://umls-similarity.sourceforge.net
● Many questions about how to use information from ontologies
or corpora to replicate human performance, and the
significance of this to other NLP tasks
Experimental methods
● Statistical and data driven
● Clustering approaches, supervised learning
● Knowledge based
● WordNet – general English
● UMLS – medicine, biology, anatomy, etc.
What you need
to do to succeed
Keys to success
●Desire to conduct science, not just engineering
● Enthusiasm for asking and answering interesting questions
–Going beyond just implementing things
–Results do matter, and we'll form our questions such that we don't require
a certain answer, but we must get concrete results that lead to an answer
●Ability to express technical ideas, questions, etc. in writing
●Mature work habits
● Willingness to stay involved, and maintain steady rate of work
over 4 semesters
● Email as a key channel of communication
●Willingness to program and learn what you don't know
● Previous projects have used Perl, MySQL, Java
● APIs increasingly important
Key values
●Experimental research
● Ask and answer questions (hypotheses)
●Publish when we can
● A “good” Master's thesis should result in publishable work
●Open source
● Free and frequent distribution of code
● Allows for replication of results
●Documentation of code
● User should be able to install, run, and understand results
based on our documentation
● Allows for replication of results
My typical schedule
●
Develop a very detailed proposal in first semester (with concrete
deadlines specified) – typically there are 2-3 main research
questions (hypotheses) that we will address
● During second semester we develop baselines based on known
answers to our questions that will be basis for comparison
● During third semester we conduct 1-2 experiments designed to
answer 1-2 of our questions – we measure how well (or not) those
answers worked out and report on that
●
During fourth semester we do one more set of experiments to
answer our remaining question – again measuring how well (or
not) that worked out and reporting on that
● Do not generally work too much with students in summer due to
other constraints and demands on time
My expectations of you
● We write the thesis AS WE GO, we do not do all the writing at the end
● We release software and data AS WE GO
● We often build off of previous student's work, so we need to be
careful in separating your work from theirs, and also leaving behind a
body of work that future students can build on
● We meet regularly (once every week or two) and communicate very
regularly (sometimes daily or even more often) via email
● I do a lot of testing and verification of results, I also read and comment
on documentation extensively
● This process needs to be iterative, and you need to be responsive to
my concerns (not always agreeing, but at least acknowledging and
discussing, and I will do the same for yours)
●
I ask that your thesis be treated as equal in priority to your class work
(not higher, but not less either)
A little bit about previous
(successful) students
Former (successful) students
http://www.d.umn.edu/~tpederse/masters.html
●
Supervised 16 MS students
● 6 earned PhDs
–CMU (3), Utah, Toronto,
UM-TC
●
2 are pursuing PhDs
–CMU and Toronto
●
2 earned second MS degree
–Missouri and Pittsburgh
● Supervised 1 PhD
● UM-TC
●
Topics?
● 5 in semantic similarity
●
5 in word sense
disambiguation
●
3 in word sense
discrimination
● 2 in collocation discovery
●
1 outside of NLP
Reading
●The paper I've suggested you read is from a
highly competitive conference (ACL 2004)
where it won the best paper award
●Since then it has had impact both in terms of
citations and influencing the direction of NLP
and CL
●I'm interested in how well you can understand
this, and how interesting you find it. I would
also like you to think about the hypotheses
that likely motivated this work.
Thank you!
http://www.d.umn.edu/~tpederse
tpederse@d.umn.edu

More Related Content

PDF
Pedersen masters-thesis-oct-10-2014
PDF
Screening Twitter Users for Depression and PTSD
PPT
The Six Facets Of Understanding
PPT
The six facets of understanding
PPTX
SFU Symposium / Keynote October 3rd, 2013
PPTX
Causal Analysis
PPTX
JiTT - Blended Learning Across the Academy - Teaching Prof. Tech - Oct 2015
PDF
MCQ Workshop - Dr Jane Holland
Pedersen masters-thesis-oct-10-2014
Screening Twitter Users for Depression and PTSD
The Six Facets Of Understanding
The six facets of understanding
SFU Symposium / Keynote October 3rd, 2013
Causal Analysis
JiTT - Blended Learning Across the Academy - Teaching Prof. Tech - Oct 2015
MCQ Workshop - Dr Jane Holland

What's hot (17)

PDF
ICAR-IFPRI - Basic Research Questions lecture 1 - Devesh Roy, IFPRI
PPTX
Common PhD viva Question
PPTX
Fresh From Academia to Industry- You WOn't Believe What HAppens Next!
PPTX
IRRROC - Constructed Response Writing
PPTX
D Whitelock LAK presentation open_essayistfv
PPTX
Questionnaires and surveys
PPTX
RESEARCH Questionnaire
PPT
Class 1 introduction to logic & problem solving
PPT
Questionnaire design & admin
PPT
Sdk projection
PDF
Thesis & viva student version 2013 [compatibility mode]
PPT
Brm(projective techniques)
PDF
Survive your PhD Final Defense or Viva - 2017
PDF
Stoplight Strategies
PPT
Questionnaires design slides
PPTX
Questionnaires 6 steps for research method.
ICAR-IFPRI - Basic Research Questions lecture 1 - Devesh Roy, IFPRI
Common PhD viva Question
Fresh From Academia to Industry- You WOn't Believe What HAppens Next!
IRRROC - Constructed Response Writing
D Whitelock LAK presentation open_essayistfv
Questionnaires and surveys
RESEARCH Questionnaire
Class 1 introduction to logic & problem solving
Questionnaire design & admin
Sdk projection
Thesis & viva student version 2013 [compatibility mode]
Brm(projective techniques)
Survive your PhD Final Defense or Viva - 2017
Stoplight Strategies
Questionnaires design slides
Questionnaires 6 steps for research method.
Ad

Viewers also liked (20)

PDF
Valkhof, Aart 0182737 MSc ACT
PPTX
Business model-forecast
PPTX
Masters' thesis 2012
PDF
Guideline for master's thesis evaluation
PDF
Decision support system for financial liquidity planning
PDF
Remittance inflow outflow to AZERBAIJAN comparison 2014/2015
PPTX
Global Money Transfer Summit Presentation
PPTX
Global Money Transfer (Remittances) Market Report: 2013 Edition – New Report ...
PDF
SinMin - Sinhala Corpus Project - Thesis
PPTX
MBA 592 Ford Vs. General Motor's Master's Thesis power point
PDF
Chapter16 employeebenefits2008
PPTX
RemitONE - Money Transfer Systems
PPT
International Remittance And Mobile Banking
PDF
FT Partners Research: Global Money Transfer - Emerging Trends and Challenges
PDF
MBA Thesis by Hikmet Tagiyev
PPT
Thesis presentation hikmet
DOCX
Final thesis presented december 2009 march 2010
PPT
GSM fundamentals (Huawei)
PPTX
The thesis and its parts
PDF
Making Effective Slides
Valkhof, Aart 0182737 MSc ACT
Business model-forecast
Masters' thesis 2012
Guideline for master's thesis evaluation
Decision support system for financial liquidity planning
Remittance inflow outflow to AZERBAIJAN comparison 2014/2015
Global Money Transfer Summit Presentation
Global Money Transfer (Remittances) Market Report: 2013 Edition – New Report ...
SinMin - Sinhala Corpus Project - Thesis
MBA 592 Ford Vs. General Motor's Master's Thesis power point
Chapter16 employeebenefits2008
RemitONE - Money Transfer Systems
International Remittance And Mobile Banking
FT Partners Research: Global Money Transfer - Emerging Trends and Challenges
MBA Thesis by Hikmet Tagiyev
Thesis presentation hikmet
Final thesis presented december 2009 march 2010
GSM fundamentals (Huawei)
The thesis and its parts
Making Effective Slides
Ad

Similar to What it's like to do a Master's thesis with me (Ted Pedersen) (20)

PPT
How to be a successful research assistant
PPTX
How to succeed in the AU REU program taneja
PDF
Step-by-Step Guide to Write a Thesis Dissertation by United Innovator
PPT
How to succeed in the au reu program qin -edited
PPTX
Edu120 week 5 guidance
PPTX
Selection of Dissertation Topic and Searching for Literature
PPTX
Getting your thesis done iw
PDF
USC 100: Summer Presentation
PPTX
How To Write a Thesis (Research Documentation)
PPTX
PG Business School Study Skills
PPT
How to Effectively Communicate Data & Research Results
PPT
Rethinking the Female Image: Promoting Positive Messages for Girls
PPTX
Workplace Simulated Courses - Course Technology Computing Conference
PPTX
Mathematics at roxy
PPTX
Learners with diverse needs (Gifted and slow learners)
PPTX
UGC NET PREPARATION
PPT
Lecture 5(writing Research proposal).ppt
PPTX
research-problems-slides for research ..
PPT
Assessor training 2013
How to be a successful research assistant
How to succeed in the AU REU program taneja
Step-by-Step Guide to Write a Thesis Dissertation by United Innovator
How to succeed in the au reu program qin -edited
Edu120 week 5 guidance
Selection of Dissertation Topic and Searching for Literature
Getting your thesis done iw
USC 100: Summer Presentation
How To Write a Thesis (Research Documentation)
PG Business School Study Skills
How to Effectively Communicate Data & Research Results
Rethinking the Female Image: Promoting Positive Messages for Girls
Workplace Simulated Courses - Course Technology Computing Conference
Mathematics at roxy
Learners with diverse needs (Gifted and slow learners)
UGC NET PREPARATION
Lecture 5(writing Research proposal).ppt
research-problems-slides for research ..
Assessor training 2013

More from University of Minnesota, Duluth (20)

PPTX
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
PDF
Automatically Identifying Islamophobia in Social Media
PPTX
What Makes Hate Speech : an interactive workshop
PDF
Algorithmic Bias - What is it? Why should we care? What can we do about it?
PDF
Algorithmic Bias : What is it? Why should we care? What can we do about it?
PDF
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
PDF
Who's to say what's funny? A computer using Language Models and Deep Learning...
PDF
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
PDF
Puns upon a midnight dreary, lexical semantics for the weak and weary
PDF
The horizon isn't found in a dictionary : Identifying emerging word senses a...
PDF
Duluth : Word Sense Discrimination in the Service of Lexicography
PDF
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
PDF
Pedersen naacl-2013-demo-poster-may25
PDF
Pedersen semeval-2013-poster-may24
ODP
Talk at UAB, April 12, 2013
ODP
Feb20 mayo-webinar-21feb2012
ODP
Ihi2012 semantic-similarity-tutorial-part1
ODP
Pedersen ACL Disco-2011 workshop
PPT
Pedersen acl2011-business-meeting
PPT
Acm ihi-2010-pedersen-final
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Automatically Identifying Islamophobia in Social Media
What Makes Hate Speech : an interactive workshop
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Who's to say what's funny? A computer using Language Models and Deep Learning...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Puns upon a midnight dreary, lexical semantics for the weak and weary
The horizon isn't found in a dictionary : Identifying emerging word senses a...
Duluth : Word Sense Discrimination in the Service of Lexicography
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
Pedersen naacl-2013-demo-poster-may25
Pedersen semeval-2013-poster-may24
Talk at UAB, April 12, 2013
Feb20 mayo-webinar-21feb2012
Ihi2012 semantic-similarity-tutorial-part1
Pedersen ACL Disco-2011 workshop
Pedersen acl2011-business-meeting
Acm ihi-2010-pedersen-final

Recently uploaded (20)

PPTX
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PPTX
Case Study on mbsa education to learn ok
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PPTX
Diploma pharmaceutics notes..helps diploma students
PDF
Chevening Scholarship Application and Interview Preparation Guide
PDF
Health aspects of bilberry: A review on its general benefits
PDF
Farming Based Livelihood Systems English Notes
PDF
African Communication Research: A review
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
Hospital Case Study .architecture design
PDF
Disorder of Endocrine system (1).pdfyyhyyyy
PPTX
Thinking Routines and Learning Engagements.pptx
PPTX
2025 High Blood Pressure Guideline Slide Set.pptx
PPTX
Macbeth play - analysis .pptx english lit
PDF
Compact First Student's Book Cambridge Official
PDF
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
PDF
Laparoscopic Colorectal Surgery at WLH Hospital
PPTX
Cite It Right: A Compact Illustration of APA 7th Edition.pptx
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
Case Study on mbsa education to learn ok
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Diploma pharmaceutics notes..helps diploma students
Chevening Scholarship Application and Interview Preparation Guide
Health aspects of bilberry: A review on its general benefits
Farming Based Livelihood Systems English Notes
African Communication Research: A review
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Hospital Case Study .architecture design
Disorder of Endocrine system (1).pdfyyhyyyy
Thinking Routines and Learning Engagements.pptx
2025 High Blood Pressure Guideline Slide Set.pptx
Macbeth play - analysis .pptx english lit
Compact First Student's Book Cambridge Official
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
Laparoscopic Colorectal Surgery at WLH Hospital
Cite It Right: A Compact Illustration of APA 7th Edition.pptx

What it's like to do a Master's thesis with me (Ted Pedersen)

  • 1. What it's like to do a Master's thesis with me (Ted Pedersen) [email protected] http://www.d.umn.edu/~tpederse September 16, 2013
  • 2. Outline ●What is research? ●What are my interests? ●What do you need to do to succeed? ●A little bit about previous students ●Comments on reading I've provided
  • 4. What is research? Asking questions about the world where the answers are interesting, whether they are positive or negative
  • 5. Interesting? ● Can I implement this algorithm? – Important and interesting to you, but not that significant to the rest of us ● Can I improve this algorithm to run in linear time (rather than exponential) – Great if you succeed, but if you fail...? ● Can I show this problem is inherently exponential and can't be improved upon? – Might be a winner, assuming that this answer is still unknown and problem is of general interest
  • 6. Interesting? ● My method is 67% accurate. Their method is 62% accurate. – Hurrah! Yawn. Nice but incomplete. – What do we now know about the world because of this? ● I've reimplemented Smith's method and added to it a new kind of feature. This has improved Smith's result by 5%. ● Plausible, assuming we can clearly show improvement is due to the new feature
  • 7. Interesting! ● Does knowing the part of speech of preceding words help us predict the meaning of a word? –Yes. Tells us that syntax and semantics are connected, and that syntactic clues are important to semantics. –No. Suggests that syntax and semantics are disconnected. ● Imagine that this is the feature we added to Smith's method
  • 8. What is research? ● We develop interesting questions to answer ● We call these hypotheses ● We then figure out the best way to answer those questions ● In our work, answers are found experimentally –Just like in many sciences, except we use computers to conduct the experiments (and a lot of other sciences use computers to do experiments too) ● Could also be more theoretical, but that's not usually what we do
  • 9. This is Science ● I'm a Scientist ● We do some engineering to build systems to conduct experiments, but ours goals are scientific ● We want to answer questions about the world, in particular human language ● Any engineering is a means to an end –The end is an answer to our question –A nicely built system is not science, it's the laboratory in which you can begin to do your science –The department is called Computer Science, and your degree will be a Master of Science
  • 10. What is a Master's Thesis? ● It presents an interesting and original question (hypotheses) ● It shouldn't matter if the answer is positive or negative (otherwise you force the results one way or the other) ● You must persuade your audience that the question is indeed interesting and worth answering ● You must present an argument that supports your answer ● Our arguments are nearly always experimental ● They are based on a series of well formed clearly explained experiments that can be replicated by others ● Questions do not need to be incredibly difficult or time consuming to pursue, but they should be interesting and to some extent unanswered or needing confirmation
  • 12. What questions interest me? ● Natural Language Processing – making computers better able to process human language (written form) ● Computational Linguistics – understanding the nature of language better by studying it with computational techniques
  • 13. What kinds of language interest me? ●General text ● News articles, web search results ●Medical text ● Clinical records, patient-centered social networks ●Most often in English ● Sometimes other languages ● I don't work on translation
  • 14. NLP● Word sense disambiguation (WSD) ● Assigning meanings to words based on the context in which they occur –The boy fishes from the bank –The bank gave me a loan ● Assume meanings are already defined, for example in a dictionary ● Many of our recent questions concern the role of semantic coherence in allowing us to determine meanings of words ● http://senserelate.sourceforge.net ● http://search.cpan.org/dist/UMLS-SenseRelate/
  • 15. NLP ● Word sense discrimination ● Assumes you don't know the possible meanings ahead of time –Goal is to discover them ● Group occurrences of a word together based on contextual similarity ● Label the discovered groups (clusters) with a definition or description ● Many interesting questions about the role of surrounding context in determining and defining meaning ● http://senseclusters.sourceforge.net
  • 16. NLP & CL ● Collocation discovery ● Identify combinations of words (in large samples of text) that tend to occur together and carry some additional meaning –Toaster oven, kick the bucket, card carrying member ● Often use statistical measures of association or networks of word co-occurrences to identify ● Necessary step in some approaches to word sense disambiguation and discrimination ● A frequent question is whether a particular technique can identify a certain kind of expression (and why or why not) ● http://ngram.sourceforge.net
  • 17. CL ● Semantic Similarity and Relatedness ● ranking or comparing concepts based on their similarity –Is a dog more like a cat or a house? –Is corn more related to a farmer or an astronaut? ●http://wn-similarity.sourceforge.net –Is blood more like a tissue or a bone? –Is aspirin more related to a headache or a vaccination? ●http://umls-similarity.sourceforge.net ● Many questions about how to use information from ontologies or corpora to replicate human performance, and the significance of this to other NLP tasks
  • 18. Experimental methods ● Statistical and data driven ● Clustering approaches, supervised learning ● Knowledge based ● WordNet – general English ● UMLS – medicine, biology, anatomy, etc.
  • 19. What you need to do to succeed
  • 20. Keys to success ●Desire to conduct science, not just engineering ● Enthusiasm for asking and answering interesting questions –Going beyond just implementing things –Results do matter, and we'll form our questions such that we don't require a certain answer, but we must get concrete results that lead to an answer ●Ability to express technical ideas, questions, etc. in writing ●Mature work habits ● Willingness to stay involved, and maintain steady rate of work over 4 semesters ● Email as a key channel of communication ●Willingness to program and learn what you don't know ● Previous projects have used Perl, MySQL, Java ● APIs increasingly important
  • 21. Key values ●Experimental research ● Ask and answer questions (hypotheses) ●Publish when we can ● A “good” Master's thesis should result in publishable work ●Open source ● Free and frequent distribution of code ● Allows for replication of results ●Documentation of code ● User should be able to install, run, and understand results based on our documentation ● Allows for replication of results
  • 22. My typical schedule ● Develop a very detailed proposal in first semester (with concrete deadlines specified) – typically there are 2-3 main research questions (hypotheses) that we will address ● During second semester we develop baselines based on known answers to our questions that will be basis for comparison ● During third semester we conduct 1-2 experiments designed to answer 1-2 of our questions – we measure how well (or not) those answers worked out and report on that ● During fourth semester we do one more set of experiments to answer our remaining question – again measuring how well (or not) that worked out and reporting on that ● Do not generally work too much with students in summer due to other constraints and demands on time
  • 23. My expectations of you ● We write the thesis AS WE GO, we do not do all the writing at the end ● We release software and data AS WE GO ● We often build off of previous student's work, so we need to be careful in separating your work from theirs, and also leaving behind a body of work that future students can build on ● We meet regularly (once every week or two) and communicate very regularly (sometimes daily or even more often) via email ● I do a lot of testing and verification of results, I also read and comment on documentation extensively ● This process needs to be iterative, and you need to be responsive to my concerns (not always agreeing, but at least acknowledging and discussing, and I will do the same for yours) ● I ask that your thesis be treated as equal in priority to your class work (not higher, but not less either)
  • 24. A little bit about previous (successful) students
  • 25. Former (successful) students http://www.d.umn.edu/~tpederse/masters.html ● Supervised 16 MS students ● 6 earned PhDs –CMU (3), Utah, Toronto, UM-TC ● 2 are pursuing PhDs –CMU and Toronto ● 2 earned second MS degree –Missouri and Pittsburgh ● Supervised 1 PhD ● UM-TC ● Topics? ● 5 in semantic similarity ● 5 in word sense disambiguation ● 3 in word sense discrimination ● 2 in collocation discovery ● 1 outside of NLP
  • 26. Reading ●The paper I've suggested you read is from a highly competitive conference (ACL 2004) where it won the best paper award ●Since then it has had impact both in terms of citations and influencing the direction of NLP and CL ●I'm interested in how well you can understand this, and how interesting you find it. I would also like you to think about the hypotheses that likely motivated this work.