SlideShare a Scribd company logo
Software Engineering Data Collection for Field StudiesZheng-WenShen2009/07/09
ReferenceForrest Shull, Janice Singer, Dag I. K. Sjoberg., Guide to Advanced Empirical Software Engineering, Springer London, PP. 9-34, 2007/11/21.
OutlineIntroductionDirect TechniquesIndirect TechniquesIndependent TechniquesApplying the TechniquesConclusion
1. Introduction Software engineering involves real people working in real environments.People create softwarePeople maintain softwarePeople evolve software
1. Introduction To understand SE  FIeld StudiesStudy software engineers as they workData Collection
Direct Techniques Require the researcher to have direct involvement with the participants.Direct Technique
Direct Techniques Data Collection Taxonomy 1/2
Direct TechniquesData Collection Taxonomy 2/2
Indirect Techniques Require the researcher to have only indirect access to the participants’ via direct access to their work environment.Indirect Technique
Indirect Techniques Data Collection Taxonomy
Independent Techniques Require researchers to access only work artifacts.Independent Technique
Independent TechniquesData Collection Taxonomy
For each techniques, we will discussA brief introductionAdvantagesDisadvantagesReporting guidelines
2. Direct Techniques
2.1 Brainstorming and focus groupSeveral people get together and focus on a particular issueCover as many ideas as possibleA moderatorA simple “trigger question”
2.1 Brainstorming and focus groupAdvantagesOne is new to a domain and seeking ideas for further exploration.Collect information from large groups of people at once.Rapidly identifying what is important to the participants.
2.1 Brainstorming and focus groupDisadvantagesNeed a well-trained moderatorPeople can be shy in a groupRaise particular issues relevant to their daily work?Difficult to schedule a brainstorming session or focus group
2.1 Brainstorming and focus groupReporting GuidelinesThe number of participants seen, and the context in which they were seen should be reported. Where appropriate the role and expertise of the moderator should be described. If specific questions were used, they should be detailed.  The time spent on brainstorming should be reported. The type of data recording used should be described (e.g., video, audio, notes, etc.).
2.2 InterviewsInterviews involve at least one researcher talking to at least one respondent.Structured interview: a fixed list of carefully worded questionsSemi-structured interview: conversational flow
2.2 InterviewsAdvantagesStructured interviews : collecting the same data from a large number of respondents. Semi-structured interviews  highly interactive. Researchers can clarify questions for respondents and probe unexpected responses. Interviewers can also build rapport with a respondent to improve the quality of responses.
2.2 InterviewsDisadvantagesInterviews are time and cost inefficient. Contact with the respondent needs to be scheduled At least one person needs to attend the meetingCareful note-taking may often be an adequate substitute for audio or video recording.Participants’ reports of events may not mirror reality.
2.2 InterviewsReporting GuidelinesTo detailthe number and type of interviewees seenapproximately how long the interviews tookthe type of interviewthe way the interview is recordedhow the participants were selectedProvide a copy of the questions in the report or an appendix.
2.3 QuestionnairesQuestionnaires are sets of questions administered in a written format. Very careful attention needs to be paid to The wording of the questionsThe layout of the formsThe ordering of the questions
2.3 Questionnaires AdvantagesQuestionnaires are time and cost effective. Researchers do not need to schedule sessionsQuestionnaires can be filled out when a software engineer has time between tasks Paper form-based questionnaires can be transported to the respondent for little more than the cost of postage. Web-based questionnaires cost even less the paper forms are eliminated the data are received in electronic form. Easily collect data from a large number of respondents in geographically diverse locations.
2.3 Questionnaires DisadvantagesAmbiguous and poorly-worded questions are problematic. Response rates can be relatively low 5%adversely affects the representativeness of the sample. Responses tend to be more terse than with interviews.Responses to questions may not mirror reality.
2.3 Questionnaires Reporting GuidelinesTo detail …how the population was sampled (i.e., who the questionnaires were sent to, how respondents were chosen) the response rate for the questionnaireAny subsequent modification of the questionnaire should be explained. Provide a copy of the questions in the report or an appendix.
2.4 Conceptual ModelingParticipants create a model of some aspect of their work  mental modelsDraw a diagramdata flow diagramcontrol flow diagrampackage diagram Physical map environment  pointing out who they talk to and how often.
2.4 Conceptual ModelingAdvantagesConceptual models provide an accurate portrayal of the user’s conception of his/her mental model of the system. Easy to collect and require only low-tech aids (pen and paper).
2.4 Conceptual ModelingDisadvantagesHard to interpret the results of conceptual modeling Domain Knowledge about the system. Some software engineers are reluctant to draw.The quality and level of details in diagrams can vary significantly.
2.4 Conceptual ModelingReporting GuidelinesThe exact instructions given to the participants.A precise description of the tools that participants had available to them to model. The way the data is recorded should also be outlined.
2.5 Work DiariesWork diaries require respondents to record various events that occur during the day.  filling out a form at the end of the dayrecording specific activities as they occurnoting whatever the current task is at a pre-selected time. Paper forms:  record information at the end of the day.Computer applications: prompt users for input at random times. Time sheets: Many software engineers are required to maintain and update quite detailed time sheets recording how many hours are spent per day per activity category.
2.5 Work DiariesAdvantagesTo provide better self-reports of events record activities on an ongoing basis rather than in retrospect (as in interviews and questionnaires). How software engineers spend their day without undertaking a great deal of observation or shadowing.
2.5 Work DiariesDisadvantagesRely on self-reportsrequire participants to recall events Interfere with respondents as they work. If software engineers have to record each time they go and consult a colleague  they may consult less often. Forget or neglect to record some events.May not record at the expected level of detail.
2.5 Work DiariesReporting GuidelinesThe precise task given to the software engineers must be describedTo record their communication patternsHow the task was accomplishedTo report to experimenterThe tools made available to do so
2.6 Think-Aloud ProtocolsResearchers ask  participants to think out loud while performing a task. The task can occur naturally at work The task can be predetermined by the researcher. Researchers do not interfere in the problem solving process. Think-aloud sessions generally last no more than 2 hours. Determine or validate a cognitive model as software engineers do some programming task.
2.6 Think-Aloud ProtocolsAdvantagesRelatively easy to implement.A unique view of the problem solving process and additionally gives access to mental model.
2.6 Think-Aloud Protocols DisadvantagesFor use in situations where a researcher couldmap out the entire problem space. It is difficult and time-consuming to analyze think-aloud data.
2.6 Think-Aloud Protocols Reporting GuidelinesTo provide an extremely precise characterization of the task the participant was asked to undertake, including any tools at the participant’s disposal. The time taken to complete the task.Any materials provided to the participant.The precise way in which the analysis occurs.
2.7 Shadowing/ObservationThe experimenter follows the participant around and records their activities. Unlimited time period Observation occurs when the experimenter observes software engineers engaged in their workmeetings or programming. Shadow one software engineer at a time  Observe many at one time.
Data collection for field studies
2.7 Shadowing/ObservationAdvantagesEasy to implementGive fast resultsRequire no special equipment
2.7 Shadowing/ObservationDisadvantagesDifficult to see what a software engineer is doingusing keyboard shortcutsObservers need to have a fairly good understanding of the environment to interpret the software engineer’s behavior.Predefine a set of categories or looked-for behaviors.Limit the type of data that will be collected.
2.7 Shadowing/ObservationReporting GuidelinesThe precise form of shadowing and/or observationwhether any verbal instructions were given to the participant to think out loud. The way the information is recordedthe length of the sessionany other special instructions given to the participants. To provide context informationwhat activities the shadowed and/or observed participants were engaged in.
2.8 Participant-Observer (Joining the Team)The researcher essentially becomes part of the team and participates in key activities. A high level of familiarity with the team members and the tasks they perform. Software engineers are comfortable with the researcher’s presence and tend not to notice being observed.
2.8 Participant-Observer AdvantagesRespondents are more likely to be comfortable with a team member and to act naturally during observation. Researchers also develop a deeper understanding of software engineering tasks.
2.8 Participant-Observer DisadvantagesJoining a team is very time consuming. to establish true team membershipA researcher who becomes too involved may lose perspective on the phenomenon being observed.
2.8 Participant-Observer Reporting GuidelinesTo report the role of the participant-observer in the team.How they interact with the team.What access they have to team material.The length of time of the interaction needs.How data was collected, coded, and analyzed.
3. Indirect Techniques
3.1 Instrumenting SystemsInstrumentation to be built into the software tools.To record information about the usage of the tools.how frequently a tool or feature is usedpatterns of access to files and directoriesthe timing underlying different activities.Record both the input and output in great detail  effectively play back
3.1 Instrumenting SystemsAdvantagesIt requires no time commitment from software engineers. To provide relative frequency and duration of the various activities accurately.
3.1 Instrumenting SystemsDisadvantagesDifficult to analyze data meaning-fullyDetermine software engineers’ thoughts and goalsWorking environment is not well understood or constrained. Software engineers often customize their environments by adding scripts and macrosto play back the events to a software engineer and ask them to comment. Ethical Concerns  monitoring software engineers without their knowledge.
3.1 Instrumenting SystemsReporting GuidelinesThe precise nature of the loggingany special instrumentation installed on the software engineer’s machines.a description of what exactly is logged, with what frequency. Any special considerations with respect to data processing and analysis
3.2 Fly on the Wall (Participants Recording their Own Work)Participants are asked to video- or audiotape themselves when they are engaged in some predefined activity. It allows the researcher to be an observer of an activity.
3.2 Fly on the Wall AdvantagesRequire very little time from the participants and is very unobtrusive. There may be some discomfort in the beginning, it fades quickly.
3.2 Fly on the Wall DisadvantagesMay forget to turn on the recording equipment at the appropriate time.The record may be incomplete or missing. The camera is fixed, so the context of what is recorded may be hard to understand. There is a high cost to analyzing the resulting data.
3.2 Fly on the Wall Reporting GuidelinesAny problems with the recordingdevelopers forgetting to record a meeting. Context information will also help to clarify the application of the techniquewhere the recording occurredwhat the typical tasks werewho was involvedwho was responsible for the recording Any methods used to transform, transcribe, and analyze the data need to be specified.
Not the end…IntroductionDirect TechniquesIndirect TechniquesIndependent TechniquesApplying the TechniquesConclusion
4. Independent Techniques
4.1 Analysis of Electronic Databases of Work PerformedThe work performed by developers is carefully managed using issue trackerproblem reportingchange requestconfiguration managementThe copious records generated for such systems are a rich source of information
4.1 Analysis of Electronic Databases of Work Performed - AdvantagesA large amount of data is often readily available. The data is stable is not influenced by the presence of researchers.
4.1 Analysis of Electronic Databases of Work Performed - DisadvantagesLittle control over the quantity and quality of information manually entered about the work performed. descriptive fields are often not filled infilled in different ways by different developersDifficult to gather additional information about a recordsoftware engineer who worked on it is no longer available
4.1 Analysis of Electronic Databases of Work Performed - Reporting GuidelinesThe exact nature of the collected data needs to be specified, along with any special considerationsany data is missingany data is uninterpretable for some reason. Any special processing of the data if only a certain proportion is chosen to be analyzed.
4.2 Analysis of Tool LogsMany software systems generate logs of some formautomatic building tools often leave records, as source code control systems. build sophisticated logging into a wide spectrum of tools  better understand the support needs of the software engineers.Such tool logs can be analyzed in the same way tools that have been deliberately instrumented by the researchersthe researchers don’t have control over the kind of information collected
4.2 Analysis of Tool LogsAdvantagesThe data is already in electronic form, making it easier to code and analyze. The behavior being logged is part of software engineers normal work routine.
4.2 Analysis of Tool LogsDisadvantagesCompanies tend to use different tools in different ways  difficult to gather data consistently when using this technique with multiple organizations.
4.2 Analysis of Tool LogsReporting GuidelinesThe exact nature of what is being collected needs to specified, along with any special concerns, such as missing data. If the data is processed in any way, it needs to be explained.
4.3 Documentation AnalysisDocumentation generated by software engineers comments in the program codeseparate documents describing a software system. Other sources of documentation that can be analyzed local newsgroups group e-mail lists memosdocuments that define the development process.
4.3 Documentation AnalysisAdvantagesDocuments written about the system contain conceptual information present a glimpse of at least one person’s understanding of the software system. an introduction to the software and the team. Comments in the program code tend to provide low-level information on algorithms and data. Using the source code as the source of data allows for an up-to-date portrayal of the software system.
4.3 Documentation AnalysisDisadvantagesIt is time consuming It requires some knowledge of the source. Written material and source comments may be inaccurate.
4.3 Documentation AnalysisReporting GuidelinesThe documentation needs to be described as well as any processing on it.
4.4 Static and Dynamic Analysis of a SystemOne analyzes the code (static analysis) or traces generated by running the code (dynamic analysis) to learn about the design, and indirectly about how software engineers think and work. One might compare the programming or architectural styles of several software engineers by analyzing their use of various constructs, or the values of various complexity metrics.
4.4 Static and Dynamic Analysis of a System - AdvantagesThe source code is usually readily available and contains a very large amount of information ready to be mined.
4.4 Static and Dynamic Analysis of a System - DisadvantagesTo extract useful information from source code requires parsers and other analysis tools; such technology is not always mature – although parsers used in compilers are of high quality, the parsers needed for certain kinds of analysis can be quite different, for example they typically need to analyze the code without it being pre-processed. Analyzing old legacy systems created by multiple programmers over many years can make it hard to tease apart the various independent variables (programmers, activities etc.) that give rise to different styles, metrics etc.
4.4 Static and Dynamic Analysis of a System - Reporting GuidelinesThe source code should be carefully described. The nature of the processing on the data also needs to be detailed.Any special processing considerations should be described.
5. Applying the TechniquesHow to record and analyze the dataRecord-Keeping OptionsCoding and Analysis the Data
5.1 Record-Keeping OptionsDirect techniques involve three data capture methodsVideotapeAudiotapeManual record keeping. The completeness of the data record captured. The degree of interference they invoke in the work environment. The time involved in using the captured data.
5.1 Record-Keeping OptionsVideotape captures details that would otherwise be lost, such as gestures, gaze direction, etc. With respect to video recording, it is important to consider the video camera’s frame of reference. Videotape can record only where a video camera is aimed. Moving the video camera a bit to the right or a bit to the left may cause a difference in the recorded output and subsequently in the interpretation of the data. Related to videotaping, there are a number of software programs that allow screen capture and playback of the recorded interactions. To be used with videotape, the video and the screen capture must be synchronized in some way.
5.1 Record-Keeping OptionsAudiotape allows for a fairly complete record in the case of interviewsDetails of the physical environment and interaction with it will be lost. Audiotape does allow for the capture of tone.If a participant is excited while talking about a new tool, this will be captured on the audio record.
5.1 Record-Keeping OptionsManual record keeping is the most data sparse method and hence captures the least complete data record, however manual record keeping is also the quickest, easiest, and least expensive method to implement. Manual record keeping works best when a well-trained researcher identifies certain behaviors, thoughts, or concepts during the data collection process. Related to manual record keeping, Wu et  al. (2003) developed a data collection technique utilizing a PDA. On the PDA, they had predetermined categories of responses that were coded each time a particular behavior was observed. The data were easily transported to a database on a PC for further analysis.
5.1 Record-Keeping OptionsThe decision of which to use depends on many variablesprivacy at workthe participant’s degree of comfort with any of the three measuresthe amount of time available for data collection and interpretationthe type of question asked and how well it can be formalized. Data capture methods will affect the information gained and the information that it is possible to gain. not mutually exclusive  used in conjunction
5.2 Coding and Analysis the DataField study techniques produce enormous amounts of dataattractive nuisance (Miles, 1979). The purpose of this data is to provide insight into the phenomenon being studied. To meet this goal, the body of data must be reduced to a comprehensible format. This is done through a process of coding  using the goals of the research as a guide  a scheme is developed to categorize the data. High level schemes: all goals stated by a software engineer during debugging. Specific schemes: how many times grep was executed in a half-hour programming session. Once coded, the data is usually coded by another researcher to ensure the validity of the rating scheme. inter-coder or inter-rater reliability
5.2 Coding and Analysis the DataAudio and videotape records are usually transcribed before categorization, although transcription is often not necessary. Transcription requires significant cost and effort, and may not be justified for small, informal studies. Having made the decision to transcribe, obtaining an accurate transcription is challenging. A trained transcriber can take up to 6 hours to transcribe a single hour of tape (even longer when gestures, etc. must be incorporated into the transcription). An untrained transcriber (especially in technical domains) can do such a poor job that it takes researchers just as long to correct the transcript. While transcribing has its problems, online coding of audio or videotape can also be quite time inefficient as it can take several passes to produce an accurate categorization.If a question surfaces later, it will be necessary to listen to the tapes again, requiring more time.
5.2 Coding and Analysis the DataQuantitative analyzes can be used to provide summary information about the dataon average, how often grep is used in debugging sessions. Quantitative analyzes can also determine whether particular hypotheses are  supported by the datawhether high-level goals are stated more frequently in development than in maintenance.
5.2 Coding and Analysis the DataWhen choosing a statistical analysis method, it is important to know whether your data is consistent with assumptions made by the method. inferential statistical analyzes are only applicable in well-constrained situations. The type of data collected in field studies often requires nonparametric statistics. Nonparametric statistics are often called “distribution-free” in that they do not have the same requirements regarding the modeled distribution as parametric statistics. There are many nonparametric tests based on simple rankings, as opposed to strict numerical values. Many nonparametric tests can be used with small samples.Briand et al. (1996) discuss the disadvantages of nonparametric statistics versus parametric statistics in software engineering; they point out that a certain amount of violation of the assumptions of parametric statistics is legitimate, but that nonparametric statistics should be used when there are extreme violations of those assumptions, as there may well be in field studies.
5.2 Coding and Analysis the DataQualitative analyzes do not rely on quantitative measures to describe the data. Rather, they provide a general characterization based on the researchers’ coding schemes. The most popular tools for quantitative analysis are SAS and SPSS. A number of different tools exist for helping with qualitative analysis, including NVivo, Altas/ti, and Noldus observer.
5.2 Coding and Analysis the DataIn summary, the way the data is coded will affect its interpretation and the possible courses for its evaluation. Therefore it is important to ensure that coding schemes reflect the research goals. They should tie in to particular research questions. Additionally, coding schemes should be devised with the analysis techniques in mind. Again, different schemes will lend themselves to different evaluative mechanisms. However, one way to overcome the limitations of any one technique is to look at the data using several different techniques (such as combining a qualitative and quantitative analyzes). A triangulation approach (Jick, 1979) will allow for a more accurate picture of the studied phenomena. Bratthall and Jørgensen (2002) give a very nice example of using multiple methods for data triangulation. Their example is framed in a software engineering context examining software evolution nd development. In fact, many of the examples cited earlier, use multiple methods to triangulate their results.
5.2 Coding and Analysis the DataAs a final note, with any type of analysis technique, it is generally useful to go back to the original participant population to discuss the findings. Participants can tell researchers whether they believe an accurate portrayal of their situation has been achieved. This, in turn, can let researchers know whether they used appropriate coding scheme and analysis techniques.
6. Conclusions

More Related Content

Viewers also liked (20)

PPS
Histoire Des Oeufs Fabergé
pazpormexico
 
ODP
505 10
her mom
 
PPTX
Ածխածին
marytelunc
 
PDF
The Mariano Legacy Chapter 3.4
guest769a80f
 
PPS
Digital arte ildy
® ildy buday ®
 
PPTX
listening skill by ambreen
Duches Khan
 
PPT
Dia de Andalucía
Encarnación Valderrama
 
PPT
Christmas
stamatia
 
PPTX
Vva Informatisering Presentatie 20110514
pverdult
 
PDF
Omelia giulianova
L & L Comunicazione
 
PPT
Presentacion sobre algunos recursos de las Web 2.0 : Mindmesiter, Gmail Labs ...
Saregune
 
PPTX
A life time of happiness
Natacha Fitzwater
 
PPT
Kingdom of Nirvoas - afl. 3.7
guest57abed1
 
PPTX
Halloween
zakarialakbir
 
DOCX
Elie wiesel paper
brutus12
 
PPT
Social media is about the content and NOT the application
Natalie den Engelse
 
PPT
What is more ecological?
mireiagd
 
PPSX
Gracie Graduation Slide Show
DeAnna Toten Beard
 
PPTX
What is the time
Bayanjargalp
 
PDF
Cannabis Interdisciplinario en Uruguay
Claudio Martinez Debat
 
Histoire Des Oeufs Fabergé
pazpormexico
 
505 10
her mom
 
Ածխածին
marytelunc
 
The Mariano Legacy Chapter 3.4
guest769a80f
 
Digital arte ildy
® ildy buday ®
 
listening skill by ambreen
Duches Khan
 
Dia de Andalucía
Encarnación Valderrama
 
Christmas
stamatia
 
Vva Informatisering Presentatie 20110514
pverdult
 
Omelia giulianova
L & L Comunicazione
 
Presentacion sobre algunos recursos de las Web 2.0 : Mindmesiter, Gmail Labs ...
Saregune
 
A life time of happiness
Natacha Fitzwater
 
Kingdom of Nirvoas - afl. 3.7
guest57abed1
 
Halloween
zakarialakbir
 
Elie wiesel paper
brutus12
 
Social media is about the content and NOT the application
Natalie den Engelse
 
What is more ecological?
mireiagd
 
Gracie Graduation Slide Show
DeAnna Toten Beard
 
What is the time
Bayanjargalp
 
Cannabis Interdisciplinario en Uruguay
Claudio Martinez Debat
 

Similar to Data collection for field studies (20)

PPT
Know the user
John Kelleher
 
PPTX
11 - Evaluating Framework in Interaction Design_new.pptx
ZahirahZairul2
 
PPTX
UX and Usability Workshop Southampton Solent University
Dr.Mohammed Alhusban
 
PPT
UCIDesign.ppt
MrUmairKhan1
 
PPT
Managing Inquiry-based Learning: Learning from experience
cilass.slideshare
 
PDF
An Efficient And Scalable Meeting Minutes Generation And Presentation Technique
Daniel Wachtel
 
PDF
Block 1 ms-034 unit-3
Nirmal Jasmatiya
 
PPTX
week-11-PROTOTYPE.pptx
MaamLyca
 
PPTX
Usability methods to improve EMRs
Jeffery Belden
 
PDF
Improving Application Development Effectiveness
Computer Aid, Inc
 
PDF
Differences in-task-descriptions
Sameer Chavan
 
PPSX
System Development Methodologies
Kashif Khan (کاشف خان)
 
PPSX
System development methodologies
Kashif Khan (کاشف خان)
 
PPT
systemdevelopmentmethodologies-160803075401.ppt
ssuserb9efd7
 
PPTX
Integrating User Centered Design with Agile Development
Julia Borkenhagen
 
PPTX
To document or not to document? An exploratory study on developers' motivatio...
Hayim Makabee
 
PDF
User Centered Design in short
silvana churruca
 
PPTX
Consulting interview framework v1.0
Dmitry Barsukov
 
PPTX
Faculty Presentation Sample ppt.pptx presentation
chmeghana1
 
PDF
Se chapter 1,2,3 2 mark qa
Aruna M
 
Know the user
John Kelleher
 
11 - Evaluating Framework in Interaction Design_new.pptx
ZahirahZairul2
 
UX and Usability Workshop Southampton Solent University
Dr.Mohammed Alhusban
 
UCIDesign.ppt
MrUmairKhan1
 
Managing Inquiry-based Learning: Learning from experience
cilass.slideshare
 
An Efficient And Scalable Meeting Minutes Generation And Presentation Technique
Daniel Wachtel
 
Block 1 ms-034 unit-3
Nirmal Jasmatiya
 
week-11-PROTOTYPE.pptx
MaamLyca
 
Usability methods to improve EMRs
Jeffery Belden
 
Improving Application Development Effectiveness
Computer Aid, Inc
 
Differences in-task-descriptions
Sameer Chavan
 
System Development Methodologies
Kashif Khan (کاشف خان)
 
System development methodologies
Kashif Khan (کاشف خان)
 
systemdevelopmentmethodologies-160803075401.ppt
ssuserb9efd7
 
Integrating User Centered Design with Agile Development
Julia Borkenhagen
 
To document or not to document? An exploratory study on developers' motivatio...
Hayim Makabee
 
User Centered Design in short
silvana churruca
 
Consulting interview framework v1.0
Dmitry Barsukov
 
Faculty Presentation Sample ppt.pptx presentation
chmeghana1
 
Se chapter 1,2,3 2 mark qa
Aruna M
 
Ad

More from Will Shen (18)

PDF
20180717 Introduction of Seamless BLE Connection Migration System (SeamBlue)
Will Shen
 
PPTX
16格筆記讀書法
Will Shen
 
PDF
Bade Smells in Code
Will Shen
 
PDF
Intro To BOOST.Spirit
Will Shen
 
PDF
20070514 introduction to test ng and its application for test driven gui deve...
Will Shen
 
PDF
20060411 face recognition using face arg matching
Will Shen
 
PDF
20060411 Analytic Hierarchy Process (AHP)
Will Shen
 
PDF
20050713 critical paths for gui regression testing
Will Shen
 
PDF
20050314 specification based regression test selection with risk analysis
Will Shen
 
PDF
20041113 A Test Generation Tool for Specifications in the Form of State Machine
Will Shen
 
PDF
Junit Recipes - Elementary tests (2/2)
Will Shen
 
PDF
Junit Recipes - Elementary tests (1/2)
Will Shen
 
PDF
Junit Recipes - Intro
Will Shen
 
PPT
20051019 automating regression testing for evolving gui software
Will Shen
 
PPT
20060712 automated model based testing of community-driven open-source gui ap...
Will Shen
 
PPT
20041221 gui testing survey
Will Shen
 
PPT
20060927 application facades
Will Shen
 
PPT
20111018 boost and gtest
Will Shen
 
20180717 Introduction of Seamless BLE Connection Migration System (SeamBlue)
Will Shen
 
16格筆記讀書法
Will Shen
 
Bade Smells in Code
Will Shen
 
Intro To BOOST.Spirit
Will Shen
 
20070514 introduction to test ng and its application for test driven gui deve...
Will Shen
 
20060411 face recognition using face arg matching
Will Shen
 
20060411 Analytic Hierarchy Process (AHP)
Will Shen
 
20050713 critical paths for gui regression testing
Will Shen
 
20050314 specification based regression test selection with risk analysis
Will Shen
 
20041113 A Test Generation Tool for Specifications in the Form of State Machine
Will Shen
 
Junit Recipes - Elementary tests (2/2)
Will Shen
 
Junit Recipes - Elementary tests (1/2)
Will Shen
 
Junit Recipes - Intro
Will Shen
 
20051019 automating regression testing for evolving gui software
Will Shen
 
20060712 automated model based testing of community-driven open-source gui ap...
Will Shen
 
20041221 gui testing survey
Will Shen
 
20060927 application facades
Will Shen
 
20111018 boost and gtest
Will Shen
 
Ad

Data collection for field studies

  • 1. Software Engineering Data Collection for Field StudiesZheng-WenShen2009/07/09
  • 2. ReferenceForrest Shull, Janice Singer, Dag I. K. Sjoberg., Guide to Advanced Empirical Software Engineering, Springer London, PP. 9-34, 2007/11/21.
  • 3. OutlineIntroductionDirect TechniquesIndirect TechniquesIndependent TechniquesApplying the TechniquesConclusion
  • 4. 1. Introduction Software engineering involves real people working in real environments.People create softwarePeople maintain softwarePeople evolve software
  • 5. 1. Introduction To understand SE  FIeld StudiesStudy software engineers as they workData Collection
  • 6. Direct Techniques Require the researcher to have direct involvement with the participants.Direct Technique
  • 7. Direct Techniques Data Collection Taxonomy 1/2
  • 9. Indirect Techniques Require the researcher to have only indirect access to the participants’ via direct access to their work environment.Indirect Technique
  • 10. Indirect Techniques Data Collection Taxonomy
  • 11. Independent Techniques Require researchers to access only work artifacts.Independent Technique
  • 13. For each techniques, we will discussA brief introductionAdvantagesDisadvantagesReporting guidelines
  • 15. 2.1 Brainstorming and focus groupSeveral people get together and focus on a particular issueCover as many ideas as possibleA moderatorA simple “trigger question”
  • 16. 2.1 Brainstorming and focus groupAdvantagesOne is new to a domain and seeking ideas for further exploration.Collect information from large groups of people at once.Rapidly identifying what is important to the participants.
  • 17. 2.1 Brainstorming and focus groupDisadvantagesNeed a well-trained moderatorPeople can be shy in a groupRaise particular issues relevant to their daily work?Difficult to schedule a brainstorming session or focus group
  • 18. 2.1 Brainstorming and focus groupReporting GuidelinesThe number of participants seen, and the context in which they were seen should be reported. Where appropriate the role and expertise of the moderator should be described. If specific questions were used, they should be detailed. The time spent on brainstorming should be reported. The type of data recording used should be described (e.g., video, audio, notes, etc.).
  • 19. 2.2 InterviewsInterviews involve at least one researcher talking to at least one respondent.Structured interview: a fixed list of carefully worded questionsSemi-structured interview: conversational flow
  • 20. 2.2 InterviewsAdvantagesStructured interviews : collecting the same data from a large number of respondents. Semi-structured interviews  highly interactive. Researchers can clarify questions for respondents and probe unexpected responses. Interviewers can also build rapport with a respondent to improve the quality of responses.
  • 21. 2.2 InterviewsDisadvantagesInterviews are time and cost inefficient. Contact with the respondent needs to be scheduled At least one person needs to attend the meetingCareful note-taking may often be an adequate substitute for audio or video recording.Participants’ reports of events may not mirror reality.
  • 22. 2.2 InterviewsReporting GuidelinesTo detailthe number and type of interviewees seenapproximately how long the interviews tookthe type of interviewthe way the interview is recordedhow the participants were selectedProvide a copy of the questions in the report or an appendix.
  • 23. 2.3 QuestionnairesQuestionnaires are sets of questions administered in a written format. Very careful attention needs to be paid to The wording of the questionsThe layout of the formsThe ordering of the questions
  • 24. 2.3 Questionnaires AdvantagesQuestionnaires are time and cost effective. Researchers do not need to schedule sessionsQuestionnaires can be filled out when a software engineer has time between tasks Paper form-based questionnaires can be transported to the respondent for little more than the cost of postage. Web-based questionnaires cost even less the paper forms are eliminated the data are received in electronic form. Easily collect data from a large number of respondents in geographically diverse locations.
  • 25. 2.3 Questionnaires DisadvantagesAmbiguous and poorly-worded questions are problematic. Response rates can be relatively low 5%adversely affects the representativeness of the sample. Responses tend to be more terse than with interviews.Responses to questions may not mirror reality.
  • 26. 2.3 Questionnaires Reporting GuidelinesTo detail …how the population was sampled (i.e., who the questionnaires were sent to, how respondents were chosen) the response rate for the questionnaireAny subsequent modification of the questionnaire should be explained. Provide a copy of the questions in the report or an appendix.
  • 27. 2.4 Conceptual ModelingParticipants create a model of some aspect of their work  mental modelsDraw a diagramdata flow diagramcontrol flow diagrampackage diagram Physical map environment  pointing out who they talk to and how often.
  • 28. 2.4 Conceptual ModelingAdvantagesConceptual models provide an accurate portrayal of the user’s conception of his/her mental model of the system. Easy to collect and require only low-tech aids (pen and paper).
  • 29. 2.4 Conceptual ModelingDisadvantagesHard to interpret the results of conceptual modeling Domain Knowledge about the system. Some software engineers are reluctant to draw.The quality and level of details in diagrams can vary significantly.
  • 30. 2.4 Conceptual ModelingReporting GuidelinesThe exact instructions given to the participants.A precise description of the tools that participants had available to them to model. The way the data is recorded should also be outlined.
  • 31. 2.5 Work DiariesWork diaries require respondents to record various events that occur during the day. filling out a form at the end of the dayrecording specific activities as they occurnoting whatever the current task is at a pre-selected time. Paper forms: record information at the end of the day.Computer applications: prompt users for input at random times. Time sheets: Many software engineers are required to maintain and update quite detailed time sheets recording how many hours are spent per day per activity category.
  • 32. 2.5 Work DiariesAdvantagesTo provide better self-reports of events record activities on an ongoing basis rather than in retrospect (as in interviews and questionnaires). How software engineers spend their day without undertaking a great deal of observation or shadowing.
  • 33. 2.5 Work DiariesDisadvantagesRely on self-reportsrequire participants to recall events Interfere with respondents as they work. If software engineers have to record each time they go and consult a colleague  they may consult less often. Forget or neglect to record some events.May not record at the expected level of detail.
  • 34. 2.5 Work DiariesReporting GuidelinesThe precise task given to the software engineers must be describedTo record their communication patternsHow the task was accomplishedTo report to experimenterThe tools made available to do so
  • 35. 2.6 Think-Aloud ProtocolsResearchers ask participants to think out loud while performing a task. The task can occur naturally at work The task can be predetermined by the researcher. Researchers do not interfere in the problem solving process. Think-aloud sessions generally last no more than 2 hours. Determine or validate a cognitive model as software engineers do some programming task.
  • 36. 2.6 Think-Aloud ProtocolsAdvantagesRelatively easy to implement.A unique view of the problem solving process and additionally gives access to mental model.
  • 37. 2.6 Think-Aloud Protocols DisadvantagesFor use in situations where a researcher couldmap out the entire problem space. It is difficult and time-consuming to analyze think-aloud data.
  • 38. 2.6 Think-Aloud Protocols Reporting GuidelinesTo provide an extremely precise characterization of the task the participant was asked to undertake, including any tools at the participant’s disposal. The time taken to complete the task.Any materials provided to the participant.The precise way in which the analysis occurs.
  • 39. 2.7 Shadowing/ObservationThe experimenter follows the participant around and records their activities. Unlimited time period Observation occurs when the experimenter observes software engineers engaged in their workmeetings or programming. Shadow one software engineer at a time  Observe many at one time.
  • 41. 2.7 Shadowing/ObservationAdvantagesEasy to implementGive fast resultsRequire no special equipment
  • 42. 2.7 Shadowing/ObservationDisadvantagesDifficult to see what a software engineer is doingusing keyboard shortcutsObservers need to have a fairly good understanding of the environment to interpret the software engineer’s behavior.Predefine a set of categories or looked-for behaviors.Limit the type of data that will be collected.
  • 43. 2.7 Shadowing/ObservationReporting GuidelinesThe precise form of shadowing and/or observationwhether any verbal instructions were given to the participant to think out loud. The way the information is recordedthe length of the sessionany other special instructions given to the participants. To provide context informationwhat activities the shadowed and/or observed participants were engaged in.
  • 44. 2.8 Participant-Observer (Joining the Team)The researcher essentially becomes part of the team and participates in key activities. A high level of familiarity with the team members and the tasks they perform. Software engineers are comfortable with the researcher’s presence and tend not to notice being observed.
  • 45. 2.8 Participant-Observer AdvantagesRespondents are more likely to be comfortable with a team member and to act naturally during observation. Researchers also develop a deeper understanding of software engineering tasks.
  • 46. 2.8 Participant-Observer DisadvantagesJoining a team is very time consuming. to establish true team membershipA researcher who becomes too involved may lose perspective on the phenomenon being observed.
  • 47. 2.8 Participant-Observer Reporting GuidelinesTo report the role of the participant-observer in the team.How they interact with the team.What access they have to team material.The length of time of the interaction needs.How data was collected, coded, and analyzed.
  • 49. 3.1 Instrumenting SystemsInstrumentation to be built into the software tools.To record information about the usage of the tools.how frequently a tool or feature is usedpatterns of access to files and directoriesthe timing underlying different activities.Record both the input and output in great detail  effectively play back
  • 50. 3.1 Instrumenting SystemsAdvantagesIt requires no time commitment from software engineers. To provide relative frequency and duration of the various activities accurately.
  • 51. 3.1 Instrumenting SystemsDisadvantagesDifficult to analyze data meaning-fullyDetermine software engineers’ thoughts and goalsWorking environment is not well understood or constrained. Software engineers often customize their environments by adding scripts and macrosto play back the events to a software engineer and ask them to comment. Ethical Concerns  monitoring software engineers without their knowledge.
  • 52. 3.1 Instrumenting SystemsReporting GuidelinesThe precise nature of the loggingany special instrumentation installed on the software engineer’s machines.a description of what exactly is logged, with what frequency. Any special considerations with respect to data processing and analysis
  • 53. 3.2 Fly on the Wall (Participants Recording their Own Work)Participants are asked to video- or audiotape themselves when they are engaged in some predefined activity. It allows the researcher to be an observer of an activity.
  • 54. 3.2 Fly on the Wall AdvantagesRequire very little time from the participants and is very unobtrusive. There may be some discomfort in the beginning, it fades quickly.
  • 55. 3.2 Fly on the Wall DisadvantagesMay forget to turn on the recording equipment at the appropriate time.The record may be incomplete or missing. The camera is fixed, so the context of what is recorded may be hard to understand. There is a high cost to analyzing the resulting data.
  • 56. 3.2 Fly on the Wall Reporting GuidelinesAny problems with the recordingdevelopers forgetting to record a meeting. Context information will also help to clarify the application of the techniquewhere the recording occurredwhat the typical tasks werewho was involvedwho was responsible for the recording Any methods used to transform, transcribe, and analyze the data need to be specified.
  • 57. Not the end…IntroductionDirect TechniquesIndirect TechniquesIndependent TechniquesApplying the TechniquesConclusion
  • 59. 4.1 Analysis of Electronic Databases of Work PerformedThe work performed by developers is carefully managed using issue trackerproblem reportingchange requestconfiguration managementThe copious records generated for such systems are a rich source of information
  • 60. 4.1 Analysis of Electronic Databases of Work Performed - AdvantagesA large amount of data is often readily available. The data is stable is not influenced by the presence of researchers.
  • 61. 4.1 Analysis of Electronic Databases of Work Performed - DisadvantagesLittle control over the quantity and quality of information manually entered about the work performed. descriptive fields are often not filled infilled in different ways by different developersDifficult to gather additional information about a recordsoftware engineer who worked on it is no longer available
  • 62. 4.1 Analysis of Electronic Databases of Work Performed - Reporting GuidelinesThe exact nature of the collected data needs to be specified, along with any special considerationsany data is missingany data is uninterpretable for some reason. Any special processing of the data if only a certain proportion is chosen to be analyzed.
  • 63. 4.2 Analysis of Tool LogsMany software systems generate logs of some formautomatic building tools often leave records, as source code control systems. build sophisticated logging into a wide spectrum of tools  better understand the support needs of the software engineers.Such tool logs can be analyzed in the same way tools that have been deliberately instrumented by the researchersthe researchers don’t have control over the kind of information collected
  • 64. 4.2 Analysis of Tool LogsAdvantagesThe data is already in electronic form, making it easier to code and analyze. The behavior being logged is part of software engineers normal work routine.
  • 65. 4.2 Analysis of Tool LogsDisadvantagesCompanies tend to use different tools in different ways  difficult to gather data consistently when using this technique with multiple organizations.
  • 66. 4.2 Analysis of Tool LogsReporting GuidelinesThe exact nature of what is being collected needs to specified, along with any special concerns, such as missing data. If the data is processed in any way, it needs to be explained.
  • 67. 4.3 Documentation AnalysisDocumentation generated by software engineers comments in the program codeseparate documents describing a software system. Other sources of documentation that can be analyzed local newsgroups group e-mail lists memosdocuments that define the development process.
  • 68. 4.3 Documentation AnalysisAdvantagesDocuments written about the system contain conceptual information present a glimpse of at least one person’s understanding of the software system. an introduction to the software and the team. Comments in the program code tend to provide low-level information on algorithms and data. Using the source code as the source of data allows for an up-to-date portrayal of the software system.
  • 69. 4.3 Documentation AnalysisDisadvantagesIt is time consuming It requires some knowledge of the source. Written material and source comments may be inaccurate.
  • 70. 4.3 Documentation AnalysisReporting GuidelinesThe documentation needs to be described as well as any processing on it.
  • 71. 4.4 Static and Dynamic Analysis of a SystemOne analyzes the code (static analysis) or traces generated by running the code (dynamic analysis) to learn about the design, and indirectly about how software engineers think and work. One might compare the programming or architectural styles of several software engineers by analyzing their use of various constructs, or the values of various complexity metrics.
  • 72. 4.4 Static and Dynamic Analysis of a System - AdvantagesThe source code is usually readily available and contains a very large amount of information ready to be mined.
  • 73. 4.4 Static and Dynamic Analysis of a System - DisadvantagesTo extract useful information from source code requires parsers and other analysis tools; such technology is not always mature – although parsers used in compilers are of high quality, the parsers needed for certain kinds of analysis can be quite different, for example they typically need to analyze the code without it being pre-processed. Analyzing old legacy systems created by multiple programmers over many years can make it hard to tease apart the various independent variables (programmers, activities etc.) that give rise to different styles, metrics etc.
  • 74. 4.4 Static and Dynamic Analysis of a System - Reporting GuidelinesThe source code should be carefully described. The nature of the processing on the data also needs to be detailed.Any special processing considerations should be described.
  • 75. 5. Applying the TechniquesHow to record and analyze the dataRecord-Keeping OptionsCoding and Analysis the Data
  • 76. 5.1 Record-Keeping OptionsDirect techniques involve three data capture methodsVideotapeAudiotapeManual record keeping. The completeness of the data record captured. The degree of interference they invoke in the work environment. The time involved in using the captured data.
  • 77. 5.1 Record-Keeping OptionsVideotape captures details that would otherwise be lost, such as gestures, gaze direction, etc. With respect to video recording, it is important to consider the video camera’s frame of reference. Videotape can record only where a video camera is aimed. Moving the video camera a bit to the right or a bit to the left may cause a difference in the recorded output and subsequently in the interpretation of the data. Related to videotaping, there are a number of software programs that allow screen capture and playback of the recorded interactions. To be used with videotape, the video and the screen capture must be synchronized in some way.
  • 78. 5.1 Record-Keeping OptionsAudiotape allows for a fairly complete record in the case of interviewsDetails of the physical environment and interaction with it will be lost. Audiotape does allow for the capture of tone.If a participant is excited while talking about a new tool, this will be captured on the audio record.
  • 79. 5.1 Record-Keeping OptionsManual record keeping is the most data sparse method and hence captures the least complete data record, however manual record keeping is also the quickest, easiest, and least expensive method to implement. Manual record keeping works best when a well-trained researcher identifies certain behaviors, thoughts, or concepts during the data collection process. Related to manual record keeping, Wu et al. (2003) developed a data collection technique utilizing a PDA. On the PDA, they had predetermined categories of responses that were coded each time a particular behavior was observed. The data were easily transported to a database on a PC for further analysis.
  • 80. 5.1 Record-Keeping OptionsThe decision of which to use depends on many variablesprivacy at workthe participant’s degree of comfort with any of the three measuresthe amount of time available for data collection and interpretationthe type of question asked and how well it can be formalized. Data capture methods will affect the information gained and the information that it is possible to gain. not mutually exclusive  used in conjunction
  • 81. 5.2 Coding and Analysis the DataField study techniques produce enormous amounts of dataattractive nuisance (Miles, 1979). The purpose of this data is to provide insight into the phenomenon being studied. To meet this goal, the body of data must be reduced to a comprehensible format. This is done through a process of coding  using the goals of the research as a guide  a scheme is developed to categorize the data. High level schemes: all goals stated by a software engineer during debugging. Specific schemes: how many times grep was executed in a half-hour programming session. Once coded, the data is usually coded by another researcher to ensure the validity of the rating scheme. inter-coder or inter-rater reliability
  • 82. 5.2 Coding and Analysis the DataAudio and videotape records are usually transcribed before categorization, although transcription is often not necessary. Transcription requires significant cost and effort, and may not be justified for small, informal studies. Having made the decision to transcribe, obtaining an accurate transcription is challenging. A trained transcriber can take up to 6 hours to transcribe a single hour of tape (even longer when gestures, etc. must be incorporated into the transcription). An untrained transcriber (especially in technical domains) can do such a poor job that it takes researchers just as long to correct the transcript. While transcribing has its problems, online coding of audio or videotape can also be quite time inefficient as it can take several passes to produce an accurate categorization.If a question surfaces later, it will be necessary to listen to the tapes again, requiring more time.
  • 83. 5.2 Coding and Analysis the DataQuantitative analyzes can be used to provide summary information about the dataon average, how often grep is used in debugging sessions. Quantitative analyzes can also determine whether particular hypotheses are supported by the datawhether high-level goals are stated more frequently in development than in maintenance.
  • 84. 5.2 Coding and Analysis the DataWhen choosing a statistical analysis method, it is important to know whether your data is consistent with assumptions made by the method. inferential statistical analyzes are only applicable in well-constrained situations. The type of data collected in field studies often requires nonparametric statistics. Nonparametric statistics are often called “distribution-free” in that they do not have the same requirements regarding the modeled distribution as parametric statistics. There are many nonparametric tests based on simple rankings, as opposed to strict numerical values. Many nonparametric tests can be used with small samples.Briand et al. (1996) discuss the disadvantages of nonparametric statistics versus parametric statistics in software engineering; they point out that a certain amount of violation of the assumptions of parametric statistics is legitimate, but that nonparametric statistics should be used when there are extreme violations of those assumptions, as there may well be in field studies.
  • 85. 5.2 Coding and Analysis the DataQualitative analyzes do not rely on quantitative measures to describe the data. Rather, they provide a general characterization based on the researchers’ coding schemes. The most popular tools for quantitative analysis are SAS and SPSS. A number of different tools exist for helping with qualitative analysis, including NVivo, Altas/ti, and Noldus observer.
  • 86. 5.2 Coding and Analysis the DataIn summary, the way the data is coded will affect its interpretation and the possible courses for its evaluation. Therefore it is important to ensure that coding schemes reflect the research goals. They should tie in to particular research questions. Additionally, coding schemes should be devised with the analysis techniques in mind. Again, different schemes will lend themselves to different evaluative mechanisms. However, one way to overcome the limitations of any one technique is to look at the data using several different techniques (such as combining a qualitative and quantitative analyzes). A triangulation approach (Jick, 1979) will allow for a more accurate picture of the studied phenomena. Bratthall and Jørgensen (2002) give a very nice example of using multiple methods for data triangulation. Their example is framed in a software engineering context examining software evolution nd development. In fact, many of the examples cited earlier, use multiple methods to triangulate their results.
  • 87. 5.2 Coding and Analysis the DataAs a final note, with any type of analysis technique, it is generally useful to go back to the original participant population to discuss the findings. Participants can tell researchers whether they believe an accurate portrayal of their situation has been achieved. This, in turn, can let researchers know whether they used appropriate coding scheme and analysis techniques.