SlideShare a Scribd company logo
Using Vision and Voice to Create a Multimodal Interface
                                      for Microsoft Word 2007
                                                               T.R. Beelders and P.J. Blignaut
                                                 The University of the Free State, Bleomfontein, South Africa
                                                               {beelderstr; pieterb}@ufs.ac.za

Abstract                                                                                           interface should be customizable and allow users to select any
                                                                                                   combination of interaction techniques which suit their needs.
There has recently been a call to move away from the standard                                      The premise of the research study is not to develop a new word
WIMP type of interfaces and give users access to more intuitive                                    processor but rather to incorporate additional interaction tech-
interaction techniques. Therefore, it in order to test the usability                               niques, besides the keyboard and mouse, into an application
of a multimodal interface in Word 2007, the most popular word                                      which has already been accepted by the user community. This
processor, the additional modalities of eye gaze and speech rec-                                   will allow for the improvement of an already popular product
ognition were added within Word 2007 as interaction tech-                                          and stimulate inclusiveness of non-mainstream users into the
niques. This paper discusses the developed application and the                                     mainstream market. Therefore, one aim is to determine whether
way in which the interaction techniques are included within the                                    it is possible to customize an interface to such an extent that all
well-established environment of Word 2007. The additional                                          user groups are catered for with an all-inclusive interface.
interaction techniques are fully customizable and can be used in
isolation or in combination. Eye gaze can be used with dwell                                       The research study is still in the beginning phase where devel-
time, look and shoot or blinking and speech recognition can be                                     opment of the tool is underway. Therefore, for the purposes of
used for dictation and verbal commands for both formatting                                         this paper, the application as it has been developed will be the
purposes and navigation through a document. Additionally, the                                      main focus. The paper will, however, conclude with a short dis-
look and shoot method can also be combined with a verbal                                           cussion of the next phases of the research study.
command to facilitate a completely hands-free interaction. Mag-
nification of the interface is also provided to improve accuracy                                   2     Interaction Techniques
and multiple onscreen keyboards are provided to provide hands
free typing capabilities.                                                                          Using a physical input device in order to communicate or per-
                                                                                                   form a task in human-computer dialogue is called an interaction
Keywords: Eye-tracking, speech recognition, usability, word                                        technique [Foley, et al., 1990 as cited in Jacob, 1995]. The inte-
processing, multimodal                                                                             raction techniques of speech recognition and eye tracking will be
                                                                                                   included in a popular word processor interface to create a mul-
1       Introduction                                                                               timodal interface as a means to determine whether the usability
                                                                                                   of this product can be enhanced in this way.
The word processor application has evolved substantially since
its initial inception and since then has undergone a virtual meta-                                 Although this approach has received limited attention thus far,
morphosis to achieve the capabilities that are available in these                                  the multimodal approach has always focused on the develop-
applications today. As an integral part of everyday life for many                                  ment of a third-party application, for example EyeTalk [Hatfield
people it caters for a very diverse group of users, Furthermore,                                   and Jenkins, 1997]. Contrary to this, this study will use an al-
users with disabilities or needs other than those of mainstream                                    ready existing application, namely Microsoft Word ©, which
users are not always taken into consideration during system                                        currently enjoys a high prevalence in the commercial market.
development and often have to compensate by using specially
designed applications which do not necessarily compare with the                                    3     Development environment
more popular applications. This study therefore aims to investi-
gate various means to increase the usability of a word processor                                   The development environment used was Visual Studio 2008,
for as wide a user group as possible.                                                              making use of the .NET Framework 3.5. Visual Studio Tools for
                                                                                                   Microsoft Office System 2008 (VSTO) in C# was used for de-
For this reason, the interface of the most popular word processor                                  velopment. VSTO allows programmers to use managed code to
application will be extended into a multimodal interface. This                                     build Office-based solutions in C# and VB.NET [Anderson,
interface should facilitate use of the mainstream product by                                       2009]. In order to incorporate the speech recognition the Micro-
marginalized users, whilst at the same time enhancing the user                                     soft Speech Application Programming Interface (SAPI) with
experience for novice, intermediate and expert users. Ideally the                                  version 5.1 of the SDK was used. The SDK provides the capa-
                                                                                                   bility of compiling customized grammars and accessing the
                                                                                                   functionalities of the speech recognizer. In order to provide gaze
Copyright © 2010 by the Association for Computing Machinery, Inc.
Permission to make digital or hard copies of part or all of this work for personal or
                                                                                                   interaction Tobii SDK 1.5.4 was used. For magnification pur-
classroom use is granted without fee provided that copies are not made or distributed              poses, which will be discussed in an upcoming section, the
for commercial advantage and that copies bear this notice and the full citation on the             commercial product Magnifying Glass Pro 1.7 was chosen as a
first page. Copyrights for components of this work owned by others than ACM must be                relatively inexpensive solution but primarily based on the fact
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
                                                                                                   that it was one of the few applications which incorporated click-
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail               able areas within the magnified area which are then correctly
permissions@acm.org.
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

                                                                                             173
transferred to the underlying area. This is essential in the devel-         continued use of the traditional interaction techniques. As illu-
oped product as the magnification will increase the accuracy of             strated in Figure 1, an extra tab was added to the established
cursor positioning via eye gaze and correct interpretation of user          Microsoft Word ribbon. This tab (circled in red) was named
intention and requiring the user to disable magnification before            Multimodal Add-Ins.
clicking on the interface would negate all the advantages gained
from magnification                                                          The new tab provides numerous options to the user to select
                                                                            which additional interaction techniques they would like to use
The aim of the development process was to incorporate speech                (Figure 1). As is evident from Figure 1, complete customization
recognition and eye tracking as additional interaction techniques           of the techniques is allowed via selection of any combination of
in the Microsoft Word environment. The user should also be                  techniques as well as in what capacity the techniques must be
given the freedom to determine in which combination the inte-               implemented.
raction techniques must be used, while still having the option of




Figure 1: Multimodal Add-ins for Word 2007



Additional tools which are available to enhance the user expe-              Speech recognition
rience are a magnification tool and an onscreen keyboard which
can be displayed at the bottom of the Word document. The mag-               The user has the option of enabling the speech engine so that
nification tool magnifies the immediate area under the mouse                Microsoft Word can respond to verbal utterances. In terms of the
cursor, thereby providing increased accuracy for users with                 customizable options, the user can toggle between dictation
weak eyesight and those making use of the gaze sensitive inter-             mode and command mode. In dictation mode, the speech recog-
face. Magnification is available when using the mouse or when               nition is implemented in the well-known method of capturing
using eye gaze as an interaction technique. The use of the mag-             vocalizations, translating those vocalizations into text and writ-
nification tool is entirely at the discretion of the user who is            ing the result to the currently activated document in Microsoft
capable of turning magnification on and off at will or as needed.           Word. In order for the dictation mode to be effective the user
Magnification can be toggled using either a verbal command or               must select a previously trained profile. A unique profile can be
the available button on the ribbon.                                         trained through the Windows Speech wizard. All the available
                                                                            speech profiles are provided in a drop-down box on the multi-
Onscreen keyboards are available as an alternative to using a               modal add-in tab for the convenience of the user.
traditional keyboard. The onscreen keyboard can be used either
through use of the traditional mouse or to achieve hands-free               In command mode, a grammar is activated which accepts only
typing using eye gaze or a combination of eye gaze and speech               isolated commands and responds to these in a pre-determined
recognition. The final adapted interface, as envisioned in use              manner. Command mode provides the functions of cursor con-
when the on screen keyboard is in use, is shown in Figure 2.                trol, formatting capabilities and certain document handling ca-
                                                                            pabilities. Several different commands are provided which have
The layout of the onscreen keyboard can be changed to either a              the same application reaction, thereby contributing to further
traditional QWERTY keyboard layout or to an alphabetic layout.              customization for the user as they can determine which the most
Each keyboard contains all 26 alphabetic letters, a Space bar,              desirable command is for them to use. Moreover, simple cursor
Backspace and Delete keys as well as special keys which simpli-             control is provided by providing directional commands but more
fy movement through the document. Special keys which are                    complex cursor control is also provided by allowing line selec-
provided are Page up, Page down, Home and End. The user can                 tion and movement of the cursor as though control keys (such as
also toggle between upper case and lower case by activating and             Shift) are being pressed in combination with the verbal com-
deactivating the CAPS lock key. A Select All key is provided as             mand. These types of commands will simplify selection of text
a means for the user to select all the text in the document. The            and provide verbal commands for complex key combinations
two red arrows in the lower left corner of the keyboard (Figure             which are not always known to novice and intermediate users.
2) change the size of all keyboard keys in decrements and in-               For example, the word “Bold” causes the activation or deactiva-
crements of 10 pixels respectively, thereby providing even more             tion of the bold formatting style. Similarly the words “Italic” and
customization of the keyboard for the user. Auditory feedback in            “Underline” activate or deactivate their formatting style. Words
the form of a soft beep is given when a keyboard key is clicked             such as “Cut”, “Copy” and “Paste” allow for text manipulation
on.                                                                         and are their subsequent actions are of course the cutting or co-
                                                                            pying of the currently selected text and the pasting of the clip-




                                                                      174
board contents at the position of the cursor. More complex                   current cursor position. Cursor control is achieved through the
commands for text selection are available such as “Select line”,             commands “Left”, “Right”, “Up” and “Down”. Verbal com-
which selects the whole line on which the cursor is situated,                mands can be issued in sequence to perform relatively complex
“Select word”, which selects the word nearest to the right of the            document manipulation.




Figure 2: Adapted interface of Word 2007 when the onscreen keyboard is activated



Eye-tracking                                                                 mouse click is simulated. When selecting the look and shoot
                                                                             method, the user can position the cursor using eye gaze and then
The eye tracker can be calibrated for use directly through the               press the Enter key to simulate a left mouse click. This would
Microsoft Word interface. This increases the usability of the                have the effect of either placing the cursor at the position of the
application as the user is not required to move between applica-             eye gaze or clicking on the icon directly under the eye gaze of
tions to achieve their goal of using gaze as an interaction tech-            the user. The third option available to the user is that of blinking.
nique. Since the word processor is the focus of this study, this             In this scenario, the user fixates on the desired object or position
meets the requirement of the research question scope. The user               and then blinks their eyes to simulate a left mouse click.
has the option to activate eye gaze which can then be used to
position the cursor in the document or over an object to be ma-              Multiple interaction techniques
nipulated. Customization is provided by allowing the user to
choose the activation method. For use purely as a single interac-            When the user selects No activation (Figure 1) via eye gaze that
tion method, the choices of dwell time, look and shoot and                   implies that they will instead be using voice commands to re-
blinking are provided. When dwell time is selected, the user is              spond to the current eye gaze position of the user. In this in-
able to set the interval of the dwell time (see the Sensitivity Set-         stance, the speech recognition must also be enabled and the user
ting text box o Figure 1). This provides additional customization            can then issue verbal commands to move the cursor to the cur-
as the user can determine the speed with which they are most                 rent gaze position which is analogous to executing a left mouse
comfortable and leaves the option for adjusting this interval as             click at that position. In this way, it is possible for the user to
the user gains more confidence and experience with gaze based                place the cursor at any position in the document, or to click one
interaction. The interval can be changed at any time during the              of the Microsoft Word icons on the ribbon. The verbal com-
use of the application. Dwell time requires the user to fixate on a          mands of “Go”, “Click” or “Select” all simulate a left mouse
position for the set time of the dwell time interval before a left           click at the button closest to the current gaze position. In this



                                                                       175
way, the user is free to choose the command which they find                  Furthermore, to further investigate the usability of the newly
most suitable for them.                                                      developed application, user efficiency effectiveness can be
                                                                             measured in a within-subjects experiment by requiring users to
In most instances it is envisioned that the onscreen keyboard                complete identical tasks in both the commercial Microsoft Word
will also be activated under these circumstances. When the on-               and the new multimodal Microsoft Word.
screen keyboard is activated in conjunction with the eye gaze,
visual feedback is given to the user to indicate which button will           Moreover, the usability of the various interaction techniques will
be clicked when the verbal command is issued. With each fixa-                also be analyzed to determine which combination of the interac-
tion that is detected within the boundaries of the keyboard, the             tion techniques provides the most usable interface – if any. User
button which is closest to that fixation is determined to be the             satisfaction will be measured through means of a questionnaire
target and a shape is displayed in the centre of the button. The             in order to gauge user reaction, both in a short-term and long-
user can also select which shape they would like to use for visu-            term exposure period.
al feedback. The available shapes are a solid square, a hollow
square, a solid disk and a hollow circle. The hollow shapes do               5     Summary
not obscure the letter of the key and in so doing provide the
necessary visual feedback whilst still allowing the user to see the
letter which will be written to the document. Feedback is only               A multimodal interface was developed for Microsoft Word in
                                                                             order to eventually determine whether the usability of this appli-
given on the keyboard to minimize interference during normal
                                                                             cation can be enhanced for mainstream users whilst simulta-
document browsing. In order to achieve increased stabilization
of the feedback within a targeted object, the algorithm as sug-              neously providing an adaptable and usable interface for disabled
                                                                             users. For these purposes, eye tracking and speech recognition
gested by Kumar (2007) was used.
                                                                             capabilities were built into the Word interface. These interaction
If the user is satisfied that the correct button has been deter-             techniques can be used in isolation or in combination and the
                                                                             way in which they are used can be customized in a number of
mined they can then issue any of the verbal commands to simu-
                                                                             ways. Once the development has been completed and measure-
late a left mouse click. The letter shown on the keyboard is then
written to the document at the current cursor position.                      ments can be captured automatically in the background during
                                                                             user interaction, a longitudinal usability study will be undertaken
                                                                             Both disabled and able-bodied users will be included in the
4     Where to next?
                                                                             sample and will be required to complete a number of practice
                                                                             sessions with the application over a prolonged period of time.
As previously mentioned, the research study is still in the pre-
                                                                             After each session, participants will be required to complete a
liminary stages of an empirical study. An application has been
developed to investigate the effect of multimodal interaction                number of tasks, during which measurements will be captured
                                                                             for further analysis In this way, it will be possible to determine
techniques on the usability of a mainstream word processor ap-
                                                                             whether users are able to improve their performance on the sys-
plication. Further enhancements to the application will include
the expansion of the keyboards to include numerical keys and                 tem over an extended period – in other words, whether the sys-
                                                                             tem is usable. Additionally, user performance between the new
the magnification will be refined to respond to eye gaze and
                                                                             application and the commercially available application will be
voice commands. More voice commands will be provided for,
particularly for commands that currently have shortcut keys                  compared to determine whether they can achieve comparable
                                                                             performance on both the systems. In this way, it will be possible
assigned to them, such as Save and displaying certain dialog
                                                                             to determine whether a popular commercial application can be
boxes.
                                                                             fully exteneded into a worthwhile multimodal application which
                                                                             caters for a diverse group of users comprised of both disabled
Additionally, a back-end will be written for the application
                                                                             and able-bodied users.
which will capture certain measurements which can be used for
usability analysis. Measurements such as the number of errors
made during a task, the number of actions required and the per-              References
centage of the task completed correctly will automatically be
saved to a database for further analysis.                                    ANDERSON, T. (2009). Pro Office 2007 development with VSTO.
                                                                             APress: United States of America.
Once the application has been completed, user testing will
commence. Both disabled and non-disabled users of a local uni-               HATFIELD, F. AND JENKINS, E.A. (1997). An interface integrating
versity will be approached to participate in the study. A longitu-           eye gaze and voice recognition for hands-free computer access.
dinal study will be conducted whereby the participants will be               In Proceedings of the CSUN 1997 Conference.
required to spend periods interacting with the system. After each
exposure to the system, users will be required to complete a                 JACOB, R.J. (1995). Eye tracking in advanced interface design.
number of tasks for which measurements will be captured. In                  In Virtual Environments and Advanced interface Design, W.
this way, the learnability of the study can be measured over a               Barfield and T. A. Furness, Eds. Oxford University Press, New
period of time by comparing the results of these sessions to de-             York, NY, 258-288.
termine if user performance increases in correlation to user ex-
posure to the application. Since it is expected that there will be a         KUMAR, M. (2007). Gaze-enhanced user interface design. PhD
learning curve associated with the application, it is deemed more            Thesis, Stanford University.
applicable to capture usability measurements over a period of
time rather than only after a single session with the application.
In order to determine whether the application succeeds in pro-
viding for disabled users whilst simultaneously providing for a
better user experience for mainstream users, it is imperative that
users from both these demographics be included in the sample.



                                                                       176

More Related Content

PDF
Drona - A Mobile Authoring Tool from Deltecs InfoTech
Akash Shah
 
PDF
Capítulo 3 Programas de Aplicacion
zihat
 
PDF
Reuse Software Components (IMS 2006)
IT Industry
 
PPTX
Chapter 03
Norhisham Mohamad Nordin
 
PDF
Usability evaluation in exclusive domains
Asbjørn Følstad
 
PDF
How to Design Effective Messages for Oracle Enterprise Applications
Ultan O'Broin
 
PDF
Mobile Training (M-Training) for corporates from Deltecs Infotech
Akash Shah
 
PDF
Cloud-based Translation Memory Systems
Uwe Muegge
 
Drona - A Mobile Authoring Tool from Deltecs InfoTech
Akash Shah
 
Capítulo 3 Programas de Aplicacion
zihat
 
Reuse Software Components (IMS 2006)
IT Industry
 
Usability evaluation in exclusive domains
Asbjørn Følstad
 
How to Design Effective Messages for Oracle Enterprise Applications
Ultan O'Broin
 
Mobile Training (M-Training) for corporates from Deltecs Infotech
Akash Shah
 
Cloud-based Translation Memory Systems
Uwe Muegge
 

What's hot (10)

DOC
DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
Microsoft Private Cloud
 
PDF
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
AEGIS-ACCESSIBLE Projects
 
PDF
Video. The new dialtone for business communications
schinarro
 
PPT
Presentación Collaboration Video Cablevisión Day 2010
Logicalis Latam
 
PDF
Php In The Enterprise 01 24 2010
phptechtalk
 
PDF
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
DALEZ
 
PDF
LCTY09 - Beyond office with Lotus Symphony
Stuart McIntyre
 
PDF
My+Magic
Tianwei_liu
 
PDF
480 483
Editor IJARCET
 
PDF
Virtualize Your Telephony Platform with Cisco UCS
Advanced Logic Industries
 
DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
Microsoft Private Cloud
 
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
AEGIS-ACCESSIBLE Projects
 
Video. The new dialtone for business communications
schinarro
 
Presentación Collaboration Video Cablevisión Day 2010
Logicalis Latam
 
Php In The Enterprise 01 24 2010
phptechtalk
 
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
DALEZ
 
LCTY09 - Beyond office with Lotus Symphony
Stuart McIntyre
 
My+Magic
Tianwei_liu
 
Virtualize Your Telephony Platform with Cisco UCS
Advanced Logic Industries
 
Ad

Viewers also liked (20)

PPTX
Contemporary management theory
Sheena Marie Cordero
 
PPTX
Contemporary approach in management
shivam singh
 
PPT
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
guest99f048
 
PDF
Guía para la gestión del uso de medicamentos
MANUEL RIVERA
 
PPTX
Override Meeting S01E02
Marcelo Paiva
 
PDF
Lua on Steroids
gaetanmorice
 
PDF
อาฟกานิสถาน
Princess Chulabhorn's College, Chiang Rai Thailand
 
PDF
Igualdad ikea
oscargaliza
 
PDF
อาหรับและปาเลสไตน์
Princess Chulabhorn's College, Chiang Rai Thailand
 
PDF
กลุ่มก่อการร้ายมุสลิม
Princess Chulabhorn's College, Chiang Rai Thailand
 
PPT
Hyves Cbw Mitex Harry Van Wouter
guest2f17d3
 
PPTX
What Is A Canadian
samspector93
 
PPTX
Oxycontin®
LeCoguicA
 
PPTX
Measuring social media - updated
Emerson Povey
 
PPT
Pr1
NatkaOA
 
PPT
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
ZFConf Conference
 
PDF
Невидимый гос долг в Казахстане
Kassymkhan Kapparov
 
PPT
Mitex Pine
guest2f17d3
 
PPT
Viena – praga
tsanchezro
 
Contemporary management theory
Sheena Marie Cordero
 
Contemporary approach in management
shivam singh
 
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
guest99f048
 
Guía para la gestión del uso de medicamentos
MANUEL RIVERA
 
Override Meeting S01E02
Marcelo Paiva
 
Lua on Steroids
gaetanmorice
 
อาฟกานิสถาน
Princess Chulabhorn's College, Chiang Rai Thailand
 
Igualdad ikea
oscargaliza
 
อาหรับและปาเลสไตน์
Princess Chulabhorn's College, Chiang Rai Thailand
 
กลุ่มก่อการร้ายมุสลิม
Princess Chulabhorn's College, Chiang Rai Thailand
 
Hyves Cbw Mitex Harry Van Wouter
guest2f17d3
 
What Is A Canadian
samspector93
 
Oxycontin®
LeCoguicA
 
Measuring social media - updated
Emerson Povey
 
Pr1
NatkaOA
 
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
ZFConf Conference
 
Невидимый гос долг в Казахстане
Kassymkhan Kapparov
 
Mitex Pine
guest2f17d3
 
Viena – praga
tsanchezro
 
Ad

Similar to Beelders Using Vision And Voice To Create A Multimodal Interface For Microsoft Word 2007 (20)

PDF
White Paper
Serenoa Project
 
PDF
Hindi speech enabled windows application using microsoft
IAEME Publication
 
PDF
Oracle Fusion Applications 101
Rhapsody Technologies, Inc.
 
PDF
Interactive speech based games for autistic children with asperger syndrome
Amal Abduallah
 
PDF
Aj31253258
IJERA Editor
 
DOCX
Comparative Analysis of Visual Basic VS Java programming language.docx
RichwellIanAfrica
 
PDF
LANGUAGE TRANSLATOR APP
IRJET Journal
 
PDF
A Research Paper On A Progress Tracking Application Using Flutter And Firebase
Nat Rice
 
PDF
Poster Serenoa
Serenoa Project
 
PDF
Why Use Flutter for App Development- Features and Benefits
Lucy Zeniffer
 
PDF
SMARCOS CNR Paper Workshop Distributed UI
Smarcos Eu
 
DOC
Pert15
Teguh Fihaiman
 
PDF
Fluent Interfaces
IDES Editor
 
PDF
The Holistic Consumer Experience
MLD/ Mel Lim Design
 
PDF
Automated Java Code Generation (ICDIM 2006)
IT Industry
 
PDF
IRJET - Survey Paper on Tools Used to Enhance User's Experience with Cons...
IRJET Journal
 
PDF
IRJET- Voice based Email Application for Blind People
IRJET Journal
 
PDF
SMARCOS_Paper_Mobile hci12 246
Smarcos Eu
 
PDF
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
Kalle
 
DOCX
Tandberg movi
Roqaya Al-Aabri
 
White Paper
Serenoa Project
 
Hindi speech enabled windows application using microsoft
IAEME Publication
 
Oracle Fusion Applications 101
Rhapsody Technologies, Inc.
 
Interactive speech based games for autistic children with asperger syndrome
Amal Abduallah
 
Aj31253258
IJERA Editor
 
Comparative Analysis of Visual Basic VS Java programming language.docx
RichwellIanAfrica
 
LANGUAGE TRANSLATOR APP
IRJET Journal
 
A Research Paper On A Progress Tracking Application Using Flutter And Firebase
Nat Rice
 
Poster Serenoa
Serenoa Project
 
Why Use Flutter for App Development- Features and Benefits
Lucy Zeniffer
 
SMARCOS CNR Paper Workshop Distributed UI
Smarcos Eu
 
Fluent Interfaces
IDES Editor
 
The Holistic Consumer Experience
MLD/ Mel Lim Design
 
Automated Java Code Generation (ICDIM 2006)
IT Industry
 
IRJET - Survey Paper on Tools Used to Enhance User's Experience with Cons...
IRJET Journal
 
IRJET- Voice based Email Application for Blind People
IRJET Journal
 
SMARCOS_Paper_Mobile hci12 246
Smarcos Eu
 
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
Kalle
 
Tandberg movi
Roqaya Al-Aabri
 

More from Kalle (20)

PDF
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Kalle
 
PDF
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Kalle
 
PDF
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Kalle
 
PDF
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Kalle
 
PDF
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Kalle
 
PDF
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Kalle
 
PDF
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Kalle
 
PDF
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Kalle
 
PDF
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Kalle
 
PDF
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Kalle
 
PDF
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Kalle
 
PDF
Skovsgaard Small Target Selection With Gaze Alone
Kalle
 
PDF
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
Kalle
 
PDF
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Kalle
 
PDF
Rosengrant Gaze Scribing In Physics Problem Solving
Kalle
 
PDF
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Kalle
 
PDF
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Kalle
 
PDF
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Kalle
 
PDF
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Kalle
 
PDF
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Kalle
 
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Kalle
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Kalle
 
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Kalle
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Kalle
 
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Kalle
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Kalle
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Kalle
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Kalle
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Kalle
 
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Kalle
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Kalle
 
Skovsgaard Small Target Selection With Gaze Alone
Kalle
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
Kalle
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Kalle
 
Rosengrant Gaze Scribing In Physics Problem Solving
Kalle
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Kalle
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Kalle
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Kalle
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Kalle
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Kalle
 

Beelders Using Vision And Voice To Create A Multimodal Interface For Microsoft Word 2007

  • 1. Using Vision and Voice to Create a Multimodal Interface for Microsoft Word 2007 T.R. Beelders and P.J. Blignaut The University of the Free State, Bleomfontein, South Africa {beelderstr; pieterb}@ufs.ac.za Abstract interface should be customizable and allow users to select any combination of interaction techniques which suit their needs. There has recently been a call to move away from the standard The premise of the research study is not to develop a new word WIMP type of interfaces and give users access to more intuitive processor but rather to incorporate additional interaction tech- interaction techniques. Therefore, it in order to test the usability niques, besides the keyboard and mouse, into an application of a multimodal interface in Word 2007, the most popular word which has already been accepted by the user community. This processor, the additional modalities of eye gaze and speech rec- will allow for the improvement of an already popular product ognition were added within Word 2007 as interaction tech- and stimulate inclusiveness of non-mainstream users into the niques. This paper discusses the developed application and the mainstream market. Therefore, one aim is to determine whether way in which the interaction techniques are included within the it is possible to customize an interface to such an extent that all well-established environment of Word 2007. The additional user groups are catered for with an all-inclusive interface. interaction techniques are fully customizable and can be used in isolation or in combination. Eye gaze can be used with dwell The research study is still in the beginning phase where devel- time, look and shoot or blinking and speech recognition can be opment of the tool is underway. Therefore, for the purposes of used for dictation and verbal commands for both formatting this paper, the application as it has been developed will be the purposes and navigation through a document. Additionally, the main focus. The paper will, however, conclude with a short dis- look and shoot method can also be combined with a verbal cussion of the next phases of the research study. command to facilitate a completely hands-free interaction. Mag- nification of the interface is also provided to improve accuracy 2 Interaction Techniques and multiple onscreen keyboards are provided to provide hands free typing capabilities. Using a physical input device in order to communicate or per- form a task in human-computer dialogue is called an interaction Keywords: Eye-tracking, speech recognition, usability, word technique [Foley, et al., 1990 as cited in Jacob, 1995]. The inte- processing, multimodal raction techniques of speech recognition and eye tracking will be included in a popular word processor interface to create a mul- 1 Introduction timodal interface as a means to determine whether the usability of this product can be enhanced in this way. The word processor application has evolved substantially since its initial inception and since then has undergone a virtual meta- Although this approach has received limited attention thus far, morphosis to achieve the capabilities that are available in these the multimodal approach has always focused on the develop- applications today. As an integral part of everyday life for many ment of a third-party application, for example EyeTalk [Hatfield people it caters for a very diverse group of users, Furthermore, and Jenkins, 1997]. Contrary to this, this study will use an al- users with disabilities or needs other than those of mainstream ready existing application, namely Microsoft Word ©, which users are not always taken into consideration during system currently enjoys a high prevalence in the commercial market. development and often have to compensate by using specially designed applications which do not necessarily compare with the 3 Development environment more popular applications. This study therefore aims to investi- gate various means to increase the usability of a word processor The development environment used was Visual Studio 2008, for as wide a user group as possible. making use of the .NET Framework 3.5. Visual Studio Tools for Microsoft Office System 2008 (VSTO) in C# was used for de- For this reason, the interface of the most popular word processor velopment. VSTO allows programmers to use managed code to application will be extended into a multimodal interface. This build Office-based solutions in C# and VB.NET [Anderson, interface should facilitate use of the mainstream product by 2009]. In order to incorporate the speech recognition the Micro- marginalized users, whilst at the same time enhancing the user soft Speech Application Programming Interface (SAPI) with experience for novice, intermediate and expert users. Ideally the version 5.1 of the SDK was used. The SDK provides the capa- bility of compiling customized grammars and accessing the functionalities of the speech recognizer. In order to provide gaze Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or interaction Tobii SDK 1.5.4 was used. For magnification pur- classroom use is granted without fee provided that copies are not made or distributed poses, which will be discussed in an upcoming section, the for commercial advantage and that copies bear this notice and the full citation on the commercial product Magnifying Glass Pro 1.7 was chosen as a first page. Copyrights for components of this work owned by others than ACM must be relatively inexpensive solution but primarily based on the fact honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. that it was one of the few applications which incorporated click- Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail able areas within the magnified area which are then correctly [email protected]. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 173
  • 2. transferred to the underlying area. This is essential in the devel- continued use of the traditional interaction techniques. As illu- oped product as the magnification will increase the accuracy of strated in Figure 1, an extra tab was added to the established cursor positioning via eye gaze and correct interpretation of user Microsoft Word ribbon. This tab (circled in red) was named intention and requiring the user to disable magnification before Multimodal Add-Ins. clicking on the interface would negate all the advantages gained from magnification The new tab provides numerous options to the user to select which additional interaction techniques they would like to use The aim of the development process was to incorporate speech (Figure 1). As is evident from Figure 1, complete customization recognition and eye tracking as additional interaction techniques of the techniques is allowed via selection of any combination of in the Microsoft Word environment. The user should also be techniques as well as in what capacity the techniques must be given the freedom to determine in which combination the inte- implemented. raction techniques must be used, while still having the option of Figure 1: Multimodal Add-ins for Word 2007 Additional tools which are available to enhance the user expe- Speech recognition rience are a magnification tool and an onscreen keyboard which can be displayed at the bottom of the Word document. The mag- The user has the option of enabling the speech engine so that nification tool magnifies the immediate area under the mouse Microsoft Word can respond to verbal utterances. In terms of the cursor, thereby providing increased accuracy for users with customizable options, the user can toggle between dictation weak eyesight and those making use of the gaze sensitive inter- mode and command mode. In dictation mode, the speech recog- face. Magnification is available when using the mouse or when nition is implemented in the well-known method of capturing using eye gaze as an interaction technique. The use of the mag- vocalizations, translating those vocalizations into text and writ- nification tool is entirely at the discretion of the user who is ing the result to the currently activated document in Microsoft capable of turning magnification on and off at will or as needed. Word. In order for the dictation mode to be effective the user Magnification can be toggled using either a verbal command or must select a previously trained profile. A unique profile can be the available button on the ribbon. trained through the Windows Speech wizard. All the available speech profiles are provided in a drop-down box on the multi- Onscreen keyboards are available as an alternative to using a modal add-in tab for the convenience of the user. traditional keyboard. The onscreen keyboard can be used either through use of the traditional mouse or to achieve hands-free In command mode, a grammar is activated which accepts only typing using eye gaze or a combination of eye gaze and speech isolated commands and responds to these in a pre-determined recognition. The final adapted interface, as envisioned in use manner. Command mode provides the functions of cursor con- when the on screen keyboard is in use, is shown in Figure 2. trol, formatting capabilities and certain document handling ca- pabilities. Several different commands are provided which have The layout of the onscreen keyboard can be changed to either a the same application reaction, thereby contributing to further traditional QWERTY keyboard layout or to an alphabetic layout. customization for the user as they can determine which the most Each keyboard contains all 26 alphabetic letters, a Space bar, desirable command is for them to use. Moreover, simple cursor Backspace and Delete keys as well as special keys which simpli- control is provided by providing directional commands but more fy movement through the document. Special keys which are complex cursor control is also provided by allowing line selec- provided are Page up, Page down, Home and End. The user can tion and movement of the cursor as though control keys (such as also toggle between upper case and lower case by activating and Shift) are being pressed in combination with the verbal com- deactivating the CAPS lock key. A Select All key is provided as mand. These types of commands will simplify selection of text a means for the user to select all the text in the document. The and provide verbal commands for complex key combinations two red arrows in the lower left corner of the keyboard (Figure which are not always known to novice and intermediate users. 2) change the size of all keyboard keys in decrements and in- For example, the word “Bold” causes the activation or deactiva- crements of 10 pixels respectively, thereby providing even more tion of the bold formatting style. Similarly the words “Italic” and customization of the keyboard for the user. Auditory feedback in “Underline” activate or deactivate their formatting style. Words the form of a soft beep is given when a keyboard key is clicked such as “Cut”, “Copy” and “Paste” allow for text manipulation on. and are their subsequent actions are of course the cutting or co- pying of the currently selected text and the pasting of the clip- 174
  • 3. board contents at the position of the cursor. More complex current cursor position. Cursor control is achieved through the commands for text selection are available such as “Select line”, commands “Left”, “Right”, “Up” and “Down”. Verbal com- which selects the whole line on which the cursor is situated, mands can be issued in sequence to perform relatively complex “Select word”, which selects the word nearest to the right of the document manipulation. Figure 2: Adapted interface of Word 2007 when the onscreen keyboard is activated Eye-tracking mouse click is simulated. When selecting the look and shoot method, the user can position the cursor using eye gaze and then The eye tracker can be calibrated for use directly through the press the Enter key to simulate a left mouse click. This would Microsoft Word interface. This increases the usability of the have the effect of either placing the cursor at the position of the application as the user is not required to move between applica- eye gaze or clicking on the icon directly under the eye gaze of tions to achieve their goal of using gaze as an interaction tech- the user. The third option available to the user is that of blinking. nique. Since the word processor is the focus of this study, this In this scenario, the user fixates on the desired object or position meets the requirement of the research question scope. The user and then blinks their eyes to simulate a left mouse click. has the option to activate eye gaze which can then be used to position the cursor in the document or over an object to be ma- Multiple interaction techniques nipulated. Customization is provided by allowing the user to choose the activation method. For use purely as a single interac- When the user selects No activation (Figure 1) via eye gaze that tion method, the choices of dwell time, look and shoot and implies that they will instead be using voice commands to re- blinking are provided. When dwell time is selected, the user is spond to the current eye gaze position of the user. In this in- able to set the interval of the dwell time (see the Sensitivity Set- stance, the speech recognition must also be enabled and the user ting text box o Figure 1). This provides additional customization can then issue verbal commands to move the cursor to the cur- as the user can determine the speed with which they are most rent gaze position which is analogous to executing a left mouse comfortable and leaves the option for adjusting this interval as click at that position. In this way, it is possible for the user to the user gains more confidence and experience with gaze based place the cursor at any position in the document, or to click one interaction. The interval can be changed at any time during the of the Microsoft Word icons on the ribbon. The verbal com- use of the application. Dwell time requires the user to fixate on a mands of “Go”, “Click” or “Select” all simulate a left mouse position for the set time of the dwell time interval before a left click at the button closest to the current gaze position. In this 175
  • 4. way, the user is free to choose the command which they find Furthermore, to further investigate the usability of the newly most suitable for them. developed application, user efficiency effectiveness can be measured in a within-subjects experiment by requiring users to In most instances it is envisioned that the onscreen keyboard complete identical tasks in both the commercial Microsoft Word will also be activated under these circumstances. When the on- and the new multimodal Microsoft Word. screen keyboard is activated in conjunction with the eye gaze, visual feedback is given to the user to indicate which button will Moreover, the usability of the various interaction techniques will be clicked when the verbal command is issued. With each fixa- also be analyzed to determine which combination of the interac- tion that is detected within the boundaries of the keyboard, the tion techniques provides the most usable interface – if any. User button which is closest to that fixation is determined to be the satisfaction will be measured through means of a questionnaire target and a shape is displayed in the centre of the button. The in order to gauge user reaction, both in a short-term and long- user can also select which shape they would like to use for visu- term exposure period. al feedback. The available shapes are a solid square, a hollow square, a solid disk and a hollow circle. The hollow shapes do 5 Summary not obscure the letter of the key and in so doing provide the necessary visual feedback whilst still allowing the user to see the letter which will be written to the document. Feedback is only A multimodal interface was developed for Microsoft Word in order to eventually determine whether the usability of this appli- given on the keyboard to minimize interference during normal cation can be enhanced for mainstream users whilst simulta- document browsing. In order to achieve increased stabilization of the feedback within a targeted object, the algorithm as sug- neously providing an adaptable and usable interface for disabled users. For these purposes, eye tracking and speech recognition gested by Kumar (2007) was used. capabilities were built into the Word interface. These interaction If the user is satisfied that the correct button has been deter- techniques can be used in isolation or in combination and the way in which they are used can be customized in a number of mined they can then issue any of the verbal commands to simu- ways. Once the development has been completed and measure- late a left mouse click. The letter shown on the keyboard is then written to the document at the current cursor position. ments can be captured automatically in the background during user interaction, a longitudinal usability study will be undertaken Both disabled and able-bodied users will be included in the 4 Where to next? sample and will be required to complete a number of practice sessions with the application over a prolonged period of time. As previously mentioned, the research study is still in the pre- After each session, participants will be required to complete a liminary stages of an empirical study. An application has been developed to investigate the effect of multimodal interaction number of tasks, during which measurements will be captured for further analysis In this way, it will be possible to determine techniques on the usability of a mainstream word processor ap- whether users are able to improve their performance on the sys- plication. Further enhancements to the application will include the expansion of the keyboards to include numerical keys and tem over an extended period – in other words, whether the sys- tem is usable. Additionally, user performance between the new the magnification will be refined to respond to eye gaze and application and the commercially available application will be voice commands. More voice commands will be provided for, particularly for commands that currently have shortcut keys compared to determine whether they can achieve comparable performance on both the systems. In this way, it will be possible assigned to them, such as Save and displaying certain dialog to determine whether a popular commercial application can be boxes. fully exteneded into a worthwhile multimodal application which caters for a diverse group of users comprised of both disabled Additionally, a back-end will be written for the application and able-bodied users. which will capture certain measurements which can be used for usability analysis. Measurements such as the number of errors made during a task, the number of actions required and the per- References centage of the task completed correctly will automatically be saved to a database for further analysis. ANDERSON, T. (2009). Pro Office 2007 development with VSTO. APress: United States of America. Once the application has been completed, user testing will commence. Both disabled and non-disabled users of a local uni- HATFIELD, F. AND JENKINS, E.A. (1997). An interface integrating versity will be approached to participate in the study. A longitu- eye gaze and voice recognition for hands-free computer access. dinal study will be conducted whereby the participants will be In Proceedings of the CSUN 1997 Conference. required to spend periods interacting with the system. After each exposure to the system, users will be required to complete a JACOB, R.J. (1995). Eye tracking in advanced interface design. number of tasks for which measurements will be captured. In In Virtual Environments and Advanced interface Design, W. this way, the learnability of the study can be measured over a Barfield and T. A. Furness, Eds. Oxford University Press, New period of time by comparing the results of these sessions to de- York, NY, 258-288. termine if user performance increases in correlation to user ex- posure to the application. Since it is expected that there will be a KUMAR, M. (2007). Gaze-enhanced user interface design. PhD learning curve associated with the application, it is deemed more Thesis, Stanford University. applicable to capture usability measurements over a period of time rather than only after a single session with the application. In order to determine whether the application succeeds in pro- viding for disabled users whilst simultaneously providing for a better user experience for mainstream users, it is imperative that users from both these demographics be included in the sample. 176