Multimodal interaction

From Wikipedia, the free encyclopedia

Multimodal interaction provides the user with multiple modes of interfacing with a system.

[edit] Multimodal input

Two major groups of multimodal interfaces have merged. The first group of interfaces combined various user input modesbeyond the traditional keyboard and mouse input/output, such as speech, pen, touch, manual gestures, gaze and head and body movements.such as speech, pen, touch, manual gestures, gaze and head and body movements.The most common such interface combines a visual modality (e.g. a display, keyboard, and mouse) with a voice modality (speech recognition for input, speech synthesis and recorded audio for output). However other modalities, such as pen-based input or haptic input/output may be used. Multimodal user interfaces are a research area in human-computer interaction (HCI).

The advantage of multiple input modalities is increased usability: the weaknesses of one modality are offset by the strengths of another. On a mobile device with a small visual interface and keypad, a word may be quite difficult to type but very easy to say (e.g. Poughkeepsie). Consider how you would access and search through digital media catalogs from these same devices or set top boxes. And in one real-world example, patient information in an operating room environment is accessed verbally by members of the surgical team to maintain an antiseptic environment, and presented in near realtime aurally and visually to maximize comprehension.

Multimodal input user interfaces have implications for accessibility.^[1] A well-designed multimodal application can be used by people with a wide variety of impairments. Visually impaired users rely on the voice modality with some keypad input. Hearing-impaired users rely on the visual modality with some speech input. Other users will be "situationally impaired" (e.g. wearing gloves in a very noisy environment, driving, or needing to enter a credit card number in a public place) and will simply use the appropriate modalities as desired. On the other hand, a multimodal application that requires users to be able to operate all modalities is very poorly designed.

The most common form of input multimodality in the market makes use of the XHTML+Voice (aka X+V) Web markup language, an open specification developed by IBM, Motorola, and Opera Software. X+V is currently under consideration by the W3C and combines several W3C Recommendations including XHTML for visual markup, VoiceXML for voice markup, and XML Events, a standard for integrating XML languages. Multimodal browsers supporting X+V include IBM WebSphere Everyplace Multimodal Environment, Opera for Embedded Linux and Windows, and ACCESS Systems NetFront for Windows Mobile. To develop multimodal applications, software developers may use a software development kit, such as IBM WebSphere Multimodal Toolkit, based on the open source Eclipse framework, which includes an X+V debugger, editor, and simulator.

[edit] Multimodal input and output

The second group of multimodal systems presents users with multimedia displays and multimodal output, primarily in the form of visual and auditory cues. Interface designers have also started to make use of other modalities, such as touch and olfaction. Proposed benefits of multimodal output system include synergy and redundancy. The information that is presented via several modalities is merged and refers to various aspects of the same process. The use of several modalities for processing the exact same information provides an increased bandwidth of information transfer [3]. Currently, multimodal output is used mainly for improving the mapping between communication medium and content and to support attention management in data-rich environment where operators face considerable visual attention demands [2].

An important step in multimodal interface design is the creation of natural mappings between modalities and the information and tasks. The auditory channel differs from vision in several aspects. It is omnidirection, transient and is always reserved [2]. Speech output, one form of auditory information, received considerable attention. Several guidelines have been developed for the use of speech. Michaelis and Wiggins (1982) suggested that speech output should be used for simple short messages that will not be referred to later. It was also recommended that speech should be generated in time and require an immediate response.

The sense of touch was first utilized as a medium for communication in the late 1950s [5]. It is not only a promising but also a unique communication channel. In contrast to vision and hearing, the two traditional senses employed in HCI, the sense of touch is proximal: it senses objects that are in contact with the body, and it is bidirectonal in that it supports both perception and acting on the environment.

Examples of auditory feedback include auditory icons in computer operating systems indicating users’ actions (e.g. deleting a file, open a folder, error), speech output for presenting navigational guidance in vehicles, and speech output for warning pilots on modern airplane cockpits. Examples of tactile signals include vibrations of the turn-signal lever to warn drivers of a car in their blind spot, the vibration of auto seat as a warning to drivers, and the stick shaker on modern aircraft alerting pilots to an impending stall[2].

[edit] References

^ Vitense, H.S., Jacko, J.A and Emery, V.K. (2002) Multimodal feedback: establishing a performance baseline for improved access by individuals with visual impairments. In: Fifth Annual ACM Conference on Assistive Technologies, pp. 49-56

 2. Sarter, N. B. (2006) Multimodal information presentation: Design guidance and research challenges. International Journal of Industrial. 36 (5), 439-445

 3. Oviatt, S. (2002) Multimodal interfaces. In Jacko J. & Sears (Eds.) A Handbook of Human-Computer Interaction. New Jersey: Lawrence Erlbaum.

 4. Sarter, N. B. (2002). Multimodal information presentation in support of human-automation communication and coordination. In Salas E. (Ed.) Advances in Human Performance and Cognitive Engineering Research. 2, 13-35.

 5. Geldar, F. A. (1957). Adventures in tactile literacy. American Psychologist, 12, 115-124.

[edit] See also

Modality (human-computer interaction)
W3C's Multimodal Interaction Activity - an initiative from W3C aiming to provide means (mostly XML) to support Multimodal Interaction scenarios on the Web.
Device Independence
Speech Recognition
Web accessibility
Wired glove
XHTML+Voice

[edit] External links

Swiss National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management
W3C Multimodal Interaction Activity
XHTML+Voice information at the VoiceXML Forum
XHTML+Voice Profile 1.0, W3C Note 21 December 2001
XHTML+Voice Profile 1.2, courtesy of VoiceXML Forum
ICMI, The International Conference on Multimodal Interfaces

[0] Vitense, H.S., Jacko, J.A and Emery, V.K. (2002) Multimodal feedback: establishing a performance baseline for improved access by individuals with visual impairments. In: Fifth Annual ACM Conference on Assistive Technologies, pp. 49-56

[1]

Multimodal interaction

From Wikipedia, the free encyclopedia

Contents

[edit] Multimodal input

[edit] Multimodal input and output

[edit] References

[edit] See also

[edit] External links

Views

Personal tools

Navigation

Search

Interaction

Toolbox

Languages