Bookbot

Multi-modal scene understanding using probabilistic models

More about the book

How do we explain a picture to someone else? We describe its colors, shapes, and the relationships between objects. Similarly, to clarify a verbal statement, we might show a picture that visualizes its content. In everyday communication, people use various methods simultaneously to convey their intentions, such as pointing, facial expressions, gestures, or referencing their shared environment. This multimodal approach should also apply to human-computer interfaces, leading to a paradigm shift from passive interactions like mouse clicks to active communication partners that interpret auditory and visual cues, draw inferences, and seek additional information. Such an interface is termed an artificial communicator. However, merely interpreting signals from individual modalities—like speech, gestures, or visual recognition—is insufficient. To create systems that communicate naturally, integrating these modalities is crucial. Each modality has its distinct vocabulary and expressiveness: pointing indicates interest, facial expressions convey emotions, speech provides factual information, and vision interprets shapes. This thesis will explore the most effective formalism for integrating the outputs of specialized processing components in a multimodal system. It will address how to connect these components and organize processing, offering innovative solutions and a practical realization in a specific domain.

Book purchase

Multi-modal scene understanding using probabilistic models, Sven Wachsmuth

Language
Released
2003
product-detail.submit-box.info.binding
(Paperback)
We’ll email you as soon as we track it down.

Payment methods

No one has rated yet.Add rating