Incremental dialogue processing has been an important topic in spoken dialogue systems research, but the broader research community that makes use of language interaction (e.g., chatbots, conversational AI, spoken interaction with robots) have not adopted incremental processing despite research showing that humans perceive incremental dialogue as more natural. In this paper, we extend prior work that identifies the requirements for making spoken interaction with a system natural with the goal that our framework will be generalizable to many domains where speech is the primary method of communication. The Incremental Unit framework offers a model of incremental processing that has been extended to be multimodal, temporally aligned, enables real-time information updates, and creates complex network of information as a fine-grained information state. One challenge is that multimodal dialogue systems often have computationally expensive modules, requiring computation to be distributive. Most importantly, when speech is the means of communication, it brings the added expectation that systems understand what they (humans) say, but also that systems understand and respond without delay. In this paper, we build on top of the Incremental Unit framework and make it amenable to a distributive architecture made up of a robot and spoken dialogue system modules. To enable fast communication between the modules and to maintain module state histories, we compared two different implementations of a distributed Incremental Unit architecture. We compare both implementations systematically then with real human users and show that the implementation that uses an external attribute-value database is preferred, but there is some flexibility in which variant to use depending on the circumstances. This work offers the Incremental Unit framework as an architecture for building powerful, complete, and natural dialogue systems, specifically applicable to robots and multimodal systems researchers.
Imtiaz, Mir Tahsin and Kennington, Casey. (2022). "Incremental Unit Networks for Distributed, Symbolic Multimodal Processing and Representation". In V.G. Duffy (Ed.), HCII 2022: Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Health, Operations Management, and Design, (Lecture Notes in Computer Science series, Volume 13320, pp. 344-393). Springer. https://doi.org/10.1007/978-3-031-06018-2_24
HCII 2022: Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Health, Operations Management, and Design is volume 13320 of the Lecture Notes in Computer Science book series.