Social Norm Cues Classification in Augmented Reality

Abstract

The growth of AR (Augmented Reality) has remained steady after the technology burst onto the scene in the early 2010s. Privacy in AR is still a fundamental and challenging issue. Immersive AR experiences, such as head mounted displays (HMDs) may be too immersive, distracting the AR user from dynamic objects in the real world which surround them and potentially violating the privacy of innocent bystanders. Another source of rapid technological advancement in the past few years has been the developments of LLMs (Large Language Models) such as OpenAI’s ChatGPT. With extensive training on a vast amount of data, LLMs can assist with a wide range of language-related tasks. Thus, this work aims to develop a system which uses the help of an LLM to detect the attitude of bystanders and subtly communicate this information to AR user in situations where bystanders want them to turn away or shut off their AR system. This proposed framework uses an emotion and gesture recognition tool to identify the subtle social cues of bystanders. The emotion and gesture data is preprocessed and a prompt is generated and delivered to OpenAI’s GPT-3.5 so that the emotions and gestures can be interpreted. Finally, a message is sent to the AR user to inform them about the potential attitude of bystanders. By collecting values for seven different emotions as well as data about head and mouth position in various case studies, we were able to generate convincing responses from GPT-3.5. In addition, we are working towards expanding the framework so that GPT-3.5 may have access to supplemental documents to aid in the interpretation process. This way, future HCI (Human Computer Interaction) studies can be incorporated into our work. Finally, to address the large overhead of using an LLM, we look to create a new LLM to be trained on custom data and run locally.

This document is currently not available here.

Share

COinS
 

Social Norm Cues Classification in Augmented Reality

The growth of AR (Augmented Reality) has remained steady after the technology burst onto the scene in the early 2010s. Privacy in AR is still a fundamental and challenging issue. Immersive AR experiences, such as head mounted displays (HMDs) may be too immersive, distracting the AR user from dynamic objects in the real world which surround them and potentially violating the privacy of innocent bystanders. Another source of rapid technological advancement in the past few years has been the developments of LLMs (Large Language Models) such as OpenAI’s ChatGPT. With extensive training on a vast amount of data, LLMs can assist with a wide range of language-related tasks. Thus, this work aims to develop a system which uses the help of an LLM to detect the attitude of bystanders and subtly communicate this information to AR user in situations where bystanders want them to turn away or shut off their AR system. This proposed framework uses an emotion and gesture recognition tool to identify the subtle social cues of bystanders. The emotion and gesture data is preprocessed and a prompt is generated and delivered to OpenAI’s GPT-3.5 so that the emotions and gestures can be interpreted. Finally, a message is sent to the AR user to inform them about the potential attitude of bystanders. By collecting values for seven different emotions as well as data about head and mouth position in various case studies, we were able to generate convincing responses from GPT-3.5. In addition, we are working towards expanding the framework so that GPT-3.5 may have access to supplemental documents to aid in the interpretation process. This way, future HCI (Human Computer Interaction) studies can be incorporated into our work. Finally, to address the large overhead of using an LLM, we look to create a new LLM to be trained on custom data and run locally.