Get a Grip: Multimodal Visual and Simulated Tendon Activations for Grounded Semantics of Hand-Related Descriptions

Document Type

Student Presentation

Presentation Date



College of Engineering


Computer Science

Faculty Sponsor

Casey Kennington


The theory of embodied semantics postulates that sensory-motor areas of the brain used to produce an action are also used for the conceptual representation of such action. Therefore, the semantic representation of hand gestures is grounded in part through the visual representation of a hand gesture and the tendon activations used to generate such a gesture. We build and train a Words-as-Classifiers machine learning model tasked in identifying the semantics of hand gestures by leveraging both hand tendon activations as well as hand visual features. Our experimental results show that a multimodal fusion of both visual and tendon data yields better results for the model than either of the modalities alone in image and description retrieval tasks. By simulating mirror neurons, we further show that the simulated tendon activations can be derived from the visual features and applied to the model.

This document is currently not available here.