Publication Date
8-2020
Date of Final Oral Examination (Defense)
4-24-2020
Type of Culminating Activity
Thesis
Degree Title
Master of Science in Computer Science
Department
Computer Science
Supervisory Committee Chair
Casey Kennington, Ph.D.
Supervisory Committee Member
Francesca Spezzano, Ph.D.
Supervisory Committee Member
Sole Pera, Ph.D.
Abstract
Automated systems that make use of language, such as personal assistants, need some means of representing words such that 1) the representation is computable and 2) captures form and meaning. Recent advancements in the field of natural language processing have resulted in useful approaches to representing computable word meanings. In this thesis, I consider two such approaches: distributional embeddings and grounded models. Distributional embeddings are represented as high-dimensional vectors; words with similar meanings tend to cluster together in embedding space. Embeddings are easily learned using large amounts of text data. However, embeddings suffer from a lack of "real world" knowledge; for example, the knowledge of identifying colors or objects as they appear. In contrast to embeddings, grounded models learn a mapping between language and the physical world, such as visual information in pictures. Grounded models, however, tend to focus only on the mapping between language and the physical world and lack the knowledge that could be gained from considering abstract information found in text.
In this thesis, I evaluate wac2vec, a model that brings together grounded and distributional semantics to work towards leveraging the relative strengths of both, and use empirical analysis to explore whether wac2vec adds semantic information to traditional embeddings. Starting with the words-as-classifiers (WAC) model of grounded semantics, I use a large repository of images and the keywords that were used to retrieve those images. From the grounded model, I extract classifier coefficients as word-level vector embeddings (hence, wac2vec), then combine those with embeddings from distributional word representations. I show that combining grounded embeddings with traditional embeddings results in improved performance in a visual task, demonstrating the viability of using the wac2vec model to enrich traditional embeddings, and showing that wac2vec provides important semantic information that these embeddings do not have on their own.
DOI
10.18122/td/1708/boisestate
Recommended Citation
Black, Stacy, "Towards Unifying Grounded and Distributional Semantics Using the Words-as-Classifiers Model of Lexical Semantics" (2020). Boise State University Theses and Dissertations. 1708.
10.18122/td/1708/boisestate