Publication Date

8-2018

Date of Final Oral Examination (Defense)

6-4-2018

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Major Advisor

James Buffenbarger, Ph.D.

Advisor

Casey Kennington, Ph.D.

Advisor

Jerry Alan Fails, Ph.D.

Abstract

Describing scenes such as rooms, city streets, or routes, is a very common human task that requires the ability to identify and describe the scene sufficiently for a hearer to develop a mental model of the scene. When people talk about such scenes, they mention some objects of the scene at the exclusion of others. We call the mentioned objects salient objects as people consider them noticeable or important in comparison to other non-mentioned objects. In this thesis, we look at saliency of visual scenes and how visual saliency informs what can and should be said about a scene when describing it.

Previous work on saliency focuses on the scenes themselves, whereas we are interested in what people actually say about those scenes. For this, we take the scenes and human dialogue into account. To collect the dialogue data, we developed a web application and used a crowd sourcing platform to get access to an on-demand workforce which allowed us to get more realistic and varied responses from users. To automate the process of detecting salient objects given a novel scene, we used a popular image content analysis tool to extract objects present in the scene. We used the dialogue data to rank the detected objects based on their saliency which gives us candidate objects to mention. We also compare how different features of the gathered data can be used to develop saliency detection models. Our initial investigation shows that human dialogue data significantly improves the detection of salient objects.

DOI

10.18122/td/1446/boisestate

Share

COinS