Publication Date

12-2019

Date of Final Oral Examination (Defense)

12-1-2019

Type of Culminating Activity

Dissertation

Degree Title

Doctor of Philosophy in Computing

Department

Computer Science

Supervisory Committee Chair

Maria Soledad Pera, Ph.D.

Supervisory Committee Member

Michael Ekstrand, Ph.D.

Supervisory Committee Member

Edoardo Serra, Ph.D.

Supervisory Committee Member

Hoda Mehrpouyan, Ph.D.

Abstract

Information Retrieval (IR) has changed the way we access digital resources and satisfy our daily information needs. Popular IR tools like Search Engines, Recommendation Systems, and Automatic Question Answering sites, act as a deterrent for information overload while fostering (at least in theory) the democratization of access to resources. Yet, in their majority, IR tools are built with a traditional user in mind. This causes users who deviate from the norm, e.g., users with low educational background, visually-impaired users, or users who speak different languages, to be undeserved and thus struggle to find the information they require. In this manuscript, we present novel methodologies that can enable better adaptation of IR tools to non-traditional users. We focus on two aspects in which users can differ from the traditional: language and reading skills. We study and address such difficulties, and discuss how they affect IR systems. Particularly, we allocate research efforts to three main areas: (1) readability assessment, where we introduce the first featureless architecture, enabling it to be used in any language without specific tuning; (2) cross-lingual word embedding generation, where we address the English-dependency problem of state-ofthe-art strategies via a hierarchical mapping strategy that takes advantage of the language family tree; and (3) cross-lingual sentence embedding generation, where we present a novel representation learning framework based on a hierarchical sequence-to-sequence model that enables better representations for low-resource languages. Each of the strategies that result from this work can be leveraged in the design of IR systems that better support vii non-traditional users. In fact, to demonstrate how they can be integrated to address the needs of non-traditional users, we also conduct an analysis of four different readability assessment strategies (based on our three models) in terms of their language transfer capabilities, demonstrating the use of the aforementioned models in low-resource language scenarios. Despite the contributions presented, results indicate that there is still a long path towards building IR systems that fully address the needs of non-traditional users, in areas including representation of typologically isolated low-resource languages or more fine-grained multilingual readability assessment.

DOI

10.18122/td/1612/boisestate

Share

COinS