Publication Date

12-2016

Date of Final Oral Examination (Defense)

10-27-2016

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Maria Soledad Pera, Ph.D.

Supervisory Committee Member

Timothy Andersen, Ph.D.

Supervisory Committee Member

Nerea Lete, M.F.A.

Abstract

Readability refers to the ease with which a reader can understand a text. Automatic readability assessment has been widely studied over the past 50 years. However, most of the studies focus on the development of tools that apply either to a single language, domain, or document type. This supposes duplicate efforts for both developers, who need to integrate multiple tools in their systems, and final users, who have to deal with incompatibilities among the readability scales of different tools. In this manuscript, we present MultiRead, a multipurpose readability assessment tool capable of predicting the reading difficulty of texts of varied type and length regardless of the language in which they are written. MultiRead bases its predictions on multiple indicators extracted from textual resources, including lexical, morphological, syntactical, semantic and social indicators. The latter are of particular interest given the recent adoption of social sites by users of different age and reading abilities. We gathered a leveled corpora comprised of textual resources in English, Spanish and Basque languages, with diverse length, source, domain and format. This corpora was used for assessing the effectiveness of MultiRead, and demonstrating that MultiRead outperforms other readability assessment systems, in terms of accuracy among all languages and document types evaluated.

Share

COinS