Document Type

Article

Publication Date

6-17-2022

Abstract

The analysis of spoken language has been integral to a breadth of research in social science and beyond. However, for analyses to occur with efficiency, language must be in the form of computer-readable text. Historically, the speech-to-text process has occurred manually using human transcriptionists. Automated speech recognition (ASR) is advertised as an efficient and inexpensive alternative, but research shows this method of speech-to-text is prone to error. This paper investigates the viability of using error prone ASR transcriptions as part of the methodological process of language analysis. Results show that at the individual feature level, analysis of ASR transcriptions differ dramatically from human transcriptions. However, when the same features are used for classification, a common machine learning task, performance results between ASR and human transcriptions are similar. We present these findings and conclude with a discussion on the methodological considerations for researchers who opt to use automated speech recognition for social science research.

Comments

This article is currently in press, the publication date provided is the online early release date. Any information regarding this publication is subject to change at the time of official publication.

Copyright Statement

This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Social Research Methodology on June 17, 2022, available online: https://doi.org/10.1080/13645579.2022.2087849

Available for download on Sunday, December 17, 2023

Share

COinS