IT and Supply Chain Management Faculty Publications and Presentations

Data Quality Relevance in Linguistic Analysis: The Impact of Transcription Errors on Multiple Methods of Linguistic Analysis

Steven Pentland, Boise State UniversityFollow
Lee Spitzley, SUNY Albany
Christie Fuller, Boise State UniversityFollow
Doug Twitchell, Boise State UniversityFollow

Document Type

Conference Proceeding

Publication Date

2019

Abstract

There is an enormous amount of recorded speech generated daily, and quickly transcribing and analyzing the text of this speech could have tremendous value to organizations and researchers. However, the speech transcription process has historically been laborious, expensive, and slow. Automatic speech recognition (ASR) tools have matured a great deal in the last decade and may be a suitable method to generate large scale, high quality transcriptions. These tools are are fast and economical, but generally produce errors at a much greater rate than human transcribers. It is unknown whether these errors matter when conducting psycholinguistic research. In this study, we will investigate the accuracy of earnings conference call transcripts produced by multiple tools and the impact of that transcription accuracy on the results of subsequent text mining analysis. While prior studies have focused on a single form of text mining, we will conduct three types of text analysis: bag-of-words based classification, lexicon-based classification and sentiment analysis. The results will show whether a different level of transcription quality is required for different types of text mining and the feasibility of using automated transcription services across a range of text mining applications.

Publication Information

Pentland, Steven; Spitzley, Lee; Fuller, Christie; and Twitchell, Doug. (2019). "Data Quality Relevance in Linguistic Analysis: The Impact of Transcription Errors on Multiple Methods of Linguistic Analysis". 25th Americas Conference on Information Systems, AMCIS 2019, .

This document is currently not available here.

Find in your library

COinS

ScholarWorks

IT and Supply Chain Management Faculty Publications and Presentations

Data Quality Relevance in Linguistic Analysis: The Impact of Transcription Errors on Multiple Methods of Linguistic Analysis

Document Type

Publication Date

Abstract

Publication Information

Browse

Links

Search

Author Corner

ScholarWorks

IT and Supply Chain Management Faculty Publications and Presentations

Data Quality Relevance in Linguistic Analysis: The Impact of Transcription Errors on Multiple Methods of Linguistic Analysis

Authors

Document Type

Publication Date

Abstract

Publication Information

Share

Browse

Links

Search

Author Corner