Document Type

Student Presentation

Presentation Date


Faculty Sponsor

Casey Keck


Corpora are excellent resources for learning, particularly considering research showing the importance of frequently used word clusters (called lexical bundles or collocations) in promoting learner fluency. However, in the context of Chinese language, most of the available corpus resources seem to be unrepresentative of how native Chinese use language in everyday life, possibly due to the influence of writers’ awareness of censorship on the Chinese internet.

The goal of this endeavor was to create a corpus of reliably natural text from China’s national newspaper, The People’s Daily(人民日报,人民网),for the purpose of identifying lexical bundles that serve to create structure in Chinese sentences in the news register. The corpus was extracted from The People’s Daily website using a web crawler, to a total of more than five hundred articles and about one million Chinese characters.

As such, the presentation will reveal several trends in collocation, which can serve as a resource to develop learning materials to improve Chinese language learning, especially for improving Chinese reading skill in the domain of news articles.