Posts

Showing posts with the label Corpus linguistics

The British National Corpus (BNC)

Image
The British National Corpus (BNC) is a 100 million-word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the late 20th century. It was created by Oxford University Press (OUP) and the Longman Group Ltd (now Pearson Education) in the 1980s and 1990s. The corpus contains text from a variety of genres, including spoken conversation, fiction, newspapers, and academic texts. The BNC is an important resource for linguistic research and is widely used in the fields of corpus linguistics, computational linguistics, and language teaching. The corpus is fully searchable and is available in both a raw form, as well as a tagged form, which includes information about word class (e.g. noun, verb, adjective) and grammatical structure. It is divided into two parts: the written part (90%) and the spoken part (10%). The written part is divided into four sections: fiction, non-fiction, newspaper, and acad