Corpus linguistics involves using large collections of natural language texts, known as corpora, to study patterns of language usage. Corpora provide insights into how language varies between spoken and written forms as well as formal and casual contexts. Creating corpora from spoken language through transcription can be time-consuming. Different types of corpora exist for various research topics in linguistics. Important factors in corpus design include size, representativeness, and whether the sample is based on production or reception of language. Compiling corpora, especially from spoken language, requires obtaining and processing text data.