Just another WordPress.com weblog

What is a text corpus?

A corpus or text corpus is a large and structured set of text.

What are they used for?

They are used to do statistical analysis and hypothesis testing, checking occurrences and validating linguistic rules in a specific universe. Nowadays, they are usually electronically structured and processed. A corpus may contain text in a single language or text data in multiple languages.

Corpora are very useful for linguistic research.

What is CORDE?

The Corpus Diacrónico del Español (CORDE) is a textual corpus of all the times and places where the Spanish language has been spoken (since 1975).

But, what is its aim? CORDE is designed to extract information to study words and their meanings, as well as its grammar and its use over time.

It was first used in 1994 when RAE brought u the possibility of applying the new technologies of information with the ain of creating a data bank which improved the quality of their working materials and made date access easier.

The Corpus collects written texts of different kinds (narrative, dramatic, lyrical, scientifical, technical…) The aim is to collect all geographical , historical and generical so that the whole is representative enough.

One of the most important targets of the diachronic corpus is to serve as a basic material for the production of the “NUEVO DICCIONARIO HISTÓRIC”.

Sources for the CORDE:

  • Books which are scanned through a programm of optical character recognition.
  • Other books in electronical formats.

Resources:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: