What is a text corpus?
A corpus or text corpus is a large and structured set of text.
What are they used for?
They are used to do statistical analysis and hypothesis testing, checking occurrences and validating linguistic rules in a specific universe. Nowadays, they are usually electronically structured and processed. A corpus may contain text in a single language or text data in multiple languages.
Corpora are very useful for linguistic research.
What is CORDE?
The Corpus Diacrónico del Español (CORDE) is a textual corpus of all the times and places where the Spanish language has been spoken (since 1975).
But, what is its aim? CORDE is designed to extract information to study words and their meanings, as well as its grammar and its use over time.
It was first used in 1994 when RAE brought u the possibility of applying the new technologies of information with the ain of creating a data bank which improved the quality of their working materials and made date access easier.
The Corpus collects written texts of different kinds (narrative, dramatic, lyrical, scientifical, technical…) The aim is to collect all geographical , historical and generical so that the whole is representative enough.
One of the most important targets of the diachronic corpus is to serve as a basic material for the production of the “NUEVO DICCIONARIO HISTÓRIC”.
Sources for the CORDE:
- Books which are scanned through a programm of optical character recognition.
- Other books in electronical formats.
- RAE (Real Academia Española). Retrieved 15:45 29th March, 2011 from: http://www.rae.es/rae/gestores/gespub000019.nsf/voTodosporId/B4E26FC2520104D8C125716400455C06?OpenDocument&i=1
- Banco de datos del español (RAE). rETRIEVED 15:55 29th March, 2011 from: http://corpus.rae.es/ayuda_c.htm
- Wikipedia. The free encyclopedia. Retrieved 16:00 29th March, 2011 from: http://en.wikipedia.org/wiki/Corpus_linguistics