The TUNA Corpus Extended is an extended version of the original TUNA corpus in which some annotations related to specific aspects of referring expressions have been added. Whereas the TUNA corpus contains an exhaustive annotation of the conceptual content of the referring expressions, it lacks some information about the lexical form chosen to express that content. To fill this gap we have re-annotated every sample of the corpus that refers to a singular entity, adding annotations about lexical information as an extension of the original annotation.
Obtaining the corpus
You can download the lastest distribution of the corpus from here. Please read the LICENSE file before using the corpus.
- TUNA Corpus - Annotation Guide contains a description of the XML format used in the corpus along with information about the annotation process.
Raquel Hervás, Javier Arroyo, Virginia Francisco, Federico Peinado, Pablo Gervás (2016) Influence of personal choices on lexical variability in referring expressions. Natural Language Engineering, volume 22, issue 02, March 2016, pp 257-290. Available here.