GALANTE: GenerAción del LenguAje Natural para Textos con Emociones

The GALANTE (Generación de Lenguaje Natural para Textos con Emociones) project was a subproject of a larger coordinated project that combined efforts from the NIL research group at UCM and the Julietta research group at the University of Seville. The coordinated project was named DIVAGALAN (Development and Validation of an Architecture for Natural Language Generation). DIVAGALAN also included the GILDA (Generación de lenguaje para sistemas de diálogo) subproject.

The project proposal contemplated two top level goals: the development of a natural language generation (NLG) application, and its validation in the context of a dialogue system (DS). The proposal outlined a coordinated project plan aimed at achieving these goals by bringing together the NLG expertise of the NIL research group at Universidad Complutense de Madrid (UCM), and the experience of the Julietta research group of the University of Seville (USE) in the development of dialogue systems. Subproject GALANTE (Natural Language Generation for Texts with Emotions) was to concentrate on the task of developing a reusable generic solution for generating textual messages tagged with emotions. Subproject GILDA (Natural Language Generation for Dialogue Systems) was to concentrate on the integration of the generation module in a real dialogue system, and on the task of validating its practical operation in the context of a real application to domotic dialogue systems.

The main result of the Galante subproject has been the development of TAP (a Text Arranging Pipeline), a software architecture for natural language generation that can be instantiated to develop natural language generation modules tailored for specific purposes. The core of the TAP architecture defines natural language generation generic functionality, from an initial conceptual input to surface realization as a string, with intervening stages of content planning and sentence planning. A large number of existing solutions for particular NLG subtasks have been taken into account, as well as the extensive literature on architectures for NLG systems. Existing architectures had focused very specifically on the English language, and adapting them to other languages, such as Spanish, had been identified as problematic. The TAP architecture has been developed and tested over English and Spanish test sets, and the current version shows acceptable coverage over both languages. In this process, several problems arose that had not been considered in the monolingual approach to NLG.

A specific subgoal of the project concerned the representation of synthetic emotions and their role in the interpretation of text. In order to provide a solid empirical basis for subsequent developments, a corpus of texts tagged with emotional information was developed. The sentences of the corpus were tagged by human evaluators according to several of the available schemas for representing emotion. The results were analysed to establish an initial model of how humans treat compositionality of emotions over complex sentences [FG06c]. Semi-automatic techniques were employed to design a method for simulating the observed compositional behaviour [FG06a, FG06b]. The resulting information was used to configure a speech synthesizer to produce emotional voice [FGGL07].

Another stated subgoal of the project was to enable the system to handle representations of its input data in terms of description logic ontologies. For this purpose, an ontology for the representation of emotions was developed [FGP07].

A particular subgoal of the original project was concerned with how to allow NLG systems to adapt to the particular communication needs of a given user. This subgoal has been addressed from several points of view: how to build user models and use them to drive the operation of a particular system that communicates information [D´07, DG07], and how to evaluate the adaptability of a system to the needs of a given user [DGG08].

Another subgoal of the original project was to consider how non-textual data can be fruitfully communicated as text. Early attempts at rendering this kind of data as text quickly identified an important underlying problem: how to convey complex structure within the constraints imposed by text, limited as it is to a linear sequence of sentences. Narrative was adopted as a guiding case study because it reproduced very similar constraints, it is known to play a very important role in human communication, and it also provided examples of interactive application. This effort resulted in publications concerning: story generation [PG06a, GLRMP06], story telling [LHG07, LHGP07], interactive approaches to narrative [GGP06, PG07, LPN08, GPyPSL07, PCP08], and particular software tools for controlling interactive
environments [PN07].