Integración de técnicas de clasificación de texto y modelado de usuario para la personalización en servicios de noticias

TitleIntegración de técnicas de clasificación de texto y modelado de usuario para la personalización en servicios de noticias
Publication TypeThesis
Year of Publication2005
AuthorsDíaz, A
Academic DepartmentDepartamento de Ingeniería del Software e Inteligencia Artificial
DegreePhD Thesis
Date Published07/2005
UniversityUniversidad Complutense de Madrid

In the last years, the electronic information available has increased in such way that it is very difficult not to feel the overload when one try to find the information in which is really inter-ested. Web content appears in many forms over different domains of application, but in most cases the form of presentation is the same for all users. The contents are static in the sense that they are not adapted to each user from two points of view: they are neither pre-sented in a different way from each user nor capable of adapting to the interest changes of the users. Content personalization is a technique that tries to avoid information overload through the adaptation of web contents to each type of user and to the interest changes of the users.
In this thesis an integrated approach of Web content personalization applied to news ser-vices is shown. This approach is based on three main functionalities: content selection, user model adaptation and results presentation. For these functionalities to be carried out in a personalized manner, they must be based on information related to the user that must be reflected in his user profile or user model. Content selection refers to the choice of the par-ticular subset of all available documents that will be more relevant for a given user. User model adaptation is necessary because user needs change over time, especially as result of his interaction with information. Results presentation involves generating a new result web do-cument that contains, for each selected item, an extract that is indicative of its content. In particular, a personalized summary for each selected item for each user has been generated.
The user model integrates four types of reference systems that allow a representation of the interests of the users from different points of view. These interests are divided into two types: long term interests and short term interests. The first type represents interests of the user that remain constant over time, and the second represents the interests that are modi-fied. The long term model uses three classification methods that allow the user to define his information needs from three different points of view: a domain dependent classification system, where the documents are pre-classified by the document author (e.g.: sections in a newspaper), an independent domain classification system, obtained of the first level catego-ries of Yahoo! Spain, and a set of keywords.
The different personalized processes are based on statistic classification text techniques that are applied as to the documents and to the user models. The text classification tasks that are used are related with information retrieval, text categorization, relevance feedback and text summarization.
The evaluation of personalized systems is especially complex because the opinions of dif-ferent users are necessary to be able to obtain relevant conclusions about system perform-ance. To evaluate the different personalization processes some evaluation collections have been generated where the relevance judges of various users over various days are stored. These collections have made it possible to try different approaches to determine which are the best choices for this purpose. Moreover other investigators can use these collections to compare the results of their personalization techniques.
The evaluations have showed that the personalization approach based on the combina-tion of long term and short term models, with personalized summaries as way to present the final results, achieves a certain reduction of the information overload of the users, independ-ently of the domain and the language, in a Web content personalization system applied to news services.

Full Text
PDF icon tesisAlbertoDiaz.pdf1.52 MB