This paper presents a system developed for adaptive retrieval and the filtering of documents belonging to digital libraries available on the Web. This system, called InfoWeb, is currently in operation on the ENEA (National Entity for Alternative Energy) digital library Web site reserved to the cultural heritage and environment domain. InfoWeb records the user information needs in a user model, created through a representation, which extends the traditional vector space model and takes the form of a semantic network consisting of co-occurrences between index terms. The initial user model is built on the basis of stereotypes, developed through a clustering of the collection by using specific documents as a starting point. The user’s query can be expanded in an adaptive way, using the user model formulated by the user himself. The system has been tested on the entire collection comprising about 14,000 documents in HTML=text format. The results of the experiments are satisfactory both in terms of performance and in terms of the system’s ability to adapt itself to the user’s shifting interests.
InfoWeb: An Adaptive Information Filtering System on the Cultural Heritage Domain
Sciarrone F
2003-01-01
Abstract
This paper presents a system developed for adaptive retrieval and the filtering of documents belonging to digital libraries available on the Web. This system, called InfoWeb, is currently in operation on the ENEA (National Entity for Alternative Energy) digital library Web site reserved to the cultural heritage and environment domain. InfoWeb records the user information needs in a user model, created through a representation, which extends the traditional vector space model and takes the form of a semantic network consisting of co-occurrences between index terms. The initial user model is built on the basis of stereotypes, developed through a clustering of the collection by using specific documents as a starting point. The user’s query can be expanded in an adaptive way, using the user model formulated by the user himself. The system has been tested on the entire collection comprising about 14,000 documents in HTML=text format. The results of the experiments are satisfactory both in terms of performance and in terms of the system’s ability to adapt itself to the user’s shifting interests.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.