Fecha: 28 de mayo de 2013
Lugar de celebración: Sala 1.03, ETSI Informática, UNED (mapa)
Resumen: Many Natural Language Processing tasks can be seen as a problem of defining similarity measures between texts (e.g. document/document, query/document, sentence/sentence, etc). A crucial issue is then to find the most appropriate similarity measure or combination of measures. The standard development cycle consists of optimizing measures with respect to test collections with human assessments; However, a common problem with this methodology is that evaluation results change considerably over different datasets.
In this talk, we present three theorems that any similarity measure satisfies, providing empirical evidences and theoretical proves. These theorems explain multiple phenomena observed in PLN tasks such us, the high predictive power of text output evaluation measures at system level, the decreasing nature of precision/recall curves in IR tasks, the pooling biasing phenomena in TREC corpora, the high performance of combining diverse systems, the unexpected high performance of voting methods in machine learning scenarios, among others. In addition, the theorems provide an unsupervised method for combining evaluation measures, predicting the clustering threshold in grouping tasks and predicting the average relevance of IR system produced rankings.