Learning of Similarity Measures

Similarity measures, sometimes also referred to as similarity functions or distance measures, represent an integral functional component of a case-based reasoning system. However, their manual definition typically turns out as a tedious and time-consuming task and typically requires expert knowledge from a domain expert. In this project we investigated a novel learning approach for optimizing the accuracy and appropriateness of similarity measures. This learning approach is suited for most CBR application domains (i.e. not restricted to classification tasks) and it enables the learning of highly sophisticated, knowledge-intesive similarity measures (composed, for example, of local similarity measures).

Technical Background

The basic idea of this approach is to acquire only high-level knowledge about the utility of cases for some set of given problem situations. The necessary low-level knowledge required to compute the utility of cases for new problem situations is then extracted from the acquired high-level knowledge by employing machine learning techniques.
Generally, it requires some kind of similarity teacher that is able to provide the mandatory training data. This training data can be described as a set of corrected retrieval results called case order feedback. This means training queries are used to perform retrievals based on some initial similarity measure. The task of the similarity teacher is then the analysis of the obtained retrieval results with respect to the actual utility of the retrieved cases for the given queries. Obvious deficiencies have to be corrected by reordering the cases. Note, that the approach does not require feedback for all retrieved cases. Even information about the utility of a single case might be useful, for example, in an e-commerce scenario where the customer does not buy the most similar product but another one contained in the retrieval result. To be able to compare the obtained retrieval results with the case order feedback provided by the similarity teacher, a special error function has to be defined that measures the “distance” between the two given partial orders. Finally, the task of the machine learning algorithm is to minimize this error function by modifying the initial similarity measure.

Our paper on using evolutionary algorithms for learning similarity measures has won the best paper award at the International Conference on Case-Based Reasoning in 2003.

Selected Publications