Structured Latent Semantic Analysis (SLSA) is a response to the failures of traditional syntactic natural language understanding (NLU) and of statistical, corpus-based semantic NLU. SLSA is a hybrid NLU system which combines shallow parsing with vector-based semantics. The semantics are taken from LSA which uses word occurrence information in a large corpus to create vectors in a high dimensional space for each word and each document in the corpus. The similarity between words and documents is measured by taking the cosine of the vectors.
Previous uses of LSA have shown that it works well with single words, and longer texts, but not with single sentences. This project is based on the hypothesis (which our evaluations support) that some level of syntactic knowledge is what is needed to come closer to human judgments.
Current work is focusing on the implementation of a structural analysis mechanism that uses:
to provide additional knowledge to an LSA-based similarity assessment mechanism.
Here is a description of our initial evaluations of this approach which was published in the Proceedings of the 22nd Annual Meeting of the Cognitive Science Society (2000), LEA Publishers. These results were not entirely encouraging. Nevertheless, we pressed on, and at the next year's Cognitive Science meeting in Edinburgh, Scotland, we unveiled much better results, which are described here.