Download: pdf (33 pages; 536 Kb)
Abstract: The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross & Miller, 1990; Jarmasz & Szpakowicz, 2003). An alternative approach seeks to model word meanings as high-dimensional vectors, which are derived from the co-occurrence of words in unlabeled text corpora (Landauer & Dumais, 1997; Burgess & Lund, 1997). This paper introduces a new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments. We explain the new model, known as COALS, and how it relates to prior methods, and then evaluate the various models on a range of tasks, including a novel set of semantic similarity ratings involving both semantically and morphologically related terms.
Copyright Notice: The documents distributed here have been provided as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.