UWA Unambiguous Word Annotations

This website contains a sense-annotated corpus of unambiguous words based on Wikipedia and OpenWebText. There are different versions depending on the number of sentences per word sense (1, 10 or 100). Embeddings for WordNet senses and synsets based on the state-of-the-art LMMS model are also available.

Daniel Loureiro and Jose Camacho-Collados

PaperDon't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation PDF

53%

WordNet Coverage

With SemCor, increases coverage from 16% to 53% of WordNet 3.0 sensekeys.

6.1M

Annotations

Compiled by processing over 53GB of text from Wikipedia and OpenWebText.

Visualization of the Embedding Space

T-SNE comparison for synset embeddings that belong to the 'noun.food' supersense. See here (23MB) for visualization of all embeddings, or below for other WN groups. Using embeddings for synsets instead of sensekeys for clearer visualization. Synset embeddings learned by converting sensekey annotations in corresponding corpora.