She then ran her model. Within three days, her neural network learned to predict, with surprising accuracy, whether an undocumented language would likely have tone distinctions based on its geographical neighbors. The results earned her a best paper award.
Whether you are working on endangered language documentation, multilingual question answering, or computational typology, this zip file deserves a place in your toolkit. Unzip it, fine-tune it, and let the 36 sets guide your model toward deeper linguistic insight. WALS Roberta Sets 1-36.zip
One of the most powerful uses of is transferring predictions to languages not in WALS. Because RoBERTa learns from subword tokens, you can: She then ran her model