
.ttf.png)

The algorithm learns to embed language sounds into a multidimensional space where differences in pronunciation are reflected in the distance between corresponding vectors. A word with a “p” in the parent language may change into a “b” in the descendant language, but changing to a “k” is less likely due to the significant pronunciation gap.īy incorporating these and other linguistic constraints, Barzilay and MIT PhD student Jiaming Luo developed a decipherment algorithm that can handle the vast space of possible transformations and the scarcity of a guiding signal in the input. For instance, while a given language rarely adds or deletes an entire sound, certain sound substitutions are likely to occur.

Spearheaded by MIT Professor Regina Barzilay, the system relies on several principles grounded in insights from historical linguistics, such as the fact that languages generally only evolve in certain predictable ways. The team’s ultimate goal is for the system to be able to decipher lost languages that have eluded linguists for decades, using just a few thousand words. They also showed that their system can itself determine relationships between languages, and they used it to corroborate recent scholarship suggesting that the language of Iberian is not actually related to Basque. However, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) recently made a major development in this area: a new system that has been shown to be able to automatically decipher a lost language, without needing advanced knowledge of its relation to other languages. (To illustrate, imaginetryingtodecipheraforeignlanguagewrittenlikethis.) Some don’t have a well-researched “relative” language to be compared to, and often lack traditional dividers like white space and punctuation. Unfortunately, most of them have such minimal records that scientists can’t decipher them by using machine-translation algorithms like Google Translate. Lost languages are more than a mere academic curiosity without them, we miss an entire body of knowledge about the people who spoke them. Dozens of these dead languages are also considered to be lost, or “undeciphered” - that is, we don’t know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts. Recent research suggests that most languages that have ever existed are no longer spoken.
