Contributing is easy!Here's an explanation.

Instructions

  1. Choose a word in Assyrian that you can use in an example.
  2. Say a few sentences using the word you choose.
  3. Write down what you said word-for-word in Assyrian and English.
  1. Pick a sentence in Assyrian and adjust its spelling to your dialect.
  2. Read the sentence.
  1. Record yourself speaking freely for a few minutes.
  2. If you can manage, try writing down what you said.

Motivation

There are almost no writings in modern Assyrian, and texts that do exist don't reflect how Assyrian is spoken in normal conversation. We can change this by writing sentences, enough to cover most words in the Assyrian language.
If writing sentences is too difficult, you can read existing sentences. Our current dataset of sentences is written in the Urmi pronunciation, so you can adjust them to your dialect.
Collecting oral history from every dialect is important for preserving our identity and culture. These long-form recordings also help in unsupervised training tasks.

Tools this data builds

automatic translation, automatic speech recognition.
cross-dialectal interpretors, automatic speech recognition.
automatic speech recognition, text-to-speech.

Building tools will promote the usage of Assyrian in future generations

Assyrian has been spoken for longer than any other Semitic language in the world. Today, however, the language is severely endangered. Within the next few generations, Assyrian is expected to fall out of everyday usage. An accelerating factor to this decline is the lack of technology around the language to encourage its usage. Vital tools like translators and voice assistants have not been developed yet.

But this takes data. You donate the examples, we build the datasets

ENGLISH TEXT: ASSYRIAN TEXT: ASSYRIAN SPEECH: DIALECT:
ana raba byayen lakhma I really want bread ./audio.wav Urmi

Assyrian has been left behind from most technologies because it is extremely low-resource, meaning the data necessary to build these tools simply does not exist. This project is committed to changing that. After cleaning and validating the examples donated, we construct the speech and translation datasets necessary for building these language tools.

An example of how machines that learn from this data can help

Given enough examples of Assyrian speech, a machine can learn how the language sounds like and what it means. With only 35 minutes of speech data, a Automatic Speech Recognition model was trained, one accurate enough to help research efforts document and preserve endangered dialects. With more data, technologies like this will improve, enough to be used in everyday technologies like automatic video subtitles and voice typing.

Everyone can and should contribute, no matter your dialect, fluency, or age

In order for this data to benifit future generations, it is crucial that examples are collected from Assyrians from all backgrounds. Young Assyrians might speak with American accents, and so our data should reflect that. Hence, if you are a younger speaker, your contribution matters. You do not need to be a scholar, since all sentences collected are expected to be written with English letters. Mistakes are expected, and we will validate them

Let's not leave any dialect behind

Assyrian has not one language, but many. In the past, the attention of academia has been a dominated by only a few dialects, like Urmi. This project aims to fix this disparity by collecting data from each dialect seperately. If your dialect is nearly extinct, your contributions are crucial to preserving your linguistic heritage. The examples you give will serve as a time capsule for your grandchildren to relearn, or even reconstruct, your dialect.