Doctoral defence: Andre Tättar "Multilingual machine translation for under-resourced languages"

On 23 May 2025 at 10:15 Andre Tättar will defend his doctoral thesis "Multilingual machine translation for under-resourced languages" to obtain the degree of Doctor of Philosophy (in Computer Science).

Supervisor
Prof. Mark Fišel, University of Tartu

Opponents
Prof. Jörg Tiedemann, University of Helsinki (Finland)
David Vilar Torres, Google Berlin (Germany)

Summary
Imagine a world where every language has advanced natural language applications like ChatGPT or is present in Google Translate, no matter how small. This is the vision behind my research, which delves into machine translation for the Finno-Ugric languages—a language family with over 40 different languages, spoken by over 20 million people across Europe and North Asia. These languages, from Estonian, Finnish, or Hungarian national languages to more minor local languages like Võro, Livonian, or Komi, carry rich cultural legacies but face significant digital neglect.

My doctoral work addresses this gap by developing robust neural machine translation systems tailored for these languages. With a focus on languages ranging from higher to lesser-known under-resourced languages, the research uses state-of-the-art NLP techniques to overcome data scarcity and enhance translation accuracy and efficiency. The final systems in work support translation for 23 Finno-Ugric languages, bringing the benefits of advanced translation technology to a diverse range of communities.

The practical outcome of this thesis has been significant, especially with the development of Estonian-centric translation tools that can compete with the likes of Google Translate and DeepL. The Estonian government has adopted these systems, showcasing their effectiveness in real-world scenarios. Beyond governmental use, these translation models provide communities access to information and services in their native languages, supporting cultural preservation and participation in the digital world.

This work demonstrates how technology can help bring equal access to digital resources for all languages. We're building a more inclusive digital environment by ensuring minor languages aren't overlooked. The research not only pushes the boundaries of NLP technology but also emphasizes the importance of valuing linguistic diversity in technological progress. It's a step towards a future where no language is left behind in the digital age.

The defence will be held also in Zoom (meeting ID: 670 504 9543, passcode: ati).

Did you find the necessary information? *
Thank you for the feedback!