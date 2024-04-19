State archives, libraries and private collections hold millions of documents from the Ottoman period, including books, journals, newspapers, notebooks, records and other material written in Ottoman Turkish — leaving a centuries-old historical heritage waiting to be uncovered.

Instead of investing a significant amount of time in learning Ottoman Turkish from scratch, now there is a new initiative to address this need, known as “Artificial Intelligence-Assisted Ottoman-Turkish End-to-End Translation."

Osmanlica.com, an initiative started as a doctoral thesis project by Dr Ishak Dolek under the supervision of Associate Professor Dr Atakan Kurt from Istanbul University-Cerrahpasa Computer Engineering Department, has achieved 96 percent success in the Ottoman Optical Character Recognition (OCR) process, which can be considered as the first step in the transfer of Ottoman sources into Modern Turkish language.

"We possess a vast archive involving approximately a hundred million pages from the Ottoman era. However, the challenge lies in the fact that people cannot read and comprehend these archives due to their language being different from modern Turkish,” Atakan Kurt tells TRT World.

“This stands as one of the foremost challenges confronting our people," he says.

Language revolution

Ottoman Turkish was a language written using a Turkish form of the Arabic script between the 13th and 20th centuries, containing a great deal of Arabic and Persian expressions.

In 1928, five years after the Republic of Türkiye was founded, the country experienced a language revolution. It rapidly shifted from using the Arabic alphabet to adopting an early version of Turkish, written with the Roman alphabet, which is still in use today. Additionally, there was a substantial removal of foreign elements from the language during this period.

Kurt says that what the European Union has done for their historical manuscripts, written since the Middle Ages, is to use these computer programmes to translate them into editable text.

“Because in Europe there is no big difference between the languages of the Middle Ages and the languages of today, they just convert these printed and manuscript texts — old newspapers, books, letters, manuscripts — from image files to editable texts, and share them,” he noted.

Three-step solution

When it comes to Ottoman Turkish, Kurt says that they faced two additional problems.

“Firstly, the alphabet in our texts is different from the one we use today. Secondly, the language is also different. Even if we translate the letters, people do not understand the language used about one or two centuries ago. Even the language used fifty years ago is almost incomprehensible nowadays.”

“In other words, the language used at that time is like a foreign language now. That is why we also have to translate the language of the documents into modern Turkish.”