The Next Challenge for the Latvian Language

We use our computers and mobile devices and expect to be able to write and read in Latvian without a second thought. But to get where we are today has taken a journey spanning over two decades. With speech recognition fast becoming the new way to interact with our devices the Latvian language could face its biggest challenge yet.

A few years back I was lucky enough to receive a 1920s state of the art writing machine. What made this typewriter so special was its ability to write in Latvian – using all 33 letters of the Latvian alphabet including the accented letters (garumzīmes, jumtiņi and mīkstinājumi). Despite weighing over 5kg this technology was considered so important that it accompanied its owners all the way from war-torn Latvia to Germany and eventually Australia. It played a key role in the publication of many Latvian history books by the renowned historian Prof. Edgars Dunsdorfs.

Fast forward to the 1980s and a new revolution in technology begins – the personal computer. Not wanting to be left behind Latvian computer enthusiasts from all corners of the world began creating fonts and keyboard drivers so that they could join the desktop publishing boom and produce high quality books and magazines in their own cherished language. With the proliferation of custom made Baltic fonts it became increasingly difficult to share Latvian based documents amongst computer users. Even the Latvian newspapers Diena and Neatkarīgā Rīta Avīze resorted to special text conversion tools to cope with the many different text formats. At one stage there was over 10 different font “standards” for the Latvian language. Latvian Macintosh pioneer Juris Mazutis, who published the first Latvian computing magazine “LatDati” in the late 1980s dedicated many pages to this topic.

The emergence of the Internet in the early 90s presented another challenge for the Latvian language. Both the Web and email supported only Latin letters, but it wasn’t long before a work-around was found. Apostrophes, tildes, dashes and two letter combinations were some of the new ways to represent the diacritics when exchanging emails (a’ = ā, aa = ā, lj = ļ, sh = š).

The Artifical Intelligence Lab of the University of Latvia together with the Baltic Express Mail service used a similar approach to encode and decode Latvian texts sent in express electronic emails between Australia and Latvia. During the August 1991 coup this proved to be one of the few ways of getting in contact with the outside world, especially when most other communication channels had been cut off by Soviet authorities.

It wasn’t until late 1992, when the official Latvian computing standard, also referred to as LVS 8-92, finally took effect. For the first time users Windows and Macintosh users could begin exchanging Latvian documents without the hieroglyphics and unreadable text.

The Latvian National standardization committee was on a roll: LVS 18-92 (computing standard for the Liv language, which is a minority language with less than 100 speakers worldwide) and LVS 24-93 (Latvian language support for computers) were also published. LVS 24-93 went beyond the font and keyboard layout standard and specified how the Latvian language (alphabet, numbers, currency, punctuation marks, date & time) should be represented in the computing world.

1-LVstandard

Several months later the Latvian ergonomic keyboard standard LVS 23-93 was also announced, but because it required the production of a custom keyboard for the Latvian market it never took off.

2-LVkeyboard

The QWERTY keyboard or the US keyboard still remains the preferred keyboard both in Latvia and abroad. On this keyboard the most popular way to access the Latvian diacritics is by using the apostrophe or tilde dead key provided by Tildes Birojs, WinLat and other similar software packages for the Windows operating system.

3-keyboard

On the Macintosh the most popular way to obtain the accented Latvian letters is by holding the Option key followed by the letter, for example, OPTION a = ā, OPTION s = š, OPTION n = ņ.

The creation of the Baltic computing standards paved the way for the Estonian, Latvian, Liv & Lithuanian languages to be included into Unicode – an international and universal character set of more than 120,000 characters for 129 modern and historic scripts, as well as multiple symbol sets.

Since the early 90s Unicode has become so widespread that it is included in all modern operating systems and programming languages. Every Latvian web page will include a UTF-8/UTF-16 in the header of its source code. Unicode plays a vital role in the localization of products such as common home appliances, car navigation and entertainment systems. It has enabled Olympus to produce digital cameras with on-screen instructions in Latvian, Electrolux washing machines to display a Latvian menu, Monopoly to use Unicode fonts to release a Latvian version of its board game, Microsoft to offer translated versions of its MS Office software suite, Tom Tom car navigation instructions and prompts in Latvian, Google and Facebook to offer localized Latvian versions of their search and social networking services and major film studios to subtitle in Latvian. The effect of Unicode on new product releases for the Latvian market was regularly documented by the latviski.lv blog that ran from 2006 to 2009. Unicode permits Latvian domain names such as pīrāgs.com and bērziņš.lv and the .lv domain registrar will even offer you a 30% discount for the privilege.

4-latviski.lv

Thanks to Unicode today’s smartphones including tablet computers will enable you to read and write in Latvian. The latest Apple mobile operating system iOS 8, which powers the popular iPhones and iPads has a staggering 287 Unicode fonts all of which are compatible with the Latvian language. Even the wearable technologies such as Google Glass and the Apple Watch are Latvian friendly.

5-latvianverbs

As each new technology is introduced, whether it was the first specially crafted letters for the typewriter, the desktop and laptop computer with custom fonts and keyboard drivers, the palm computer with the stylus driven letter strokes or the latest touch devices with predictive text algorithms – Latvian language support has always caught up.

However the emerging trend of talking to our devices will also become the biggest challenge for the Latvian language. Already we use simple voice commands to initiate phone calls, select radio stations and play music from a compatible smartphone while driving a car. All of the software industry giants such as Apple, Google and Microsoft are investing considerable resources in speech recognition technologies for their mobile operating systems. Devices are getting smaller and it will be simply more practical to interact with voice commands rather than tapping on tiny screens.

The Artificial Intelligence Lab at the University of Latvia has for several years been developing a Latvian speech recognition corpus and offers an experimental page where you can upload your own voice samples. The corpus already includes just over 100 hours of audio data comprising of different types of background noise including office, street, in-car and hall, different speech styles covering TV and radio news, audiobooks, public speeches and presentations, male and female speakers of different ages and different dialects and accents such as Latgalian, Belorussian, English, Russian and Ukrainian. This is only a fraction of what will be required and you can monitor the progress at runa.korpuss.lv.

Considerable more investment is required for the continuing development of the Latvian speech recognition corpus. It should become an open system and freely available for software developers to use in their future software applications and technology platforms. It will provide the opportunity for our kids and the next generation to interact with the latest voice-driven technologies in their mother tongue rather than switching to a major language such as English or Russian.

2 thoughts on “The Next Challenge for the Latvian Language”_en

Leave a comment

Your email address will not be published. Required fields are marked *