Analysis: here are some of the peculiar challenges researchers found when building an Irish language text generation system
Data-to-text generation consists of automatically writing text from structured data (see Douglas Hyde example below). Text generation systems are used to write weather reports, sports game summaries, or as part of the answering component in dialogue systems.
Large language models such as ChatGPT are able to produce good quality texts in different languages, but are very difficult to control and have a significant carbon footprint. Another approach to build a text generator is to encode the very grammar of language in the system, which requires in-depth linguistic knowledge.
Researchers have just released the first grammar-based Irish text generation system. You can have a go at the first public version and generate short Wikipedia-like texts like this in Irish (or English!).

Like Classical Hebrew or Arabic, Irish is a language with verb-initial constructions. It links prepositions and determiners as in Spanish, French, or Italian and it uses prefixation of nouns and adjectives, which reminds you about how English or German compounds are made. Irish is also notorious for having no words for "yes" and "no" and three ways of expressing numbers according to what is being counted.
But which particular phenomena pose a challenge when building an Irish generation system? In the following list, Irish examples are italicised, and both a literal translation (following 'lit.') and a full translation (between parentheses, when necessary) are provided.
We need your consent to load this rte-player contentWe use rte-player to manage extra content that can set cookies on your device and collect data about your activity. Please review their details and accept them to load the content.Manage Preferences
From RTÉ 2fm's Louise McSharry Show in 2020, broadcaster Ola Majekodunmi on learning Irish as an adult and the best way to do it
The issue with pronunciation
Anyone who is not an Irish speaker and takes the bus for the first time in Dublin will be puzzled when it comes to matching the voiced bus stations with what is written on the screen, especially if you take the 46A to Dún Laoghaire. I don't speak Irish, I only make generation systems, so I need Irish speakers to explain how the language works, which I then translate into a computer program. Pronouncing the words well helps communicate better with speakers, but reading Irish definitely has a learning curve.
Autonomous verb forms
Every verb (in every tense) has an inflected form that incorporates a pronoun with unknown number and gender (autonomous form) and can be used in place of a passive. Autonomous verb forms tend to express an action where the actor is not known or not relevant to the discussion, while constructions with the auxiliary bí 'be' tend to express a state. For example, the verb beir ‘give birth to’ has an autonomous form rugadh: Rugadh Agustín Barboza i bParagua, lit. ‘Was given birth to Agustín Barboza in Paraguay’ (‘Agustín Barboza was born in Paraguay’). Making a system know when to use the autonomous verb instead of the auxiliary is not straightforward.
We need your consent to load this rte-player contentWe use rte-player to manage extra content that can set cookies on your device and collect data about your activity. Please review their details and accept them to load the content.Manage Preferences
From RTÉ Radio 1's Brendan O'Connor Show, broadcaster Ray Cuddihy talks about Gaeilge and its place in Irish culture today
Prepositions are troublesome!
In English and many other languages, prepositions such as "of", "with", "for" are invariant, but most of them in Irish combine with pronouns and their form varies according to the person, number and gender of these pronouns. The preposition do 'for' can be declined as di ‘for her’, de ‘for him’, or dúinn ‘for us’.
There are up to 7 forms for one preposition, and the declined form is sometimes used in other contexts, as for instance the 3rd person masculine singular of le ‘with’, which becomes leis in front of the determiner an ‘the’. Prepositions also combine with determiners in some cases, and they can trigger mutations in surrounding words (see below).
'X is Y' constructions
There are two ways to express "X is Y" in Irish, according to the meaning one wants to convey: with the copula is and its accompanying pronoun é/í, and with the verb bí. In general, the is é construction is used to express a permanent role, such as family relations or set membership, as in Is séipéal é Notre Dame, lit. 'Is church it Notre Dame' (‘Notre Dame is a church’).
We need your consent to load this rte-player contentWe use rte-player to manage extra content that can set cookies on your device and collect data about your activity. Please review their details and accept them to load the content.Manage Preferences
From RTÉ Brainstorm, here are 15 slang words as Gaeilge to use this week
On the other hand, the bí construction is used for expressing non-permanent properties, such as a job title or a temporary condition, as in Tá sé ina láithreán tógála faoi láthair, lit. ‘Is it in-its site of building at present’ (‘It is currently a building site’). The distinction between the two types is sometimes blurry, and the order of the words in both cases can also vary according to the exact meaning, making is/bí constructions particularly difficult to handle.
The thing about mutations
In Irish, letters can be added towards the start of a word –usually a noun or a verb- to modify the way it is pronounced; these are called initial mutations: eclipsis (insertion of a letter before the first letter), lenition (insertion of an "h" after the first letter), and prothesis (insertion of "t-" or "h-" before the first letter). A simple case: a feminine noun in the nominative singular is lenited after the definite article an 'the' if it starts with a consonant: cathairFEM ‘city’ becomes an chathairFEM ‘the city’, but an iontaiseFEM ‘the fossil’ has no lenition because it starts with a vowel.
Irish has a rather intricate system of rules and exceptions for introducing or not a mutation. For instance, an exception to the lenition rule, called the "dentals-dots" rule, is when the potentially lenited word starts with a ‘dots’ consonant, "d", "t", or "s" and the word just before ends with a ‘dentals’ consonant, "d", "n", "t", "l" or "s", e.g. an tineFEM ‘the fire’ is thus not lenited while a feminine noun starting with "s" gets a "t" prefix: an tseirbhísFEM ‘the service’. Initial mutations can be fun too: when the preposition i ‘in’ triggers an eclipsis on the following noun, a lower case letter is introduced even if the noun is uppercased, as in i dTexas ‘in Texas’, which looks like a spelling error but is not!
Irish is a very rich and fun language, but it is currently endangered so urgent action is needed if Irish is to benefit from the digital revolution and survive the threat of digital extinction. It is critical to build resources (systems, datasets, digital texts) to ensure the future of this 1,700-year old language.
More information about the Multilingual Flexible Neuro-Symbolic Natural Language Generation project is available here. Thanks to Dr Elaine Uí Dhonnchadha at TCD for the help with the Irish examples. Any errors remain the author's responsibility.
The views expressed here are those of the author and do not represent or reflect the views of RTÉ
 
             
                                 
            