Technoleg Iaith a Lleferydd || Speech and Language Technology
23/05/2013
Translate to other Languages

Research Resources at the Language Technologies Unit

Listen with ReadSpeaker

Welsh basic Speech Recognition project - A small pilot project, funded by the WLB, developed a speech-controlled calculator as a way of demonstrating the potential of speech recognition Welsh. The software produced as a result of this project is a laboratory prototype, rather than a program that is ready for the market. All documentation, software and data is available here.

CEG (an electronic corpus of the Welsh language) - The Unit is responsible for maintaining CEG, a corpus of 1,079,032 words of written Welsh prose, based on 500 samples of approximately 2000 words each. These were selected from a representative range of contemporary prose texts (primarily from 1970 onwards). Corpora of this type is used for the statistical study of a language. The CEG website includes an analysis of the incidence of word forms as well as raw lemmata. For more information, visit CEG.

Maes-T is a web interface for the online development of terminology dictionaries. It allows terminologists and subject specialists to standardization of terminology collaboratively over the web, regardless of their location. Its database designs are based on international ISO standards for the categorization of data. Maes-T allows dictionaries to be developed and then converted into the electronic dictionary format found in Cysgeir, the Porth Termau terminology portal, and other searchable terminological and lexicographical websites. Maes-T also facilitates the export of these dictionaries in different formats, such as paper hardcopy, on CD, online, and on mobile phones.

‘Porth Termau’ Terminology Portal Components – The Porth Termau terminology portal was designed in such a manner that it could be integrated within other websites. This is done by using a piece of code and an API key. View the page source of this webpage for an example of the required code. You may copy the code and modify it for your own site. Please contact us for a free API key that will enable your website to search and fetch results from Porth Termau.

Hunspell – The Unit is preparing an update of the open source spell checker in the form of a Hunspell spell checker.

WISPR Welsh Synthetic Voice – a Welsh synthetic voice developed by the Language Technologies Unit as part of a the WISPR project (Welsh and Irish Speech Processing Resources). The resources developed during the project were released under an open BSD style license and  are available here.