UTX (universal terminology exchange)
Open standard for machine translation user dictionary

Asia-Pacific Association for
Machine Translation
Top

Updated: 2008/8/3

Japanese version of this article

 

Dictionaries

Articles

Introduction to UTX

Background

In order to use MT (machine translation) systems such as translation software effectively, it is essential to use user dictionaries. When we use commercial high-end translation software in a CAT (computer-aided translation) workflow, specialized terminology, names of persons, and place names in the target document are often not included in basic system dictionaries, and they are not translated as well as one would expect. However, if these terms are registered into a user dictionary, the accuracy of the MT system can be improved.

Unfortunately, user dictionaries are not always compatible between different MT systems, rendering the effort to create such dictionaries futile. To address this issue, AAMT (Asia-Pacific Association for Machine Translation) has undertaken to establish a set of specifications for sharable dictionaries, which can be used across different MT systems. AAMT created its first version of specification, UPF (Universal PlatForm), with support from IPA (Information-technology Promotion Agency, an institute in Japan) in 1995. In 2006, AAMT started to create new specifications to reflect and incorporate the subsequent advancement of technology and the changing usage of MT. In 2007, the new format received a new name "UTX," short for universal terminology eXchange. As of 2008, AAMT is working to establish UTX-Simple, which is the simple, stripped-down version of UTX before building the full XML version.

We are currently focussing on the following tasks: production and collection of dictionary data; and creation of a user community for generating, sharing, and accumulating user dictionaries in a sustainable way. We will select some domains for which translation is needed, and build and collect actual dictionary data from real-world documents in accordance with the specification of UTX-Simple and UTX-XML. By carrying out translation with the dictionary and collecting feedback, the UTX specification will be further improved.  

Why use UTX?

Simplicity

  • "Dictionary for the user" - simple and easy to use

    • UTX-Simple (tab-delimited text format) requires only three types of information: a source word, its translation, and the part of speech of the source word, making it easy to build.
    • UTX accommodates multiple languages.
  • Inclusion of the information to support managing and sharing dictionaries

    • UTX dictionary includes a creator name and creation timestamp.

Entry as a "technical term"

  • Clarification of domains

  • One word, one meaning

    • In priciple, one term has one meaning (i.e. one translation) in a specific domain.
    • An entry must be a unique term within an applicable domain.

Benefits of using UTX

  • For users

    • Improvement in translation accuracy in each specific domain.
    • Sharing and reuse of dictionaries is possible through user communities, either locally or over the Internet.
  • For manufacturers of MT systems

    • The entire market of MT will be enlivened by promotion of user dictionaries.
    • New demands for and applications of MT can be explored.
    • UTX-XML format retains entry properties that are proprietary to manufacturers, and no data is lost during the conversion to/from UTX-XML.

A community for shared dictionaries - "Open dictionaries for everyone"

  • AAMT will establish two types of dictionary communities for producing, sharing, and accumulating dictionaries, and a framework for distribution.
     
  • The official dictionary community (managed by AAMT or its delegate) offers supervised dictionaries with guaranteed quality for a fee.
  • The open dictionary community offers free dictionaries with open source license and promotes mutual exchange. AAMT or its delegate provides hosting service only, but no management or guarantee.
  • AAMT will collaborate with Oki's community-oriented machine translation site  Yakushite-net.

Examples of application

  • Open source localization

 

  • Problem 1: Terminology and its translation (including terms for user interface) vary among applications, thus reducing the translator's efficiency and confusing end users.
  • Solution by using UTX: Terminology used in various applications can be standardized. End users can enjoy more organized, reduced terminology.
  • Problem 2: Translation assets are not effectively recycled when translating a similar application. Since there is no accumulation of translation, whenever starting a new translation project, it is necessary to start from scratch.
  • Solution by using UTX: Translation is more efficient by accumulating, sharing, and reusing translation assets, such as user dictionary and glossaries.
  • Problem 3: Multilingualization is not an easy task.
  • Solution by using UTX: UTX-XML accommodates multiple languages. Since the translation assets (user dictionaries and glossaries) are centralized and reusable through UTX-XML, multilingualization is significantly easier.

  

  • As technical-term dictionary and in-company glossary

  • Problem: The glossary for in-house documents and the glossary for translation are often separate, incoherent, disorganized, and difficult to manage centrally. Accumulation of technical knowledge of a specific domain in the company is desirable but difficult, since the sources are separate and they are all in different formats.

  • Solution by using UTX: It can also be used as a monolingual glossary. Exchange of data with various tools is easy through the standardized specification. Import from existing glossaries is also straight forward.

  • Supporting intercultural communication between individuals

  • Problem: Correct translations of proper nouns are often difficult to find, for example, when a fan wishes to write a fan letter to an overseas writer or a movie star. When a user would like to chat with an overseas friend about sports or online games, the translation of sports players' names or the terminology of the online game may not be available in system dictionaries.

  • Solution by using UTX: Niche glossaries which are not included in commercial specialized dictionaries can be shared to be used for machine translation.

 

  • Translation assistance for developing countries

  • Problem 1: NPOs are always in need of more human resources and funds.
    Solution by using UTX: Translation assistance and automatic translation can be carried out at low cost.

  • Problem 2: Some minor languages only have limited bilingual glossaries or dictionaries. Some domains may have a high priority, such as medical science.
    Solution by using UTX: Since UTX dictionaries can be accumulated in a dictionary community, dictionaries can be gradually compiled in a language with limited bilingual glossaries.

Development and use of various tools

  • Term extraction and dictionary building tools

We will need tools to analyze multiple documents, extract terms, and add them to a user dictionary or make a new one instantly, not by building a dictionary one-word-at-a-time.

  • UTX converters (including parsers)

We will need tools which convert from a format unique to a translation application or a translation site to UTX format, and vice versa. A parser which verifies the conformity to the UTX specification must also be included.

  • Dictionary search tool (glossary search tool)

We will need tools to perform a direct search to a dictionary and glossary to see the translation of a word.

Contact

 

(Anyone can participate in this mailing list, but the correspondence is mostly in Japanese. We are planning to start another mailing list in English.)

If you are interested in UTX, please contact us through the following form: We welcome organizations and individuals who could collaborate with us to develop the specification of UTX, to provide and/or build dictionaries and tools, and to perform evaluation.

Please be sure to fill in the item with *.

Your name*
Telephone numbers*
Mail address*
Company or organization
Type* Individual Company or organization Others
Your target language(s), domain(s),
and other comments
Do you wish to receive occasional informational messages from AAMT?* Yes No
 

Attention: Please CLICK the button AGAIN on the next screen to submit your message. This site currently does not support SSL. Therefore, the transmission of your data to our server is not encrypted.