UTX (universal terminology exchange)
common format for user dictionary

Asia-Pacific Association for
Machine Translation
Top

Updated: 2010/4/1

Japanese version of this article

Content of this page

 

 

Brochures

This brochure contains the essentials of UTX.

You can directly scroll, zoom, and print the Flash diagram below.
(The content is identical to the PDF version above.)

 

 

The content is identical to the trifold version, but the format is A4.

Introduction to UTX

Background

In order to use MT (machine translation) systems such as translation software effectively, it is essential to use user dictionaries. When we use commercial high-end translation software in a CAT (computer-aided translation) workflow, specialized terminology, names of persons, and place names in the target document are often not included in basic system dictionaries, and they are not translated as well as one would expect. However, if these terms are registered into a user dictionary, the accuracy of the MT system can be improved.

Unfortunately, user dictionaries are not always compatible between different MT systems, rendering the effort to create such dictionaries futile. To address this issue, AAMT (Asia-Pacific Association for Machine Translation) has undertaken to establish a set of specifications for sharable dictionaries, which can be used across different MT systems. AAMT created its first version of specification, UPF (Universal PlatForm), with support from IPA (Information-technology Promotion Agency, an institute in Japan) in 1995. In 2006, AAMT started to create new specifications to reflect and incorporate the subsequent advancement of technology and the changing usage of MT. In 2007, the new format received a new name "UTX," short for universal terminology eXchange. In 2009, AAMT has established UTX-Simple, which is the simple version of UTX. We are also considering developing the full XML version.

We are currently focusing on the following tasks: production and collection of dictionary data; and creation of a user community for generating, sharing, and accumulating user dictionaries in a sustainable way. We will select some domains for which translation is needed, and build and collect actual dictionary data from real-world documents in accordance with the specification of UTX-Simple and UTX-XML. By carrying out translation with the dictionary and collecting feedback, the UTX specification will be further improved.  

Why use UTX?

Simplicity

  • "Dictionary for the user" - simple and easy to use

    • UTX-Simple (tab-delimited text format) requires only three types of information: a source word, its translation, and the part of speech of the source word, making it easy to build.
    • UTX accommodates multiple languages.
  • Inclusion of the information to support managing and sharing dictionaries

    • UTX dictionary includes a creator name and creation timestamp.

Entry as a "technical term"

  • Clarification of domains

  • One word, one meaning

    • In principle, one term has one meaning (i.e. one translation) in a specific domain.
    • An entry must be a unique term within an applicable domain.

Benefits of using UTX

  • For users

    • Improvement in translation accuracy in each specific domain.
    • Sharing and reuse of dictionaries is possible through user communities, either locally or over the Internet.
  • For manufacturers of MT systems

    • The entire market of MT will be enlivened by promotion of user dictionaries.
    • New demands for and applications of MT can be explored.
    • UTX-XML format retains entry properties that are proprietary to manufacturers, and no data is lost during the conversion to/from UTX-XML.

A community for shared dictionaries - "Open dictionaries for everyone"

  • AAMT will establish two types of dictionary communities for producing, exchanging, sharing, and accumulating dictionaries, and a framework for distribution.
     
  • The official dictionary community (managed by AAMT or its delegate) offers supervised dictionaries with guaranteed quality.
  • The open dictionary community offers free dictionaries with open source license and promotes mutual exchange. AAMT or its delegate provides hosting service only, but no management or guarantee.
  • AAMT will collaborate with Oki's community-oriented machine translation site  Yakushite-net.

Examples of application

  • Open source localization

 

  • Problem 1: Terminology and its translation (including terms for user interface) vary among applications, thus reducing the translator's efficiency and confusing end users. In the above example, the term "format" can be translated into several Japanese words.
  • Solution by using UTX: Terminology used in various applications can be standardized. End users can enjoy more organized, reduced terminology.
  • Problem 2: Translation assets are not effectively recycled when translating a similar application. Since there is no accumulation of translation, whenever starting a new translation project, it is necessary to start from scratch.
  • Solution by using UTX: Translation is more efficient by accumulating, sharing, and reusing translation assets, such as user dictionary and glossaries.
  • Problem 3: Multilingualization is not an easy task.
  • Solution by using UTX: UTX-XML accommodates multiple languages. Since the translation assets (user dictionaries and glossaries) are centralized and reusable through UTX-XML, multilingualization is significantly easier.

  

  • As technical-term dictionary and in-company glossary

  • Problem: The glossary for in-house documents and the glossary for translation are often separate, incoherent, disorganized, and difficult to manage centrally. Accumulation of technical knowledge of a specific domain in the company is desirable but difficult, since the sources are separate and they are all in different formats.

  • Solution by using UTX: It can also be used as a monolingual glossary. Exchange of data with various tools is easy through the standardized specification. Import from existing glossaries is also straight forward.

  • Supporting intercultural communication between individuals

  • Problem: Correct translations of proper nouns are often difficult to find, for example, when a fan wishes to write a fan letter to an overseas writer or a movie star. When a user would like to chat with an overseas friend about sports or online games, the translation of sports players' names or the terminology of the online game may not be available in system dictionaries.

  • Solution by using UTX: Niche glossaries which are not included in commercial specialized dictionaries can be shared to be used for machine translation.

 

  • Translation assistance for developing countries

  • Problem 1: NPOs are always in need of more human resources and funds.
    Solution by using UTX: Translation assistance and automatic translation can be carried out at low cost.

  • Problem 2: Some minor languages only have limited bilingual glossaries or dictionaries. Some domains may have a high priority, such as medical science.
    Solution by using UTX: Since UTX dictionaries can be accumulated in a dictionary community, dictionaries can be gradually compiled in a language with limited bilingual glossaries.

Development and use of various tools

  • Term extraction and dictionary building tools

We will need tools to analyze multiple documents, extract terms, and add them to a user dictionary or make a new one instantly, not by building a dictionary one-word-at-a-time.

  • UTX converters (including parsers)

We will need tools which convert from a format unique to a translation application or a translation site to UTX format, and vice versa. A parser which verifies the conformity to the UTX specification must also be included.

  • Dictionary search tool (glossary search tool)

We will need tools to perform a direct search to a dictionary and glossary to see the translation of a word.

Download

UTX-Simple 1.0 Specification NEW!!

Download from here (PDF).

Please also take a look at our brochures.

Dictionaries

  • More dictionaries will be available. If you wish to include your UTX dictionaries in this list, please contact us.
Name and Domain Direction of translation Author License Dictionary Version Number of entries
Computational Linguistics Term List English to Japanese Francis Bond Creative Commons 3.0, Attribution (CC-BY)   4092
Japanese to English Francis Bond Creative Commons 3.0, Attribution (CC-BY)   4123
Medical Glossary (Department of Medical Informatics, Kitazato University) NEW!! English to Japanese Medical Informatics, School of Allied Health Sciences, Kitazato University Creative Commons 3.0, Attribution (CC-BY) 1.00 27126
Japanese-English Standard Dictionary of Legal Terms English to Japanese (the direction has been changed.) Japan Creative Commons 3.0, Attribution No Derivatives (CC-BY-ND) 1.00 5451
 
  • AAMT terminology dictionary (sample)

     

Articles

  • "Introduction to UTX, a Specification for a Shared User Dictionary," (PDF) a paper submitted to the Association of Natural Language Processing, Japan (13th annual meeting). Originally written in Japanese, translated into English.
    This paper refers to UTX-Simple 0.91.
  • A presentation will be delivered at LISA China Focus (November 2009).

Contact

If you are interested in UTX, please contact us through the following form: We welcome organizations and individuals who could collaborate with us to develop the specification of UTX, to provide and/or build dictionaries and tools, and to perform evaluation.

Please be sure to fill in the item with *.

Your name*
Telephone number*
E-mail address*
Company or organization
Type* Individual Company or organization Others
Your target language(s), domain(s) of interest,
and other comments
If you wish to receive updates of UTX, please join to the UTX mailing list. Non-AAMT members are also welcome. The ML is not directly managed by AAMT. The correspondence is mostly in Japanese. We might start another mailing list in English, if demands are high.
 

Attention: Please CLICK the button AGAIN on the next screen to submit your message. This site currently does not support SSL. Therefore, the transmission of your data to our server is not encrypted.

 

Disclaimer

By using the specifications of UTX, UTX-Simple, and UTX-XML (hereinafter collectively called “UTX Specifications”) or the dictionaries based on UTX Specifications (hereinafter called “UTX Dictionaries”), you agree to be bound by the following terms. The invalidity or unenforceability of this disclaimer shall in no way affect the validity or enforceability of any other provision herein.

1. To the authors of UTX Dictionaries and related tools from the AAMT and its members:

(1) UTX Specifications are made public, and anyone can use them. The AAMT, however, does not waive any rights thereof and no one may alter UTX Specifications nor make them public.

(2) THE AAMT AND ITS MEMBERS PROVIDE UTX SPECIFICATIONS “AS IS,” WITH NO GUARANTEES WHATSOEVER. YOU SHOULD USE UTX SPECIFICATIONS AND UTX DICTIONARIES AT YOUR OWN RISK.

(3) THE AAMT AND ITS MEMBERS SHALL NOT ASSUME ANY RESPONSIBILITY FOR UTX SPECIFICATIONS AND THE RESULT OF THEIR USE INCLUDING, BUT NOT LIMITED TO, THE EXISTENCE OF INFRINGEMENT OF THIRD PARTIES’ RIGHTS AND THE ACCURACY, ADEQUACY AND QUALITY OF THE TRANSLATION.

(4) THE AAMT AND ITS MEMBERS SHALL NOT ASSUME ANY RESPONSIBILITY FOR VERIFYING NOR DO THEY GUARANTEE THE LEGITIMACY OF THE COPYRIGHT FOR EACH UTX DICTIONARY. YOU AND THE ORIGINAL AUTHOR OF EACH UTX DICTIONARY ARE RESPONSIBLE FOR THE LEGAL PROBLEM IF IN ANY CASE THAT THE ORIGINAL AUTHOR OF THE UTX DICTIONARY IS NOT THE LEGITIMATE HOLDER OF THE APPROPRIATE COPYRIGHT.

(5) The AAMT and its members grant you the permission to stipulate the terms and conditions for the use of UTX Dictionaries by their users for commercial or non-commercial purposes as long as you have the appropriate copyright; provided, however, that the author of UTX Dictionary is solely responsible for verifying the legitimacy of the copyright for data used in the UTX Dictionary.

(6) THE AAMT AND ITS MEMBERS SHALL NOT ASSUME ANY RESPONSIBILITY FOR THE RESULT OF USE OF THE TOOLS RELATED TO UTX DICTIONARIES.

2. To the users of UTX Dictionaries from their authors:

The users of UTX Dictionaries may make use of UTX Dictionaries, in accordance with their license terms and conditions. Since the license terms and conditions of UTX Dictionaries are varied, please confirm the license indicated in the UTX file header.

3. To the users of UTX Dictionaries and related tools from the AAMT and its members:

THE AAMT AND ITS MEMBERS SHALL NOT ASSUME ANY RESPONSIBILITY IN CONNECTION WITH THE UTX SPECIFICATIONS AND THE RESULT OF THEIR USE INCLUDING, BUT NOT LIMITED TO, THE EXISTENCE OF INFRINGEMENT OF THIRD PARTIES’ RIGHTS AND THE ACCURACY, ADEQUACY AND QUALITY OF THE TRANSLATION. You should resolve such problems between you and the author of the UTX Dictionaries.
 


 

All Rights Reserved, Copyright (C) AAMT, 1996-2010