Creating a glossary would require a lot of effort. I don't want
to do it!
I don't want to waste my time and money!
Actually, you are wasting your time by NOT making a glossary.
Writing/translating process without a glossary is quite time
consuming and laborious. If
you create a glossary, you can save your time. If
you have reliable information in the form of glossary that other
people can reference, you will see fewer mistakes. Everybody
(including you) doesn't have to spend time in checking up terms that
someone else already knows. You don't have to wonder which term is
the best every single time you have multiple alternatives.
If you choose to create documents or develop software without a
glossary, you are also wasting money. No matter how much money you
spend implementing wonderful features into your application, your user
will not notice them if your naming of the features are inappropriate and
inconsistent. Using a consistent glossary can prevent this
Isn't it hard to create a glossary?
That's exactly where UTX can help. UTX drastically simplifies the
creation and maintenance of glossary by providing minimum, simple
Are you getting any money for maintaining UTX?
No. The members of the UTX team (i.e. AAMT members) are dedicated
volunteers. The activity of the UTX team is financed by AAMT.
How can I find out more
The brochure, specification, and sample
dictionaries are available.
I don't want to share any glossaries
It's a pity, but you may still get benefits of using UTX by
easily merging other UTX glossaries into your own.
Do I have to include thousands of entries to create a useful
Absolutely not! The UTX team's research has shown that
small as fifty entries are useful to enhance the quality of machine
translation for a 4000-word document. Improvement in human
translation is more difficult to measure, but the benefit of
glossary will even extend to the improved readability and
comprehension of the readers.
of terms should we include in a UTX glossary?
A UTX glossary should contain technical terms within a specific
domain. The majority of such terms are compound nouns. Please refer
to the brochure and specification for the
How can I
edit a UTX glossary?
UTX can be edited with any spreadsheet applications (such as
Microsoft Excel or LibreOffice) or text editors that can handle UTF-8 (such as "Notepad"
included in Windows operating systems).
Can a UTX glossary include sentences?
It can, but sentences are better handled by translation memory
formats, such as TMX. We recommend excluding sentences from a UTX
glossary unless it's absolutely necessary. By keeping the length of
terms to a certain length, columns should be readable.
Is a UTX glossary high-quality?
A UTX glossary should be high-quality, because its entries are
hand-picked, and it should be inspected by a dictionary
administrator. Automatically generated raw glossary data contain
many inappropriate entries that degrade the quality of translation ("big
data, big noise"). UTX's term status property allows a dictionary
administrator to authorize or reject terms collected from various term
Can I use UTX to normalize terms?
Yes. A detailed instruction will be provided in the future.
Do I have to pay to create a UTX glossary?
No. AAMT doesn't charge you for the use of the UTX specification.
Can I change or sell existing UTX glossaries?
It depends on the license included in the header of the glossary.
The UTX specification recommends indicating the license of a
Creative Commons is a good idea. Of course, you can declare any
license to your dictionary if it is legally reasonable. You can keep
it for your own internal use. But UTX glossaries can be more useful
and rich if you share them!
It's impossible to choose only one
translation for one term!
How can I follow the "one word, one meaning" principle?
You might have one or two of the following problems.
1. Assuming that you are the author of the document, you might be
using multiple meanings for a particular term.
You should avoid using ambiguous terms in technical
documentation. It is not a good idea to use multiple words for a
single meaning or a single word for multiple meanings. For example,
avoid using "terms" to refer to an agreement, especially if your
main topic is terminology. If you use potentially ambiguous terms,
such terms must be clearly defined and differentiated to show their
2. You are mixing multiple domains into one glossary.
In principle, one domain requires one glossary. If a
translation project deals with multiple domains, for example,
medical devices, you may need to have glossaries for medicine,
machine, medical devices per se, and perhaps more. You don't want to
use a single glossary for the entire project, because it is not
compartmented and hard to reuse.
If entries from multiple domains are included in one dictionary
without a good reason, the situation can be called "domain
contamination." Different domain requires different terminology.
If you maintain one glossary for one domain, one translation is
mostly enough. If you still need multiple translations for a term
within a particular domain, you can do so, but it may confuse your
Why do we want to use UTX?
Have you felt frustration that you don't get technical terms
translated correctly? Perhaps all you have to do is simply creating
If you don't have any plan of using UTX for machine translation,
then you will benefit from the simple terminological management of
UTX. You can collect contributions of new target term candidates
from individual translators. Then you can create and use your own
If you are planning to use RBMT (see also here), then you will
benefit from the simplicity of UTX, especially in the early stage.
We are translating books and games. How can we use UTX?
In books or game software, you will encounter tons of terms,
perhaps many of them being proper nouns. They could be names of
characters, skills, items, places etc. These are actually all
"technical terms." Without a glossary, how would you keep
track of these terms over a long period of time? Also your
translation is likely to involve many translators and checkers. UTX is very useful to
standardize the use of terms across translators and checkers, with
or without terminological tools.
Can I propose a translation project that uses UTX?
Please let us know your ideas
using the contact form.
I don't see why UTX could improve translation productivity.
That's perhaps because you are not reusing UTX
glossaries. They are most useful whey they are reused and/or shared
among various user, tools, and environments.
Do I need to have a
style guide to create a good UTX glossary?
Doing so is strongly recommended to maintain
consistency. A number of well-established style guides are available
for English and other languages. For Japanese,
Standard Style Guide can be used.
Why is UTX tab-separated format instead of XML?
UTX is designed to be simple. It is so simple that a UTX glossary is
viable with only three mandatory columns (source and target term, and part of
speech). They are manageable without using XML.
Why do we need a
format if it is that simple?
Many online glossaries are published on the web, but many of them
are very hard to use. They don't follow best practices of glossary.
They often include similar entries without indicating priorities or clarifications of different usages. Their entries are not well-formed and they don't list their basic forms (singular or root form). However
simple UTX looks, it can serve
its purpose as a glossary by keeping to a certain specification.
Does UTX replace TBX, TBX-Basic, or any other existing
No. A UTX glossary can be created from scratch, as a collection of hand-picked technical terms by translators.
It can be created with a very little effort (see the diagram below). It can serve as a basis for large-scale, complicated termbases for bigger translation projects. But it is quite useful as it is for small to medium-sized translation projects.
Position of UTX and TBX
wrong with TBX, TBX-Basic, or any other existing glossary formats?
There is nothing wrong about them. It's just that they are too
complicated for a wider range of term contributors. Term
contributors may or may not be familiar with XML or the details of
various glossary formats. They can be professional translators who
just know appropriate translations.
It would be nice to leverage such knowledge in the form of a
What is the difference between a system dictionary and a user
dictionary (in translation software)?
Translation software uses two types of dictionaries - system
dictionaries and user dictionaries. A system dictionary is a
collection of pre-defined terms that are fine-tuned to achieve the
best translation results. A user dictionary is a collection of terms
defined and added by the user to further increase the translation
quality for a specific translation project. For this purpose, the entries of
a user dictionary usually supersede those of a system dictionary. In
general, a user dictionary should not include entries that are already
included in a system dictionary. The user, however, can choose more
suitable translations by adding such terms in a user dictionary and
override the translations in the system dictionary.
What is the difference between a glossary and a user dictionary
(of translation software)?
A glossary is a collection of technical terms that can be
used by people or by software. A glossary may include definitions and descriptions, which are not used by translation software
(translation software would need them in the form that they can
In contrast, a user dictionary is specifically created and used for translation software. One can
convert a glossary into a user dictionary. At this point, the
content of a glossary and a user dictionary is very similar.
However, a user dictionary may have additional properties or entries
that are not used by people. Generally, an extensive glossary can be a very
good source for a high-quality user dictionary.
Yes. But UTX can be used with almost any
Why did AAMT create the UTX format? What is the background?
Commercial translation software package like
SYSTRAN is known worldwide, but you might no be familiar with translation software
in Japan, where AAMT is based. The UTX specification is not
limited to Japanese software or Japanese language, but a piece of
historical background may be helpful. In Japan, there are a number of
commercial RBMT translation software packages. These high-end applications are shipped with 7-8 millions basic/technological terms. They
are highly sophisticated, and they have 30 or more options to control
various aspects of translations (the high-end version of SYSTRAN has
only 2 options for Japanese). As they can guess conjugations for
user dictionaries, there is no need to feed detailed properties for
each term entry.
translation software need well-defined glossaries to achieve good translation
results. Having large dictionaries could improve translation
quality, but they can also degrade it at the same time. Our research
found that a small number of well-chosen UTX glossary terms significantly
improve translation quality. This is the reason why we created a
simple glossary format to reflect appropriate technical terms.
We are using SMT. We don't need a
Perhaps you do. If you are using SMT and want to ensure the
quality of translation, your project requires a separate process of
terminological verification (which is integrated into the system if
you are using RBMT instead).
When converting to UTX, will it be a
It depends. Although UTX can hold any amount of information by
defining extra columns, doing so may not always be a good idea.
XML-based formats can do a better job. But we also need to realize
that when we convert one format to another, only certain properties
Why does it
not include lots of term properties?
Such properties contribute very little to improve
accuracy/appropriateness of translation. Reducing complexity is
I would like to contribute glossaries/write a
tool for conversion.
Please let us know using
the contact form.
Can I make suggestions to the UTX
Please let us know your ideas
using the contact form.
Reserved, Copyright (C) AAMT, 1996-2012