UTX (simple glossary format)AAMT
Actually, you are wasting your time by NOT making a glossary. Writing/translating documents without a glossary is quite time consuming and laborious. If you create a glossary, you can save your time. If you have reliable information in the form of glossary that other people can reference, you will see fewer mistakes. Everybody (including you) doesn't have to spend time in checking up terms that someone else already knows. You don't have to wonder which term is the best every single time you have multiple alternatives.
If you choose to create documents or develop software without a glossary, you are also wasting money. No matter how much money you spend implementing wonderful features into your application, your user will not notice them if your naming of the features are inappropriate and inconsistent. Using a consistent glossary can prevent this situation.
That's exactly where UTX can help. UTX drastically simplifies the creation and maintenance of glossary by providing minimum, simple rules.
No. The members of the UTX team (i.e. AAMT members) are dedicated volunteers. The activity of the UTX team is financed by AAMT.
The brochure, specification, and sample dictionaries are available.
It's a pity, but you may still get benefits of using UTX by easily merging other UTX glossaries into your own.
Absolutely not! The UTX team's research has shown that a glossary containing as little as fifty entries is useful to enhance the quality of machine translation for a 4000-word document. How a UTX glossary improve the overall efficiency of human translator is more difficult to measure. But the benefit of glossary will even extend to the improved readability and comprehension of the readers.
A UTX glossary should contain technical terms within a specific domain. The majority of such terms are compound nouns. Please refer to the brochure and specification for the details.
UTX can be edited with any spreadsheet applications (such as Microsoft Excel or LibreOffice) or text editors that can handle UTF-8 (such as "Notepad" included in Windows operating systems).
It can, but sentences are better handled by translation memory formats, such as TMX. We recommend excluding sentences from a UTX glossary unless it's absolutely necessary. Generally speaking, you should avoid including an excessively long term in a UTX glossary. By keeping the length of terms to a certain length, columns of a UTX glossary will be more readable.
A UTX glossary should be high-quality, because its entries are hand-picked, and it should be inspected by a dictionary administrator. By contrast, automatically generated raw glossary data contain many inappropriate entries that degrade the quality of translation. This situation could be refer to as "big data, big noise". UTX's term status property allows a dictionary administrator to authorize or reject terms collected from various term contributors.
Yes. A detailed instruction will be provided in the future.
No. AAMT doesn't charge you for the use of the UTX specification.
It depends on the license included in the header of the glossary. The UTX specification recommends indicating the license of a glossary. Creative Commons is a good idea. Of course, you can declare any license to your glossary as long as it is legally reasonable. You can keep it for your own internal use. But UTX glossaries can be more useful and rich if you share them!
If you find hard to follow this principle, you might have one or two of the following problems.
1. Assuming that you are the author of the document, you might be using multiple meanings for a particular term.
You should avoid using ambiguous terms in technical
documentation. It is not a good idea to use multiple terms for a
single meaning or a single term for multiple meanings. For example,
avoid using "terms" to refer to an agreement, especially if your
main topic is terminology. If you use potentially ambiguous terms,
such terms must be clearly defined and differentiated to show their
2. You are mixing multiple domains into one glossary.
In principle, one domain requires one glossary. If a translation project deals with multiple domains, for example, medical devices, you may need to have glossaries for medicine, machine, medical devices per se, and perhaps more. You don't want to use a single glossary for the entire project, because it is not compartmented and hard to reuse.
If entries from multiple domains are included in one glossary without a good reason, the situation can be called "domain contamination." Different domain requires different terminology. For example, a file and a window have different meanings in carpentry and the ICT domain. If you maintain one glossary for one domain, one translation term is enough for a source term.
Have you felt frustration that you don't get technical terms translated correctly? Perhaps all you had to do was simply creating a glossary.
If you don't have any plan of using UTX for machine translation, then you will benefit from the simple terminological management of UTX. With a proper understanding, agreement, and arrangement, you might be able to collect contributions of new target term candidates from individual translators. Then you can create and use your own glossary.
If you do have a plan to use RBMT (see also here), then you can quickly build high-quality user dictionaries based on the UTX glossary.
In books or game software, you will encounter tons of terms, perhaps many of them being proper nouns. They could be names of characters, skills, items, places etc. These are actually all "technical terms." Without a glossary, how would you properly keep track of thousands of terms over several months of translation? The readers of your book will be confused, and the user of your game will be angry when they see incoherent terms. Also your translation is likely to involve many translators and checkers. UTX is very useful to standardize the use of terms across translators and checkers, with or without terminological tools.
Sure! Please let us know your ideas using the contact form.
That's perhaps because you are not reusing UTX glossaries. They are most useful when they are reused and/or shared among various user, tools, and environments.
Doing so is strongly recommended to maintain consistency. A number of well-established style guides are available for English and other languages. For Japanese, JTF Standard Style Guide can be used.
UTX is designed to be simple. It is so simple that a UTX glossary is viable with only three mandatory columns (source and target term, and part of speech). They are manageable without using XML.
Many online glossaries are published on the web, but many of them are very hard to use. They don't follow best practices of glossary. They often include similar entries without indicating priorities or clarifications of different usages. Their entries are not well-formed and they don't list their basic forms (singular or root form). However simple UTX looks, it can serve its purpose as a glossary by keeping to a certain specification.
No. A UTX glossary can be created from scratch, as a collection of hand-picked technical terms by translators. It can be created with a very little effort (see the diagram below). It can serve as a basis for large-scale, complicated termbases for bigger translation projects. But it is quite useful as it is for small to medium-sized translation projects.
Position of UTX and TBX
There is nothing wrong about them. It's just that they are too complicated for a wider range of term contributors. Term contributors may or may not be familiar with XML or the details of various glossary formats. They can be professional translators who just know appropriate translations.
It would be nice to leverage such knowledge in the form of a usable glossary.
(Rule-based) Translation software uses two types of dictionaries - system dictionaries and user dictionaries. A system dictionary is a collection of pre-defined terms that are fine-tuned to achieve the best translation results. A user dictionary is a collection of terms defined and added by the user to further increase the translation quality for a specific translation project. For this purpose, the entries of a user dictionary usually supersede those of a system dictionary. In general, a user dictionary should not include entries that are already included in a system dictionary. The user, however, can choose more suitable translations by adding such terms in a user dictionary and override the translations in the system dictionary.
A glossary is a collection of technical terms that can be used by people or by software. A glossary may include definitions and descriptions, which are not used by translation software (translation software would need them in the form that they can understand). In contrast, a user dictionary is specifically created and used for translation software. One can convert a glossary into a user dictionary. At this point, the content of a glossary and a user dictionary is very similar. However, a user dictionary may have additional properties or entries that are not used by people. Generally, an extensive glossary can be a very good source for a high-quality user dictionary.
Yes. But UTX can be used with almost any translation/terminological tools.
Commercial translation software package like SYSTRAN is known worldwide, but you might not be familiar with translation software in Japan, where AAMT is based. The UTX specification is not limited to Japanese software or Japanese language, but a piece of historical background may be helpful to understand why UTX was established in Japan. In Japan, there are a number of commercial RBMT translation software packages. These high-end applications are shipped with 7-8 million basic/technological terms. They are highly sophisticated, and they have 30 or more options to control various aspects of translations (the high-end version of SYSTRAN has only 2 options for Japanese). As they can guess conjugations for user dictionaries, there is no need to feed detailed properties for each term entry.
Still, translation software need well-made glossaries to achieve good translation results. Large dictionaries could improve translation quality. They can, however, potentially degrade translation quality if the quality of the dictionaries is not adequately maintained. Our research proved that a small number of well-chosen terms in a UTX glossary significantly improve translation quality. This is the reason why we created a simple glossary format to reflect appropriate technical terms in translation.
Perhaps you do. If you are using SMT and want to ensure the quality of translation, your project requires a separate process of terminological verification (which is integrated into the system if you are using RBMT instead). Even if you don't use a glossary when you translate, you will still need to use it for the purpose of quality assurance. Because you need a separate terminological verification process, you will need extra time and effort.
It depends. Although UTX can hold any amount of information by defining extra columns, doing so may not always be a good idea. If you need to maintain a number of extra properties, you may also need to consider the use of other XML-based formats. But we also need to realize that when we convert one format to another, only certain properties are essential.
Such properties contribute very little to improve accuracy/appropriateness of translation. Reducing complexity is more essential.
Thank you! Please let us know using the contact form.
Sure! Please let us know your ideas using the contact form.