
by Ciprian Iovu
In the translation field, one of the most common methods of setting a price is the rate per source word. When CAT tools started being used, this method became a bit more complex, as these tools can analyse source content and generate a report based on the number of segments, words, matches, repetitions, etc.
Many translation agencies and even some translators have a standard grid by which they calculate their prices. During my time as a project manager, I noticed that although this method of calculating the price for a project had been used for quite some time, some translators did not quite understand how it worked and how the prices were set.
If you are one of those translators or a junior translator who just entered the world of translation and CAT tools, you should read this article to get a better idea about this pricing method, as I will try to explain it in detail.
Before I go further, please bear in mind that the rates and pricing grids mentioned and used in this article are just for explanatory purposes. You should set your price based on your working language pair, specialization, experience, cost of living, and/or any other factor that may seem relevant to you and your work. Also, each CAT tool may use a different terminology when referring to the sections in their report, but the match classification is generally the same. In this article, I will use as an example memoQ’s Statistics report and Trados’ Analysis report. So, let’s dive in!
How are source words counted?
Well, it depends on the CAT tool. Each CAT tool has an algorithm that defines what a word is. Some CAT tools may use Microsoft Word’s word count, while others may have their own defined algorithm. For example, memoQ considers that everything between two spaces is a word. So, depending on the CAT tool you are using, you may end up with a different word count, but the number shouldn’t be that different. Of course, the gap between two word counts gets bigger if the file or project is larger. If you and your client are using the same CAT tool and have different word counts, this should be a red flag and you should discuss with your client potential errors that may be caused by using different files, different percentage matches, different rates, different settings, etc. Therefore, even if you receive a word count report from a client, you should also generate your own report.
What are the main categories of a word count report?
A word count report takes into account not only the words in a source file, but also the number of segments and the matches between themselves and between those stored in a translation memory (TM). CAT tools generally split a file into translation units, commonly known as segments. When a segment is translated and confirmed, it is then saved in a TM. When generating a word count report, CAT tools will compare the segments within the source file, but also against any TM attached to the project. If the same segment is repeated in the source file, the report will count the words in that segment as Repetitions. If you have several documents in the project and they contain the same segment, memoQ will consider that segment a repetition. Trados does the same, although you have the option to count that segment in a section entitled Cross-file Repetitions, which is selected by default.
Now, if you have a TM attached to the project, the source segments will be compared against the segments existing in the TM, and based on their differences, you will get matches such as these:
- X-translate / double context (memoQ) | PerfectMatch (Trados)
Broadly, X-translate and PerfectMatch are taken into account when the source file gets updated during a project. It compares the segments from the updated file to the previous version and counts those segments that are identical between those two files.
Double context is a function specific to memoQ. It takes a source segment and compares it against the TM. If the same segment is found in the translation memory, then you get a 100% match. However, if memoQ identifies that the two segments immediately before and after the source segment are also identical to the ones in the TM, then the (middle) source segment is considered a double context match. In other words, if memoQ discovers that 5 source segments are identical with 5 TM segments, the middle segment is considered a double context match, and in the translation editor this segment will have a 102% match under Translation results. Those 5 segments must have the same order both in the source file and TM.
- 101% (memoQ) | Context match (CM in Trados)
This works similarly to a double context match, but instead of 5 segments, it refers only to 3 segments, meaning that the middle source segment is considered a 101%/context match only if the segments immediately before and after are also identical in the TM.
- 100% (both memoQ and Trados)
This means that a source segment is identically found in the TM.
- 95%-99% (both memoQ and Trados)
This means that a source segment has a 95% to 99% match with a segment recorded in the TM. It is a nearly exact match but with slight differences. Such differences could be caused by different numbers, tags, bolded words, spaces, punctuation marks, etc.
- 85%-94% (both memoQ and Trados)
These matches are considered high fuzzy matches. The differences now start to appear in the text, meaning that generally there is one word in the source segment that is different in the TM segment.
- 75%-84% (both memoQ and Trados)
These matches are considered medium fuzzy matches. Generally, there are two differing words in the source segment compared to the TM segment.
- 50%-74% (both memoQ and Trados)
These matches are considered low fuzzy matches. This means that more than two words in the source segment are different in the TM segment.
A clear distinction between high, medium, and fuzzy matches could be established if the source segment consists of at least 8 words. If the source segment is shorter, the differences in the text may not correspond so clearly to the descriptions above.
- No match (memoQ) | New/AT (Trados)
This means that the source segment doesn’t have any match in the TM, and that it needs to be translated from scratch.
A small side note: Before we go further, it is useful to talk a little bit about memoQ’s ‘Show weighted counts’ and ‘Homogeneity’.

If ‘Show weighted counts’ is checked, memoQ shows a different word count between round brackets. memoQ applies a weight percentage to the source word and the result is shown between round brackets in the source word column. In the image below, for example, we see that the number of source words for 75%-84% is 94, and the weight column for that match is 80%. This means that the weighted word count for a 75%-84% match is 75.2 (94 words multiplied by 0.8 equals 75.2). Any memoQ user can set their preferred percentage for the weight column. Usually, translation agencies set the payment percentages here and they use this function to have an enhanced view of the costs or time requirements for a translation job.

Homogeneity is an option that measures the internal similarities of documents in a given project. When this option is checked, memoQ simulates the process of translating a segment at some point and the matching results you would get if you encounter a match of that segment later in the translation. Because of this simulation, the weighted count shown in the memoQ Statistics report is smaller than the standard word count. Note that only the word count between round brackets differs in the two images below. The actual/standard source words remain the same. memoQ generally recommends not to use homogeneity in the analysis, especially if a project has several documents and if several translators will work on the same project concomitantly.


How do matches affect my rate per source word?
Remember when I talked about weighted words? Well, the percentage defined under memoQ’s column ‘Weight’ is applied to your rate per source word. Now, you may have not received a price log directly generated from memoQ, but the same principle applies. But, let me give you an example. Generally, when you apply to work with a translation agency, you will receive a pricing grid similar to the one below:

For example, if your rate is €0.06 per source word, then you will receive €0.06 only for No match/New words, that have a 100% weight. For the remaining match rates, the weight percentage is applied. So, let’s say you have a 7,000-word file according to Microsoft word count. After analysing the file in a CAT tool, the report shows that each match rate has 1,000 source words. What you will actually be paid is the weighted words that you get after applying the weight percentage. E.g., for the 100% match rate, you multiply the source words (in this case 1,000) by the weighted 10% (meaning 0.1), and you get the weighted word count.

Ithis fair? Ws ell, yes and no… it depends! There is indeed little to no work to be done on segments that fall under the 95%-99% match rate or higher. So having a low weight percentage seems justifiable. But this may not always be the case. Some languages have a very high degree of inflection and using 100% matches as such may be detrimental to the quality of the translation. Furthermore, some TMs are so huge that it would require you to spend more time on finding the right match.
So, what can you do?
When starting a collaboration with a new translation agency, you should ask them to send their pricing grid, if they have one. Have a look at the pricing grid and decide for yourself if it is a fair one. If you find it to be disadvantageous to you, then ask to negotiate the pricing grid (more exactly, the weight percentage for each match rate), set your price higher, or try to opt for another pricing method. It goes without saying that you should have a different pricing grid for editing/revision, or another pricing method altogether, but that may be the subject of a future article. In the end, the goal is to have a fair business relationship. After all, we are all partners in this field and our purpose is to provide high-quality services, at a fair price, for all parties involved.