Friday, December 21, 2007

Getting rid of pesky single-byte Katakana from Japanese source documents

Today, let's take a look at how to quickly get rid of single-byte Katakana characters in Japanese documents before translating them. You may ask why one needs to do this. Well, most documents I translate have all Japanese text (Hiragana, Katakana and Kanji) as double-byte characters (except numbers, which I prefer to keep as single-byte characters). When you use a CAT-tool, these pesky single-byte Katakana characters reduce your chances of getting 100% matches because your translation memory may not have single-byte Katakana characters. So there's a work around provided by Ryan Ginstrom, making use of his excellent Wide2Narrow macro.

Here's what you do.

1. Select the entire document in Word using Ctrl + A.

2. Deselect the tail (From the end of the document press Shift + Left Arrow)

3. In Word, go to Format/Change Case/Full Width and press OK. All your single-byte characters (including Romaji and numbers are changed to double-byte characters).

4. Now run the Wide2Narrow macro.

By doing so, the Romaji and numbers are converted back to single-byte characters.

No comments: