Here's a translation aggregator app I've been working on. It basically works like ATLAS, with support for using a number of website translators and ATLAS simultaneously. It was designed to replace ATLAS's interface as well as add support for getting translations from a few additional sources. Currently, it has support for getting translations from Atlas V13 or V14 (Don't need to have Atlas running), Google, Honyaku, Babel Fish, FreeTranslations.com, Excite, OCN, a word-by-word breakdown from WWWJDIC, MeCab, which converts Kanji to Katakana, and its own built-in Japanese parser (Requires edict2 (Or edict) in the dictionaries directory, supports multiple dictionaries in there at once. Does not support jmdict). I picked websites based primarily on what I use and how easy it was to figure out their translation request format. I'm open to adding more, but some of the other sites (Like Word Lingo) seem to go to some effort to make this difficult.
It now also includes the ability to inject a dll into Japanese apps that will translate their menus and dialogs using the ATLAS module (Requires you have ATLAS installed, of course). Note that if you launch a program from the interface, it will be launched using Japanese regional settings and the dll will be automatically injected. The option to translate all in-game text tries to do just that: Translate all text in a game, on screen. It has a pretty low compatibility, and is not AGTH compatible. Needs a lot of work, still. My menu translation code is much further along.
The interface is pretty simple, much like ATLAS: Just paste text into the upper left window, and either press the double arrow button to run it through all translators, or press the arrow buttons for individual translation apps. It will only run one query through one algorithm at a time, so if a window is busy when you tell it to translate something, it'll queue it up if it's a remote request, or stop and rerun it for local algorithms. If you have clipboard monitoring enabled (Master button disables it altogether), it'll run any clipboard text with Japanese characters copied from any other app through all translators with clipboard monitoring enabled. I won't automatically submit text with over 500 characters to any of the translation websites, so you can skip forward in agth without flooding servers, in theory. I still don't recommend automatic clipboard translations for the website translators, however.
To assign a hotkey to the current window layout, press shift-alt-#. Press alt-# to restore the layout. Bound hotkeys will automatically include the current transparency, window frame, and toobar states. If you don't want a bound hotkey to affect one or more of those states, then you can remove the first 1 to 3 entries in the associated line in the ini file. Only modify the ini yourself when the program isn't running. All other values in those lines are mandatory.
MeCab is a free program that separates words and gives their pronunciation and part of speech. I use it to get the information needed to parse words and display furigana. If you have MeCab installed but I report I'm having trouble initializing it, you can try copying libmecab.dll to the same directory as this program. Do not install MeCab using a UTF16 dictionary. Tell MeCab to use the UTF8, Shift-JIS, or EUC-JP formats.
Source is attached below. Feel free to use it for any non-commercial purpose.
Changelog:
0.2.9
* Added DCBS hack. May fix some games that display gibberish when region is not set to Japan even when using applocale. May make it do more in a future release, currently just overrides GetOEMCP and IsDBCSLeadByte/IsDBCSLeadByteEx. Disabled by default.
* Fixed bug where translate highlighted text option would erase untranslated text window.
* Fixed issue where websites wouldn't launch on some systems, and browser window wouldn't come to the front (For links in websites menu).
0.2.8
* Pair of quick fixes for substitution code.
0.2.7
* Autoconvert half-width Katakana to full-width option added. Affects text in source text window. Enabled by default.
* Added substitution list. Doesn't affect text in source text window.
* Google support updated to work with Google Translate modifications.
* OCN linebreaks now handled properly.
* Fixed a JParser crash when parsing 5000 or more characters containing no recognized Japanese word.
* MeCab character encoding detection updated (May or may not require a more recent version of MeCab than before). Shift-JIS, UTF8, and EUC-JP encodings supported. UTF16 support does not appear to be possible, at the moment.
* Fixed drag and drop bug that would cause an exe's old path to be used if it had been moved/installed elsewhere since last run. May also have affected path when browsing directly to exe or selecting exe from process list.
* Added links to all of the translators in the websites pull down menu. JParser links to page with edict downloads.
* Switched to using precompiled headers for faster compilation times.
0.2.6
* Harder to accidentally drag windows when not locked, and should automatically re-insert them when you do so (Though possibly not in the correct place).
* OCN support added (Thanks to Freaka for his help with this).
* Drag and drop added. Drag an exe to the main window or the launch window, and the launch path will be set to that exe.
* May let you configure ATLAS v13 options (Was always looking for v14 in the registry file for the ATLAS v13 ini location).
* Some changes to my parser's scoring function.
* Slight changes to launch dialog.
* Fixed a crash bug when using an ini from a newer version that has more/different translators.
* A change to or two to the conjugation tables.
0.2.5
* Fixed replace command in rule sets.
* Try to make sure windows are created onscreen when loading config file.
* Slightly modified dictionary to make some conjugation/suffix things clearer.
* Verb conjugations now correctly recognize and penalize hiragana/katakana mismatches.
* Reduced mismatch penalty.
* Now favors longer words first rather than longer words last, when scores are equal. This is mostly to favor sticking "ni" with the word before it, where possible, rather than appending it to the start of a verb, when the ni could form valid words either way. Problem still exists with other particles, but only really noticed it a lot with ni. (Ni and na are considered part of a conjugation of the previous word in many circumstances)
0.2.4
* Fixed JParser memory leak.
* A couple formatting options added to JParser's dictionary entry display.
* When selecting a running process, the corresponding config will automatically be loaded, if there is one.
* Fixed a bug when selecting an exe with an existing profile from the file browser.
* Improved drag and drop window placement when there are an even number of subwindows in a main window.
* Words that only have Hiragana entries are displayed above those that don't, assuming their common word/particle flags match.
* Added -masu stems and adjective stems to conjugation tables. May add -nai as well. Other stems don't seem to be used nearly as often in constructs, so don't think I'll be adding them.
* Mecab and JParser windows now clear their results when hidden.
* Added Hiragana/Katakana converter. Highlight text, right click, and select.
* Filter out "Potential Potential" (Though no other duplicated tenses, yet). Merged some entries that always appear together, commented out "Imperfective" entries for now.
0.2.3
* Fixed bug that would put junk tab entries in conjugation tables, possibly causing other conjugation issues as well.
* Added pre-nominal "tense" to na-nouns. Makes correct parses a little more likely.
* Added more space for conjugation table indices, just to be safe. Makes dictionaries a bit bigger. Names that can be used in conjugation tables are no longer restricted.
* Upgraded verb conjugation algorithm to handle more complicated tables. Tables now make a fair bit of use of stem forms.
* Dictionaries can now be in Shift-JIS format, in addition to the UTF8, UTF16, and EUC-JP from previous version. Dictionaries must start with 4 full-width question marks and have no extension to be recognized, regardless of character encoding.
* Fixed bug that would display katakana/hiragana mismatch hits first instead of last.
* Fixed messing up window placement on minimize.
* Tooltip placement improved. Shouldn't have issues with its placement now unless it's too tall for the screen or running at a really low resolution. Also makes slightly wider windows when they have a lot of text.
0.2.2
* Added support for splitting up windows.
* Added handling for verbs conjugated multiple times. Note that causitive, passive, and potential are all similar and have multiple forms, so getting 4+ possible combinations of them is generally not a bug.
* Added new verb "tense" for contractions involving "shimau". Run into them several times before, so seemed worth adding.
* Improved support for suru verbs (Suru nouns? "vs" class in edict). Were treated as two words. Now handled as one, which favors them more when picking a parse of the sentence. This change makes dictionaries a little bigger.
* Added recognition of "(P)" in english as well as Japanese strings, and treat them the same.
* Now use more frequent words when generating furigana, rather than random ones.
* Removed caus-pass form, as I now conjugate both causitive and passive in sequence.
* Fixed i-adjective conjugations (Conjugation table bug).
* No longer display "Non-past" for verbs whose only match is their standard non-past form.
* Verb conjugation code now handles the copula. Will often get multiple hits, as a lot of the common conjugated forms of it are also in edict.
* Fixed lower penalty for unmatched characters at the end of a string.
* Fix for issue that could theoretically give multiple conjugations of the same tense for the same verb.
* Fixed checking ConjugationTable.txt's modification date when deciding if need to update dictionary files.
* Japanese dash no longer matches every other special character (Other special characters match each other, but not a whole lot of them in the dictionary - most of them are listed at the top of edict).
* Fixed some bugs related to using multiple dictionaries.
* Now automatically checks for any modified dictionaries just before each translation. Updates compiled dictionaties as needed.
0.2.1
* Fixed two crash issues when no Furigana text is being displayed by JParser or Mecab.
* Removed display of cut up verb prefixes in dictionary definition display.
* Merged redundant display of definitions for some conjugated verb forms.
* Negative/Formal labelling fixed.
* Fixed tooltip colors.
* Fixed JParser ignoring last character of sentence.
* Updated dictionary format to use what little word frequency information edict has. No need to delete old dictionaries, will automatically recreat them if you still have the original edict2 file.
* Compiled dictionary should now be locale independent.
* When furigana are hidden, no longer space words as if they're still visible in JParser.
* More common and particle definitions now appear at the top of the list.
* Removed a couple redundant conjugation table entries.
0.2.0
* Added beta Japanese word parser, requires edict.
* Fixed potential ATLAS crash issue.
* Improved support for unicode file names with injection code.
* Can select WWWJDICT mirrors, as Setsumi suggested. Default mirror is now the Canadian one.
* Navigating to an exe will load the last settings for that exe, as Freaka suggested.
0.1.8
* Fixed failure to inject into first listed process bug.
* Added "replace x with y" to rules.
* Function hooking module works when compiled under 64-bit (Note: There is no 64-bit ATLAS dll, and you'd need a 64-bit exe and dll to inject into 64-bit processes, so it's pretty pointless, at the moment. Was doing something else with the code that needed this).
0.1.7
* Updated FreeTranslation code.
* Some additional AGTH checkboxes.
0.1.6
* Mecab display completely redone. Now displays furigana and has its own config screen.
* Should now automatically detect whether MeCab is using a UTF8 or Shift-JIS dictionary.
* AGTH default parameter set to /c
* Some whitespace bugs in the new ATLAS preparser fixed.
* Bug that would cause the game exe name dropdown to be blank after selecting a game fixed.
* Game profiles are now named after the game and its launch directory.
* Delete key can be used to delete a game profile.
* No longer inject my dll when neither menu translation nor my text/graphics hooker are enabled.
* Prefixing a translation rule by "line " will add a line break (Like "line break before whatever").
* Context menu added. Just duplicates the view menu, at the moment.
* Transparency adjustment is now coarser (And thus faster).
* Slightly upgraded graphics hooker - may work with multiple window games or those that use multiple HDCs at once. Broke it on games that never release their device context, but I'll add support for them again at some point.
0.1.5
* Removed debugging code that broke full text translation.
* Added "Rule Set" support and rewrote ATLAS preparser.
* Can now set ATLAS parameters for other processes.
* Added support for automatically attaching AGTH to processes.
* Added a single more complete injection/launching screen, with application memory.
* Processes without windows are no longer listed.
0.1.4
* Fixed context menu translation and "Full Translation" options.
0.1.3
* Crash on copy to clipboard under 32-bit versions of Windows fixed.
0.1.2:
* Bug that removed last character of sentences when the Japanese ends with a punctuation mark but the ATLAS translation does not fixed.
* If you copy some (< 10k characters) text from my interface and then quit, clipboard data will persist.
0.1.1:
* Added auto translate selected text button.
* Should be able to edit ATLAS dictionary now.
* Added ability to translate more than just Windows text in running games. Not AGTH compatible, can be disabled.
* App translator now preserves tabs in menus.
* App translator will now translate popup menus.
* Added Honyaku translator.
* Added line break to sentence breaking options (Always breaks on double linebreak, regardless)
* Fixed display of raw html responses on error. Old code could cause crashes.
* Linebreaks now appear in Google results.
* Fixed a bug when removing spaces between Japanese and non-Japanese characters in a string that has half-width katakana characters.
* dll injection should no longer lock up the other program.
* Now force dialogs to redraw after modify their items. Alleviates issue of controls not being redrawn properly after I resize them.
* Resizing rows/columns/main window should be a little more responsive.
* Added some extra ATLAS punctuation handling code. Sentences should (Generally) start with two spaces, and ?...? and ?... and the like will now be replaced with ...?
* No longer send strings without Japanese characters to ATLAS.
* Tooltips started working. Absolutely no idea why.
0.1.0:
* Fixed MeCab rapid clipboard copy crash bug.
* Fixed bug that was causing hot-key bindings not to be saved.
* When using hotkeys to switch layouts, no longer clears the text in windows that are visible in both the old and new layouts. Note that all hidden windows always lose their text.
* WWWJDIC coloring fixed in cases where one line has no identified words.
* Can now hide toolbars.
* Can now make window transparent.
* Added option to break on linebreaks with ATLAS.
* Dll injection added.
* No longer replace multiple Japanese exclamation marks with ASCII question marks when using ATLAS.
* Increased the limit on the text editor text length from the default of 32k to the maximum allowed. Only really was an issue when pasting large amounts of text manually.
* In some cases, ATLAS can lock up when you send it spaces. I now send no spaces to ATLAS unless they occur between two non-Japanese characters.
* No longer change current directory in order to load Atlas dlls (Didn't know about LoadLibraryEx() before).
* Attempt to make my HTTP code a little more robust.
* Hack to fix hanging on quit after using a dialog issue.
Last edited by ScumSuckingPig; 11-22-2009 at 12:46 AM.
Seems to be a great program. Just add "Auto paste on clipboard change" and "Auto translate on paste" options and it would be convenient to use with AGTH.
Few technical notes:
1) I always get cut off in Babel Fish translation on '「' and '」' symbols. They are indeed turn into '\0' in html response. But how they are processed is not a bug in Microsoft's code, but in yours: you use 'strstr' and 'strlen' functions and they work with null-terminated strings, so '\0' is end-of-string marker for them. The simplest workaround is to use something like this after you do MultiByteToWideChar/WideCharToMultiByte with length and before any str* functions:
Code:
for (int i = 0; i < text_len; i++)
if (!Text[i])
Text[i] = ' ';
Text[text_len] = 0;
2) You do too many MultiByteToWideChar/WideCharToMultiByte conversions. It's better to just MultiByteToWideChar once from any source encoding you have into UTF-16 and than work with wcs* and *W functions instead of str* and *A. Also i think it's better to use *W WinAPI everywhere for consistency.
3) Instead of looking into hardcoded "C:\\Program Files (x86)\\ATLAS V14\\" and "C:\\Program Files\\ATLAS V14\\" i think it's better to use registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Curr entVersion\ProgramFilesDir or SHGetFolderPath api.
4) MS Visual C++ 6 is very buggy - you should really switch to Visual C++ 2005 or 2008 (Express editions are free).
5) There are some fair ways to reduce compiled program size, not just UPX'ing. ^_^ In general packing programs is bad - they take less hdd space but more ram when work.
6) Are you sure '#ApplicationLocale' do something useful for you? I thought it works only on creating new process, not on loading dlls. Or you are using it for calling self with "/ATLAS:DICTSEARCH" ?
Thanks for the feedback. Automatic clipboard translation support is definitely a good idea.
1) Actually, I never use strlen before I strip HTML from the file. Turns out that there's a null in the middle of the Babelfish webpage. When you use japanese quotes, you get "\0Translation.\0" No other non-ASCII characters in there at all, so it's not a codepage thing. Might be because I'm using a GET rather than a POST, and Yahoo prefers that I POST using ISO-8859-1, or could just be a yahoo thing. Either way, now that I know what it is, not hard to fix, or at least implement a workaround.
2) Only unneeded calls I know of for either is with my Babel Fish code, which calls them both one extra time in a vain attempt to fix the null issue. My code to locate the translation currently expects ASCII/multi-byte strings. Once I cut that out, I switch to exclusively using unicode, though I probably do use ASCII too much elsewhere.
3) I do check the registry key (HKCU\Software\Fujitsu\ATLAS\V14.0\EJ\TRENV EJ). If the directory or key doesn't exist, I check the two hard-coded locations as well. Probably unnecessary, but there doesn't seem a key for the ATLAS directory or executable itself, which makes me nervous.
4) I code/debug with MS VC 2005, but VC 6.0 gives me such wonderfully small exes without having to redistribute a couple MB of dlls. I know that exe size doesn't really matter, but 75 KB vs 19 KB...
5) You're right. UPX claims it requires no additional memory, but now that I look at reported memory usage, it does actually use about the size of the compressed exe in extra RAM. Won't use it for future versions. Though with UPX the exe itself still takes up less RAM than without UPX but with VC 2005...
6) You're right, launching "/ATLAS: DICTSEARCH" in another process is the only reason for that. Atlas's dictionary interface doesn't work correctly unless the locale is Japanese, and I'd be nervous about launching it in another thread while a translation is running, anyways.
Last edited by ScumSuckingPig; 11-17-2008 at 07:37 AM.
3) Of cource i've seen that you check atlas registry key. My idea was that it's bad to hardcode 'Program Files' location because it easily could be in some other place like 'D:\Program Files'.
4) Look at my AGTH - i use 2005 environment with 2008 SP1 compiler. So, if you know what to do it's possible to get all bugfixes and optimizations of new compilers and compiled size less than you get in VC 6. If you are interested how to do it - i can tell you more.
Interface should remember its position/sizes.
In vertical view it could be more space-efficient to have beside text area only one vertical button with name of translation engine on it (that would do Translate action), maybe add checkbox above or below it without text that would mean include this engine in auto translation. Though it's arguable because it would be much less obvious than now.
I've been thinking of moving the translator name/buttons above the text edit controls instead of to the side, so they take up less space. Can also add toolbar-ish icons to them with tool tips and such.
Mecab support looks like it'll be relatively easy to do. Could have a couple modes for hiragana, katakana, and english text. Hooking it up directly to a J to E dictionary might be nifty, too.
And thanks for the instructions.
Last edited by ScumSuckingPig; 11-17-2008 at 01:34 PM.
I really like 0.0.4. I was trying it since the first version throughout the week and one of my minor peeves was the clipboard and window frame memory and you fixed it so quickly. I'm sorry I can't provide better feedback but hopefully I'll run into some noteworthy bugs with more trials.
Thank you eagerly looking forward to future updates.
Bookmarks