问题
Is there a module for handling TMX(Translation Memory eXchange) files in python, if not, what would be another way to do it?
As it stands, I have a giant 2gb file with French-English subtitles. Would it be possible to even handle such a file or would I have to break it down?
回答1:
As @hurrial said, you can use translate-toolkit.
Install
This toolkit is only available using pip. To install it, run:
pip install translate-toolkit
Usage
Assume that you have the following simple sample.tmx
file:
<tmx version="1.4">
<header
creationtool="XYZTool" creationtoolversion="1.01-023"
datatype="PlainText" segtype="sentence"
adminlang="en-us" srclang="en"
o-tmf="ABCTransMem"/>
<body>
<tu>
<tuv xml:lang="en">
<seg>Hello world!</seg>
</tuv>
<tuv xml:lang="ar">
<seg>اهلا بالعالم!</seg>
</tuv>
</tu>
</body>
</tmx>
You can parse this simple file like so:
>>> from translate.storage.tmx import tmxfile
>>>
>>> with open("sample.tmx", 'rb') as fin:
... tmx_file = tmxfile(fin, 'en', 'ar')
>>>
>>> for node in tmx_file.unit_iter():
... print(node.getsource(), node.gettarget())
Hello world! اهلا بالعالم!
For more info, check the official documentation from here.
回答2:
You may check the following links:
- pretranslate: http://translate-toolkit.readthedocs.org/en/latest/commands/pretranslate.html
- Translate toolkit: http://en.wikipedia.org/wiki/Translate_Toolkit
- Translate toolkit package: https://pypi.python.org/pypi/translate-toolkit
- Translate API: https://github.com/translate/translate
Cheers,
来源:https://stackoverflow.com/questions/20356149/tmxtranslation-memory-exchange-files-in-python