
Product
Introducing Module Reachability: Focus on the Vulnerabilities That Matter
Module Reachability filters out unreachable CVEs so you can focus on vulnerabilities that actually matter to your application.
Goslate: Free Google Translate API ##################################################
.. note::
Google has updated its translation service recently with a ticket mechanism to prevent simple crawler programs like goslate
from accessing.
Though a more sophisticated crawler may still work technically, it would have crossed the fine line between using the service and breaking the service.
goslate
will not be updated to break google's ticket mechanism. Free lunch is over. Thanks for using.
.. contents:: :local:
goslate
provides you free python API to google translation service by querying google translation website.
It is:
Goslate().translate('Hi!', 'zh')
The basic usage is simple:
.. sourcecode:: python
import goslate gs = goslate.Goslate() print(gs.translate('hello world', 'de')) hallo welt
goslate support both Python2 and Python3. You could install it via:
.. sourcecode:: bash
$ pip install goslate
or just download latest goslate.py <https://bitbucket.org/zhuoqiang/goslate/raw/tip/goslate.py>
_ directly and use
futures
package <https://pypi.python.org/pypi/futures>
_ is optional but recommended to install for best performance in large text translation tasks.
Proxy support could be added as following:
.. sourcecode:: python
import urllib2 import goslate
proxy_handler = urllib2.ProxyHandler({"http" : "http://proxy-domain.name:8080"}) proxy_opener = urllib2.build_opener(urllib2.HTTPHandler(proxy_handler), urllib2.HTTPSHandler(proxy_handler))
gs_with_proxy = goslate.Goslate(opener=proxy_opener) translation = gs_with_proxy.translate("hello world", "de")
Romanization or latinization (or romanisation, latinisation), in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so.
For example, pinyin is the default romanization method for Chinese language.
You could get translation in romanized writing as following:
.. sourcecode:: python
import goslate roman_gs = goslate.Goslate(writing=goslate.WRITING_ROMAN) print(roman_gs.translate('China', 'zh')) Zhōngguó
You could also get translation in both native writing system and ramon writing system
.. sourcecode:: python
import goslate
gs = goslate.Goslate(writing=goslate.WRITING_NATIVE_AND_ROMAN) gs.translate('China', 'zh') ('中国', 'Zhōngguó')
You could see the result will be a tuple in this case: (Translation-in-Native-Writing, Translation-in-Roman-Writing)
Sometimes all you need is just find out which language the text is:
.. sourcecode:: python
import goslate gs = goslate.Goslate() language_id = gs.detect('hallo welt') language_id 'de' gs.get_languages()[language_id] 'German'
It is not necessary to roll your own multi-thread solution to speed up massive translation. Goslate has already done it for you. It utilizes concurrent.futures
for concurrent querying. The max worker number is 120 by default.
The worker number could be changed as following:
.. sourcecode:: python
import goslate import concurrent.futures executor = concurrent.futures.ThreadPoolExecutor(max_workers=200) gs = goslate.Goslate(executor=executor) it = gs.translate(['text1', 'text2', 'text3']) list(it) ['translation1', 'translation2', 'translation3']
It is advised to install concurrent.futures
backport lib in python2.7 (python3 has it by default) to enable concurrent querying.
The input could be list, tuple or any iterator, even the file object which iterate line by line
.. sourcecode:: python
translated_lines = gs.translate(open('readme.txt')) translation = '\n'.join(translated_lines)
Do not worry about short texts will increase the query time. Internally, goslate will join small text into one big text to reduce the unnecessary query round trips.
Google translation does not support very long text, goslate bypasses this limitation by splitting the long text internally before sending it to Google and joining the multiple results into one translation text to the end user.
.. sourcecode:: python
import goslate with open('the game of thrones.txt', 'r') as f: novel_text = f.read() gs = goslate.Goslate() gs.translate(novel_text)
Goslate uses batch and concurrent fetch aggressively to achieve maximized translation speed internally.
All you need to do is reduce API calling times by utilizing batch translation and concurrent querying.
For example, say if you want to translate 3 big text files. Instead of manually translate them one by one, line by line:
.. sourcecode:: python
import goslate
big_files = ['a.txt', 'b.txt', 'c.txt'] gs = goslate.Goslate()
translation = [] for big_file in big_files: with open(big_file, 'r') as f: translated_lines = [] for line in f: translated_line = gs.translate(line) translated_lines.append(translated_line)
translation.append('\n'.join(translated_lines))
It is better to leave them to Goslate totally. The following code is not only simpler but also much faster (+100x) :
.. sourcecode:: python
import goslate
big_files = ['a.txt', 'b.txt', 'c.txt'] gs = goslate.Goslate()
translation_iter = gs.translate(open(big_file, 'r').read() for big_file in big_files) translation = list(translation_iter)
Internally, goslate will first adjust the text to make them not so big that do not fit Google query API, nor so small that increase the total HTTP querying times. Then it will use concurrent queries to speed things even further.
If you want detail dictionary explanation for a single word/phrase, you could
.. sourcecode:: python
import goslate gs = goslate.Goslate() gs.lookup_dictionary('sun', 'de') [[['Sonne', 'sun', 0]], [['noun', ['Sonne'], [['Sonne', ['sun', 'Sun', 'Sol'], 0.44374731, 'die']], 'sun', 1], ['verb', ['der Sonne aussetzen'], [['der Sonne aussetzen', ['sun'], 1.1544633e-06]], 'sun', 2]], 'en', 0.9447732, [['en'], [0.9447732]]]
There are 2 limitations for this API:
The result is a complex list structure which you have to parse for your own usage
The input must be a single word/phase, batch translation and concurrent querying are not supported
If you get an HTTP 5xx error, it is probably because google has banned your client IP address from transaction querying.
You could verify it by accessing google translation service in the browser manually.
You could try the following to overcome this issue:
query through a HTTP/SOCKS5 proxy, see Proxy Support
_
using another google domain for translation: gs = Goslate(service_urls=['http://translate.google.de'])
wait for 3 seconds before issue another querying
please check API reference <http://pythonhosted.org/goslate/#module-goslate>
_
goslate.py
is also a command line tool which you could use directly
Translate stdin
input into Chinese in GBK encoding
.. sourcecode:: bash
$ echo "hello world" | goslate.py -t zh-CN -o gbk
Translate 2 text files into Chinese, output to UTF-8 file
.. sourcecode:: bash
$ goslate.py -t zh-CN -o utf-8 source/1.txt "source 2.txt" > output.txt
use --help
for detail usage
.. sourcecode:: bash
$ goslate.py -h
issues & suggestions <https://bitbucket.org/zhuoqiang/goslate/issues>
_repository <https://bitbucket.org/zhuoqiang/goslate>
_Donation <http://pythonhosted.org/goslate/#donate>
_threading.currentThread()
properlyretry_wait_duration
param to fine control the retry behavior in case of connection errorAdd new API Goslate.lookup_dictionary()
to get detail information for a single word/phrase, thanks for Adam's suggestion
Improve document with more user scenario and performance consideration
[fix bug] fix compatible issue with latest google translation service json format changes
[fix bug] unit test failure
[new feature] Translation in roman writing system (romanization), thanks for Javier del Alamo's contribution.
[new feature] Customizable service URL. you could provide multiple google translation service URLs for better concurrency performance
[new option] roman writing translation option for CLI
[fix bug] Google translation may change normal space to no-break space
[fix bug] Google web API changed for getting supported language list
FAQs
Goslate: Free Google Translate API
We found that goslate demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Module Reachability filters out unreachable CVEs so you can focus on vulnerabilities that actually matter to your application.
Product
Socket is introducing a new way to organize repositories and apply repository-specific security policies.
Research
Security News
Socket researchers uncovered malicious npm and PyPI packages that steal crypto wallet credentials using Google Analytics and Telegram for exfiltration.