2
\$\begingroup\$

I have this code to translate text using the Google Translate mobile site. Currently, text size is limited by the request method. Everything else seems to works just fine.

I am also about to post this on PyPi but I don't know how I should name the package, the file and the function.

Basically today you must do:

import translate.translate
translate.translate.translate("hello")

which seems really bad. How should I name this?

Could you suggest any improvements to this?

#!/usr/bin/env python
# encoding: utf-8
import six
if (six.PY2):
 import urllib2
 import re
 import urllib
else:
 import urllib.request
 import urllib.parse
 import re
agent = {'User-Agent' : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"}
def translate(to_translate, to_language="auto", language="auto"):
 """
 Returns the translation using google translate
 you must shortcut the language you define (French = fr, English = en, Spanish = es, etc...)
 if you don't define anything it will detect it or use english by default
 Example:
 print(translate("salut tu vas bien?", "en"))
 hello you alright?
 """
 base_link = "http://translate.google.com/m?hl=%s&sl=%s&q=%s"
 if (six.PY2):
 link = base_link % (to_language, language, urllib.pathname2url(to_translate))
 request = urllib2.Request(link, headers=agent)
 page = urllib2.urlopen(request).read()
 else:
 link = base_link % (to_language, language, urllib.parse.quote(to_translate))
 request = urllib.request.Request(link, headers=agent)
 page = urllib.request.urlopen(request).read().decode("utf-8")
 expr = r'class="t0">(.*?)<'
 result = re.findall(expr, page)
 if (len(result) == 0):
 return ("")
 return(result[0])
if __name__ == '__main__':
 to_translate = 'Bonjour comment allez vous?'
 print("%s >> %s" % (to_translate, translate(to_translate)))
 print("%s >> %s" % (to_translate, translate(to_translate, 'es')))
 print("%s >> %s" % (to_translate, translate(to_translate, 'ar')))
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Sep 14, 2016 at 9:06
\$\endgroup\$
3
  • 2
    \$\begingroup\$ As you said, "translate() function is inside translate.py file which is inside translate folder" which, at least from my point of view, begs the question of why you used the same name? You don't really want a "translate" package, but maybe a "translation" package, or "language", or "language.translation". It's as if instead of, say, math.fsum you had fsum.fsum.fsum, with fsum.fsum.fsum(to_sum). \$\endgroup\$ Commented Sep 14, 2016 at 12:43
  • \$\begingroup\$ Is there a reason why you are web-scraping rather than using the API? \$\endgroup\$ Commented Sep 14, 2016 at 14:55
  • \$\begingroup\$ learning @200_success \$\endgroup\$ Commented Sep 14, 2016 at 15:30

1 Answer 1

4
\$\begingroup\$
import translate.translate
translate.translate.translate("hello")

The verbosity is strong with this one. The clarity is not.

Why do you have a translate in a translate in a translate? With constructs like this, you can bet there are better names available.

Inside your main the same problem occurs:

to_translate = 'Bonjour comment allez vous?'
print("%s >> %s" % (to_translate, translate(to_translate)))
print("%s >> %s" % (to_translate, translate(to_translate, 'es')))
print("%s >> %s" % (to_translate, translate(to_translate, 'ar')))

That's 3 counts of translate per call. It's obvious you're translating something, but the rest is vague.

Afar from that, are you familiar with the Python Enhancement Proposals? Some of them include style suggestions, most notably the PEP 8. You might want to take at least a good look at it.

The most notable violation of your code against the PEP 8 is your lines are too long. Long lines are hard to read, hard to parse by humans. The readability of your code will improve if you split them up logically.

The ugly way of breaking-up lines is with line continuations:

agent = {'User-Agent' : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"}
agent2 = {'User-Agent' : "Mozilla/4.0 (\
compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; \
.NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"}
print agent
print agent2

Those produce the same output.

The proper way would be turning agent into a proper data format and parsing it on request or reading it from somewhere else. You really shouldn't have hard-coded strings that long in your code anyway.

answered Sep 14, 2016 at 9:26
\$\endgroup\$
9
  • \$\begingroup\$ because translate() function is inside translate.py file which is inside translate folder, which results in translate.translate.translate (I wont publish it like that this is ugly, but I have no idea how I should name this) \$\endgroup\$ Commented Sep 14, 2016 at 9:35
  • \$\begingroup\$ @mou I understand why it looks like that. But it's a mess. \$\endgroup\$ Commented Sep 14, 2016 at 9:39
  • 1
    \$\begingroup\$ @mou Is there any way to change the contents of that string other than re-typing it into your IDE? No? Then it's hard-coded. \$\endgroup\$ Commented Sep 14, 2016 at 12:42
  • 1
    \$\begingroup\$ @mou Precisely. Because they are code. And code is, by definition, hard-coded. \$\endgroup\$ Commented Sep 14, 2016 at 12:53
  • 1
    \$\begingroup\$ @mou Better, yes :-) \$\endgroup\$ Commented Sep 14, 2016 at 13:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.