I have csv file having some address data mostly in Finnish language. I need to read that file and getting some geocode information of these address. But It doesn't work for Finnish alphabet and says it cant read those! Can anybody please help me out of this?
import urllib,urllib2,time
addr_file = 'address.csv'
out_file = 'addresses_geocoded.csv'
out_file_failed = 'failed.csv'
sleep_time = 2
root_url = "http://maps.google.com/maps/geo?"
gkey = "asfasdfasdfasdf" # not an actual value
return_codes = {'200':'SUCCESS',
'400':'BAD REQUEST',
'500':'SERVER ERROR',
'601':'MISSING QUERY',
'602':'UNKOWN ADDRESS',
'603':'UNAVAILABLE ADDRESS',
'604':'UNKOWN DIRECTIONS',
'610':'BAD KEY',
'620':'TOO MANY QUERIES'
}
def geocode_for_musiquitous(addr_file,out_fmt='csv'):
#encode our dictionary of url parameters
values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
data = urllib.urlencode(values)
#set up our request
url = root_url+data
req = urllib2.Request(url)
#make request and read response
response = urllib2.urlopen(req)
geodat = response.read().split(',')
response.close()
# this section is just handle the data returned from google
code = return_codes[geodat[0]]
if code == 'SUCCESS':
code,precision,lat,lng = geodat
return {'code':code,'precision':precision,'lat':lat,'lng':lng}
else:
return {'code':code}
def main():
#open i/o files
outf = open(out_file,'w')
outf_failed = open(out_file_failed,'w')
inf = open(addr_file,'r')
for address in inf:
#get latitude and longitude of address
data = geocode_for_musiquitous(address)
#output results and log to file
if len(data)>1:
print "Latitude and Longitude of "+address+":"
print "\tLatitude:",data['lat']
print "\tLongitude:",data['lng']
outf.write(address.strip()+data['lat']+','+data['lng']+'\n')
outf.flush()
else:
print "Geocoding of '"+addr_file+"' failed with error code "+data['code']
outf_failed.write(address)
outf_failed.flush()
time.sleep(sleep_time)
#clean up
inf.close()
outf.close()
outf_failed.close()
if __name__ == "__main__":
main()
-
preview exists for a reason, fix your formatting!Idan K– Idan K2010年02月09日 12:29:58 +00:00Commented Feb 9, 2010 at 12:29
-
@rahman: formatting was fixed, please don't break it again.SilentGhost– SilentGhost2010年02月09日 12:31:33 +00:00Commented Feb 9, 2010 at 12:31
-
sorry..I was confused when editing!rahman.bd– rahman.bd2010年02月09日 12:36:48 +00:00Commented Feb 9, 2010 at 12:36
-
2"It says it cant read those." That is, I assure you, not what it says. It is much easier to debug if you can tell us exactly what python says. That is, paste in the error message and stack trace. That will tell us exactly what line the problem is on, so we don't have to wade through your entire program to find it.jcdyer– jcdyer2010年02月09日 12:41:27 +00:00Commented Feb 9, 2010 at 12:41
4 Answers 4
The argument of urllib.url should be UTF-8 encoded beforehand:
addr_file = addr_file.encode("utf-8")
values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
data = urllib.urlencode(values)
And make sure you open the CSV file with the correct encoding (might be "windows-1252" or "iso-8859-1"):
inf = codecs.open(addr_file, 'r', 'iso-8859-1')
Comments
I don't know Python, but I'm pretty sure this is an encoding issue.
Make sure your address file is UTF-8 encoded and that urlencode() function you use can deal with UTF-8 characters (the latter shouldn't be a problem though, as Python can handle UTF-8 natively as far as I know).
Comments
Use the codecs module.
codecs.open(filename, mode[, encoding[, errors[, buffering]]])Open an encoded file using the given mode and return a wrapped version providing transparent encoding/decoding. The default file mode is 'r' meaning to open the file in read mode.
You can use wrapped file objects to read encoded files into unicode strings.
Comments
You need to open file using the correct encoding using the codecs module. The correct encoding for Finnish is probably ISO-8859-1
inf = codecs.open(addr_file,'r', 'iso-8859-1')
If this is not the correct encoding for your file you need to find out what the correct encoding for you file is then check whether a codec for it is available like below:
import codecs
codec = codecs.lookup("iso-8859-1'")
print codec.name
If codecs.lookup() returns a codec object for the encoding you a looking for then it is available and can be used in codecs.open() .