Replacing non-English characters in attribute tables using ArcPy and Python?

Question 1

I have a few shapefiles where some of the attributes contain the non-English characters ÅÄÖ. Since some queries doesn't work with these characters (specifically ChangeDetector), I tried to change them in advance with a simple script and add the new strings to another field.

However, change in characters works fine but not updating the field with arcpy.UpdateCursor.

What is an appropriate way of solving this?

I have also tried to do this via the Field Calculator while posting "code" to the codeblock, with the same error.

(削除) Error message:
Runtime error Traceback (most recent call last): File "", line 1, in File "c:/gis/python/teststring.py", line 28, in val = code(str(prow.Typkod)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc4' in position 3: ordinal not in range(128) (削除ここまで)

Code:

# -*- coding: cp1252 -*-
def code(infield):
 data = ''
 for i in infield:
## print i
 if i == 'Ä':
 data = data + 'AE'
 elif i == 'ä':
 data = data + 'ae'
 elif i == 'Å':
 data = data + 'AA'
 elif i == 'å':
 data = data + 'aa'
 elif i == 'Ö':
 data = data + 'OE'
 elif i == 'ö':
 data = data + 'oe'
 else:
 data = data + i
 return data
shp = r'O:\XXX250000円\DB\ArcView\shape.shp'
prows = arcpy.UpdateCursor(shp)
for prow in prows:
 val = code(unicode(str(prow.Typkod), "utf-8"))
 prow.Typkod_U = val
 print val
 prows.updateRow(prow)

The values of Typkod are of the type: [D, D, S, DDRÄ, TRÄ] etc.

I use ArcMap Basic (10.1) on Windows 7.

New Error message:
Runtime error Traceback (most recent call last): File "", line 1, in File "c:/gis/python/teststring.py", line 29, in val = code(unicode(str(row.Typkod), "utf-8")) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc4' in position 3: ordinal not in range(128)

>>> val'DDRÄ'
>>> type(val)type 'str'

It appears as if the output from the function is wrong somehow. When there's ÅÄÖ involved it returns data = u'DDR\xc4' and not (as was my intention) data = 'DDRAE'. Any suggestions on what might cause this?

Question 2

I am too quite often dealing with special characters such as you have in Swedish (ä,ö,å), but also some others presenting in other languages such as Portuguese and Spanish (é,í,ú,ó etc.). For instance, I have data where the name of city is written in plain Latin with all the accents removed, so the "Göteborg" becomes "Goteborg" and "Åre" is "Are". In order to perform the joins and match the data I have to replace the accents to the English Latin-based character.

I used to do this as you've shown in your own answer first, but this logic soon became rather cumbersome to maintain. Now I use the unicodedata module which is already available with Python installation and arcpy for iterating the features.

import unicodedata
import arcpy
import os
def strip_accents(s):
 return ''.join(c for c in unicodedata.normalize('NFD', s)
 if unicodedata.category(c) != 'Mn')
arcpy.env.workspace = r"C:\TempData_processed.gdb"
workspace = arcpy.env.workspace
in_fc = os.path.join(workspace,"FC")
fields = ["Adm_name","Adm_Latin"]
with arcpy.da.UpdateCursor(in_fc,fields) as upd_cursor:
 for row in upd_cursor:
 row[1] = strip_accents(u"{0}".format(row[0]))
 upd_cursor.updateRow(row)

See the link for more information about using the unicodedata module at What is the best way to remove accents in a python unicode string?

Question 3

I see how this could be useful, but what if we need to keep the characters as is? could we do some magic to keep those special characters?

Question 4

Turns out iterating over ÅÄÖ wasn't that easy. It is refered to as a unicode string, and when checking in the if-statements that has to be used instead of the literal ÅÄÖ. After I figured that out, the rest was a piece of cake :)

Resulting code:

# -*- coding: cp1252 -*-
def code(infield):
 data = ''
 for i in infield:
## print i
 if i == u'\xc4': #Ä
 data = data + 'AE'
 elif i == u'\xe4': #ä
 data = data + 'ae'
 elif i == u'\xc5': #Å
 data = data + 'AA'
 elif i == u'\xe5': #å
 data = data + 'aa'
 elif i == u'\xd6': #Ö
 data = data + 'OE'
 elif i == u'\xf6': #ö
 data = data + 'oe'
 else:
 data = data + i
 return data
shp = arcpy.GetParameterAsText(0)
field = arcpy.GetParameterAsText(1)
newfield = field + '_U'
arcpy.AddField_management(shp, newfield, 'TEXT')
prows = arcpy.UpdateCursor(shp)
for row in prows:
 row.newfield = code(row.field)
 prows.updateRow(row)

Question 5

See if the following works:

val = code(unicode(str(prow.Typkod), "utf-8")

Question 6

Thanks! That did help for the assigning of val, but not for writing it to the current row (the following line). [Updating the question with this modification.]

Question 7

You mean this line now fails: prow.Typkod_U = val? With the same error? So what's the val value after the conversion?

Question 8

I added some new information, including the new error message.

score 9 · Accepted Answer · 2013-12-28 08:46:27Z

I am too quite often dealing with special characters such as you have in Swedish (ä,ö,å), but also some others presenting in other languages such as Portuguese and Spanish (é,í,ú,ó etc.). For instance, I have data where the name of city is written in plain Latin with all the accents removed, so the "Göteborg" becomes "Goteborg" and "Åre" is "Are". In order to perform the joins and match the data I have to replace the accents to the English Latin-based character.

I used to do this as you've shown in your own answer first, but this logic soon became rather cumbersome to maintain. Now I use the unicodedata module which is already available with Python installation and arcpy for iterating the features.

import unicodedata
import arcpy
import os
def strip_accents(s):
 return ''.join(c for c in unicodedata.normalize('NFD', s)
 if unicodedata.category(c) != 'Mn')
arcpy.env.workspace = r"C:\TempData_processed.gdb"
workspace = arcpy.env.workspace
in_fc = os.path.join(workspace,"FC")
fields = ["Adm_name","Adm_Latin"]
with arcpy.da.UpdateCursor(in_fc,fields) as upd_cursor:
 for row in upd_cursor:
 row[1] = strip_accents(u"{0}".format(row[0]))
 upd_cursor.updateRow(row)

See the link for more information about using the unicodedata module at What is the best way to remove accents in a python unicode string?

I see how this could be useful, but what if we need to keep the characters as is? could we do some magic to keep those special characters?

Stack Exchange Network

Replacing non-English characters in attribute tables using ArcPy and Python?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Replacing non-English characters in attribute tables using ArcPy and Python?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions