2

Using QGIS version 2.18.11 I have an attribute table with a column of over 500K of records that has many missing spellings, abbreviations and erroneous characters that I need cleaned up on a regular basis. I can use regexp in the field calculator but can only run one expression at a time. I need to run approximately 150 expressions.
Like:

regexp_replace("DirtyNames" ,'([-\/~!@#$%^&*{}:"<>?|/.,;=_+]?)', '') and regexp_replace("DirtyNames" ,'TRL', 'TRAIL')

I’m thinking a Python script would likely be the easiest way to do this but don’t know the Python language. The table is "Local_Streets". The field is "DirtyNames" and the regexp results would go to the "CleanedNames" field.

# Run_Regexp_Statements
####################################
from qgis.utils import iface
from PyQt4.QtCore import QVariant
_DIRTY_FIELD = 'DirtyNames'
_CLEAN_FIELD = 'CleanedNames'
layer = iface.activeLayer()
layer.startEditing()
# Create a field to store the results
layer.dataProvider().addAttributes(
 [QgsField(_CLEAN_FIELD, QVariant.String),
layer.updateFields()
# **WHAT NEEDS TO GO HERE to be able to run the regexp’s ?**
# regexp_replace("DirtyNames" ,'([-\/~!@#$%^&*{}:"<>?|/.,;=_+]?)', '')
# regexp_replace("DirtyNames" ,'TRL', 'TRAIL')
Layer.commitChanges()
print 'Processing complete.'
PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
asked Sep 4, 2017 at 19:09
0

1 Answer 1

2

You'll need the re library.

import re
replaced = re.sub(r'([-\/~!@#$%^&*{}:"<>?|/.,;=_+]?)',r'replacementpattern',dirty_line))

More info.

answered Sep 4, 2017 at 19:39
3
  • Can the following line be : replaced = re.sub('TRL','TRAIL',replaced) (and so on) in your code ? Commented Sep 4, 2017 at 19:48
  • Thanks Ena2345 and gisnside for your replies. I get a invalid syntax error on both. I looked at the link provided by Ena but I'm not really following what I'm doing wrong.<br> import re<br> replaced = re.sub('TRL','TRAIL',r'replacementpattern',_DIRTY_FIELD) Commented Sep 4, 2017 at 20:46
  • Well the reason why it doesn't work is because it only works for 1 value at the time. So if you looped through the layer you could do it. But such operation should cost you O(n). I don't know how much such operations delay on QGIS so I cannot estimate the time taken. Commented Sep 5, 2017 at 21:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.