12

Is there any recommended way to do multiple string substitutions other than doing replace chaining on a string (i.e. text.replace(a, b).replace(c, d).replace(e, f)...)? How would you, for example, implement a fast function that behaves like PHP's htmlspecialchars in Python?

I compared (1) multiple replace method, (2) the regular expression method, and (3) Matt Anderson's method.

With n=10 runs, the results came up as follows:

On 100 characters:

TIME: 0 ms [ replace_method(str) ]
TIME: 5 ms [ regular_expression_method(str, dict) ]
TIME: 1 ms [ matts_multi_replace_method(list, str) ]

On 1000 characters:

TIME: 0 ms [ replace_method(str) ]
TIME: 3 ms [ regular_expression_method(str, dict) ]
TIME: 2 ms [ matts_multi_replace_method(list, str) ]

On 10000 characters:

TIME: 3 ms [ replace_method(str) ]
TIME: 7 ms [ regular_expression_method(str, dict) ]
TIME: 5 ms [ matts_multi_replace_method(list, str) ]

On 100000 characters:

TIME: 36 ms [ replace_method(str) ]
TIME: 46 ms [ regular_expression_method(str, dict) ]
TIME: 39 ms [ matts_multi_replace_method(list, str) ]

On 1000000 characters:

TIME: 318 ms [ replace_method(str) ]
TIME: 360 ms [ regular_expression_method(str, dict) ]
TIME: 320 ms [ matts_multi_replace_method(list, str) ]

On 3687809 characters:

TIME: 1.277524 sec [ replace_method(str) ]
TIME: 1.290590 sec [ regular_expression_method(str, dict) ]
TIME: 1.116601 sec [ matts_multi_replace_method(list, str) ]

So kudos to Matt for beating the multi replace method on a fairly large input string.

Anyone got ideas for beating it on a smaller string?

sophros
17.2k11 gold badges52 silver badges84 bronze badges
asked Aug 5, 2010 at 0:54
3
  • Good discussion here stackoverflow.com/questions/3367809/… Commented Aug 5, 2010 at 2:05
  • Tim, only useful comment on the page is one by Alex. He gives an example for the linear regular expression substitution method that I verified as being slower on a 3.5M size document with 5 substitution pairs. So it doesn't provide new idea to me. Commented Aug 5, 2010 at 2:41
  • Do you require that the result of the first substitution be available for participating in the next substitution (as it would be in your example of replacement chaining)? Or do you want all substitutions to operate on the original text only? If the latter, do you have something in mind on how to prioritize them if the overlap or otherwise conflict? Commented Aug 5, 2010 at 4:14

3 Answers 3

7

Something like the following maybe? Split the text into pieces with the first "from" item to be replaced, then recursively split each of those parts into sub-parts with the next "from" item to be replaced, and so on, until you've visited all your replacements. Then join with the "to" replacement item for each as recursive function completes.

A little hard to wrap your head around the following code perhaps (it was for me, and I wrote it), but it seems to function as intended. I didn't benchmark it, but I suspect it would be reasonably fast.

def multi_replace(pairs, text):
 stack = list(pairs)
 stack.reverse()
 def replace(stack, parts):
 if not stack:
 return parts
 # copy the stack so I don't disturb parallel recursions
 stack = list(stack) 
 from_, to = stack.pop()
 #print 'split (%r=>%r)' % (from_, to), parts
 split_parts = [replace(stack, part.split(from_)) for part in parts]
 parts = [to.join(split_subparts) for split_subparts in split_parts]
 #print 'join (%r=>%r)' % (from_, to), parts
 return parts
 return replace(stack, [text])[0]
print multi_replace(
 [('foo', 'bar'), ('baaz', 'foo'), ('quux', 'moop')], 
 'foobarbaazfooquuxquux')

for:

barbarfoobarmoopmoop
answered Aug 5, 2010 at 5:26
1
  • 1
    Matt, thanks for the function. It beats multiple 'replace' method on a large string :) 'replace' method is still faster on a smaller string though (strings < 1M or so). I've added my test results to my original question. You might want to check it out. Commented Aug 5, 2010 at 6:58
1

Normally, .replace method beats all other methods. (See my benchmarks above.)

sophros
17.2k11 gold badges52 silver badges84 bronze badges
answered Aug 30, 2010 at 10:14
0

How fast? Also, how big are your strings?

There's a fairly simple recipe for building a regular expression to do the job on another site. It might need some tweaking to handle regex metacharacters; I didn't look too closely.

If that's not good enough, you probably need to write some C code, honestly. You can build a simple state machine to do all the replacements, and then process any string byte by byte with no backtracking along the machine to actually do the work. However, I doubt you will beat the regex engine without going to C and optimizing that.

answered Aug 5, 2010 at 1:02
3
  • "How fast?" - at least faster than the replace chaining method above. "how big" - up to what the system RAM allows minus memory space that would be used by the 'function'. I'm not talking about some gigantic string of, say, 4GiB size. Commented Aug 5, 2010 at 1:07
  • Based on my experiment with 3.5M size of a string with 5 substitution pairs, the regular expression method always performs worse (1.285878 sec vs 1.341442 sec), even with the caching of the resultant re object. If I increase the number of substitution pairs, the situation could be reversed. But under normal condition, it just doesn't happen. So the linear regular expression method in the recipe in this case is not usable. My tests were againt a more efficient version of it, actually. Commented Aug 5, 2010 at 2:34
  • Yeah, that method is not so great. With any luck we'll see a better answer soon. If not, I may see if implementing a state machine on top of the Python buffer interface works any better. Commented Aug 5, 2010 at 3:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.