Parsing strings in python

Question 1

So my problem is this, I have a file that looks like this:

[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1

This would of course translate to

' This is an example file!'

I am looking for a way to parse the original content into the end content, so that a [BACKSPACE] will delete the last character(spaces included) and multiple backspaces will delete multiple characters. The [SHIFT] doesnt really matter as much to me. Thanks for all the help!

Question 2

Are [BACKSPACE] and [SHIFT] the only markups that you need to worry about?

Question 3

Here's one way, but it feels hackish. There's probably a better way.

def process_backspaces(input, token='[BACKSPACE]'):
 """Delete character before an occurence of "token" in a string."""
 output = ''
 for item in (input+' ').split(token):
 output += item
 output = output[:-1]
 return output
def process_shifts(input, token='[SHIFT]'):
 """Replace characters after an occurence of "token" with their uppecase 
 equivalent. (Doesn't turn "1" into "!" or "2" into "@", however!)."""
 output = ''
 for item in (' '+input).split(token):
 output += item[0].upper() + item[1:]
 return output
test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1'
print process_backspaces(process_shifts(test_string))

Question 4

If you don't care about the shifts, just strip them, load

(defun apply-bspace ()
 (interactive)
 (let ((result (search-forward "[BACKSPACE]")))
 (backward-delete-char 12)
 (when result (apply-bspace))))

and hit M-x apply-bspace while viewing your file. It's Elisp, not python, but it fits your initial requirement of "something I can download for free to a PC".

Edit: Shift is trickier if you want to apply it to numbers too (so that [SHIFT]2 => @, [SHIFT]3 => #, etc). The naive way that works on letters is

(defun apply-shift ()
 (interactive)
 (let ((result (search-forward "[SHIFT]")))
 (backward-delete-char 7)
 (upcase-region (point) (+ 1 (point)))
 (when result (apply-shift))))

Question 5

+1 for an Elisp answer! It's (not too suprisingly, I guess) quite good at this sort of thing... I'm a vim person, personally, but things like this sometimes pull me towards emacs.

Question 6

@Joe Kington - Hehe. To be truthful, this is the sort of thing I'd handle with a keyboard macro and maybe an alist unless there were multiple, large files that needed parsing. It's just that a function is easier to share and explain.

Question 7

This does exactly what you want:

def shift(s):
 LOWER = '`1234567890-=[];\',円./'
 UPPER = '~!@#$%^&*()_+{}:"|<>?'
 if s.isalpha():
 return s.upper()
 else:
 return UPPER[LOWER.index(s)]
def parse(input):
 input = input.split("[BACKSPACE]")
 answer = ''
 i = 0
 while i<len(input):
 s = input[i]
 if not s:
 pass
 elif i+1<len(input) and not input[i+1]:
 s = s[:-1]
 else:
 answer += s
 i += 1
 continue
 answer += s[:-1]
 i += 1
 return ''.join(shift(i[0])+i[1:] for i in answer.split("[SHIFT]") if i)
>>> print parse("[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1")
>>> This is an example file!

Question 8

Oops, I just spotted a bug... sorry. Fixing it now

Question 9

The debug is complete and the result is exactly what you want

Question 10

It seems that you could use a regular expression to search for (something)[BACKSPACE] and replace it with nothing...

re.sub('.?\[BACKSPACE\]', '', YourString.replace('[SHIFT]', ''))

Not sure what you meant by "multiple spaces delete multiple characters".

Question 11

-1 How will this work for "blah[BACKSPACE][BACKSPACE][BACKSPACE]arf"?

Question 12

But it needs to delete one space BEFORE the backspace as well as the '[BACKSPACE]' itslef

Question 13

That's my point -- gahooa's solution won't work for my blah-barf example.

Question 14

Yeah, i just saw what you were saying, so far the only way I can think of to do it would combine python with autoit or another manual macro/automation service, but the results would be tedious at best, and possibly not 100% functioning

Question 15

You need to read the input, extract the tokens, recognize them, and give a meaning to them.

This is how I would do it:

# -*- coding: utf-8 -*-
import re
upper_value = {
 1: '!', 2:'"',
}
tokenizer = re.compile(r'(\[.*?\]|.)')
origin = "[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1"
result = ""
shift = False
for token in tokenizer.findall(origin):
 if not token.startswith("["):
 if(shift):
 shift = False
 try:
 token = upper_value[int(token)]
 except ValueError:
 token = token.upper()
 result = result + token
 else:
 if(token == "[SHIFT]"):
 shift = True
 elif(token == "[BACKSPACE]"):
 result = result[0:-1]

It's not the fastest, neither the elegant solution, but I think it's a good start.

Hope it helps :-)

Joe Kington 287k73 gold badges621 silver badges474 bronze badges · Accepted Answer · 2011-02-03 03:26:02Z

Here's one way, but it feels hackish. There's probably a better way.

def process_backspaces(input, token='[BACKSPACE]'):
 """Delete character before an occurence of "token" in a string."""
 output = ''
 for item in (input+' ').split(token):
 output += item
 output = output[:-1]
 return output
def process_shifts(input, token='[SHIFT]'):
 """Replace characters after an occurence of "token" with their uppecase 
 equivalent. (Doesn't turn "1" into "!" or "2" into "@", however!)."""
 output = ''
 for item in (' '+input).split(token):
 output += item[0].upper() + item[1:]
 return output
test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1'
print process_backspaces(process_shifts(test_string))

CollectivesTM on Stack Overflow

Parsing strings in python

5 Answers 5

Comments

2 Comments

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

Comments

2 Comments

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related