I have a number of codes which I need to process, and these come through in a number of different formats which I need to manipulate first to get them in the right format:
Examples of codes:
ABC1.12 - correct format
ABC 1.22 - space between letters and numbers
ABC1.12/13 - 2 codes joined together and leading 1. missing from 13, should be ABC1.12 and ABC1.13
ABC 1.12 / 1.13 - codes joined together and spaces
I know how to remove the spaces but am not sure how to handle the codes which have been split. I know I can use the split function to create 2 codes but not sure how I can then append the letters (and first number part) to the second code. This is the 3rd and 4th example in the list above.
WHAT I HAVE SO FAR
val = # code
retList = [val]
if "/" in val:
(code1, code2) = session_codes = val.split("/", 1)
(inital_letters, numbers) = code1.split(".", 1)
if initial_letters not in code2:
code2 = initial_letters + '.' + code2
# reset list so that it returns both values
retList = [code1, code2]
This won't really handle the splits for 4 as the code2 becomes ABC1.1.13
5 Answers 5
You can use regex for this purpose
A possible implementation would be as follows
>>> def foo(st):
parts=st.replace(' ','').split("/")
parts=list(re.findall("^([A-Za-z]+)(.*)$",parts[0])[0])+parts[1:]
parts=parts[0:1]+[x.split('.') for x in parts[1:]]
parts=parts[0:1]+['.'.join(x) if len(x) > 1 else '.'.join([parts[1][0],x[0]]) for x in parts[1:]]
return [parts[0]+p for p in parts[1:]]
>>> foo('ABC1.12')
['ABC1.12']
>>> foo('ABC 1.22')
['ABC1.22']
>>> foo('ABC1.12/13')
['ABC1.12', 'ABC1.13']
>>> foo('ABC 1.12 / 1.13')
['ABC1.12', 'ABC1.13']
>>>
Are you familiar with regex? That would be an angle worth exploring here. Also, consider splitting on the space character, not just the slash and decimal.
Comments
I suggest you write a regular expression for each code pattern and then form a larger regular expression which is the union of the individual ones.
Comments
Using PyParsing
The answer by @Abhijit is a good, and for this simple problem reg-ex may be the way to go. However, when dealing with parsing problems, you'll often need a more extensible solution that can grow with your problem. I've found that pyparsing is great for that, you write the grammar it does the parsing:
from pyparsing import *
index = Combine(Word(alphas))
# Define what a number is and convert it to a float
number = Combine(Word(nums)+Optional('.'+Optional(Word(nums))))
number.setParseAction(lambda x: float(x[0]))
# What do extra numbers look like?
marker = Word('/').suppress()
extra_numbers = marker + number
# Define what a possible line could be
line_code = Group(index + number + ZeroOrMore(extra_numbers))
grammar = OneOrMore(line_code)
From this definition we can parse the string:
S = '''ABC1.12
ABC 1.22
XXX1.12/13/77/32.
XYZ 1.12 / 1.13
'''
print grammar.parseString(S)
Giving:
[['ABC', 1.12], ['ABC', 1.22], ['XXX', 1.12, 13.0, 77.0, 32.0], ['XYZ', 1.12, 1.13]]
Advantages:
The number is now in the correct format, as we've type-casted them to floats during the parsing. Many more "numbers" are handled, look at the index "XXX", all numbers of type 1.12, 13, 32. are parsed, irregardless of decimal.
Comments
Take a look at this method. The might be the simple and yet best way to do.
val = unicode(raw_input())
for aChar in val:
if aChar.isnumeric():
lastIndex = val.index(aChar)
break
part1 = val[:lastIndex].strip()
part2 = val[lastIndex:]
if "/" not in part2:
print part1+part2
else:
if " " not in part2:
codes = []
divPart2 = part2.split(".")
partCodes = divPart2[1].split("/")
for aPart in partCodes:
codes.append(part1+divPart2[0]+"."+aPart)
print codes
else:
codes = []
divPart2 = part2.split("/")
for aPart in divPart2:
aPart = aPart.strip()
codes.append(part1+aPart)
print codes
AAA 12.3/66should be interpreted asAAA: 12.3andAAA:1.66? How do you know that the "leading one" is stripped from the66?