3

I have a number of codes which I need to process, and these come through in a number of different formats which I need to manipulate first to get them in the right format:

Examples of codes:

ABC1.12 - correct format
ABC 1.22 - space between letters and numbers
ABC1.12/13 - 2 codes joined together and leading 1. missing from 13, should be ABC1.12 and ABC1.13 
ABC 1.12 / 1.13 - codes joined together and spaces

I know how to remove the spaces but am not sure how to handle the codes which have been split. I know I can use the split function to create 2 codes but not sure how I can then append the letters (and first number part) to the second code. This is the 3rd and 4th example in the list above.

WHAT I HAVE SO FAR

 val = # code
 retList = [val]
 if "/" in val:
 (code1, code2) = session_codes = val.split("/", 1)
 (inital_letters, numbers) = code1.split(".", 1)
 if initial_letters not in code2:
 code2 = initial_letters + '.' + code2
 # reset list so that it returns both values 
 retList = [code1, code2]

This won't really handle the splits for 4 as the code2 becomes ABC1.1.13

Hooked
88.9k46 gold badges197 silver badges272 bronze badges
asked Mar 26, 2012 at 13:00
2
  • @John do you know that all numbers of the form AAA 12.3/66 should be interpreted as AAA: 12.3 and AAA:1.66? How do you know that the "leading one" is stripped from the 66? Commented Mar 26, 2012 at 14:16
  • if there is a dot in the numbered part of the string then both sides should start with the number(s) before the dot followed by a dot followed by the second set of numbers. e.g. XX1.11/12 would always be XX1.11 and XX1.12 and not XX1.11 and XX12. If no dot in the string then we can assume no leading number e.g. EFG10/12 would be EFG10 and EFG20 Commented Mar 26, 2012 at 15:12

5 Answers 5

3

You can use regex for this purpose

A possible implementation would be as follows

>>> def foo(st):
 parts=st.replace(' ','').split("/")
 parts=list(re.findall("^([A-Za-z]+)(.*)$",parts[0])[0])+parts[1:]
 parts=parts[0:1]+[x.split('.') for x in parts[1:]]
 parts=parts[0:1]+['.'.join(x) if len(x) > 1 else '.'.join([parts[1][0],x[0]]) for x in parts[1:]]
 return [parts[0]+p for p in parts[1:]]
>>> foo('ABC1.12')
['ABC1.12']
>>> foo('ABC 1.22')
['ABC1.22']
>>> foo('ABC1.12/13')
['ABC1.12', 'ABC1.13']
>>> foo('ABC 1.12 / 1.13')
['ABC1.12', 'ABC1.13']
>>> 
answered Mar 26, 2012 at 13:25
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks this is almost perfect. The only one which seems to be wrong is the ABC1.12/13. I would like ABC1.13 rather than just ACB13
see answer above for a detailed explanation
1

Are you familiar with regex? That would be an angle worth exploring here. Also, consider splitting on the space character, not just the slash and decimal.

answered Mar 26, 2012 at 13:21

Comments

0

I suggest you write a regular expression for each code pattern and then form a larger regular expression which is the union of the individual ones.

answered Mar 26, 2012 at 13:30

Comments

0

Using PyParsing

The answer by @Abhijit is a good, and for this simple problem reg-ex may be the way to go. However, when dealing with parsing problems, you'll often need a more extensible solution that can grow with your problem. I've found that pyparsing is great for that, you write the grammar it does the parsing:

from pyparsing import *
index = Combine(Word(alphas))
# Define what a number is and convert it to a float
number = Combine(Word(nums)+Optional('.'+Optional(Word(nums))))
number.setParseAction(lambda x: float(x[0]))
# What do extra numbers look like?
marker = Word('/').suppress()
extra_numbers = marker + number
# Define what a possible line could be
line_code = Group(index + number + ZeroOrMore(extra_numbers))
grammar = OneOrMore(line_code)

From this definition we can parse the string:

S = '''ABC1.12
ABC 1.22
XXX1.12/13/77/32.
XYZ 1.12 / 1.13
'''
print grammar.parseString(S)

Giving:

[['ABC', 1.12], ['ABC', 1.22], ['XXX', 1.12, 13.0, 77.0, 32.0], ['XYZ', 1.12, 1.13]]

Advantages:

The number is now in the correct format, as we've type-casted them to floats during the parsing. Many more "numbers" are handled, look at the index "XXX", all numbers of type 1.12, 13, 32. are parsed, irregardless of decimal.

answered Mar 26, 2012 at 13:57

Comments

0

Take a look at this method. The might be the simple and yet best way to do.

val = unicode(raw_input())
for aChar in val:
 if aChar.isnumeric():
 lastIndex = val.index(aChar)
 break
part1 = val[:lastIndex].strip()
part2 = val[lastIndex:]
if "/" not in part2:
 print part1+part2
else:
 if " " not in part2:
 codes = []
 divPart2 = part2.split(".")
 partCodes = divPart2[1].split("/")
 for aPart in partCodes:
 codes.append(part1+divPart2[0]+"."+aPart)
 print codes
 else:
 codes = []
 divPart2 = part2.split("/")
 for aPart in divPart2:
 aPart = aPart.strip()
 codes.append(part1+aPart)
 print codes
answered Mar 26, 2012 at 16:03

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.