Getting columns from lines of uneven length

Question 1

I am given pasted data from a table, so I have spaces as delimiters and in some of the fields I don't care about. I want the first field and the last three fields, and got them using this code:

testdata = """8/5/15 stuffidontneed custid locid 55.00
8/9/15 stuff i really dont need with extra spaces custid otherlocid 79.00"""
rows = testdata.split('\n')
tupls = [row.split(' ') for row in rows]
dates = [tupl[0] for tupl in tupls]
custids, locids, amounts = ([tupl[i] for tupl in tupls] for i in range (-3, 0))
print(dates, custids, locids, amounts)
# ['8/5/15', '8/9/15'] ['custid', 'custid'] ['locid', 'otherlocid'] ['55.00', '79.00']

I just thought there might be a more elegant way to do things, maybe capturing the data in the middle as a single field.

Edit: I have attempted to add delimiters using re.finditer, but I can't replace the matches easily.

Question 2

After you have tupls:

data = [(t[0], t[-3], t[-2], t[-1] for t in tupls] # Or use range...
print(list(zip(*data))

Gives:

[('8/5/15', '8/9/15'), ('custid', 'custid'), ('locid', 'otherlocid'), ('55.00', '79.00')]

So this:

testdata = """8/5/15 stuffidontneed custid locid 55.00
8/9/15 stuff i really dont need with extra spaces custid otherlocid 79.00"""
rows = testdata.split('\n')
tupls = [row.split(' ') for row in rows]
dates = [tupl[0] for tupl in tupls]
custids, locids, amounts = ([tupl[i] for tupl in tupls] for i in range (-3, 0))
print(dates, custids, locids, amounts)

Becomes:

testdata = """8/5/15 stuffidontneed custid locid 55.00
8/9/15 stuff i really dont need with extra spaces custid otherlocid 79.00"""
rows = testdata.split('\n')
tupls = [row.split(' ') for row in rows]
data = [(t[0], t[-3], t[-2], t[-1] for t in tupls] # Or use range...
print(list(zip(*data))

Question 3

So basically your first line drops the irregular data so we have a matrix whose columns are the data we want. Then you split it into tuples with argument unpacking, then you zip the tuples together so you have the columns as iterators.

Question 4

Yup. That's what is going on.

Question 5

It's amazing how much more readable the code is just from using "t in tupls" instead of "tupl in tupls".

Question 6

Even though there are more succinct ways of writing this, depending on the size of the input that'll make the performance just worse, so there'll certainly be more efficient ways, considering that a lot of work is done repeatedly again even though it's not strictly necessary. Compare this StackOverflow post in particular for ways to iterate over the string input without having the intermediate list for it.

Apart from that I'd consider doing just one iteration and accumulating values that way. Instead of cutting of a character I'd consider adding an underscore instead to make it more readable without overriding e.g. the tuple predefined name.

Lastly you could still just look backwards from the end for the third to last space and just split that substring.

More verbose it could thus be:

testdata = """8/5/15 stuffidontneed custid locid 55.00
8/9/15 stuff i really dont need with extra spaces custid otherlocid 79.00"""
dates, custids, locids, amounts = [], [], [], []
for row in testdata.splitlines():
 tuples = row.split()
 dates.append(tuples[0])
 custids.append(tuples[-3])
 locids.append(tuples[-2])
 amounts.append(tuples[-1])
print(dates, custids, locids, amounts)
# ['8/5/15', '8/9/15'] ['custid', 'custid'] ['locid', 'otherlocid'] ['55.00', '79.00']

Question 7

split() and splitlines() are definitely more elegant than what I used.

Dair Dair 6,2001 gold badge21 silver badges45 bronze badges · Accepted Answer · 2016-09-04 00:30:12Z

After you have tupls:

data = [(t[0], t[-3], t[-2], t[-1] for t in tupls] # Or use range...
print(list(zip(*data))

Gives:

[('8/5/15', '8/9/15'), ('custid', 'custid'), ('locid', 'otherlocid'), ('55.00', '79.00')]

So this:

testdata = """8/5/15 stuffidontneed custid locid 55.00
8/9/15 stuff i really dont need with extra spaces custid otherlocid 79.00"""
rows = testdata.split('\n')
tupls = [row.split(' ') for row in rows]
dates = [tupl[0] for tupl in tupls]
custids, locids, amounts = ([tupl[i] for tupl in tupls] for i in range (-3, 0))
print(dates, custids, locids, amounts)

Becomes:

testdata = """8/5/15 stuffidontneed custid locid 55.00
8/9/15 stuff i really dont need with extra spaces custid otherlocid 79.00"""
rows = testdata.split('\n')
tupls = [row.split(' ') for row in rows]
data = [(t[0], t[-3], t[-2], t[-1] for t in tupls] # Or use range...
print(list(zip(*data))

So basically your first line drops the irregular data so we have a matrix whose columns are the data we want. Then you split it into tuples with argument unpacking, then you zip the tuples together so you have the columns as iterators.
It's amazing how much more readable the code is just from using "t in tupls" instead of "tupl in tupls".

Stack Exchange Network

Getting columns from lines of uneven length

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Getting columns from lines of uneven length

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions