0

I'm having a variable which holds the contents that is somewhat similar to this

**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;5
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****

I want to read data that starts with Main_data1.{ Read only the last column and store it into a list} . Please note that this is a variable that holds this data and this is not a file.

My Desired Output:

Some_list=[1,2,3,4,5]

I thought of using something like this.

for line in var_a.splitlines():
 if Main_data1 in line:
 print (line)

But there are more than 200 lines from which I need to read the last column. What could be an efficient way of doing this

asked Nov 8, 2015 at 8:16
2
  • Side note: 200 lines is practically nothing. Commented Nov 8, 2015 at 8:19
  • Have you got solution yet? Commented Nov 8, 2015 at 10:41

4 Answers 4

1

Check if line starts with "Main_data" than split by semi-colon ; and choose the last element by index -1:

some_list = []
for line in var_a.split("\n"):
 if line.startswith("Main_data"):
 some_list.append(int(line.split(";")[-1]))
answered Nov 8, 2015 at 8:18
Sign up to request clarification or add additional context in comments.

Comments

1

You can use a list comprehension to store the numbers :

my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if line.startswith('Main_data5')]

Also note that as a more pyhtonic way you better to use str.startswith() method rather than in operator. (with regards to this poing that it might happen to one line has Main_data5 in the middle of the line!)

If you have two case for start of the line you can use an or operator with two startswith consition.

my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if line.startswith('Main_data5') or line.startswith('Main_data1')]

But if you have more key-words you can use regex.For example if you want to match all the linse that stats with Main_data and followed by a number you can use re.match():

import re
my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if re.match(r'Main_data\d.*',line)]
answered Nov 8, 2015 at 8:22

2 Comments

This is not a file. Change to variable.
Hi, thanks a lot! Is there a way that i can specify like start reading from line which has Main_data1 and end reading where line has Main_data5?
0
 my_list = []
 for line in my_var.strip().split('\n):
 if "Main_data1" in line:
 my_list.append(int(line.split(";")[-1]))
 else:
 continue

Or you can use the startswith('match)' function like someone mentioned.

answered Nov 8, 2015 at 8:32

Comments

0

My approach is regex since it can control over pattern more-

File content

**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ******** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ******** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****

Code

import re
fl = open(r"C:\text.txt",'rb')
pattern = r'Main_data.*(?<=;)([0-9]{1,})'
data = []
for line in fl.readlines():
 #match all the digits that have ; before and line starts with Main_data
 if re.search(pattern, line, re.IGNORECASE | re.MULTILINE):
 data.append(re.search(pattern, line, re.IGNORECASE | re.MULTILINE).group(1))
 else:
 data.append('N')
strng = ','.join(data)#get string of the list
lsts = re.findall(r'(?<=,)[0-9,]+(?=,)',strng)# extracts values and excludes 'N'
outpt = [i.split(',') for i in lsts]# generate final list
print outpt

Output

[['1', '2', '3', '4', '523233'], ['1', '2', '3', '4', '523233'], ['1', '2', '3', '4', '523233']]
answered Nov 8, 2015 at 10:33

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.