Parsing a simple text file in python

Question 1

I've the following text file taken from a csv file. The file is two long to be shown properly here, so here's the line info:
The file has 5 lines:
The 1st one starts in ETIQUETAS
The 2nd one stars in RECURSOS
The 3rd one starts in DATOS CLIENTE Y PIEZA
The 4th one starts in Numero Referencia,
The 5th and last one starts in BRIDA Al.

ETIQUETAS:;;;;;;;;;START;;;;;;;;;;;;;;;;;;;;;END;; RECURSOS:;;;;;;;;;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;1;0;1;0;;Nota: 0 equivale a infinito, para decir que no existen recursos usar un numero negativo
DATOS CLIENTE Y PIEZA;;;;PLAZOS Y PROCESOS;;;;;;;;;;hoja de ruta;MU;;;;;;;;;;;;;;;;;
Numero Referencia;Descripcion Referencia;Nombre Cliente;Codigo Cliente;PLAZO DE ENTREGA;piezas;PROCESO;MATERIAL;stock;PROVEEDOR;tiempo ida pulidor;pzas dia;TPO;tiempo vuelta pulidor;TIEMPO RECEPCION;CONTROL CALIDAD DE ENTRADA;TIEMPO CONTROL CALIDAD DE ENTRADA;ALMACEN A (ANTES DE ENTRAR MAQUINA);GRANALLA;TPO;LIMPIADO;TPO;BRILLADO;TPO;;CARGA;MAQUINA;SOLTAR;control;EMPAQUETADO;ALMACENB;TIEMPO;
BRIDA Al;BRIDA Al;AEROGRAFICAS AHE, S.A.;394;;;niquelado;aluminio;;;;matriz;;;5min;NO;;3dias;;;;;;;;1;1;1;;1;4D;;

I want to do two things:

Count the between START and END of the first line, both inclusive and save it as TOTAL_NUMBERS. This means if I've START;;END has to count 3; the START itself, the blank space between the two ;; and the END itself. In the example of the test, START;;;;;;;;;;;;;;;;;;;;;END it has to count 22.

What I've tried so far:

f = open("lt.csv", 'r')
array = []
for line in f:
 if 'START' in line:
 for i in line.split(";"):
 array.append(i)
i = 0
while i < len(array):
 if i == 'START':
 # START COUNTING, I DONT KNOW HOW TO CONTINUE
 i = i + 1

2.Check the file, go until the word PROVEEDOR appears, and save that word and the following TOTAL_NUMBERS(in the example, 22) on an array. This means it has to save:

final array = ['PROVEEDOR', 'tiempo ida pulidor', 'pzas dia, 'TPO', 'tiempo vuelta pulidor', 'TIEMPO RECEPCION', 'CONTROL CALIDAD DE ENTRADA', 'TIEMPO CONTROL CALIDAD DE ENTRADA, 'ALMACEN A (ANTES DE ENTRAR MAQUINA)', 'GRANALLA', 'TPO', 'LIMPIADO', 'TPO','BRILLADO','TPO','','CARGA', 'MAQUINA', 'SOLTAR', 'control', 'EMPAQUETADO', 'ALMACENB']

Thanks in advance.

Question 2

The second question is also marked as 1, if somebody could fix this I would appreciate it.

Question 3

What happened to the rest of the line after ALMACENB;, is it discarded? Are you always going to stop at ALMACENB;?

Question 4

It starts always at PROVEEDOR and ends TOTAL_NUMBERS (in the example, 22) words after. You have that writen in question #2. Thanks in advance, Burhan.

Question 5

I am assuming the file is split into two lines; the first line with START and END and then a long line which needs to be parsed. This should work:

with open('somefile.txt') as f:
 first_row = next(f).strip().split(';')
 TOTAL_NUMBER = len(first_row[first_row.index('START'):first_row.index('END')+1])
 bits = ''.join(line.rstrip() for line in f).split(';')
 final_array = bits[bits.index('PROVEEDOR'):bits.index('PROVEEDOR')+TOTAL_NUMBER]

Question 6

Thanks Burhan. I've tried your solution and TOTAL_NUMBERS gives me 27 instead of 22. And the file has 5 lines. The 1st one starts in ETIQUETAS, the 2nd one stars in RECURSOS, the 3rd one starts in DATOS CLIENTE Y PIEZA, and the 4th one starts in Numero Referencia, and the 5th and last one starts in BRIDA Al.

Question 7

Thanks Burhan! It shows the 22 now. The thing is that I've also a problem with the bits = f.readlines().split(';') line. It shows me an error: ValueError: Mixing iteration and read methods would lose data. What's the problem with this? Thanks in advance.

Burhan Khalid 175k20 gold badges255 silver badges292 bronze badges · Accepted Answer · 2014-01-27 11:20:36Z

2

I am assuming the file is split into two lines; the first line with START and END and then a long line which needs to be parsed. This should work:

with open('somefile.txt') as f:
 first_row = next(f).strip().split(';')
 TOTAL_NUMBER = len(first_row[first_row.index('START'):first_row.index('END')+1])
 bits = ''.join(line.rstrip() for line in f).split(';')
 final_array = bits[bits.index('PROVEEDOR'):bits.index('PROVEEDOR')+TOTAL_NUMBER]

Share

Improve this answer

edited Jan 27, 2014 at 11:31

answered Jan 27, 2014 at 11:20

Burhan Khalid's user avatar

Burhan Khalid

175k20 gold badges255 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Avión

Avión Over a year ago

Thanks Burhan. I've tried your solution and TOTAL_NUMBERS gives me 27 instead of 22. And the file has 5 lines. The 1st one starts in ETIQUETAS, the 2nd one stars in RECURSOS, the 3rd one starts in DATOS CLIENTE Y PIEZA, and the 4th one starts in Numero Referencia, and the 5th and last one starts in BRIDA Al.

2014年01月27日T11:26:55.023Z+00:00

Avión

Avión Over a year ago

Thanks Burhan! It shows the 22 now. The thing is that I've also a problem with the bits = f.readlines().split(';') line. It shows me an error: ValueError: Mixing iteration and read methods would lose data. What's the problem with this? Thanks in advance.

2014年01月27日T11:33:03.303Z+00:00

CollectivesTM on Stack Overflow

Parsing a simple text file in python

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related