I have written a simple code to extract different time steps from a big text file. It works for small text files, but not for bigger text files as input.
n=0
y=[]
with open ("DEPTH.dat", "r") as input:
for x in input:
x = x.strip()
if x.startswith('SCALAR'):
n += 1
with open("TS_"+ str(n) + ".dat", "a") as subfile:
subfile.write("{0}\n".format(x))
The input file is like the following:
SCALAR
ND 5
ST 0
TS 10.00
0.0022
0.0022
0.0022
0.0020
0.4881
SCALAR
ND 5
ST 0
TS 100.00
0.1
0.2
0.12
0.32
0.15
SCALAR
ND 5
ST 0
TS 200.00
0.34
0.25
1.1
1.0020
1.4381
In this example file, the number of nodes ND=5
, so it works very well. But when I have 1,000,000 nodes, it does not work. It does not deliver any result even after 20 days. So I know I have to write the program as a function and return the result and not to upload the whole data into memory. I have no I idea how to do that.
1 Answer 1
IO operations
Your code for each line of "DEPTH.dat" first open file in append mode, next write one line and finally close file. You may reduce open() and close() calls only to lines when 'SCALAR' line appears.
def split_less_io_operations(src_filename):
idx = 1
with open(src_filename, 'r') as inp:
outfile = open("TS_before_first_SCALAR.dat", 'w')
for line in inp:
if line.startswith('SCALAR'):
outfile.close()
outfile = open("TS_{}.dat".format(idx), 'w')
idx += 1
outfile.write(line)
outfile.close()
if __name__ == "__main__":
split_less_io_operations('DEPTH.dat')
Explore related questions
See similar questions with these tags.
csplit
utility. \$\endgroup\$