2
\$\begingroup\$

I have written a simple code to extract different time steps from a big text file. It works for small text files, but not for bigger text files as input.

n=0
y=[]
with open ("DEPTH.dat", "r") as input:
 for x in input:
 x = x.strip()
 if x.startswith('SCALAR'):
 n += 1
 with open("TS_"+ str(n) + ".dat", "a") as subfile:
 subfile.write("{0}\n".format(x))

The input file is like the following:

SCALAR
ND 5
ST 0
TS 10.00
 0.0022
 0.0022
 0.0022
 0.0020
 0.4881
SCALAR
ND 5
ST 0
TS 100.00
 0.1
 0.2
 0.12
 0.32
 0.15
SCALAR
ND 5
ST 0
TS 200.00
 0.34
 0.25
 1.1
 1.0020
 1.4381

In this example file, the number of nodes ND=5 , so it works very well. But when I have 1,000,000 nodes, it does not work. It does not deliver any result even after 20 days. So I know I have to write the program as a function and return the result and not to upload the whole data into memory. I have no I idea how to do that.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Nov 17, 2017 at 15:26
\$\endgroup\$
3
  • 1
    \$\begingroup\$ While there could be more efficient solutions, I'd recommend csplit utility. \$\endgroup\$ Commented Nov 17, 2017 at 17:51
  • \$\begingroup\$ Do I understand correctly that tou are trying to split one large file into smaller files with a sequential filename? \$\endgroup\$ Commented Nov 18, 2017 at 8:59
  • \$\begingroup\$ I want to be able to split it in any part of the Depth.dat file. which means I have sometimes 50 GB Depth.dat file and I need time steps 25 to 28 and 501 to 505 when the total numebr of time steps is 1000. \$\endgroup\$ Commented Nov 20, 2017 at 7:15

1 Answer 1

3
\$\begingroup\$

IO operations

Your code for each line of "DEPTH.dat" first open file in append mode, next write one line and finally close file. You may reduce open() and close() calls only to lines when 'SCALAR' line appears.

def split_less_io_operations(src_filename):
 idx = 1
 with open(src_filename, 'r') as inp:
 outfile = open("TS_before_first_SCALAR.dat", 'w')
 for line in inp:
 if line.startswith('SCALAR'):
 outfile.close()
 outfile = open("TS_{}.dat".format(idx), 'w')
 idx += 1
 outfile.write(line)
 outfile.close()
if __name__ == "__main__":
 split_less_io_operations('DEPTH.dat')
answered Nov 18, 2017 at 12:34
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.