Split a data file into files for each time step

Asked 7 years, 10 months ago

Viewed 574 times

\$\begingroup\$

I have written a simple code to extract different time steps from a big text file. It works for small text files, but not for bigger text files as input.

n=0
y=[]
with open ("DEPTH.dat", "r") as input:
 for x in input:
 x = x.strip()
 if x.startswith('SCALAR'):
 n += 1
 with open("TS_"+ str(n) + ".dat", "a") as subfile:
 subfile.write("{0}\n".format(x))

The input file is like the following:

SCALAR
ND 5
ST 0
TS 10.00
 0.0022
 0.0022
 0.0022
 0.0020
 0.4881
SCALAR
ND 5
ST 0
TS 100.00
 0.1
 0.2
 0.12
 0.32
 0.15
SCALAR
ND 5
ST 0
TS 200.00
 0.34
 0.25
 1.1
 1.0020
 1.4381

In this example file, the number of nodes ND=5 , so it works very well. But when I have 1,000,000 nodes, it does not work. It does not deliver any result even after 20 days. So I know I have to write the program as a function and return the result and not to upload the whole data into memory. I have no I idea how to do that.

edited Nov 18, 2017 at 13:15

200_success's user avatar

200_success

145k22 gold badges190 silver badges478 bronze badges

asked Nov 17, 2017 at 15:26

Mohamad Reza Salehi Sadaghiani's user avatar

Mohamad Reza Salehi Sadaghiani Mohamad Reza Salehi Sadaghiani

3452 silver badges7 bronze badges

\$\endgroup\$

1

\$\begingroup\$ While there could be more efficient solutions, I'd recommend csplit utility. \$\endgroup\$

hjpotter92
– hjpotter92

2017年11月17日 17:51:44 +00:00
Commented Nov 17, 2017 at 17:51
\$\begingroup\$ Do I understand correctly that tou are trying to split one large file into smaller files with a sequential filename? \$\endgroup\$

agtoever
– agtoever

2017年11月18日 08:59:35 +00:00
Commented Nov 18, 2017 at 8:59
\$\begingroup\$ I want to be able to split it in any part of the Depth.dat file. which means I have sometimes 50 GB Depth.dat file and I need time steps 25 to 28 and 501 to 505 when the total numebr of time steps is 1000. \$\endgroup\$

Mohamad Reza Salehi Sadaghiani
– Mohamad Reza Salehi Sadaghiani

2017年11月20日 07:15:06 +00:00
Commented Nov 20, 2017 at 7:15

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

IO operations

Your code for each line of "DEPTH.dat" first open file in append mode, next write one line and finally close file. You may reduce open() and close() calls only to lines when 'SCALAR' line appears.

def split_less_io_operations(src_filename):
 idx = 1
 with open(src_filename, 'r') as inp:
 outfile = open("TS_before_first_SCALAR.dat", 'w')
 for line in inp:
 if line.startswith('SCALAR'):
 outfile.close()
 outfile = open("TS_{}.dat".format(idx), 'w')
 idx += 1
 outfile.write(line)
 outfile.close()
if __name__ == "__main__":
 split_less_io_operations('DEPTH.dat')

answered Nov 18, 2017 at 12:34

vaeta's user avatar

vaeta vaeta

8865 silver badges8 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Split a data file into files for each time step

1 Answer 1

IO operations

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Split a data file into files for each time step

1 Answer 1

IO operations

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions