1
\$\begingroup\$

My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.

#reading in workload data
def getworkload():
 work = []
 strings = []
 with open("workload.txt") as f:
 read_data = f.read()
 jobs = read_data.split("\n")
 for j in jobs:
 strings.append(" ".join(j.split()))
 for i in strings:
 work.append([float(s) for s in i.split(" ")])
 return work
print(getworkload())

The text file is over 2000 lines long, and looks like this:

 1 0 1835117 330855 640 5886 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
 2 0 2265800 251924 640 3124 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
 3 1 3114175 -1 640 -1 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
 4 1813487 7481 -1 128 -1 20250 -1 -1 -1 5 3 1 5 8 -1 -1 -1
 5 1814044 0 122 512 1.13 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1
 6 1814374 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
 7 1814511 0 55 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
 8 1814695 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
 9 1815198 0 75 512 2.14 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
 10 1815617 0 115 512 1.87 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1
 ...

It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Nov 12, 2018 at 10:59
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site) \$\endgroup\$ Commented Nov 12, 2018 at 11:07
  • \$\begingroup\$ The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up. \$\endgroup\$ Commented Nov 12, 2018 at 11:11
  • 1
    \$\begingroup\$ "If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt. \$\endgroup\$ Commented Nov 12, 2018 at 11:26

1 Answer 1

1
\$\begingroup\$

You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?

Instead, here is a list comprehension that should do the same thing:

def get_workload(file_name="workload.txt"):
 with open(file_name) as f:
 return [[float(x) for x in row.split()] for row in f]

This uses the fact that files are iterable and when iterating over them you get each row on its own.

If this is still too slow (or e.g. too large to fit into memory), then you need to process each row separately. For this you would make this a generator of processed lines:

def get_workload(file_name="workload.txt"):
 with open(file_name) as f:
 for row in f:
 yield [float(x) for x in row.split()]
answered Nov 12, 2018 at 15:51
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.