bash while loop that breaks at a given file size

Question 1

So, I bought this book called Primes and Programming, and it's pretty tough going. Today I wrote this (simple) program from chapter 1:

#!/usr/bin/env python
import math
def find_gcd(a,b):
 while b > 0:
 r = a - b * math.floor(a/b)
 a = b
 b = r
 return int(a)
if __name__ == "__main__":
 import random, sys
 while True:
 print find_gcd(random.randrange(int(sys.argv[1])), random.randrange(int(sys.argv[2])))

...and just now I called it like so:

./gcd-rand.py 10000 10000 > concievablyreallyhugefile

...and now I'm dreaming of a bash one-liner that breaks when concievablyreallyhugefile has reached a certain size. I guess it would look something like:

while $(du -h f) < 32M; do ./gcd-rand.py 10000 10000 > $f; done

...but I have never written a while loop in bash before and I don't really know how the syntax works.

Question 2

To answer your question you don't need to buy a book, actually — just issue man bash.

Question 3

I presume the exercise you're working through asks you to roll your own, but for the sake of reference, the fractions.gcd method is useful.

Question 4

The trick is to use the test command test or the equivalent [ ... ]:

 while [ "$(du -m f|cut -f1)" -lt 32 ]
 do 
 ./gcd-rand.py 10000 10000 > "$f"
 done

See help test for more information.

Note

test or [ command is a bash builtin. The help information can be retrieved inside bash via help test or help [. man test refers to the test command that is used if a shell has no such builtin or is invoked explicitly as /usr/bin/test.

Question 5

man test you meant(?)

Question 6

No - see the added Note in the answer.

Question 7

Huh. help is a Bash-builtin but not zsh's, for e. g.

Question 8

./gcd-rand.py 10000 10000 | head -c 32M > concievablyreallyhugefile

head will stop reading after 32MB. Soon after head stops reading, gcd-rand.py will receive a SIGPIPE signal and exit.

To avoid storing a truncated last line, as Michael Kjörling noticed:

./gcd-rand.py 10000 10000 | head -c 32M | sed '$d' > concievablyreallyhugefile

Question 9

This. Piping is The Unix Way (tm), and it will give you exactly as much data as you want. Of course, it might break the resulting file in the middle of a number, so in the general case the last line of the file will be meaningless. If you want to guard against that, it'd probably be better to implement output size limiting in the script itself (look up len(), and remember to account for the newline).

Question 10

@MichaelKjörling Good point about the last truncated line. Again piping saves the day.

Question 11

Your python code loops forever. Thus, you might want to run it in the background and then kill it when the file size is exceeded. As one-liner:

{ ./gcd-rand.py 10000 10000 > f & }; p=$!; while (( $(stat -c %s f) < 33554432 )); do sleep .1; done; kill $p

Note: choose sleep time as appropriate, instead of stat you can also use du, as suggested by Dirk.

Question 12

This is good, but you should use wc -c instead of stat, which will allow it to work outside of Linux.

Question 13

You can use the ulimit command to restrict how large a file the shell (or its children) can create:

ulimit -f 32768

Question 14

I think this qualifies as an excellent example of what Raymond Chen calls "using global state to manage a local problem".

Question 15

Well, it's limited to the current shell, so (ulimit -f 32768; cmd) is a possibility.

H.-Dirk Schmitt H.-Dirk Schmitt 2,04913 silver badges13 bronze badges · Answer 1 · 2013-03-07 11:39:58Z

3

The trick is to use the test command test or the equivalent [ ... ]:

 while [ "$(du -m f|cut -f1)" -lt 32 ]
 do 
 ./gcd-rand.py 10000 10000 > "$f"
 done

See help test for more information.

Note

test or [ command is a bash builtin. The help information can be retrieved inside bash via help test or help [. man test refers to the test command that is used if a shell has no such builtin or is invoked explicitly as /usr/bin/test.

Share

Improve this answer

edited Mar 7, 2013 at 12:05

Stéphane Chazelas's user avatar

Stéphane Chazelas

583k96 gold badges1.1k silver badges1.7k bronze badges

answered Mar 7, 2013 at 11:39

H.-Dirk Schmitt's user avatar

H.-Dirk Schmitt H.-Dirk Schmitt

2,04913 silver badges13 bronze badges

3

man test you meant(?)

poige
– poige

2013年03月07日 11:42:30 +00:00
Commented Mar 7, 2013 at 11:42
No - see the added Note in the answer.

H.-Dirk Schmitt
– H.-Dirk Schmitt

2013年03月07日 11:47:45 +00:00
Commented Mar 7, 2013 at 11:47
Huh. help is a Bash-builtin but not zsh's, for e. g.

poige
– poige

2013年03月07日 12:03:46 +00:00
Commented Mar 7, 2013 at 12:03

Add a comment |

score 3 · Answer 2 · 2013-03-08 00:52:27Z

3

./gcd-rand.py 10000 10000 | head -c 32M > concievablyreallyhugefile

head will stop reading after 32MB. Soon after head stops reading, gcd-rand.py will receive a SIGPIPE signal and exit.

To avoid storing a truncated last line, as Michael Kjörling noticed:

./gcd-rand.py 10000 10000 | head -c 32M | sed '$d' > concievablyreallyhugefile

Share

Improve this answer

edited Apr 13, 2017 at 12:36

Community's user avatar

Community Bot

1

answered Mar 8, 2013 at 0:52

Gilles 'SO- stop being evil''s user avatar

Gilles 'SO- stop being evil' Gilles 'SO- stop being evil'

864k204 gold badges1.8k silver badges2.3k bronze badges

2

This. Piping is The Unix Way (tm), and it will give you exactly as much data as you want. Of course, it might break the resulting file in the middle of a number, so in the general case the last line of the file will be meaningless. If you want to guard against that, it'd probably be better to implement output size limiting in the script itself (look up len(), and remember to account for the newline).

user
– user

2013年03月09日 23:21:23 +00:00
Commented Mar 9, 2013 at 23:21
@MichaelKjörling Good point about the last truncated line. Again piping saves the day.

Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil'

2013年03月10日 16:34:06 +00:00
Commented Mar 10, 2013 at 16:34

Add a comment |

Jurij Jurij 912 bronze badges · Answer 3 · 2013-03-07 13:33:05Z

Your python code loops forever. Thus, you might want to run it in the background and then kill it when the file size is exceeded. As one-liner:

{ ./gcd-rand.py 10000 10000 > f & }; p=$!; while (( $(stat -c %s f) < 33554432 )); do sleep .1; done; kill $p

Note: choose sleep time as appropriate, instead of stat you can also use du, as suggested by Dirk.

This is good, but you should use wc -c instead of stat, which will allow it to work outside of Linux.

chepner chepner 7,7961 gold badge29 silver badges28 bronze badges · Answer 4 · 2013-03-09 22:43:21Z

0

You can use the ulimit command to restrict how large a file the shell (or its children) can create:

ulimit -f 32768

Share

Improve this answer

answered Mar 9, 2013 at 22:43

chepner's user avatar

chepner chepner

7,7961 gold badge29 silver badges28 bronze badges

2

I think this qualifies as an excellent example of what Raymond Chen calls "using global state to manage a local problem".

user
– user

2013年03月11日 08:19:56 +00:00
Commented Mar 11, 2013 at 8:19
Well, it's limited to the current shell, so (ulimit -f 32768; cmd) is a possibility.

chepner
– chepner

2013年03月11日 12:33:35 +00:00
Commented Mar 11, 2013 at 12:33

Add a comment |

Stack Exchange Network

bash while loop that breaks at a given file size

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

bash while loop that breaks at a given file size

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions