Clean way to get size of directory

Question 1

I'm working on a Unix machine where I can't use more than vanilla Perl, and I'm using Perl5.8. This script exits with a 1 if the current directory size is smaller than 1 GB (the character after -d is a literal "tab" character).

my $du = `du --si | tail -1 | cut -d" " -f1`;
chomp $du;
if (substr($du, -1) ne "G") {
 exit 1;
}
exit 0;

This is gross, but I know the data is in du --si so I can write it in 30 seconds. Is there a cleaner, more robust way?

Question 2

Why tail -1? Doesn't that just choose one of the subdirectories arbitrarily?

Question 3

I believe the last line of that output is the size of everything (it's the directory ., which is the current directory). The other lines are sizes of subdirectories.

Question 4

I agree with @rolfl that this would be much simpler as a one-line shell pipeline. The -s option to du makes it produce a total. awk is a good tool to use for processing multi-column text.

du -s --si | awk '1ドル ~ /G/ { exit 1 }'

However, the --si option seems to be a non-portable GNU extension. A more portable version would look at the number of 512-byte blocks. The magic number 1953125 is \$\dfrac{10^9}{512}\$.

du -s | awk '1ドル < 1953125 { exit 1 }'

The second version also works even if the total is in the terabyte or exabyte range.

There is an inefficiency, though: you should be able to exit early as soon as you find that the total exceeds 1 GB. For that, you would go back to Perl, but with a proper Perl program instead of a wrapper around du.

use File::Find;
use strict;
my $sum = 0;
my %seen_inodes;
find(sub {
 my ($inode, $blocks) = (stat)[1, 12] or die "${File::Find::name}: $!";
 # Do not double-count hard links
 if (!$seen_inodes{$inode}) {
 $seen_inodes{$inode} = 1;
 $sum += 512 * $blocks;
 exit 0 if $sum >= 1_000_000_000;
 }
}, ".");
exit 1;

Question 5

It is unusual on Code Review, to recommend a different approach, but this process can be simplified a whole bunch..... and avoid perl entirely.....

du -s -B 1 | grep -P -q '^\d{10,}+\s.*'

It breaks down as follows:

du -s -B 1

print a summary (no details for each file), with a byte-per-block size ... i.e. print the number of bytes in the current directory.

Then, using grep (and perl-compatible regex).... use quiet output, which returns 0 on a successful match, and 1 on no-match.

In other words, make sure the line starts with at least 10 digits.... i.e. >= 1,000,000,000 bytes.

Putting it together, the grep will be successful if the current directory is at least 1GB.

I tested this with:

du -s -B 1 | grep -P -q '^\d{10,}+\s.*' && echo "Bigger than 1G" || echo "less than 1G"

Edit:

This is compatible with your original code, which uses --si on du, which uses 1,000,000,000 bytes to represent GB. If you want to use GiB ( \2ドル^{31}\$ ) then it is actually substantially harder ....

Question 6

Calling du to calculate the full size is Ok, as it is not a trivial task. Everything else is better done on the Perl side. Simpler and cleaner.

my $du = `du -bs .`;
my $bytes = $du =~ /^(\d+)/ or die "du failed";
if ($bytes > 1e9) {
 print "directory is bigger than 1GB\n"
}

Question 7

Hi, and welcome to code review. du -s . does not count the number of bytes used ... but the number of kiloBytes. Consider adding the -B 1 option to du

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Accepted Answer · 2014-05-01 02:06:05Z

I agree with @rolfl that this would be much simpler as a one-line shell pipeline. The -s option to du makes it produce a total. awk is a good tool to use for processing multi-column text.

du -s --si | awk '1ドル ~ /G/ { exit 1 }'

However, the --si option seems to be a non-portable GNU extension. A more portable version would look at the number of 512-byte blocks. The magic number 1953125 is \$\dfrac{10^9}{512}\$.

du -s | awk '1ドル < 1953125 { exit 1 }'

The second version also works even if the total is in the terabyte or exabyte range.

There is an inefficiency, though: you should be able to exit early as soon as you find that the total exceeds 1 GB. For that, you would go back to Perl, but with a proper Perl program instead of a wrapper around du.

use File::Find;
use strict;
my $sum = 0;
my %seen_inodes;
find(sub {
 my ($inode, $blocks) = (stat)[1, 12] or die "${File::Find::name}: $!";
 # Do not double-count hard links
 if (!$seen_inodes{$inode}) {
 $seen_inodes{$inode} = 1;
 $sum += 512 * $blocks;
 exit 0 if $sum >= 1_000_000_000;
 }
}, ".");
exit 1;

Stack Exchange Network

Clean way to get size of directory

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Clean way to get size of directory

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions