Read lines from a file in chunks

Question 1

I created a function to read lines from an file into chunks. My hidden agenda in creation this script was in python the yield function in interaction with chunks. The script works fine, but now i want to know if anyone has improvements?

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub read_in_chunks {
 my $args = shift;
 my $self = {
 fd => $args->{fd} || undef,
 chunk_size => $args->{chunk_size} || 10,
 chunks => [],
 };
 my $fh = $self->{fd};
 return unless defined(my $line=<$fh>);
 while(<$fh>){
 chomp($_);
 # maybe the following line could be written nicer :)
 ($self->{chunk_size} == 0) ? return $self->{chunks} : (push @{$self->{chunks}}, $_);
 $self->{chunk_size}--;
 }
 return $self->{chunks};
}
open my $fh, 'dump.txt' or die $!;
my $opts = {
 fd => $fh,
 chunk_size => 4
};
while(my $chunk = read_in_chunks($opts)) {
 print Dumper($chunk);
 # process data
}
close $fh;

Question 2

What happens to the line read by $line=<$fh>?

Question 3

it returns if $line=<$fh> is not defined

Question 4

Yes, but if it's defined, it gets lost.

Question 5

@choroba: maybe you have an Idea how i can improve my code snippet?

Question 6

Here's my take. I removed the $self structure, as it gives you no advantage, and fixed the problem with missing lines.

#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub read_in_chunks {
 my %args = @_;
 my $fh = $args{fh} or die "No filehandle given.\n";
 my $size = $args{chunk_size} || 10;
 my @chunk;
 while (@chunk < $size && defined(my $line = <$fh>)) {
 chomp $line;
 push @chunk, $line;
 }
 return @chunk
}
open my $fh, shift or die $!;
my %opts = (
 fh => $fh,
 chunk_size => 4
);
while (my @chunk = read_in_chunks(%opts)) {
 print Dumper(\@chunk);
 # ...
}

Question 7

Thanks to You @choroba . Do. You think Reading big Files is better and faster when Reading in Chucky? I Know this answer can only be answered if you make an Time measure.

Question 8

@Patrick85: Definitely not this way. You can try read or sysread or reading with $/ set to a reference to a number.

Question 9

Untested,

use strict;
use warnings;
use Data::Dumper;
sub read_in_chunks {
 my ($args) = @_;
 my $chunk_size = $args->{chunk_size};
 my $fh = $args->{fd} or die "no filehandle";
 return sub {
 $chunk_size ||= 10;
 my $chunks = [];
 # my $line = <$fh> // return;
 while (my $line = <$fh>) {
 chomp($line);
 last if !$chunk_size;
 push @$chunks, $line;
 $chunk_size--;
 }
 return @$chunks ? $chunks : undef;
 };
}
open my $fh, '<', 'dump.txt' or die $!;
my $opts = {
 fd => $fh,
 chunk_size => 4,
};
my $iter = read_in_chunks($opts);
while (my $chunk = $iter->()) {
 print Dumper($chunk);
 # process data
}

choroba choroba 1,4139 silver badges11 bronze badges · Accepted Answer · 2017-05-16 20:09:14Z

Here's my take. I removed the $self structure, as it gives you no advantage, and fixed the problem with missing lines.

#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub read_in_chunks {
 my %args = @_;
 my $fh = $args{fh} or die "No filehandle given.\n";
 my $size = $args{chunk_size} || 10;
 my @chunk;
 while (@chunk < $size && defined(my $line = <$fh>)) {
 chomp $line;
 push @chunk, $line;
 }
 return @chunk
}
open my $fh, shift or die $!;
my %opts = (
 fh => $fh,
 chunk_size => 4
);
while (my @chunk = read_in_chunks(%opts)) {
 print Dumper(\@chunk);
 # ...
}

Thanks to You @choroba . Do. You think Reading big Files is better and faster when Reading in Chucky? I Know this answer can only be answered if you make an Time measure.
@Patrick85: Definitely not this way. You can try read or sysread or reading with $/ set to a reference to a number.

Stack Exchange Network

Read lines from a file in chunks

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Read lines from a file in chunks

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions