10
\$\begingroup\$

I attempted to recreate the GitHub code frequency graph (example) with a daily granularity using Perl and git log. How did I do, and what improvements can I make? I know that I should try to reduce the frequency of keys on the x-axis, but I have no idea how to do that and maintain scale.

Example of the output

It should be noted that the default application for open SVG files should be set to a browser of some sort.

#!/usr/bin/env perl
# Henry J Schmale
# November 4, 2015
#
# Creates an insertion and delation graph per day graph for a git repo. It
# outputs an svg of the graph on standard output.
#
# This script can take the name of a directory to produce the graph for
# that directory if no param is given, then it does it in the current
# directory. 
#
# Requires SVG::TT:Graph::Line
use strict;
use warnings;
# Get the path to the stylesheet first
use File::Spec;
use File::Basename;
my $graphsty = dirname(File::Spec->rel2abs(__FILE__)) . '/svg-graph-ss.css';
# CD into the directory specified if specified 
if(-e $ARGV[0] and -d $ARGV[0]){
 chdir $ARGV[0];
}
# Indexed by date
my %commits;
# get the git log and preprocess it
my $gitlogOutput = qx(git log --numstat --pretty="%H %aI" | grep -v '^\$');
my @lines = split /\n/, $gitlogOutput;
my $date;
my $hash;
foreach (@lines) {
 chomp;
 my @fields = split /\s+/;
 # Length of sha1sum
 if(length($fields[0]) > 39){
 $hash = $fields[0];
 $date = substr($fields[1], 0, 10);
 }else{
 $commits{$date}->{ins} += $fields[0];
 $commits{$date}->{del} += $fields[1];
 }
}
use DateTime;
use Date::Parse;
use Data::Dumper;
my $firstDate = getDateTime(((sort keys %commits)[0]));
my $lastDate = getDateTime(((sort keys %commits)[-1]));
# print "$firstDate\t".((sort keys %commits)[0])."\n";
# print "$lastDate\t".((sort keys %commits)[-1])."\n";
# print (scalar keys %commits)."\n";
while($firstDate->add(days => 1) < $lastDate){
 my $key = $firstDate->ymd('-');
 if(!defined $commits{$key}){
 $commits{$key}->{ins} = 0;
 $commits{$key}->{del} = 0;
 }
}
# print scalar keys %commits;
# print "\n";
# Prepare data for graphing by converting them to arrays
my (@key, @ins, @del, @net);
my $i = 0;
foreach (sort keys %commits){
 $key[$i] = $_;
 $ins[$i] = $commits{$_}{ins};
 # del must be negitive in order for graph to look right, with delations
 # being below the x axis.
 $del[$i] = -$commits{$_}{del};
 $net[$i] = $commits{$_}{ins} - $commits{$_}{del};
 # printf("%s,%s,%s,%s\n", $_, $ins[$i], $del[$i], $net[$i]);
 $i++;
}
# Get the max value in the data sets
my @allpoints;
my ($min, $max);
push @allpoints, @ins;
push @allpoints, @del;
push @allpoints, @net;
for(@allpoints){
 $min = $_ if !$min || $_ < $min;
 $max = $_ if !$max || $_ > $max;
}
# Graph it
use SVG::TT::Graph::Line;
my $graph = SVG::TT::Graph::Line->new({
 width => 1200,
 height => 800,
 fields => \@key,
 scale_integers => 1,
 rotate_x_labels => 1,
 show_data_values => 0,
 show_data_points => 0,
 min_scale_value => $min,
 max_scale_value => $max,
 style_sheet => $graphsty,
 });
# Add the data
$graph->add_data({
 'data' => \@ins,
 title => 'Inserts Per Day'
 });
$graph->add_data({
 'data' => \@del,
 title => 'Deletions Per Day'
 });
$graph->add_data({
 'data' => \@net,
 title => 'Net Insert/Del Per Day'
 });
# Print file 
my $filepath = '/tmp/gitGraph.svg';
open my $FD,'>',$filepath or die $!;
print $FD $graph->burn();
close $FD;
# Open it up in the browser or prefered method for opening the file
$filepath = 'file://'.$filepath;
qx(xdg-open $filepath);
sub getDateTime {
 my ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($_[0]);
 # print "\t$_[0]\n";
 return DateTime->new(
 day => $day,
 month => $month + 1,
 # We have to add 1900 here inorder to make the date format work
 year => ($year + 1900)
 );
}
toolic
14.5k5 gold badges29 silver badges203 bronze badges
asked Nov 16, 2015 at 1:16
\$\endgroup\$
3
  • \$\begingroup\$ Empty string for every odd element on X axis? \$\endgroup\$ Commented Nov 16, 2015 at 6:27
  • \$\begingroup\$ @mpapec It only happens on the first run for a repository, or if you have dozens of branches. \$\endgroup\$ Commented Nov 16, 2015 at 15:11
  • 3
    \$\begingroup\$ Just a style thing, but the use SVG::TT::Graph::Line; should be at the top. So should the other use statements that are in the middle. They will all be loaded at compile time anyway. \$\endgroup\$ Commented Nov 26, 2015 at 16:18

2 Answers 2

5
\$\begingroup\$

A simple thing to add would be a Logarithmic Y-scaling function, so that significant bursts in either losses or gains don't get over-exaggerated perceptually. Essentially you want 100-commit days to only take up at most, 10x as much space as 1 commit days, so "low activity" wavering is still visible.

I'd also suggest breaking windows of about 7 days into single data points, by aggregating them and representing them as 5 lines each:

  • Max number of commits per day in 7-day-range,
  • Upper Quartile number of commits per day in 7-day-range
  • Median number of commits per day ...
  • Lower Quartile ....
  • Min...

And then render it with some shades like this ( half-assed mockup )

enter image description here

With this, you could afford to narrow the sample window to "per hour" say, but still represent the data accumulated in terms of weeks, so you'd have "max commits/hour this week", "min commits/hour this week", etc, all represented in a single vertical column.

answered Apr 29, 2016 at 11:18
\$\endgroup\$
2
  • 1
    \$\begingroup\$ IMO a CR post almost always should include some code (or at least a reference to original code) otherwise it's more about UX (which is good and desiderable but maybe off-topic) \$\endgroup\$ Commented Apr 29, 2016 at 11:44
  • 1
    \$\begingroup\$ I mostly agree. But here it's not entirely obvious what the author is asking for, seems like they're wanting both. The code ... is ... well, more scary than I can begin to start on :/ \$\endgroup\$ Commented Apr 29, 2016 at 12:21
4
\$\begingroup\$

Overview

It is great that you:

  • Used strict and warnings
  • Leveraged other people's code by using the CPAN modules
  • Used comments at the top of the file to describe the code

Here are some adjustments for you to consider, mainly for coding style.

Warning

I get a warning if I don't specify a directory on the command line. The -e file test isn't doing what you expect. It does not check if the argument was passed in. This is one way to do it:

# If a directory is specified on the command line, change into it
chdir $ARGV[0] if $ARGV[0] and -d $ARGV[0];

Modules

It is customary to place all the use statements together at the top of the file:

Since this is not used, just delete the line:

use Data::Dumper;

Namespace

It is best to import only what is needed to avoid namespace pollution. For example, change:

use File::Basename;

to:

use File::Basename qw(dirname);

These are all the use statements:

use strict;
use warnings;
use File::Spec qw();
use File::Basename qw(dirname);
use DateTime qw();
use Date::Parse qw(strptime);
use SVG::TT::Graph::Line qw();

Provide more debugging information to the user (the file name) if open fails. Change:

open my $FD,'>',$filepath or die $!;

to:

open my $FD, '>', $filepath or die "$filepath $!";

There is no need for the following scalar variable and for split:

my $gitlogOutput = qx(git log --numstat --pretty="%H %aI" | grep -v '^\$');
my @lines = split /\n/, $gitlogOutput;

Simplified as:

my @lines = qx(git log --numstat --pretty="%H %aI" | grep -v '^\$');

The $hash variable is declared and set, but it is never read. It can be deleted.


Since you don't use the @allpoints array anywhere else, you can get rid of the variable and simplify all these lines:

my @allpoints;
push @allpoints, @ins;
push @allpoints, @del;
push @allpoints, @net;
for(@allpoints){

as:

for (@ins, @del, @net) {

Since the "quote executable" operator interpolates variables, you can simplify these 2 lines:

$filepath = 'file://'.$filepath;
qx(xdg-open $filepath);

as:

qx(xdg-open file://$filepath);

There is no need for the single quotes around the data key:

$graph->add_data({
 'data' => \@ins,
 title => 'Inserts Per Day'

It would be more consistent to omit them as you did for the title key:

$graph->add_data({
 data => \@ins,
 title => 'Inserts Per Day'

Since foreach is identical to for, and you already use for elsewhere in the code, you should just use for everywhere.

answered Mar 26, 2024 at 20:24
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.