7
\$\begingroup\$

I had a file that looked like this:

Mar 06 22:00:00 [10.251.132.246] logger: 10.64.69.219 - - [06/Mar/2011:22:.....
Mar 06 22:00:00 [10.251.132.246] logger: 10.98.137.116 - - [06/Mar/2011:22:0....

that I wanted to split into smaller files using the ip address after "logger"

This is what I came up with:

file = ARGV.shift
split_file = {}
pattern = /logger: ([^\s]*)/
File.open(file, 'r') do |f|
 f.each do |l|
 match = l[pattern]
 if match
 list = split_file[1ドル]
 list = [] if list == nil
 list << l
 split_file[1ドル] = list
 end
 end
end
split_file.each_pair do |k, v|
 File.open("#{file}.#{k}", "a+") do |f|
 v.each do |l|
 f.print l
 end
 end
end

Suggestions, warnings, improvements are very welcome :)

One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).

Thank you

[edit: typo]

asked Mar 8, 2011 at 7:18
\$\endgroup\$
6
  • 2
    \$\begingroup\$ Do you mind me asking why you'd want to split log files this way? It looks like you want to have a log per IP instead of one giant log. Is this for documentation purposes? Do you just want a way of checking all access for a particular IP? \$\endgroup\$ Commented Mar 8, 2011 at 14:08
  • \$\begingroup\$ I needed to split the log per ip for further analysis and thought ruby would be the easiest way to do it.. for me at least :) \$\endgroup\$ Commented Mar 8, 2011 at 23:30
  • 1
    \$\begingroup\$ @Dale - Ok, yes I got that, but I meant, "what kind of analysis?". If you're just looking to check up on a specific IP, for example, cat log.log | grep 10.64.69.210 is a better approach than a splitting script (if you want a count of how many times they visited, pipe the output of that through wc -l). If you just want a list of unique IPs that visited, then awk '{print 6ドル}' log.log | sort -u might be enough for you (again, pipe wc to taste). I'm asking to see if a ruby script (which you will now need to maintain) is actually the best solution for you. \$\endgroup\$ Commented Mar 9, 2011 at 0:16
  • \$\begingroup\$ Revised result (heavily cut down) : bitbucket.org/dwijnand/logsplitter/src/999b61673f65/… \$\endgroup\$ Commented Mar 9, 2011 at 0:54
  • \$\begingroup\$ @Inaimathi sorry didn't see your comment. It was to see try and follow what was going on for specific ips. Yes I could have hand-picked a few and grep them into individual files, but this was simple enough (and a nice excercise) to warrant a ruby script :) \$\endgroup\$ Commented Mar 9, 2011 at 7:04

1 Answer 1

4
\$\begingroup\$

First of all it is a pretty wide-spread convention in ruby to use 2 spaces for indendation not 4. Personally I don't care, but there are some ruby developers who will complain when seeing code indented with 4 spaces, so you'll have an easier time just going with the stream.


file = ARGV.shift

Unless there is a good reason to mutate ARGV (which in this case doesn't seem to be the case), I'd recommend not using mutating operations. file = ARGV[0] will work perfectly fine here.


match = l[pattern]
if match
 list = split_file[1ドル]
 list = [] if list == nil
 list << l
 split_file[1ドル] = list
end

First of all you should avoid using magic variables. Using MatchData objects is more robust than using magic variables. As an example consider this scenario:

Assume that you decide you want to do some processing on the line before storing it in split_file. For this you decide to use gsub. Now your code looks like this:

match = l[pattern]
if match
 list = split_file[1ドル]
 list = [] if list == nil
 list << l.gsub( /some_regex/, "some replacement")
 split_file[1ドル] = list
end

However this code is broken. Since gsub also sets 1ドル, 1ドル now no longer contains what you think it does and split_file[1ドル] will not work as expected. This kind of bug can't happen if you use [1] on a match data object instead.

Further the whole code can be simplified by using a very useful feature of ruby hashes: default blocks. Hashes in ruby allow you to specify a block which is executed when a key is not found. This way you can create hash of arrays which you can just append to without having to make sure the array exists.

For this you need to change the initialization of split_file from split_file = {} to split_file = Hash.new {|h,k| h[k] = [] }. Then you can replace the above code with:

match = l.match(pattern)
if match
 split_file[ match[1] ] << l
end

One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).

If you want to avoid that use File.basename to extract only the name of the file without the directory from the given path and then build the path of the file to be created from that. I.e.:

File.open("#{ File.basename(file) }.#{k}", "a+") do |f|

Speaking of this line: I don't see why you use "a+" instead of just "a" as the opening mode - you never read from it.

answered Mar 8, 2011 at 15:15
\$\endgroup\$
3
  • \$\begingroup\$ Thanks sepp2k, great great tips! I thought I remembered something about default blocks in hashes.. there we go :) Thanks again, accepting. \$\endgroup\$ Commented Mar 8, 2011 at 23:33
  • \$\begingroup\$ Oh, one mistake however. Using the square brackets that way does something different. The following: match = l[pattern] split_file[match[1]] << l if match must be replaced with: match = l[pattern,1] split_file[match] << l if match Or using Regexp.match(str) instead... \$\endgroup\$ Commented Mar 8, 2011 at 23:54
  • \$\begingroup\$ @Dale: True. I somehow did not notice that you were using [] and not match. \$\endgroup\$ Commented Mar 9, 2011 at 12:28

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.