1
\$\begingroup\$

I have a bit of code that I use to update my blog. Basically, I send an email to a particular address, and every x minutes the daemon fetches the emails, creates the files, executes middleman and then synchronizes it all to S3.

The thing is that I'm using lots of different apps to do it all, be it system commands, Python scripts (s3cmd), Ruby apps with node server (middleman), etc.

The code works (well, most of the time) but is an eye-sore, and I have a hard time living with myself when I know this code exists. I've posted it below, and it's also available as a commented gist, as it is too long to put here. I've tried to comment on all the quirks and requirements I had in mind when writing this.

I'm not experienced in Ruby (as you can probably tell). Can someone with a little spare time help me refactor this code to a saner level?

#!/usr/local/bin/ruby
# -*- encoding: utf-8 -*-
require "rubygems"
require "mail"
require "fileutils"
require "daemons"
# I know I probably shouldn't use constants like that...
DEBUG = true
DELETE = true
# And most definitely not global variables
$log = "/Users/johndoe/log.txt"
$path = "/Users/johndoe/source/data"
$needs_sync = "/Users/johndoe/sync"
$needs_generate = "/Users/johndoe/generate"
FileUtils.touch($log)
# I don't see anything wrong with the following lines.
file_size = (File.size($log).to_f / 2**20)
if file_size > 0.1
 File.write(log, "")
end
Mail.defaults do
 retriever_method :imap,
 :address => "imap.mail.com",
 :port => 993,
 :user_name => "[email protected]",
 :password => "ph4tpassword",
 :enable_ssl => true
end
# OK, here it starts to get hairy. First, global variables, again...
# Also, no class, no sane logic, just plain "shell scripting" thinking translated into insanely coded Ruby
def set_dates
 date = Time.now.utc
 datetmz = Time.now
 $year = date.strftime("%Y").to_s
 $month = date.strftime("%02m").to_s
 $day = date.strftime("%02d").to_s
 $time = date.strftime("%02T").to_s
 timetmz = datetmz.strftime("%02T").to_s
 $now = "#{$year}/#{$month}/#{$day} #{timetmz}"
end
# This seems about right
def slugalize(text, separator = "-")
 re_separator = Regexp.escape(separator)
 result = text.to_s
 result.gsub!(/[^\x00-\x7F]+/, "")
 result.gsub!(/[^a-z0-9\-_\+]+/i, separator)
 result.gsub!(/\./, "")
 result.gsub!(/#{re_separator}{2,}/, separator)
 result.gsub!(/^#{re_separator}|#{re_separator}$/, "")
 result.downcase!
 return result
end
# This doesn't seem good. My requirement is to reverse fill the log (latest on top)
def log(alert)
 if DEBUG
 set_dates
 FileUtils.touch($log)
 File.open($log, "r") do |orig|
 File.unlink($log)
 File.open($log, "w") do |new|
 new.write "#{$now[5..-1]} - #{alert}\n"
 new.write(orig.read())
 end
 end
 end
end
# OK. I had lots of trouble with these two. Middleman is a long process,
# and without doing some kind of process fork or multithreading, the whole
# daemon randomly crashes when it arrives here. I'm not sure it is good though.
def generate
 pid = Process.fork do
 File.delete($needs_generate)
 log("Generating...")
 middleman = Process.spawn("middleman build --clean", :chdir => "/Users/johndoe")
 Process.wait
 log("Generated.")
 FileUtils.touch($needs_sync)
 end
 Process.detach(pid)
end
# Here too. s3cmd sync can take a very long time (4 minutes for instance).
def sync
 pid = Process.fork do
 File.delete($needs_sync)
 log("Synchronizing...")
 Dir.chdir("/Users/johndoe") do
 Dir::mkdir("gzip") unless File.exists?("gzip")
 FileUtils.rm_rf Dir.glob("gzip/*")
 FileUtils.cp_r(Dir["build/*"], "gzip")
 end
 Dir.chdir("/Users/johndoe/gzip") do
 `find . -iname "*.*" -type f -exec sh -c "gzip -n -c -9 "{}" > tmp && cat tmp > "{}"" ";"`
 File.delete("tmp")
 `find . -name ".DS_Store" -delete`
 s3cmd = "/usr/local/bin/s3cmd"
 opts = "-P --rr --delete-removed --guess-mime-type --no-check-md5 --no-preserve -H --add-header=Content-Encoding:gzip"
 bucket = "s3://www.mybucket.com/"
 in5minutes = "--add-header=Cache-control:max-age=300,must-revalidate"
 in15minutes = "--add-header=Cache-control:max-age=900,must-revalidate"
 in1day = "--add-header=Cache-control:max-age=86400,must-revalidate"
 in8days = "--add-header=Cache-control:max-age=691200,must-revalidate"
 sync = `#{s3cmd} sync #{opts} #{in8days} . #{bucket}`
 sitemap = `#{s3cmd} put #{opts} #{in1day} sitemap.xml #{bucket}sitemap.xml`
 feed = `#{s3cmd} put #{opts} #{in15minutes} feed.xml #{bucket}feed.xml`
 archives = `#{s3cmd} put #{opts} --recursive #{in15minutes} archives #{bucket}`
 index = `#{s3cmd} put #{opts} #{in5minutes} index.html #{bucket}index.html`
 end
 log("Synchronized.")
 end
 Process.detach(pid)
end
# This whole function seems right though.
def check_mail
 emails = Mail.find_and_delete(mailbox: "INBOX", what: :first, count: 10, order: :asc)
 if (emails.length == 0 )
 log("No email found.")
 else
 emails.each do |email, imap, uid|
 if email.subject.empty?
 email.skip_deletion
 next
 end
 subject = email.subject
 slug = slugalize(subject.dup)
 if email.body.decoded.empty?
 email.skip_deletion
 next
 end
 body = email.body.decoded
 kind = "post"
 link = body.lines.first.split(/\s+/).find_all { |u| u =~ /^https?:/ }.join
 if link.match(/^http/)
 kind = "link"
 body = body.lines.to_a[1..-1].join
 end
 log("#{subject} (#{kind})")
 path = $path.dup
 if kind == "link"
 path << "/links"
 else
 path << "/posts"
 end
 path << "/%s-%s-%s-%s.markdown" % [$year, $month, $day, slug]
 open(path, "w") do |str|
 str << "---\n"
 str << %Q|title: "#{subject}"\n|
 if kind == "link"
 str << "link: #{link}\n"
 end
 str << "date: %s-%s-%sT%s\n" % [$year, $month, $day, $time]
 str << "---\n"
 str << body.chomp!
 end
 url = "http://www.mybucket.com/"
 url << "linked/" if kind == "link"
 url << "#{year}/#{month}/#{slug}/"
 twtsubj = truncate(subject, 115)
 twtsubj.insert(0, "★ ") if kind == "post"
 #Twitter.update("#{twtsubj} #{url}")
 email.skip_deletion if !DELETE
 end
 FileUtils.touch($needs_generate)
 end
end
############################
log("Daemon started.")
# OK, finally, let me explain the whole "files" thing. As it crashed randomly,
# the only way to get the script to actually generate the site and sync it
# when the daemon starts again.
# If the whole thing doesn't crash, I don't think I'd need it then.
loop do
 check_mail
 if File.exists?($needs_generate)
 generate
 end
 if File.exists?($needs_sync)
 sync
 end
 sleep 300
end
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Nov 14, 2013 at 20:28
\$\endgroup\$
5
  • \$\begingroup\$ Thanks for the tip. I should have looked better. I see no way to move my own question though. \$\endgroup\$ Commented Nov 14, 2013 at 20:42
  • \$\begingroup\$ I asked the moderator to move it for you. But, because you don't have the code in your question, expect it to be closed there too. As is, you're asking people to chase down the link, which wastes people's time, plus, if/when the link breaks your question will be worthless to anyone else. \$\endgroup\$ Commented Nov 14, 2013 at 21:22
  • \$\begingroup\$ Thanks @theTinMan. I've added the code in that case. I've yet to familiarize myself with the very peculiar way SO works. \$\endgroup\$ Commented Nov 14, 2013 at 21:35
  • 1
    \$\begingroup\$ SO isn't peculiar, it's very common-sense oriented. Consider how it works: There are people asking questions, and volunteers answer them. The volunteers do this on their own time and, to help them help you, they need you to supply the information needed in the question. If you were asked your question, along with a bunch of others, wouldn't you want to have the necessary data right there? \$\endgroup\$ Commented Nov 14, 2013 at 21:39
  • \$\begingroup\$ And, BTW, that isn't a lot of code. You should see when someone inserts a big Java stack trace. \$\endgroup\$ Commented Nov 14, 2013 at 21:41

1 Answer 1

1
\$\begingroup\$

Well, I tend to agree on at least one count - this one is an eye-sore! Here are my 2 cents on what you need to do:

Break your code into following files:

  • config.rb: If you spend some time looking at open-source projects out there, you'd notice that almost all applications separation configuration from logic. This is a good thing to do and will reduce the eyesore instantaneously. Create a module named Configuration.
  • logger.rb: Your requirement to reverse fill a log is absurd. Why do you want that? If you want to see latest logs, just tail -f your file if you are on a Unix system. In your best interests, you should just use a Logging gem out there which can rotate logs for you. At any rate, create a module Logger with the log method here.
  • email_checker.rb: Move the check_mail functionality there into a EmailChecker module.
  • generator.rb: Move the generate method there as part of the Generator module.
  • syncer.rb: Move the sync method there as part of the Syncer module.
  • myproc.rb: Write your loop here using all the modules defined previously.
  • myproc_control.rb: Use the Daemon gem.

Once you've done this, you can ask a question with each individual file and get a much better response. Here are some of my other quick observations:

  1. Isn't set_date equivalent to Time.now.strftime("%Y/%m/%d/ %T")? And if you can live with hyphens in place of slashes, you can use Time.now.strftime("%F %T").
  2. The check_email method needs refactoring. For example,
    1. Combine the conditions when you'll skip the iteration cycle together like if email.subject.empty? or email.body.decoded.empty?.
    2. Extract out the code needed to send out a Twitter update.
  3. In the sync method, you don't need to create individual variables for each interval. Use a method which will return the appropriate option with the correct amount of seconds corresponding to the given time interval.
answered Nov 21, 2013 at 11:11
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.