3
\$\begingroup\$

This method changes all URLs in the string into an HTML link. The goal is to not mutate the string passed in, which I believe is a good practice.

The new_string code in this method does not seem idiomatic.

def autolink_string string
 return if string.nil? 
 return '' if string.empty?
 regex = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
 groups = string.scan regex
 matches = groups.map &:first
 new_string = string.clone
 matches.each do |match|
 new_string.gsub! match, "<a href='http://#{ match }'>#{ match }</a>"
 end
 new_string
end

How can I refactor to remove the new_string? Any other suggestions?

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked May 21, 2013 at 14:02
\$\endgroup\$
5
  • 1
    \$\begingroup\$ This method needs some polishing indeed, but are you aware that there are gems for this task? github.com/tenderlove/rails_autolink \$\endgroup\$ Commented May 21, 2013 at 14:27
  • \$\begingroup\$ Yes, I did some research. This is not a rails project, and from what I could guess rails_autolink is only for Rails. I didn't find any other useful gems. It turns out that the regex is simple anyway. \$\endgroup\$ Commented May 21, 2013 at 14:46
  • \$\begingroup\$ Ok. Let's see if somebody else answers (I am kind of hoarding the ruby tag lately ;-)) \$\endgroup\$ Commented May 21, 2013 at 14:51
  • 1
    \$\begingroup\$ I don't know what kind of input you're dealing with, but here are some things to consider: What if the URL already includes "http:" (or something else)? What if there are "bad" chars after the domain, e.g. it's at the end of a sentence and followed by a period? What if the domain is a .co.uk, .io, or .museum or any one of the other top-level domains? What about chars that are illegal in URLs? What if the URL is uppercase? Robust URL detection is hard, but here's a pretty good one \$\endgroup\$ Commented May 21, 2013 at 15:32
  • \$\begingroup\$ @Flambino - I have a helper method to handle the case that it already includes http://. I omitted it for clarity. It does not match the punctuation after the URL. I don't care about .co.uk, .io, etc. I don't care about illegal URL chars. It works for uppercase. Thanks for the link. \$\endgroup\$ Commented May 21, 2013 at 15:50

2 Answers 2

2
\$\begingroup\$

Some notes:

  • About the double return at the beginning: note that you could write return string if !string || string.empty?. In any case, I wouldn't check this, if the caller sends a nil as argument it's his/her fault. IMO in dynamic languages it does not make much sense to check the types of the arguments.

  • Again, about the double return at the beginning: ok, let's say that you really need to do this check. As it may be seen as a guard, it may be ok to write a inline return. However, as a rule of thumb, it's better to write full indented conditional expressions, they are much more clear (at the meager cost of a couple of lines and an indentation level).

  • #{ match } -> #{match}. Nobody puts those spaces.

  • regex = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/: This looks like a brittle regular expression to detect URLs. Also are you sure you won't get a http:// here? Anyway, you'll know what you need...

  • new_string.gsub!: No need to do an in-place update. Use String#gsub with a block instead.

So I'd simply write:

def autolink_string(string)
 regexp = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
 string.gsub(regexp) { |url| "<a href='http://#{url}'>#{url}</a>" }
end
answered May 21, 2013 at 16:39
\$\endgroup\$
2
  • 1
    \$\begingroup\$ OP should pick this answer, better than mine. gsub is the way to go, I used it before but failed due to something wrong. \$\endgroup\$ Commented May 21, 2013 at 17:04
  • 1
    \$\begingroup\$ that's very nice of you :-) but your answer also shows two of my points: it uses real conditionals and gsub with blocks. \$\endgroup\$ Commented May 21, 2013 at 19:08
2
\$\begingroup\$

You can use gsub! and block directly with a clone.

def autolink_string str
 unless str.empty?
 regex = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
 str.clone.gsub!(regex) do |match|
 "<a href='http://#{ match }'>#{ match }</a>"
 end || str # Return str if no match
 end
end

In console:

> str = "this is a test abc.com"
> result = autolink_string str
=> "this is a test <a href='http://abc.com'>abc.com</a>"
> str
=> "this is a test abc.com"
> result
=> "this is a test <a href='http://abc.com'>abc.com</a>"
> autolink_string "no url here"
=> "no url here"
> autolink_string ""
=> ""
> autolink_string "this is a test abc.com and test twitter.com"
=> "this is a test <a href='http://abc.com'>abc.com</a> and test <a href='http://twitter.com'>twitter.com</a>"
answered May 21, 2013 at 15:03
\$\endgroup\$
4
  • \$\begingroup\$ It seems that using gsub in the block is a good idea. However, I don't want to mutate the string. \$\endgroup\$ Commented May 21, 2013 at 15:46
  • \$\begingroup\$ @BSeven, then just clone it, no more lines needed. Results are same. \$\endgroup\$ Commented May 21, 2013 at 15:56
  • \$\begingroup\$ Works well, except that it returns nil if there are no matches: autolink_string "no url here" => nil \$\endgroup\$ Commented May 21, 2013 at 16:26
  • \$\begingroup\$ @BSeven, nice point! It depends on your usages, sometimes nil is preferred when you already have "str" in hand. If you choose to return something, I've revised the code to return str if no matches. \$\endgroup\$ Commented May 21, 2013 at 16:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.