This method changes all URLs in the string into an HTML link. The goal is to not mutate the string passed in, which I believe is a good practice.
The new_string
code in this method does not seem idiomatic.
def autolink_string string
return if string.nil?
return '' if string.empty?
regex = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
groups = string.scan regex
matches = groups.map &:first
new_string = string.clone
matches.each do |match|
new_string.gsub! match, "<a href='http://#{ match }'>#{ match }</a>"
end
new_string
end
How can I refactor to remove the new_string
? Any other suggestions?
-
1\$\begingroup\$ This method needs some polishing indeed, but are you aware that there are gems for this task? github.com/tenderlove/rails_autolink \$\endgroup\$tokland– tokland2013年05月21日 14:27:17 +00:00Commented May 21, 2013 at 14:27
-
\$\begingroup\$ Yes, I did some research. This is not a rails project, and from what I could guess rails_autolink is only for Rails. I didn't find any other useful gems. It turns out that the regex is simple anyway. \$\endgroup\$B Seven– B Seven2013年05月21日 14:46:02 +00:00Commented May 21, 2013 at 14:46
-
\$\begingroup\$ Ok. Let's see if somebody else answers (I am kind of hoarding the ruby tag lately ;-)) \$\endgroup\$tokland– tokland2013年05月21日 14:51:30 +00:00Commented May 21, 2013 at 14:51
-
1\$\begingroup\$ I don't know what kind of input you're dealing with, but here are some things to consider: What if the URL already includes "http:" (or something else)? What if there are "bad" chars after the domain, e.g. it's at the end of a sentence and followed by a period? What if the domain is a .co.uk, .io, or .museum or any one of the other top-level domains? What about chars that are illegal in URLs? What if the URL is uppercase? Robust URL detection is hard, but here's a pretty good one \$\endgroup\$Flambino– Flambino2013年05月21日 15:32:07 +00:00Commented May 21, 2013 at 15:32
-
\$\begingroup\$ @Flambino - I have a helper method to handle the case that it already includes http://. I omitted it for clarity. It does not match the punctuation after the URL. I don't care about .co.uk, .io, etc. I don't care about illegal URL chars. It works for uppercase. Thanks for the link. \$\endgroup\$B Seven– B Seven2013年05月21日 15:50:02 +00:00Commented May 21, 2013 at 15:50
2 Answers 2
Some notes:
About the double
return
at the beginning: note that you could writereturn string if !string || string.empty?
. In any case, I wouldn't check this, if the caller sends anil
as argument it's his/her fault. IMO in dynamic languages it does not make much sense to check the types of the arguments.Again, about the double
return
at the beginning: ok, let's say that you really need to do this check. As it may be seen as a guard, it may be ok to write a inlinereturn
. However, as a rule of thumb, it's better to write full indented conditional expressions, they are much more clear (at the meager cost of a couple of lines and an indentation level).#{ match }
->#{match}
. Nobody puts those spaces.regex = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
: This looks like a brittle regular expression to detect URLs. Also are you sure you won't get ahttp://
here? Anyway, you'll know what you need...new_string.gsub!
: No need to do an in-place update. UseString#gsub
with a block instead.
So I'd simply write:
def autolink_string(string)
regexp = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
string.gsub(regexp) { |url| "<a href='http://#{url}'>#{url}</a>" }
end
-
1\$\begingroup\$ OP should pick this answer, better than mine.
gsub
is the way to go, I used it before but failed due to something wrong. \$\endgroup\$Billy Chan– Billy Chan2013年05月21日 17:04:08 +00:00Commented May 21, 2013 at 17:04 -
1\$\begingroup\$ that's very nice of you :-) but your answer also shows two of my points: it uses real conditionals and
gsub
with blocks. \$\endgroup\$tokland– tokland2013年05月21日 19:08:11 +00:00Commented May 21, 2013 at 19:08
You can use gsub!
and block directly with a clone.
def autolink_string str
unless str.empty?
regex = /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
str.clone.gsub!(regex) do |match|
"<a href='http://#{ match }'>#{ match }</a>"
end || str # Return str if no match
end
end
In console:
> str = "this is a test abc.com"
> result = autolink_string str
=> "this is a test <a href='http://abc.com'>abc.com</a>"
> str
=> "this is a test abc.com"
> result
=> "this is a test <a href='http://abc.com'>abc.com</a>"
> autolink_string "no url here"
=> "no url here"
> autolink_string ""
=> ""
> autolink_string "this is a test abc.com and test twitter.com"
=> "this is a test <a href='http://abc.com'>abc.com</a> and test <a href='http://twitter.com'>twitter.com</a>"
-
\$\begingroup\$ It seems that using
gsub
in the block is a good idea. However, I don't want to mutate the string. \$\endgroup\$B Seven– B Seven2013年05月21日 15:46:24 +00:00Commented May 21, 2013 at 15:46 -
\$\begingroup\$ @BSeven, then just clone it, no more lines needed. Results are same. \$\endgroup\$Billy Chan– Billy Chan2013年05月21日 15:56:39 +00:00Commented May 21, 2013 at 15:56
-
\$\begingroup\$ Works well, except that it returns nil if there are no matches:
autolink_string "no url here" => nil
\$\endgroup\$B Seven– B Seven2013年05月21日 16:26:52 +00:00Commented May 21, 2013 at 16:26 -
\$\begingroup\$ @BSeven, nice point! It depends on your usages, sometimes nil is preferred when you already have "str" in hand. If you choose to return something, I've revised the code to return
str
if no matches. \$\endgroup\$Billy Chan– Billy Chan2013年05月21日 16:38:51 +00:00Commented May 21, 2013 at 16:38