I have an input for users to enter a Twitter account in any different way and I want to extract the user account.
For example:
twitters = [
"www.twitter.com/twitteruser1",
"@twitteruser2",
"twitteruser3",
"https://twitter.com/twitteruser4",
"https://www.twitter.com/twitteruser5",
"www.twitter.com/twitteruser6",
"http://www.twitter.com/twitteruser7",
"http://www.twitter.com/twitteruser8",
"twitter.com/twitteruser9"
]
The script that I've written to extract the data is the following:
twitters.each do |twitter|
# for the url
twitter_user = twitter.match(/twitter.com\/([^\/.]*)$/)
if twitter_user != nil
puts twitter_user[1]
next
end
# for @ beginning
twitter_user = twitter.match(/^@([^\/.]*)$/)
if twitter_user != nil
puts twitter_user[1]
next
end
# if we arrive, we haven't found any coincidence
puts twitter
end
It actually works, outputting the following:
twitteruser1
twitteruser2
twitteruser3
twitteruser4
twitteruser5
twitteruser6
twitteruser7
twitteruser8
twitteruser9
But as I'm really newbie in Ruby I wanted to check for possible improvements.
3 Answers 3
I would put the regular expressions into a list:
TWITTER_PATTERNS = [
/twitter.com\/([^\/.]*)$/, # "www.twitter.com/twitteruser1"
/^@([^\/.]*)$/, # "@twitteruser2"
]
This function can iterate over that list:
def twitter_user(twitter)
TWITTER_PATTERNS.each do |pattern|
return 1ドル if twitter =~ pattern
end
twitter
end
This clearly separates the "policy" (what patterns are used to extract twitter users) from the "mechanism" (the loop we use to apply the policy). A change to the function's implementation won't require a change to the patterns, and vice versa.
You can use a ruby case statement to check for different regex matches. If a match is found, it is by definition not nil, so it removes the need for a nil check as well.
twitters.each do |twitter|
case twitter
# for the url
when /twitter.com\/([^\/.]*)$/
puts 1ドル
# for @ beginning
when /^@([^\/.]*)$/
puts 1ドル
# if we arrive, we haven't found any coincidence
else
puts twitter
end
end
When ruby does a regex match, any capture groups are assigned to the global variables 1ドル
, 2ドル
, 3ドル
, and so on. See this question on StackOverflow for more details about the mechanic.
Note on your regex: If you want the capture groups to actually contain anything, you should change the asterisk to a plus: ([^\/.]+)
. Without the plus, "www.twitter.com/" will be captured by one of the regex. With the plus, that string will fall into the default case.
twitters.each do |twitter|
puts twitter.match(/twitter.com\/([^\/.]*)$/) ||
twitter.match(/^@([^\/.]*)$/) ||
twitter
end
It's shorter, and I'd say at least equally readable. Chain of ||
operators will return first truthy value.
-
\$\begingroup\$ Hi @Borsunho, thanks for the quick response. The problem is that in the two first cases, I need the value of the twitter.match(...)[1], and if I access to [1] without checking if twitter.match(...) is nil, for the accounts that haven't matched the first case, the script fails with a:
undefined method [] for nil:NilClass
\$\endgroup\$bcap– bcap2016年12月15日 17:26:42 +00:00Commented Dec 15, 2016 at 17:26 -
\$\begingroup\$ I suspect there is a bug here. If I'm right, it would be fixed by appending
&.()[1]
to each of the two match expressions, but the fix makes the code pretty ugly. There's probably a better fix. \$\endgroup\$Wayne Conrad– Wayne Conrad2016年12月19日 12:44:24 +00:00Commented Dec 19, 2016 at 12:44