I need to extract a ID (int number) from a Url.
Example: http://www.example.com/foo/bar/12345
For this I wrote 4 methods where the first one ReadIDsFromFile()
is called by my constructor and the return value is set to a properties of this class. The methods are called in the order I posted them below.
private List<string> ReadIDsFromFile(string path)
{
// path is the full qualified path to a txt file. C:\text.txt
List<string> TweetIDsList = new List<string>();
string temp = string.Empty;
using (StreamReader sr = new StreamReader(path))
{
while ((temp = sr.ReadLine()) != null)
{
if (ValidateUrl(temp))
{
TweetIDsList.Add(ExtractID(temp));
}
else
{
Logger.Log("Invalid URL: {0}", temp);
}
}
}
return TweetIDsList;
}
private bool ValidateUrl(string url)
{
Uri uriResult;
bool result;
return result = Uri.TryCreate(url, UriKind.Absolute, out uriResult) && (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
}
private string ExtractID(string url)
{
string id = string.Empty;
char[] urlArray = url.ToCharArray();
int result;
for (int i = urlArray.Length - 1; i >= 0; i--)
{
if (Int32.TryParse(urlArray[i].ToString(), out result))
{
id += result;
}
else
{
break; // break loop. If tryparse fails it means we have reached a character which is not a number, probably a forward slash.
}
}
return ReverseNumber(id);
}
private string ReverseNumber(string id)
{
char[] tempArray = id.ToCharArray();
string result = string.Empty;
for (int i = tempArray.Length -1; i >= 0; i--)
{
result += tempArray[i];
}
return result;
}
My code is working without problems so far but I feel like that it is overly unnecessarily complicated. I am especially concerned about the reversed order of the ID and my attempt to reverse it. Is there a better way?
2 Answers 2
You can use a regular expression to find the ID. The pattern [0-9]+$
will match one or more occurrences of 0-9 at the end of the string. You can use it like this:
private static readonly Regex UrlId = new Regex("[0-9]+$");
private static string ExtractID(string url)
{
var match = UrlId.Match(url);
return match.Success
? match.Captures[0].Value
: string.Empty;
}
Instead of using a StreamReader
, consider using File.ReadLines
foreach (var line in File.ReadLines(path))
{
if (ValidateUrl(line))
{
TweetIDsList.Add(ExtractID(line));
}
else
{
Logger.Log("Invalid URL: {0}", line);
}
}
You can remove the variable result
from ValidateUrl
:
private static bool ValidateUrl(string url)
{
Uri uriResult;
return Uri.TryCreate(url, UriKind.Absolute, out uriResult) &&
(uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
}
-
\$\begingroup\$ Thank you very much. Can you elaborate why I should use
File.ReadLines()
? \$\endgroup\$Takeru– Takeru2015年04月07日 13:47:28 +00:00Commented Apr 7, 2015 at 13:47 -
\$\begingroup\$ Since you're already processing the file line by line, using
File.ReadLines()
allows you to useforeach()
so that your code more clearly expresses what you're doing. Leave thenull
test to the library. \$\endgroup\$Snowbody– Snowbody2015年04月07日 17:24:11 +00:00Commented Apr 7, 2015 at 17:24 -
\$\begingroup\$ @Takeru As Snowbody said,
File.ReadLines
makes the code clearer. Note there is a difference between ReadLines and ReadAllLines. I forgot to mention, you should specify the encoding when reading the file. \$\endgroup\$mjolka– mjolka2015年04月07日 21:04:27 +00:00Commented Apr 7, 2015 at 21:04
In this case all those helper functions are probably redundant since you can leverage the Segments
array of the Uri
class to get the ID from the last segment.:
private List<string> ReadIDsFromFile(string path)
{
// path is the full qualified path to a txt file. C:\text.txt
List<string> TweetIDsList = new List<string>();
string temp = string.Empty;
using (StreamReader sr = new StreamReader(path))
{
while ((temp = sr.ReadLine()) != null)
{
Uri uriResult;
if (Uri.TryCreate(temp, UriKind.Absolute, out uriResult) && (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps))
{
//TweetIDsList.Add(new string(uriResult.Segments.Last().Reverse().ToArray()));
//Since reverse isn't necessary the segment itself can be passed
TweetIDsList.Add(uriResult.Segments.Last());
}
else
{
Logger.Log("Invalid URL: {0}", temp);
}
}
}
return TweetIDsList;
}
-
\$\begingroup\$ wow, thank you. I didn't know about that property of uri. The
Reverse()
is unneeded in this case because I only needed to reverse the string as I was reading from end to start. \$\endgroup\$Takeru– Takeru2015年04月07日 15:47:58 +00:00Commented Apr 7, 2015 at 15:47 -
\$\begingroup\$ @Takeru - In that case you can use the segment itself. BTW
File.ReadAllLines
will read the whole file into memory. This works faster, but for large files is impractical, andStreamReader
is necessary. \$\endgroup\$user33306– user333062015年04月07日 15:52:25 +00:00Commented Apr 7, 2015 at 15:52 -
\$\begingroup\$ You can also add a check that your last segment is a valid int. \$\endgroup\$aydjay– aydjay2015年04月08日 09:10:26 +00:00Commented Apr 8, 2015 at 9:10