4
\$\begingroup\$

I need to extract a ID (int number) from a Url.

Example: http://www.example.com/foo/bar/12345

For this I wrote 4 methods where the first one ReadIDsFromFile() is called by my constructor and the return value is set to a properties of this class. The methods are called in the order I posted them below.

private List<string> ReadIDsFromFile(string path)
 {
 // path is the full qualified path to a txt file. C:\text.txt
 List<string> TweetIDsList = new List<string>();
 string temp = string.Empty;
 using (StreamReader sr = new StreamReader(path))
 {
 while ((temp = sr.ReadLine()) != null)
 {
 if (ValidateUrl(temp))
 {
 TweetIDsList.Add(ExtractID(temp));
 }
 else
 {
 Logger.Log("Invalid URL: {0}", temp);
 }
 }
 }
 return TweetIDsList;
 }
private bool ValidateUrl(string url)
{
 Uri uriResult;
 bool result;
 return result = Uri.TryCreate(url, UriKind.Absolute, out uriResult) && (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
}
private string ExtractID(string url)
{
 string id = string.Empty;
 char[] urlArray = url.ToCharArray();
 int result;
 for (int i = urlArray.Length - 1; i >= 0; i--)
 {
 if (Int32.TryParse(urlArray[i].ToString(), out result))
 {
 id += result;
 }
 else
 {
 break; // break loop. If tryparse fails it means we have reached a character which is not a number, probably a forward slash.
 }
 }
 return ReverseNumber(id);
}
private string ReverseNumber(string id)
{
 char[] tempArray = id.ToCharArray();
 string result = string.Empty;
 for (int i = tempArray.Length -1; i >= 0; i--)
 {
 result += tempArray[i];
 }
 return result;
}

My code is working without problems so far but I feel like that it is overly unnecessarily complicated. I am especially concerned about the reversed order of the ID and my attempt to reverse it. Is there a better way?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Apr 7, 2015 at 2:01
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

You can use a regular expression to find the ID. The pattern [0-9]+$ will match one or more occurrences of 0-9 at the end of the string. You can use it like this:

private static readonly Regex UrlId = new Regex("[0-9]+$");
private static string ExtractID(string url)
{
 var match = UrlId.Match(url);
 return match.Success
 ? match.Captures[0].Value
 : string.Empty;
}

Instead of using a StreamReader, consider using File.ReadLines

foreach (var line in File.ReadLines(path))
{
 if (ValidateUrl(line))
 {
 TweetIDsList.Add(ExtractID(line));
 }
 else
 {
 Logger.Log("Invalid URL: {0}", line);
 }
}

You can remove the variable result from ValidateUrl:

private static bool ValidateUrl(string url)
{
 Uri uriResult;
 return Uri.TryCreate(url, UriKind.Absolute, out uriResult) &&
 (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
}
answered Apr 7, 2015 at 2:25
\$\endgroup\$
3
  • \$\begingroup\$ Thank you very much. Can you elaborate why I should use File.ReadLines()? \$\endgroup\$ Commented Apr 7, 2015 at 13:47
  • \$\begingroup\$ Since you're already processing the file line by line, using File.ReadLines() allows you to use foreach() so that your code more clearly expresses what you're doing. Leave the null test to the library. \$\endgroup\$ Commented Apr 7, 2015 at 17:24
  • \$\begingroup\$ @Takeru As Snowbody said, File.ReadLines makes the code clearer. Note there is a difference between ReadLines and ReadAllLines. I forgot to mention, you should specify the encoding when reading the file. \$\endgroup\$ Commented Apr 7, 2015 at 21:04
2
\$\begingroup\$

In this case all those helper functions are probably redundant since you can leverage the Segments array of the Uri class to get the ID from the last segment.:

private List<string> ReadIDsFromFile(string path)
{
 // path is the full qualified path to a txt file. C:\text.txt
 List<string> TweetIDsList = new List<string>();
 string temp = string.Empty;
 using (StreamReader sr = new StreamReader(path))
 {
 while ((temp = sr.ReadLine()) != null)
 {
 Uri uriResult;
 if (Uri.TryCreate(temp, UriKind.Absolute, out uriResult) && (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps))
 {
 //TweetIDsList.Add(new string(uriResult.Segments.Last().Reverse().ToArray()));
 //Since reverse isn't necessary the segment itself can be passed
 TweetIDsList.Add(uriResult.Segments.Last());
 }
 else
 {
 Logger.Log("Invalid URL: {0}", temp);
 }
 }
 }
 return TweetIDsList;
}
answered Apr 7, 2015 at 15:36
\$\endgroup\$
3
  • \$\begingroup\$ wow, thank you. I didn't know about that property of uri. The Reverse() is unneeded in this case because I only needed to reverse the string as I was reading from end to start. \$\endgroup\$ Commented Apr 7, 2015 at 15:47
  • \$\begingroup\$ @Takeru - In that case you can use the segment itself. BTW File.ReadAllLines will read the whole file into memory. This works faster, but for large files is impractical, and StreamReader is necessary. \$\endgroup\$ Commented Apr 7, 2015 at 15:52
  • \$\begingroup\$ You can also add a check that your last segment is a valid int. \$\endgroup\$ Commented Apr 8, 2015 at 9:10

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.