I wrote code for my needs (i.e. collecting YouTube links from a page) and it is simple, but now I just want to know what can I do just to make it a better and properly written one. For example, I think I should have try
/catch
blocks/exceptions in my methods, or get/set properties. What tests can I write to test the code? I don't know to be honest, but how would the code look if you had written it?
public class Parser
{
public static void Main()
{
Console.WriteLine("Enter the url of the page:");
var url = Console.ReadLine();
var numOfPages = GetNumberOfPages(url);
var numOfLinks = 1;
for (int page = 0; page <= numOfPages; page++)
{
if (page != 1)
{
if (page == 0)
{
GetYouTubeLink(url).ForEach(x => Console.WriteLine($"{numOfLinks++}." +
$"{x.Replace("embed/", "watch?v=").Replace("?feature=oembed", "")}"));
}
else
{
var newUrl = url + string.Format($"{page}/");
GetYouTubeLink(newUrl).ForEach(x => Console.WriteLine($"{numOfLinks++}." +
$"{x.Replace("embed/", "watch?v=").Replace("?feature=oembed", "")}"));
newUrl = string.Empty;
}
}
}
}
private static HtmlDocument GetWebPageHtmlFromUrl(string url)
{
var webGet = new HtmlWeb();
return webGet.Load(url);
}
private static int GetNumberOfPages(string url)
{
var pages = GetWebPageHtmlFromUrl(url)
.DocumentNode.SelectNodes("//div[@class='page-link']")
.Descendants("a")
.Last().InnerText;
return int.Parse(pages);
}
private static List<string> GetYouTubeLink(string url)
{
var links = new List<string>();
var iframeLinks = GetWebPageHtmlFromUrl(url).DocumentNode.SelectNodes("//iframe");
if (iframeLinks != null)
{
foreach (var link in iframeLinks)
{
links.Add(link.Attributes["src"].Value);
}
}
return links;
}
}
-
\$\begingroup\$ CRs are so much easier when I can just paste the code in VS. Thank you! \$\endgroup\$Dan Friedman– Dan Friedman2018年04月30日 22:14:08 +00:00Commented Apr 30, 2018 at 22:14
1 Answer 1
Comments inline:
public class Parser
{
public static void Main()
{
Console.WriteLine("Enter the url of the page:");
var url = Console.ReadLine();
var numOfPages = GetNumberOfPages(url);
var numOfLinks = 1;
for (int page = 0; page <= numOfPages; page++)
{
// if (page != 1) is redundant in the original code, did you mean numOfPages?
var newUrl = url + (page > 0 ? string.Format($"{page}/") : "");
GetYouTubeLinks(newUrl).ForEach(x => Console.WriteLine($"{numOfLinks++}." +
x.Replace("embed/", "watch?v=").Replace("?feature=oembed", ""))); // remove string interpolation since it's just one value
// newUrl = string.Empty; // Not needed
}
}
private static HtmlDocument GetWebPageHtmlFromUrl(string url)
{
var webGet = new HtmlWeb();
return webGet.Load(url);
}
private static int GetNumberOfPages(string url)
{
var pages = GetWebPageHtmlFromUrl(url)
.DocumentNode.SelectNodes("//div[@class='page-link']") // This parsing may work on specific pages, but it won't work on all pages. This is concerning since the user is prompted to enter any URL?
.Descendants("a")
.Last().InnerText;
return int.Parse(pages);
}
private static List<string> GetYouTubeLinks(string url) // Made plural
{
var iframeLinks = GetWebPageHtmlFromUrl(url).DocumentNode.SelectNodes("//iframe");
if (iframeLinks == null) // reduce nesting
{
return new List<string>(0);
}
return iframeLinks.Select(link => link.Attributes["src"].Value).ToList(); // simplify allocation
}
}
-
\$\begingroup\$ Actually I am checking if the page is not 1, because the first page actually is 0 and the second is 2. And the other thing is that I am getting the number of pages only from the first one, so I don't need to call this method on every page. \$\endgroup\$beBoss– beBoss2018年05月01日 23:12:23 +00:00Commented May 1, 2018 at 23:12
-
\$\begingroup\$ Why not just make the first page 1? I.e.
for (int page = 1; page <= numOfPages; page++)
I'm confused by your "getting the number of pages" once comment. I'm only callingGetNumberOfPages
once, same as you. My comment aboutnumOfPages
wouldn't have required any recalculation, I was just trying to figure out what you were trying to accomplish. \$\endgroup\$Dan Friedman– Dan Friedman2018年05月02日 15:17:30 +00:00Commented May 2, 2018 at 15:17 -
\$\begingroup\$ Well, will try to explain it. So when I call GetNumberOfPages(), on every page on the bottom there is pagination and the number of pages, So I just call this method on the first page and I have the number of the pages. First page in the URL is like this: website.com/ , the second is website.com/2/ so I don't have page 1 as you suggested me to use if in the for loop (don't have website.com/1/) that's why I check if the loop isn't 1. \$\endgroup\$beBoss– beBoss2018年05月02日 20:39:48 +00:00Commented May 2, 2018 at 20:39
-
\$\begingroup\$ So you could just start at 1 instead of 0 and do
var newUrl = url + (page > 1 ? string.Format($"{page}/") : "");
\$\endgroup\$Dan Friedman– Dan Friedman2018年05月02日 23:29:53 +00:00Commented May 2, 2018 at 23:29 -
\$\begingroup\$ Yep, it works and looks great to be honest, thank you for the help! Great tips and suggestions :) \$\endgroup\$beBoss– beBoss2018年05月03日 01:20:09 +00:00Commented May 3, 2018 at 1:20
Explore related questions
See similar questions with these tags.