0

I have a ArrayList containing list of websites in this format:-

  • google.com
  • facebook.com
  • youtube.com
  • yahoo.com
  • wikipedia.org
  • t.co

    And I have to read html text from all the links. But some links are creating problem like (t.co) and other are working fine.

    Code:-

     try
     {
     String line="t.co";
     String[] Add_words = line.split("[//:.]");
     if (Add_words[0].contains("http")) {
     }
     else if (Add_words[0].contains("www"))
     line = "http://" + line;
     else if (!Add_words[0].contains("http")
     && !Add_words[0].contains("www"))
     line = "http://www." + line;
     URL url = new URL(line);
     URLConnection urlConnection = url.openConnection();
     HttpURLConnection connection = null;
     if(urlConnection instanceof HttpURLConnection)
     {
     connection = (HttpURLConnection) urlConnection;
     }
     else
     {
     System.out.println("Please enter an HTTP URL.");
     return;
     }
     BufferedReader in = new BufferedReader(
     new InputStreamReader(connection.getInputStream()));
     String urlString = "";
     String current;
     while((current = in.readLine()) != null)
     {
     urlString += current+"\n";
     }
     System.out.println(urlString);
     }catch(IOException e)
     {
     e.printStackTrace();
     }
    And I'm getting the error with the last link `t.co`
    

    error:-

    java.io.FileNotFoundException: http://www.t.co
     at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1834)
     at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
     at com.test.code.Main.main(Main.java:109)
    

    What i need is, I have list of link in above format and my code should access all the link, whatever the link format will be.

asked Aug 27, 2014 at 7:53

3 Answers 3

2

You are adding www. to t.co, but www.t.co is not correct and will result in an 404 Not Found.

Just do not add the www. to the URL and it should work.

answered Aug 27, 2014 at 8:05
Sign up to request clarification or add additional context in comments.

3 Comments

Sorry I did not see this when I posted mine.
@Tim, i have a list of lots of link . so how can i do for every one .
Not sure what the question is now, but if you just replace line = "http://www." + line; by line = "http://" + line; it should work. Why do you have to append the www.?
0

You get FileNotFoundException because getting a response from http://www.t.co returns:

HTTP/1.1 404 Not Found
answered Aug 27, 2014 at 7:58

9 Comments

i have a list of lots of link of this type . so how can i do for every one.
If url.openConnection() throws an Exception, that means the url is unreachable/invalid. Move on to the next one. Or don't create invalid/false urls in the first place.
for testing view i can handle it but how can a user know. If they have a list like this. then
Not sure what you're asking, please clarify.
The overall idea is :- you have a list of link in csv file. you just upload it. my code will access all those links and will keep in ArrayList and link will be any in format like without http, www or may be prefixed by https. so I have to validate all the links. if link is working then access the html text if not then move to next link..
|
0

You are adding www. to your t.co link which is causing the problem. Do not add that prefix and only try with http://t.co and it should work if your link is valid.

EDIT

Change:

else if (Add_words[0].contains("www"))
 line = "http://" + line;
else if (!Add_words[0].contains("http")
 && !Add_words[0].contains("www"))
 line = "http://www." + line;

to

else if (Add_words[0].contains("www") ||
 (line.contains("t.co") && !Add_words[0].contains("www")))
 line = "http://" + line;
else if (!Add_words[0].contains("http")
 && !Add_words[0].contains("www")
 && !line.contains("t.co"))
 line = "http://www." + line;

This is not the best way but will do. The only case left is if you have line=www.t.co in which you will need to remove the www. prefix before those if statements. As @Tim said www. append is unnessecary anyway so the most efficient solution will be fixing the second else if as he suggested.

answered Aug 27, 2014 at 8:07

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.