I want to convert a webpage into an HTML page programatically.
I searched many sites but only providing details like converting into pdf format etc.
For my program now I'm saving a page as .html and then extracting the necessary data.
Is there any way to convert the webpage to an html page? Can anyone help me?
Any help would be appreciated.
Well I can explain in detail
I am extracting the names of users who like a page which i'm admin of . So I found a link https://www.facebook.com/browse/?type=page_fans&page_id=pageid where i can find the list of users. So for getting it first of all i have to save it as a .html page and then extract necessary data. So here I'm converting it into .html and then extract the data. But what I need is that convert that page into an HTML page using my program. I hope my question is clear now
-
4What do you mean by converting a webpage into an HTML page? Aren't they same?nicael– nicael2014年04月24日 11:03:55 +00:00Commented Apr 24, 2014 at 11:03
-
What do you want to convert to HTML page? Do you mean you want to generate HTML page with Java?Stasel– Stasel2014年04月24日 11:04:43 +00:00Commented Apr 24, 2014 at 11:04
-
1what do you want from this conversion?Oomph Fortuity– Oomph Fortuity2014年04月24日 11:06:18 +00:00Commented Apr 24, 2014 at 11:06
-
Perhaps you question is how to programmatically fetch a web page?Tech Agent– Tech Agent2014年04月24日 11:10:36 +00:00Commented Apr 24, 2014 at 11:10
-
Do you mean converting a web page to a standalone HTML page that can be used offline? Should it be a single HTML file or can it be a collection of files (possibly packaged as a zip file)? Please clarify by editing the question (including its title).Jukka K. Korpela– Jukka K. Korpela2014年04月24日 11:12:23 +00:00Commented Apr 24, 2014 at 11:12
2 Answers 2
Oracle provides the following code snippet for programmatically retrieving an html page here.
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
Instead of printing to console, you can save the contents to a file by using a FileWriter and BufferedWriter (example from this question):
FileWriter fstream = new FileWriter("fileName");
BufferedWriter fbw = new BufferedWriter(fstream);
while ((line = in.readLine()) != null) {
fbw.write(line + "\n");
}
1 Comment
Webpages are already HTML, if you want to save a webpage as HTML you can do this via the Firefox> Save Page As menu on Firefox. Or through File menu on other browsers.
If you need to download multiple pages in HTML from the same website or from a list of URLs there is a software that will make it easier for you: http://www.httrack.com/