I need to specify the string find in Regex format, in order that head tag can be found whatever its format is like <html > or <html> or < html>. How to specify the find string in Regex format?
String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
String find = "<html>";
String replace = "";
Pattern pattern = Pattern.compile(find);
Matcher matcher = pattern.matcher(source);
String output = matcher.replaceAll(replace);
System.out.println("Source = " + source);
System.out.println("Output = " + output);
4 Answers 4
Although you could go round your problem by doing <\\s*html\\s*>, you should not process HTML with regex. Obligatory link.
The \\s* denotes 0 or more white spaces.
Comments
Do not attempt to parse HTML using regex! Try reading about XPath. Very helpful.
Although XPath will try by default to validate your document, but you can try HtmlCleaner to make it valid.
Comments
To extract text inside your tags use something like
String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
System.out.println( source.replaceAll( "^<\\s*html\\s*>(.*)<\\s*\\/html\\s*>$", "1ドル" ) );
// output is:
// The quick brown fox jumps over the brown lazy dog.
But try to avoid parsing of html by regexps. Read this topic.
Comments
This example may be helpful to you.
String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
String find = "\\<.*?>";
String replace = "";
Pattern pattern = Pattern.compile(find);
Matcher matcher = pattern.matcher(source);
String output = matcher.replaceAll(replace);
System.out.println("Source = " + source);
System.out.println("Output = " + output);
<html <!-- <html> --> >valid html?