I've created a JavaScript application to highlight the syntax of HTML and PHP. I know a lot of syntax highlighter are available nowadays, I just created to extend my knowledge in JS and regular expressions. I only wanted to know if its the right way to do this. (The code below works fine.)
js/codeHighlighter.js
function codeHighlighter(){
var obj=document.getElementsByTagName("code");
for(var i=0;i<obj.length;i++){
var data=obj[i].innerHTML;
data=data.replace(/<(.*?)>/g,"<span class='html-tag'><1ドル></span>");
data=data.replace(/"(.*?)"/g,"<span class='string-value'>"1ドル"</span>");
data=data.replace(/<\?(.*?)\s/g,"<span class='php-tag'><?1ドル</span>");
data=data.replace(/\s\?>/g,"<span class='php-tag'>?></span>");
data=data.replace(/\/\* (.*?) \*\//g,"<span class='comment'>/* 1ドル */</span>");
data=data.replace(/(new|echo|print|while|for|foreach|class|public|function|static|protected|private|return|required|required_once|include|include_once)[^=]/g,"<span class='reserved'> 1ドル </span>");
data=data.replace(/\\n/g,"<br/>");
data=data.replace(/\\t/g,"  ");
obj[i].innerHTML=data;
}
}
index.html
<html>
<head>
<title>Code Highlighter</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="js/codeHighlighter.js" type="text/javascript"></script>
<script>
window.addEventListener("load", codeHighlighter);
</script>
<style>
code{
font-family: arial;
}
.html-tag{
color:#090;
}
.string-value{
color:#900;
}
.reserved{
color:#009;
}
.php-tag{
color:#f00;
}
.comment{
color:#444;
}
</style>
</head>
<body>
<div>This application highlights `php` and `html` code.</div>
<code>
/* A sample code. */\n
<div class="code" >\n
\t Hello!\n
</div>\n
<?php\n
class Anish(){\n
\n
\t public function __construct(){\n
\t\t return "Hello";\n
\t }\n
\n
}\n
$anish=new Anish();\n
echo $anish;\n
?>\n
</code>
</body>
</html>
-
\$\begingroup\$ what is yakk or antlr? can you give a url refrence to this? \$\endgroup\$Anish Silwal– Anish Silwal2016年02月06日 19:17:03 +00:00Commented Feb 6, 2016 at 19:17
-
2\$\begingroup\$ Why don't you just use PHP's built-in syntax highlighting function? \$\endgroup\$r3mainer– r3mainer2016年02月06日 19:40:44 +00:00Commented Feb 6, 2016 at 19:40
-
\$\begingroup\$ Looks like there are some parser generators for JavaScript. PEG.js being one of them. I'm sure a bit of searching would produce grammars for both PHP & HTML. \$\endgroup\$RubberDuck– RubberDuck2016年02月06日 19:55:10 +00:00Commented Feb 6, 2016 at 19:55
-
\$\begingroup\$ Also, check out prism if you want a client-side solution. \$\endgroup\$r3mainer– r3mainer2016年02月06日 21:51:31 +00:00Commented Feb 6, 2016 at 21:51
1 Answer 1
Really accurate highlghting is a big challenge, and even if your implementation is not totally falsy it's very incomplete.
First some obvious points, easy to correct:
- your PHP reserved words list lacks a number of words, such as (not exhaustively)
global
,const
,if
,else
,switch
,case
,default
,do
,exit
,break
,continue
try
,catch
,finally
, ... - you look for PHP multiline comments like
/*...*/
but not for simple line ones like//...
, nor for HTML comments<!--...-->
. - you look for double-quoted strings
"..."
but not for single-quoted ones'...'
.
Now some harder issues:
- you currently don't take care of escaped (single or double) quotes in a quoted string:
"quote \" inside quoted string"
breaks the highlighting. - you don't look for numbers (integer or float)
Lastly, not a lack but might be improved: you don't distinct between HTML tags and their attributes.
Please note that this is not to criticize your work! At the opposite, since you said it is:
to extend my knowledge in JS and regular expressions
I hope it encourages you to rise to the challenge :)
EDIT. Two points I forgot to mention above.
The first one comes from a preliminary general advice: try to follow best practices, notably in that, instead of:
for(var i=0;i<obj.length;i++){
you should write:
for (var i = 0; i < obj.length; i++) {
and so on evrywhere...
So here is the point since when looking for reserved words you wrote this:
data = data.replace(/(new|echo|...|include|include_once)[^=]/g, ...
There you added [^=]
to avoid selecting something like $new=...
.
Right but regarding the above advice you must realize that one may have written $new = ...
instead. Then you'll select new
as a reserved word, while it's not!
So actually you'd better looking for a prepended $
rather than an appended =
:
data = data.replace(/[^\$(new|echo|...|include|include_once)/g, ...
The other point is only for convenience: currently you force tab
to be arbitrarily replaced by 4 spaces, which may sometimes be undesired. So you might merely write something like this:
function codeHighlighter(tab) {
tab = tab ? tab : 4;
for (var i = 0; i < obj.length; i++) {
...
data=data.replace(/\\t/g, repeat(" ", tab);
...
}
}
-
\$\begingroup\$ thanks very much! I'll try to overcome those problem.... \$\endgroup\$Anish Silwal– Anish Silwal2016年02月08日 14:32:03 +00:00Commented Feb 8, 2016 at 14:32
-
\$\begingroup\$ @AnishSilwal Glad to help. BTW, I remembered I'd forgotten something important: look at my edit. \$\endgroup\$cFreed– cFreed2016年02月08日 19:59:34 +00:00Commented Feb 8, 2016 at 19:59
Explore related questions
See similar questions with these tags.