4
\$\begingroup\$

I've created a JavaScript application to highlight the syntax of HTML and PHP. I know a lot of syntax highlighter are available nowadays, I just created to extend my knowledge in JS and regular expressions. I only wanted to know if its the right way to do this. (The code below works fine.)

js/codeHighlighter.js

function codeHighlighter(){
 var obj=document.getElementsByTagName("code");
 for(var i=0;i<obj.length;i++){
 var data=obj[i].innerHTML;
 data=data.replace(/&lt;(.*?)&gt;/g,"<span class='html-tag'>&lt;1ドル&gt;</span>");
 data=data.replace(/"(.*?)"/g,"<span class='string-value'>&quot;1ドル&quot;</span>");
 data=data.replace(/&lt;\?(.*?)\s/g,"<span class='php-tag'>&lt;?1ドル</span>");
 data=data.replace(/\s\?&gt;/g,"<span class='php-tag'>?&gt;</span>");
 data=data.replace(/\/\* (.*?) \*\//g,"<span class='comment'>/* 1ドル */</span>"); 
 data=data.replace(/(new|echo|print|while|for|foreach|class|public|function|static|protected|private|return|required|required_once|include|include_once)[^=]/g,"<span class='reserved'> 1ドル </span>");
 data=data.replace(/\\n/g,"<br/>");
 data=data.replace(/\\t/g,"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp");
 obj[i].innerHTML=data;
 }
}

index.html

<html>
 <head>
 <title>Code Highlighter</title>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <script src="js/codeHighlighter.js" type="text/javascript"></script>
 <script>
 window.addEventListener("load", codeHighlighter);
 </script>
 <style>
 code{
 font-family: arial;
 }
 .html-tag{
 color:#090;
 }
 .string-value{
 color:#900;
 }
 .reserved{
 color:#009;
 }
 .php-tag{
 color:#f00;
 }
 .comment{
 color:#444;
 }
 </style> 
 </head>
 <body>
 <div>This application highlights `php` and `html` code.</div>
 <code>
 /* A sample code. */\n
 &lt;div class="code" &gt;\n
 \t Hello!\n
 &lt;/div&gt;\n
 &lt;?php\n
 class Anish(){\n
 \n
 \t public function __construct(){\n
 \t\t return "Hello";\n
 \t }\n
 \n
 }\n
 $anish=new Anish();\n
 echo $anish;\n
 ?&gt;\n
 </code>
 </body>
</html>
200_success
146k22 gold badges190 silver badges479 bronze badges
asked Feb 6, 2016 at 18:41
\$\endgroup\$
4
  • \$\begingroup\$ what is yakk or antlr? can you give a url refrence to this? \$\endgroup\$ Commented Feb 6, 2016 at 19:17
  • 2
    \$\begingroup\$ Why don't you just use PHP's built-in syntax highlighting function? \$\endgroup\$ Commented Feb 6, 2016 at 19:40
  • \$\begingroup\$ Looks like there are some parser generators for JavaScript. PEG.js being one of them. I'm sure a bit of searching would produce grammars for both PHP & HTML. \$\endgroup\$ Commented Feb 6, 2016 at 19:55
  • \$\begingroup\$ Also, check out prism if you want a client-side solution. \$\endgroup\$ Commented Feb 6, 2016 at 21:51

1 Answer 1

4
\$\begingroup\$

Really accurate highlghting is a big challenge, and even if your implementation is not totally falsy it's very incomplete.

First some obvious points, easy to correct:

  • your PHP reserved words list lacks a number of words, such as (not exhaustively) global, const, if, else, switch, case, default, do, exit, break, continue try, catch, finally, ...
  • you look for PHP multiline comments like /*...*/ but not for simple line ones like //..., nor for HTML comments <!--...-->.
  • you look for double-quoted strings "..." but not for single-quoted ones '...'.

Now some harder issues:

  • you currently don't take care of escaped (single or double) quotes in a quoted string: "quote \" inside quoted string" breaks the highlighting.
  • you don't look for numbers (integer or float)

Lastly, not a lack but might be improved: you don't distinct between HTML tags and their attributes.

Please note that this is not to criticize your work! At the opposite, since you said it is:

to extend my knowledge in JS and regular expressions

I hope it encourages you to rise to the challenge :)


EDIT. Two points I forgot to mention above.

The first one comes from a preliminary general advice: try to follow best practices, notably in that, instead of:

for(var i=0;i<obj.length;i++){

you should write:

for (var i = 0; i < obj.length; i++) {

and so on evrywhere...

So here is the point since when looking for reserved words you wrote this:

data = data.replace(/(new|echo|...|include|include_once)[^=]/g, ...

There you added [^=] to avoid selecting something like $new=....
Right but regarding the above advice you must realize that one may have written $new = ... instead. Then you'll select new as a reserved word, while it's not!

So actually you'd better looking for a prepended $ rather than an appended =:

data = data.replace(/[^\$(new|echo|...|include|include_once)/g, ...

The other point is only for convenience: currently you force tab to be arbitrarily replaced by 4 spaces, which may sometimes be undesired. So you might merely write something like this:

function codeHighlighter(tab) {
 tab = tab ? tab : 4;
 for (var i = 0; i < obj.length; i++) {
 ...
 data=data.replace(/\\t/g, repeat("&nbsp;", tab);
 ...
 }
}
answered Feb 7, 2016 at 20:24
\$\endgroup\$
2
  • \$\begingroup\$ thanks very much! I'll try to overcome those problem.... \$\endgroup\$ Commented Feb 8, 2016 at 14:32
  • \$\begingroup\$ @AnishSilwal Glad to help. BTW, I remembered I'd forgotten something important: look at my edit. \$\endgroup\$ Commented Feb 8, 2016 at 19:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.