4
\$\begingroup\$

I would like to compress a Magento HTML page using some regex, and this is what I have written:

function html_compress($string){
 global $idarray;
 $idarray=array();
 //Replace PRE and TEXTAREA tags
 $search=array(
 '@(<)\s*?(pre\b[^>]*?)(>)([\s\S]*?)(<)\s*(/\s*?pre\s*?)(>)@', //Find PRE Tag
 '@(<)\s*?(textarea\b[^>]*?)(>)([\s\S]*?)(<)\s*?(/\s*?textarea\s*?)(>)@' //Find TEXTAREA
 );
 $string=preg_replace_callback($search,
 function($m){
 $id='<!['.uniqid().']!>';
 global $idarray;
 $idarray[]=array($id,$m[0]);
 return $id;
 },
 $string
 );
 //Remove blank useless space
 $search = array(
 '@( |\t|\f)+@', // Shorten multiple whitespace sequences
 '@(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+@', //Remove blank lines
 '@^(\s)+|( |\t|0円|\r\n)+$@' //Trim Lines
 );
 $replace = array(' ',"\1円",'');
 $string = preg_replace($search, $replace, $string);
 //Replace IE COMMENTS, SCRIPT, STYLE and CDATA tags
 $search=array(
 '@<!--\[if\s(?:[^<]+|<(?!!\[endif\]-->))*<!\[endif\]-->@', //Find IE Comments
 '@(<)\s*?(script\b[^>]*?)(>)([\s\S]*?)(<)\s*?(/\s*?script\s*?)(>)@', //Find SCRIPT Tag
 '@(<)\s*?(style\b[^>]*?)(>)([\s\S]*?)(<)\s*?(/\s*?style\s*?)(>)@', //Find STYLE Tag
 '@(//<!\[CDATA\[([\s\S]*?)//]]>)@', //Find commented CDATA
 '@(<!\[CDATA\[([\s\S]*?)]]>)@' //Find CDATA
 );
 $string=preg_replace_callback($search,
 function($m){
 $id='<!['.uniqid().']!>';
 global $idarray;
 $idarray[]=array($id,$m[0]);
 return $id;
 },
 $string
 );
 //Remove blank useless space
 $search = array(
 '@(class|id|value|alt|href|src|style|title)=(\'\s*?\'|"\s*?")@', //Remove empty attribute
 '@<!--([\s\S]*?)-->@', // Strip comments except IE
 '@[\r\n|\n|\r]@', // Strip break line
 '@[ |\t|\f]+@', // Shorten multiple whitespace sequences
 '@(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+@', //Remove blank lines
 '@^(\s)+|( |\t|0円|\r\n)+$@' //Trim Lines
 );
 $replace = array(' ','',' ',' ',"\1円",'');
 $string = preg_replace($search, $replace, $string);
 //Replace unique id with original tag
 $c=count($idarray);
 for($i=0;$i<$c;$i++){
 $string = str_replace($idarray[$i][0], "\n".$idarray[$i][1]."\n", $string);
 }
 return $string;
}

It works, but I have got some concerns:

  • Has it got sense to explit all the \s*? between the tags and retrieve with this (<)\s*?(style\b[^>]*?)(>)?
  • Does this script eat resources and considerably delay the page load? Is there any possible optimization?
  • Is the part that remove the white space redundant?
  • Is all of this "cacheable"?
asked Dec 20, 2014 at 10:42
\$\endgroup\$
5
  • 1
    \$\begingroup\$ stackoverflow.com/a/1732454/736079 \$\endgroup\$ Commented Dec 20, 2014 at 10:45
  • \$\begingroup\$ Already read (also on differents site), but since I thought I don't have to do impossible tasks, maybe, perhaps I could give a try \$\endgroup\$ Commented Dec 20, 2014 at 10:47
  • \$\begingroup\$ Why not use an existing minifier for HTML that is proven to be working? With regex it will be neigh impossible to catch all the strange corner cases that can crop up. \$\endgroup\$ Commented Dec 20, 2014 at 11:12
  • \$\begingroup\$ @jessehouwing could you give me an example? Because what I have found wasn't satisfaing me or I can't use it on my server... \$\endgroup\$ Commented Dec 20, 2014 at 11:53
  • \$\begingroup\$ You will likely lose a lot more response time based on the minification process at runtime of the server. Use GZIP. If you can't determine the HTML beforehand (i.e, it's dynamically generated), then you're probably sod-out-of-luck. Minifying HTML at runtime is going to add significant overhead to your application. \$\endgroup\$ Commented Sep 2, 2015 at 10:18

1 Answer 1

3
\$\begingroup\$

See: https://stackoverflow.com/a/6225706/736079

For comments on enabling content compression for your HTML pages, that usually is enough to reduce the payload by more than 50%.

You can also make use of output buffering and combine it with the HTMLMinify function:

<?php
function sanitize_output($content) {
 $content = Minify_HTML::minify($content);
}
ob_start("sanitize_output");
?>

See: https://github.com/mrclay/minify/blob/master/min/lib/Minify/HTML.php

This still uses Regex at the core, which is still not ideal, but it has been tested by a larger audience and looks quite solid (test to make sure). If you are hosting on IIS, you might be able to use a .NET HttpModule or an ISAPI filter as well. This isn't limited to PHP only, sometimes the Web Server itself has plugins that can help you, like Apache's mod_pagespeed.

answered Dec 20, 2014 at 12:06
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.