A very long time ago, I started development of a code generator for HTML. And some time ago I rewrote it and published it under LGPL v3.0 on Sourceforge.net.
It is very universal because one of its classes ([ElementListSetting][1]
) reads DTD files and extracts elements from them.
Currently, this class creates only a single associative array of elements that allows only self-validation of closed and empty elements - but not validation of children/parent elements. I would like to rewrite this class to prepare it for supporting such validation.
The full code of the class is long and may be read behind the link above (placed on name of class handling DTD). But here is the main part that is the matter of the discussion:
private function Get_ElementList($File="")
{
/*
* reads file;
* converts new lines;
* splits file content
*/
$File = file_get_contents($File);
$File = str_replace("\r\n", "\n", $File);
$File = str_replace("\r", "\n", $File);
$File = explode("\n", $File);
foreach($File AS $Line)
{
/*
* if element definition is detected
*/
if(preg_match(self::MARC_PATTERN_DEFINITION_ANYELEMENT, $Line))
{
/*
* splits line text
*/
$Line_Elements = preg_split('/ /', $Line);
/*
* if empty element definition is detected
*/
if(in_array("EMPTY", $Line_Elements) || preg_match(self::MARC_PATTERN_DEFINITION_EMPTYELEMENT, $Line))
{
$Replace_What = array("\n", "\t", "(", ")");
$Replace_With = array("", "", "", "");
$Line_Elements = preg_split('/\|/', str_replace($Replace_What, $Replace_With, $Line_Elements[1]));
/*
* sets found element into list of available elements
*/
foreach($Line_Elements AS $Element)
{
if(!preg_match(self::MARC_PATTERN_DEFINITION_ENTITY, $Element))
{
self::$List_AvailableElements[strtolower(trim($Element))] = strtolower(trim($Element));
}
}
}
/*
* if closed element definition is detected
*/
else
{
$Replace_What = array("\n", "\t", "(", ")");
$Replace_With = array("", "", "", "");
$Line_Elements = preg_split('/\|/', str_replace($Replace_What, $Replace_With, $Line_Elements[1]));
/*
* sets found element into list of available elements
*/
foreach($Line_Elements AS $Element)
{
if(!preg_match(self::MARC_PATTERN_DEFINITION_ENTITY, $Element))
{
self::$List_AvailableElements[strtolower(trim($Element))] = "/".strtolower(trim($Element));
}
}
}
}
}
}
The code is currently running as originally intended. It reads parent elements and creates a simple associative array with items like [p] => '/p'
or [input] => 'input'
.
But it should also read children elements and add them to the associative array. And here is the point where I need help or hints on how to read them because some of them were (in original DTD files) written as entities.
Fo explanation, here is interface with constants used in code above.
Expressions constants
interface I_MarC_Expressions_ElementListSetting
{
const MARC_PATTERN_DEFINITION_ANYELEMENT = '/<!ELEMENT/';
const MARC_PATTERN_DEFINITION_EMPTYELEMENT = '/<!ELEMENT (.*) EMPTY>/i';
const MARC_PATTERN_DEFINITION_ENTITY = '/%(.*)\;/i';
}
Lines in file depend opn current line as they are extracted from provided DTD file. So, here is example how they can be (borrowed from dtd file xhtml1-transitional.dtd
)
[123] => '<!ELEMENT head (%head.misc;,'
[124] => '((title, %head.misc;, (base, %head.misc;)?)'
[125] => '(base, %head.misc;, (title, %head.misc;))))>'
...
<!ATTLIST head
%i18n;
id ID #IMPLIED
profile %URI; #IMPLIED
>
<!ELEMENT meta EMPTY>
<!ELEMENT title (#PCDATA)>
<!ELEMENT object (#PCDATA | param | %block; | form | %inline; | %misc;)*>
1 Answer 1
With hint offered by Samurai8 (and minor view into XML_DTD package available on PEAR), I rewrote function Get_ElementList
to it would parse given DTD file not line by line but definition by definition.
And code of this function got to be much shorter and cleaner, even if part of parsing was separated into own functions.
private function Get_ElementList($File="")
{
/*
* reads file;
* replaces unwanted "spaces"
*/
$File = file_get_contents($File);
$File = preg_replace("/[[:cntrl:]]+/", "", $File);
/*
* searches for elements;
* searches for entities
*/
$Result = preg_match_all(self::MARC_PATTERN_DEFINITION_ELEMENT_NAMECONTENT, $File, $Elements, PREG_SET_ORDER);
$Result = preg_match_all(self::MARC_PATTERN_DEFINITION_ENTITY_BLOCK, $File, $Entities, PREG_SET_ORDER);
/*
* converts entities to usable form
*/
$this -> Convert_SimplifyEntities($Entities);
$this -> Convert_PrepareEntities($Entities);
/*
* iterates list of elements and sets closing part of element and usable siblings
*/
foreach ($Elements as $Element)
{
self::$List_AvailableElements[$Element['ElementName']]['ClosingPart'] = ($Element['ElementSetting'] == MarC::MARC_OPTION_EMPTY ? $Element['ElementName'] : '/'.$Element['ElementName']);
self::$List_AvailableElements[$Element['ElementName']]['Siblings'] = $this -> Get_Siblings($Element['ElementSetting'], $Entities);
}
}
With this condition of function, that allows (relatively easy) further development, I could go on but I decided to don't do that.
Recursion I was afraid of, was avoided. But it took some time to find how to replace entities used inside else entities correctly.
I found XML_DTD long time ago, when I was beginning with programming in PHP. Because I had not (much) experience with OOP (and still I use only few posibilities of OOP) I decided to write my own parser for DTD files, with array as output format of list of elements.
And now, this project (XML_DTD) is abandoned and obsolete (?outdated), because it is written in PHP4.
So, I made at least a view into its code for solution of extraction of entities. Because my original expression was ignoring some entities (I don't know why - I checked all that could cause that problem).
Explore related questions
See similar questions with these tags.
$File
(even when analyzing the code); 2) how appears a children element. So could you add an example of source$File
lines for these two cases, and the values of theMARC_PATTERN_DEFINITION_...
constants? \$\endgroup\$