I would like to show an XML tree as a tree using GraphViz Dot tool. Then I must convert it to a DOT file.
It is what I have tried
string dot = "digraph G {" + Environment.NewLine;
XML2DOT(XmlRoot, "r");
dot += Environment.NewLine + "}";
....
private void XML2DOT(XmlNode n, string id)
{
if (n == null)
return;
string NodeName = n.Name.Replace("-", "_");
if (n.HasChildNodes)
{
dot += Environment.NewLine + NodeName
+ id + "[label=\"" + NodeName + "\"];";
}
else
{
dot += Environment.NewLine
+ "leaf" + id + "[label=\"" + n.InnerText + "\"];";
}
int i = 0;
foreach (XmlNode item in n.ChildNodes)
{
string cid = id + i;
dot += Environment.NewLine + NodeName + id + " -> "
+ (item.HasChildNodes ? item.Name.Replace("-", "_") : "leaf")
+ cid + ";";
XML2DOT(item, cid);
i++;
}
}
The input xml is:
<?xml version="1.0" encoding="UTF-8"?>
<S>
<MN clitic="empty" ne_sort="pers">
<PUNC clitic="empty">
<w clitic="empty" gc="Ox" lc="Ox" lemma="#">#</w>
</PUNC>
<MN clitic="empty" ne_sort="pers">
<N clitic="ezafe">
<w clitic="ezafe" gc="Apsz ; Nasp--- ; Nasp--z ; Ncsp--z" lc="Nasp--z" lemma="مسعود" n_type="prop" ne_sort="pers">مسعود</w>
</N>
<N clitic="ezafe">
<w clitic="ezafe" gc="Apsy ; Nasp--- ; Nasp--z ; Ncsp--y" lc="Nasp--z" lemma="شجاعی" n_type="prop" ne_sort="pers">شجاعی</w>
</N>
<N clitic="empty">
<w clitic="empty" gc="Nasp--- ; Nasp--z" lc="Nasp---" lemma="طباطبایی" n_type="prop" ne_sort="pers">طباطبایی</w>
</N>
</MN>
<PUNC clitic="empty">
<w clitic="empty" gc="Ox" lc="Ox" lemma="#">#</w>
</PUNC>
</MN>
</S>
And it is what I get:
It works as expected, just would like to know a more general and efficient way to do that.
2 Answers 2
There's an XML syntax for DOT called DotML at http://martin-loetzsch.de/DOTML/. Generating a generic DotML tree from XML is very straightforward using XSLT. Most of it is just:
<xsl:template match="*">
<node id="{generate-id()}" label="{name()}"/>
<edge from="{generate-id(..)}" to="{generate-id()}"/>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()">
<node id="{generate-id()}" label="{.}"/>
<edge from="{generate-id(..)}" to="{generate-id()}"/>
</xsl:template>
You should not concatenate strings like this, it's inefficient. Instead, use a StringBuilder
.
I'm unclear on how is the method supposed to be used. In the usage, it looks like dot
is a local variable, while in the method itself, it looks like it's a field. If you want to stick with string
, return it from the method. If you're going to switch to StringBuilder
, accept it as a parameter.
Also, the whole process, including digraph
, should be encapsulated into a method.
The scheme you're using for producing unique node IDs does not actually produce IDs that are unique. For example r + 1 + 0
produces the same ID as r + 10
.
A simple solution would be to have a single i
for the whole conversion.
You have some small pieced of code that repeat. Especially if you can use C# 6.0 expression bodied methods, I would extract them into separate methods.
Your code does not handle XML elements with no content (e.g. <elem />
) well: it presents them as empty text nodes.
What does cid
mean? Child ID? Don't abbreviate names unnecessarily.
The common convention in C# is to use camelCase for local variables, you mostly follow that, except for NodeName
, which should be nodeName
.
Also, 3-letter abbreviations are almost always not fully capitalized (e.g. it's XmlNode
, not XMLNode
), you should follow that convention too: Xml2Dot
instead of XML2DOT
.
And XML2DOT
is not a great method name either. Method names should usually be verbs.
The final result could look like this:
class XmlToDotConverter
{
private const string Leaf = "leaf";
private readonly StringBuilder stringBuilder = new StringBuilder();
private int i;
public string Convert(XmlDocument document)
{
stringBuilder.Clear();
stringBuilder.AppendLine("digraph G {");
ConvertNode(document.DocumentElement, i);
stringBuilder.Append('}');
return stringBuilder.ToString();
}
private static string ConvertName(XmlNode node)
=> node.Name.StartsWith("#") ? Leaf : node.Name.Replace("-", "_");
private void AppendNode(string name, int id, string text)
{
stringBuilder.AppendLine($"{name}{id}[label=\"{text}\"];");
// or the following, if you care about performance more than about readability
//stringBuilder
// .Append(name)
// .Append(id)
// .Append("[label=\"")
// .Append(text)
// .Append("\"];")
// .AppendLine();
}
private void AppendEdge(string fromName, int fromId, string toName, int toId)
=> stringBuilder.AppendLine($"{fromName}{fromId} -> {toName}{toId};");
private void ConvertNode(XmlNode node, int id)
{
if (node.NodeType == XmlNodeType.Element)
{
string nodeName = ConvertName(node);
AppendNode(nodeName, id, nodeName);
foreach (XmlNode childNode in node.ChildNodes)
{
i++;
int childId = i;
AppendEdge(nodeName, id, ConvertName(childNode), childId);
ConvertNode(childNode, childId);
}
}
else if (node.NodeType == XmlNodeType.Text)
{
AppendNode(Leaf, id, node.InnerText);
}
}
}