2
\$\begingroup\$

I would like to show an XML tree as a tree using GraphViz Dot tool. Then I must convert it to a DOT file.

It is what I have tried

 string dot = "digraph G {" + Environment.NewLine;
 XML2DOT(XmlRoot, "r");
 dot += Environment.NewLine + "}";
 ....
 private void XML2DOT(XmlNode n, string id)
 {
 if (n == null)
 return;
 string NodeName = n.Name.Replace("-", "_");
 if (n.HasChildNodes)
 {
 dot += Environment.NewLine + NodeName 
 + id + "[label=\"" + NodeName + "\"];";
 }
 else
 {
 dot += Environment.NewLine 
 + "leaf" + id + "[label=\"" + n.InnerText + "\"];";
 }
 int i = 0;
 foreach (XmlNode item in n.ChildNodes)
 {
 string cid = id + i;
 dot += Environment.NewLine + NodeName + id + " -> " 
 + (item.HasChildNodes ? item.Name.Replace("-", "_") : "leaf") 
 + cid + ";";
 XML2DOT(item, cid);
 i++;
 }
 }

The input xml is:

<?xml version="1.0" encoding="UTF-8"?>
<S>
<MN clitic="empty" ne_sort="pers">
 <PUNC clitic="empty">
 <w clitic="empty" gc="Ox" lc="Ox" lemma="#">#</w>
 </PUNC>
 <MN clitic="empty" ne_sort="pers">
 <N clitic="ezafe">
 <w clitic="ezafe" gc="Apsz ; Nasp--- ; Nasp--z ; Ncsp--z" lc="Nasp--z" lemma="مسعود" n_type="prop" ne_sort="pers">مسعود</w>
 </N>
 <N clitic="ezafe">
 <w clitic="ezafe" gc="Apsy ; Nasp--- ; Nasp--z ; Ncsp--y" lc="Nasp--z" lemma="شجاعی" n_type="prop" ne_sort="pers">شجاعی</w>
 </N>
 <N clitic="empty">
 <w clitic="empty" gc="Nasp--- ; Nasp--z" lc="Nasp---" lemma="طباطبایی" n_type="prop" ne_sort="pers">طباطبایی</w>
 </N>
 </MN>
 <PUNC clitic="empty">
 <w clitic="empty" gc="Ox" lc="Ox" lemma="#">#</w>
 </PUNC>
</MN>
</S>

And it is what I get:

enter image description here

It works as expected, just would like to know a more general and efficient way to do that.

asked Jul 25, 2016 at 12:37
\$\endgroup\$

2 Answers 2

6
\$\begingroup\$

There's an XML syntax for DOT called DotML at http://martin-loetzsch.de/DOTML/. Generating a generic DotML tree from XML is very straightforward using XSLT. Most of it is just:

<xsl:template match="*">
 <node id="{generate-id()}" label="{name()}"/>
 <edge from="{generate-id(..)}" to="{generate-id()}"/>
 <xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()">
 <node id="{generate-id()}" label="{.}"/>
 <edge from="{generate-id(..)}" to="{generate-id()}"/>
</xsl:template>
answered Jul 25, 2016 at 14:48
\$\endgroup\$
1
\$\begingroup\$

You should not concatenate strings like this, it's inefficient. Instead, use a StringBuilder.


I'm unclear on how is the method supposed to be used. In the usage, it looks like dot is a local variable, while in the method itself, it looks like it's a field. If you want to stick with string, return it from the method. If you're going to switch to StringBuilder, accept it as a parameter.

Also, the whole process, including digraph, should be encapsulated into a method.


The scheme you're using for producing unique node IDs does not actually produce IDs that are unique. For example r + 1 + 0 produces the same ID as r + 10.

A simple solution would be to have a single i for the whole conversion.


You have some small pieced of code that repeat. Especially if you can use C# 6.0 expression bodied methods, I would extract them into separate methods.


Your code does not handle XML elements with no content (e.g. <elem />) well: it presents them as empty text nodes.


What does cid mean? Child ID? Don't abbreviate names unnecessarily.


The common convention in C# is to use camelCase for local variables, you mostly follow that, except for NodeName, which should be nodeName.

Also, 3-letter abbreviations are almost always not fully capitalized (e.g. it's XmlNode, not XMLNode), you should follow that convention too: Xml2Dot instead of XML2DOT.

And XML2DOT is not a great method name either. Method names should usually be verbs.


The final result could look like this:

class XmlToDotConverter
{
 private const string Leaf = "leaf";
 private readonly StringBuilder stringBuilder = new StringBuilder();
 private int i;
 public string Convert(XmlDocument document)
 {
 stringBuilder.Clear();
 stringBuilder.AppendLine("digraph G {");
 ConvertNode(document.DocumentElement, i);
 stringBuilder.Append('}');
 return stringBuilder.ToString();
 }
 private static string ConvertName(XmlNode node)
 => node.Name.StartsWith("#") ? Leaf : node.Name.Replace("-", "_");
 private void AppendNode(string name, int id, string text)
 {
 stringBuilder.AppendLine($"{name}{id}[label=\"{text}\"];");
 // or the following, if you care about performance more than about readability
 //stringBuilder
 // .Append(name)
 // .Append(id)
 // .Append("[label=\"")
 // .Append(text)
 // .Append("\"];")
 // .AppendLine();
 }
 private void AppendEdge(string fromName, int fromId, string toName, int toId)
 => stringBuilder.AppendLine($"{fromName}{fromId} -> {toName}{toId};");
 private void ConvertNode(XmlNode node, int id)
 {
 if (node.NodeType == XmlNodeType.Element)
 {
 string nodeName = ConvertName(node);
 AppendNode(nodeName, id, nodeName);
 foreach (XmlNode childNode in node.ChildNodes)
 {
 i++;
 int childId = i;
 AppendEdge(nodeName, id, ConvertName(childNode), childId);
 ConvertNode(childNode, childId);
 }
 }
 else if (node.NodeType == XmlNodeType.Text)
 {
 AppendNode(Leaf, id, node.InnerText);
 }
 }
}
answered Jul 27, 2016 at 17:18
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.