I'm working on an application that allows users edit/fix XML. A part of this is to format the XML for better readability.
As the XML might be invalid, the existing methods I found for formatting (like XmlWriter
or XDocument
) don't work for me.
There might be all sorts of problems with the XML, although the most common is unescaped special characters.
public static string FormatXml(string xml)
{
var tags = xml
.Split('<')
.Select(tag => tag.TrimEnd().EndsWith(">") ? tag.TrimEnd() : tag); //Trim whitespace between tags, but not at the end of values
var previousTag = tags.First(); //Preserve content before the first tag, e.g. if the initial < is missing
var formattedXml = new StringBuilder(previousTag);
var indention = 0;
foreach (var tag in tags.Skip(1))
{
if (previousTag.EndsWith(">"))
{
formattedXml.AppendLine();
if (tag.StartsWith("/"))
{
indention = Math.Max(indention - 1, 0);
formattedXml.Append(new string('\t', indention));
}
else
{
formattedXml.Append(new string('\t', indention));
if (!tag.EndsWith("/>"))
{
indention++;
}
}
}
else
{
indention = Math.Max(indention - 1, 0);
}
formattedXml.Append("<");
formattedXml.Append(tag);
previousTag = tag;
}
return formattedXml.ToString();
}
Sofar the method produces reasonable output for all cases I came up with.
I'm mostly worried that I missed some special cases of valid XML that would get messed up.
1 Answer 1
There's a test suite of 2000 test cases available at https://www.w3.org/XML/Test/ - try it out.
From a quick glance, it's not clear to me how you're handling content within comments or CDATA sections - which might be well-formed XML, or it might be something approximating to well-formed XML.
Another comment is that messing with whitespace is dangerous in mixed content. With inline markup (bold, italic etc) preserving whitespace as written may be important.
-
\$\begingroup\$ +1 I have a look a the test cases. Mixed content might be problematic. In my specific use case it's not a concern, but generally my code would need some major cases to handle this. \$\endgroup\$raznagul– raznagul2020年12月03日 10:23:33 +00:00Commented Dec 3, 2020 at 10:23
xml
passed to the method before or after the user edit the xml? \$\endgroup\$