Mono BLOG

Blog about Microsoft technologies (.NET, ASP.NET Core, Blazor, EF Core, WPF, TypeScript, etc.)

Efficient XML Parsing in C#: A Performance Comparison

in .NET

I recently encountered a task at work involving reading and writing XML, which led me to explore the performance implications of different approaches. I'd like to share my findings with you.

For our demonstration, we'll use a sample XML text from W3Schools:

<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
  <food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
    <calories>650</calories>
  </food>
  <!-- ... three <food> elements omitted for brevity ... -->
  <food>
    <name>Homestyle Breakfast</name>
    <price>$6.95</price>
    <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
    <calories>950</calories>
  </food>
</breakfast_menu>

Our goal is to read the values of all <name> elements and store them in a List<string>. Let's implement this using several different methods.

Using XmlDocument

XmlDocument represents the traditional, DOM-based approach. It can be used in a couple of ways: one is by using methods like GetElementsByTagName to traverse the document tree, and the other is by using XPath expressions to select nodes directly. Let's look at the first method:

public List<string> XmlDocument()
{
    var doc = new XmlDocument();
    doc.LoadXml(testXml);
    return doc
        .GetElementsByTagName("food")
        .OfType<XmlNode>()
        .Select(node => node["name"]!.InnerText)
        .ToList();
}

Now, let's try it with an XPath expression:

public List<string> XmlDocumentXPath()
{
    var doc = new XmlDocument();
    doc.LoadXml(testXml);
    return doc
        .SelectNodes("//food/name")
        .OfType<XmlNode>()
        .Select(node => node.InnerText)
        .ToList();
}

Using System.Xml.Linq

System.Xml.Linq (XDocument) offers a more modern and developer-friendly API. Here's how to implement our task with it:

public List<string> XDocument()
{
    var doc = XDocument.Parse(testXml);
    return doc
        .Root
        .Elements("food")
        .Select(node => node.Element("name")!.Value)
        .ToList();
}

While XDocument also supports XPath, its fluent API is already so concise that using XPath often isn't necessary for simpler queries.

Using XmlReader

XmlReader is a forward-only, stream-based parser. While its API is more complex to use, it is extremely efficient in terms of speed and memory.

public List<string> XmlReader()
{
    using var stringReader = new StringReader(testXml);
    using var xmlReader = System.Xml.XmlReader.Create(stringReader);
    var res = new List<string>(8);
    while (xmlReader.Read())
    {
        if (xmlReader.IsStartElement() && xmlReader.Name == "name")
        {
            res.Add(xmlReader.ReadElementContentAsString());
        }
    }
    return res;
}

Using Regular Expressions (Regex)

Since our task is relatively simple and the XML structure is predictable, we can also use regular expressions.

public List<string> Regex()
{
    var matches = System.Text.RegularExpressions.Regex.Matches(testXml, @"<name>(.*?)</name>");
    return matches.Select(match => match.Groups[1].Value).ToList();
}

Using Traditional String Methods

Finally, we can fall back on basic string manipulation methods.

public List<string> StringOps()
{
    var res = new List<string>(8);
    int cur = 0;
    while (true)
    {
        // Find the next <name> tag
    	int idx = testXml.IndexOf("<name>", cur);
        // If not found, we're done
    	if (idx < 0)
    		break;
        // Find the corresponding </name> tag
    	int end = testXml.IndexOf("</name>", idx + 6);
    	res.Add(testXml.Substring(idx + 6, end - idx - 6));
        // Continue searching from the end of the current tag
    	cur = end + 7;
    }
    return res;
}

Spoiler alert: this method is surprisingly inefficient, much slower than the other approaches. This leads us to our secret weapon: Span<T>.

Using Span<T>

Span<T>, introduced in C# 7.2, allows for highly efficient, allocation-free memory manipulation.

public List<string> SpanOps()
{
    var res = new List<string>(8);
    var span = testXml.AsSpan();
    while (true)
    {
        int idx = span.IndexOf("<name>");
        if (idx < 0)
            break;
        // Slice from after "<name>" and find the closing tag
        var contentSlice = span.Slice(idx + 6);
        int end = contentSlice.IndexOf("</name>");
        // Add the content to the list
        res.Add(contentSlice.Slice(0, end).ToString());
        // Move the span past the content we just processed
        span = contentSlice.Slice(end + 7);
    }
    return res;
}

Performance Test

We used BenchmarkDotNet to test the performance of these methods. Here are the results:

CaseMeanMinMaxAllocated Bytes
XmlDocument10.65 µs10.40 µs10.79 µs17,608
XPath5.74 µs5.72 µs5.76 µs18,912
LINQ3.95 µs3.94 µs3.97 µs15,054
XmlReader3.04 µs3.02 µs3.06 µs11,776
Regex2.13 µs2.11 µs2.14 µs2,632
StringOps37.16 µs37.08 µs37.32 µs448
SpanOps134.48 ns133.89 ns135.11 ns448

So, were you impressed by the power of Span<T>? It's in a league of its own, reaching nanosecond-level performance. From these results, we can draw a few conclusions:

  • If the content we need to extract isn't complex and the structure is consistent, we can use Regex for a very fast and low-allocation solution.
  • When the task requires robust XML parsing, we should rely on dedicated XML APIs. Their performance ranking is: XmlReader > XDocument (LINQ) > XmlDocument.
  • From a practical standpoint, XDocument is often the best choice. It's significantly faster than the traditional XmlDocument and not much slower than XmlReader, but with a much more pleasant and productive API, making it the optimal choice for most scenarios.
  • Using Span<T> can lead to massive performance gains, especially when your logic involves frequent string operations like IndexOf and slicing (Substring). It's the clear winner for performance-critical scenarios that can be solved with string manipulation.

Tags:

C#.Net