Understanding C#: Simple LINQ to XML examples (tutorial)

By Andrew Stellman
October 16, 2010 | Comments: 3

XML is one of the most popular formats for files and data streams that need to represent complex data. The .NET Framework gives you some really powerful tools for creating, loading, and saving XML files. And once you've got your hands on XML data, you can use LINQ to query anything from data that you created to an RSS feed.

Head First C# Cover

In this post, I'll show you two simple LINQ to XML tutorial style examples that highlight basic patterns that you can use to create or query XML data using LINQ to XML:

  • In the first example you'll create XML data, write it to disk, read it back, and then query it using LINQ.
  • In the second example you'll use a LINQ query to read data from an RSS feed.

The goal is to get you started quickly by giving you a few quick and simple LINQ to XML examples and patterns.

Note: The examples here are based on ones we used in the "leftovers" appendix in Head First C#, 2nd ed. If you want to read more about LINQ to XML, Microsoft has a lot of great documentation about it on MSDN. You can read more about LINQ to XML and classes in the System.Xml.Linq namespace here: http://msdn.microsoft.com/en-us/library/bb387098.aspx.


Example #1: Building XML data, writing it to a file, and querying it with LINQ

The LINQ to XML classes live in the System.Xml.Linq namespace, which has a very handy object for processing XML: the XDocument class, which represents an XML document. There's a lot of depth to an XDocument (no pun intended), but the easiest way to get a handle on it is to see a simple XDocument example.

This example uses the XDocument and XElement classes to create XML data and save it to a file (or print it to the console). Then we'll use LINQ to query the XML data. Finally, we'll read RSS data from a blog using an XDocument object, and use LINQ to XML to turn it into a sequence of our own objects that we can code against.

Structure your data using an XDocument

Start by creating a new Console application in Visual Studio. Add using System.Xml.Linq; to the top of Program.cs:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

Then add this GetStarbuzzData() method. This method uses an XDocument and XElement objects to generate an XML document. This particular XML document stores the same kind of Starbuzz customer loyalty data we used as sample data in Head First C#.

XML isn't the only way to represent this data. You can see this same data as a SQL database in the Head First C# Free C# eBook [PDF] preview, which contains complete chapters.

/// <summary>
/// Get the Starbuzz customer data as an XDocument
/// </summary>
/// <returns>XDocument with the Starbuzz data</returns>
static XDocument GetStarbuzzData()
{
     /*
         * You can use an XDocument to create an XML file, and that includes XML
         * files you can read and write using DataContractSerializer.
         *
         * An XMLDocument object represents an XML document. It's part of the
         * System.Xml.Linq namespace.
         *
         * Use XElement objects to create elements under the XML tree.
         */
 
     XDocument doc = new XDocument(
         new XDeclaration("1.0", "utf-8", "yes"),
         new XComment("Starbuzz Customer Loyalty Data"),
         new XElement("starbuzzData",
             new XAttribute("storeName", "Park Slope"),
             new XAttribute("location", "Brooklyn, NY"),
             new XElement("person",
                 new XElement("personalInfo",
                     new XElement("name", "Janet Venutian"),
                     new XElement("zip", 11215)),
                 new XElement("favoriteDrink", "Choco Macchiato"),
                 new XElement("moneySpent", 255),
                 new XElement("visits", 50)),
             new XElement("person",
                 new XElement("personalInfo",
                     new XElement("name", "Liz Nelson"),
                     new XElement("zip", 11238)),
                 new XElement("favoriteDrink", "Double Cappuccino"),
                 new XElement("moneySpent", 150),
                 new XElement("visits", 35)),
             new XElement("person",
                 new XElement("personalInfo",
                     new XElement("name", "Matt Franks"),
                     new XElement("zip", 11217)),
                 new XElement("favoriteDrink", "Zesty Lemon Chai"),
                 new XElement("moneySpent", 75),
                 new XElement("visits", 15)),
             new XElement("person",
                 new XElement("personalInfo",
                     new XElement("name", "Joe Ng"),
                     new XElement("zip", 11217)),
                 new XElement("favoriteDrink", "Banana Split in a Cup"),
                 new XElement("moneySpent", 60),
                 new XElement("visits", 10)),
             new XElement("person",
                 new XElement("personalInfo",
                     new XElement("name", "Sarah Kalter"),
                     new XElement("zip", 11215)),
                 new XElement("favoriteDrink", "Boring Coffee"),
                 new XElement("moneySpent", 110),
                 new XElement("visits", 15))));
     return doc;
}
Save and load XML files

The XDocument object's Load() and Save() methods read and write XML files. And its ToString() method renders everything inside it as one big XML document. Add this code to the Main() entry point method:

/// <summary>
/// Save the Starbuzz data to an XML file and load it again.
/// </summary>
/// <param name="filename">Filename to write the data to</param>
static void SaveDataToAnXmlFile(string filename)
{
     XDocument doc = GetStarbuzzData();
     doc.Save(filename);
}

If you want to load data, you'd do this:

XDocument anotherDoc = XDocument.Load("starbuzzdata.xml");

Here's a useful tip. Once you've got your document in an XDocument object, calling its ToString() method will return the XML in one big string:

string xmlData = anotherDoc.ToString();

NOTE: In this tutorial, I'm just showing you how to save and load XML data using an XDocument object. But you don't necessarily need to use files—once you've got your data in an XDocument object, it's all ready for querying, no matter where you got it. Also, take a minute and open the XML file in a text editor once you've saved it out to disk. Go ahead and give that a try now—it will help show you exactly what's going on.

Query your data

Here's a method with two LINQ to XML queries against the the Starbuzz data. The first one is simple - it just selects person and pulls the drink, money spent, and zip code into an anonymous object. Notice how it uses the XDocument.Descendants() method. That method looks through the XDocument and all of its child nodes - the descendants - and returns them in document order. When you pass it a name, it filters the list.

One important thing to keep in mind about XDocument.Descendants() is that is uses deferred execution. That means it returns a sequence (an IEnmerable, to be specific), but it doesn't actually descend through the XML document and find all of the descendants until its iterator is executed. If you use a foreach loop to iterate through the descendants, each iteration only reads to the next descendant.

The second query in the method is a little more complex. It counts the number of people in each zip code, using a group clause to group the data, and a foreach loop to print the count and the key (the zip code) for each group.

/// <summary>
/// Query the data and print the results to the console
/// <param name="doc">XDocument with Starbuzz customer loyalty data loaded</param>
/// </summary>
static void QueryTheData(XDocument doc)
{
     // Do a simple query and print the results to the console
     var data = from item in doc.Descendants("person")
                 select new
                 {
                      drink = item.Element("favoriteDrink").Value,
                      moneySpent = item.Element("moneySpent").Value,
                      zipCode = item.Element("personalInfo").Element("zip").Value
                  };
     foreach (var p in data)
         Console.WriteLine(p.ToString());
 
     // Do a more complex query and print the results to the console
     var zipcodeGroups = from item in doc.Descendants("person")
                         group item.Element("favoriteDrink").Value
                         by item.Element("personalInfo").Element("zip").Value
                             into zipcodeGroup
                             select zipcodeGroup;
     foreach (var group in zipcodeGroups)
         Console.WriteLine("{0} favorite drinks in {1}",
                         group.Distinct().Count(), group.Key);
}
Put it all together

Here's the Main() method for your program:

static void Main(string[] args)
{
     // Save the Starbuzz data
     SaveDataToAnXmlFile("starbuzzdata.xml");
 
     // Read the XML data from starbuzzdata.xml
     XDocument starbuzzData = XDocument.Load("starbuzzdata.xml");
 
     // Query the data that was loaded
     QueryTheData(starbuzzData);
 
     // Don't quit until the user presses a key (just to make it easier to run in the
     // Visual Studio debugger -- since this is a learning exercise)
     Console.ReadKey();
}

And here's what it looks like when it runs:

Screenshot - Linq Starbuzz.png


Example #2: Using LINQ to read XML data from an RSS feed

You can do some pretty powerful things with LINQ to XML, because so much data is stored and transmitted as XML. Like RSS feeds, for example! Open up any RSS feed - like this one from our blog, Building Better Software - and view its source, and you'll see XML data. And that means you can read it into an XDocument and query it with LINQ.

One nice thing about the XDocument.Load() method is that when you pass it a string, you're giving it a URI. A lot of the time, you'll just pass it a simple filename. But a URL will work equally well. Here's how you can read the title of a blog from its RSS feed, using the <rss>, <channel>, and <title> tags:

XDocument ourBlog = XDocument.Load("http://www.stellman-greene.com/feed");
Console.WriteLine(ourBlog.Element("rss").Element("channel").Element("title").Value);

That means it's easy to write a LINQ to XML query to read data from an RSS feed. Here's how we'll do it:

  1. Create a new console application
  2. Make sure you've got using System.Xml.Linq; at the top of the code
  3. We'll use XDocument.Load() to load the XML data from the URL.
  4. A simple LINQ query can extract the articles into instances of a Post class that we'll create
  5. Instead of using anonymous types, the select new clause will select new Post objects

When you use the XDocument.Element() method, you're really calling the Element() method of its base class, XContainer. The XElement class that use used earlier also extends XContainer, and the Element() method returns an XContainer.

We'll take advantage of that by creating a Post class with a constructor that takes an XContainer object and uses its Element() method to get values. Note its GetElementValue() method that either returns an element's Value or, if that element doesn't exist, returns an empty string. (Again, remember to add using System.Xml.Linq; to the top of the code, for both this and the Main() method below!)

/// <summary>
/// A Post object represents a single RSS post read from XML data
/// </summary>
class Post
{
     public string Title { get; private set; }
     public DateTime? Date { get; private set; }
     public string Url { get; private set; }
     public string Description { get; private set; }
     public string Creator { get; private set; }
     public string Content { get; private set; }
 
     private static string GetElementValue(XContainer element, string name)
     {
          if ((element == null) || (element.Element(name) == null))
              return String.Empty;
          return element.Element(name).Value;
      }
 
     public Post(XContainer post)
     {
          // Get the string properties from the post's element values
          Title = GetElementValue(post, "title");
          Url = GetElementValue(post, "guid");
          Description = GetElementValue(post, "description");
          Creator = GetElementValue(post, 
              "{http://purl.org/dc/elements/1.1/}creator");
          Content = GetElementValue(post, 
              "{http://purl.org/dc/elements/1.0/modules/content}encoded");
  
          // The Date property is a nullable DateTime? -- if the pubDate element
          // can't be parsed into a valid date, the Date property is set to null
          DateTime result;
          if (DateTime.TryParse(GetElementValue(post, "pubDate"), out result))
              Date = (DateTime?)result;
      }
 
     public override string ToString()
     {
          return String.Format("{0} by {1}", Title ?? "no title", Creator ?? "Unknown");
      }
}

Did you notice how the Post constructor passes uses "{http://purl.org/dc/elements/1.1/}creator" as the name for creator? If you go back to the RSS feed source and search for "creator", you'll find a tag that looks like this:

<dc:creator>Andrew Stellman</dc:creator>

See that "dc:"? At the top of the post, the tag has this attribute:

xmlns:dc="http://purl.org/dc/elements/1.1/"

That's an XML namespace. Put them together and you'll get the element's complete name:

{http://purl.org/dc/elements/1.1/}creator

Now you're ready for the LINQ query. Notice how it uses select new Post(post) to pass each XElement returned by ourBlog.Descendants("item") into the Post constructor.

static void Main(string[] args)
{
     // Load the blog posts and print the title of the blog
     XDocument ourBlog = XDocument.Load("http://www.stellman-greene.com/feed");
     Console.WriteLine(ourBlog
         .Element("rss")
         .Element("channel")
         .Element("title")
         .Value);
 
     // Query the <item>s in the XML RSS data and select each one into a new Post()
     IEnumerable<Post> posts =
         from post in ourBlog.Descendants("item")
         select new Post(post);
 
     // Print each post to the console
     foreach (var post in posts)
         Console.WriteLine(post.ToString());
}

When you run your program, it connects to the blog, retrieves the RSS feed, and prints the list of articles to the console.

Andrew Stellman is the author of Head First C# and other books from O'Reilly. You can read more from Andrew at Building Better Software.


You might also be interested in:

3 Comments

Whew, it's almost like speaking another language. Great article.

Well thanks for the article i need to use this for my job.

THanks a lot for this article man! Very very useful! Cheers!

News Topics

Recommended for You

Got a Question?