Sunday, December 30, 2007
JSON vs. XML DataContract Serialization: Download Test Harness
Rick Strahl's DataContractJsonSerializer in .NET 3.5 post of December 29, 2007 describes the .NET Framework 3.5's new DataContractJsonSerializer (DCJS) class from the System.Runtime.Serialization.Json.DataContractJsonSerializer namespace, which contains the serializer and related objects for the lightweight JavaScript Object Notation (JSON) transport (RFC 4627). JSON is one of ADO.NET Data Service's two wire formats, as noted in the "Julie Lerman: Astoria to Become ADO.NET Data Services" topic of Link and Entity Framework Posts for 12/10/2007+. (The Astoria Team intends to add plain old XML (POX) format to v1.0 by RTW.)
Quick Summary: JSON serialization with DCJS requires about 25% to about 40% fewer bytes than the XML DataContractSerializer (DCS) class to represent moderately complex objects, such as Northwind Order and their their associated Order_Detail objects, and takes about 25% longer than the DCS class to serialize and deserialize.
However, Windows Communication Foundation (WCF) serializes JSON streams to an internal XML Infoset wire format that requires about 50% more bytes than DCS to hydrate an object across a process boundary. Adding type="JSONDataType" attributes to each member's element contributes the additional bytes.
The added time to serialize a moderate-sized object collection and the dramatic increase in the size of WCF messages indicates that JSON serialization isn't as lightweight as one would expect from its simple syntax.
You can download the VB 9.0 code for the JsonSerializationWinVB.sln test harness project here. The project requires Visual Basic 2008 Express or Visual Studio 2008 Standard or higher but doesn't need a database connection. (Mock generic data objects are provided.)
The Test Harness
Rick's sample code uses simple, lightweight objects and I wanted to verify whether the DCJS class actually was lighter in weight and as performant or better than the .NET Fx 3.0's DataContractSerializer (DCS) class with more complex classes containing a variety of data types, including Nullable<T>, and generic collections. So I created the following two classes with the LINQ In-Memory Object Generation (LIMOG) utility that's described in my Serializing Object Graphs Without and With References post of November 21, 2007.
Here are C# versions of the class definitions for the Order and Order_Detail objects used in the tests:
[DataContract(Name = "Order", Namespace = "")] public class Order { [DataMember(Name = "OrderID", Order = 1)] public int OrderID { get; set; } [DataMember(Name = "CustomerID", Order = 2)] public string CustomerID { get; set; } [DataMember(Name = "EmployeeID", Order = 3)] public int? EmployeeID { get; set; } [DataMember(Name = "OrderDate", Order = 4)] public DateTime? OrderDate { get; set; } [DataMember(Name = "RequiredDate", Order = 5)] public DateTime? RequiredDate { get; set; } [DataMember(Name = "ShippedDate", Order = 6)] public DateTime? ShippedDate { get; set; } [DataMember(Name = "ShipVia", Order = 7)] public int? ShipVia { get; set; } [DataMember(Name = "Freight", Order = 8)] public decimal? Freight { get; set; } [DataMember(Name = "ShipName", Order = 9)] public string ShipName { get; set; } [DataMember(Name = "ShipAddress", Order = 10)] public string ShipAddress { get; set; } [DataMember(Name = "ShipCity", Order = 11)] public string ShipCity { get; set; } [DataMember(Name = "ShipRegion", Order = 12)] public string ShipRegion { get; set; } [DataMember(Name = "ShipPostalCode", Order = 13)] public string ShipPostalCode { get; set; } [DataMember(Name = "ShipCountry", Order = 14)] public string ShipCountry { get; set; } [DataMember(Name = "Order_Details", Order = 15)] public List<Order_Detail> Order_Details { get; set; } }
[DataContract(Name = "Order_Detail", Namespace = "")] public class Order_Detail { [DataMember(Name = "OrderID", Order = 1)] public int OrderID { get; set; } [DataMember(Name = "ProductID", Order = 2)] public int ProductID { get; set; } [DataMember(Name = "UnitPrice", Order = 3)] public decimal UnitPrice { get; set; } [DataMember(Name = "Quantity", Order = 4)] public short Quantity { get; set; } [DataMember(Name = "Discount", Order = 5)] public float Discount { get; set; } }
LIMOG adds the Name, Namespace, and Order attribute/value pairs; DCJS processes Name and Order attribute values and ignores Namespace values. I wrote custom SQL commands to generate object initializers for Northwind's last 20 Orders rows with their associated Order Details and Customer rows. A few of the orders are missing Nullable<DateTime> ShippedDate values.
Following are the test harness's four operating modes. Click the image for a full-size capture.
1. JSON (DCJS) Serialization with Parent (Order) Objects Only
Representative deserialized values appear below the column headers.
2. JSON (DCJS) Serialization with Parent and Child (Order_Detail) Objects
LINQ queries populate the 1:many Order_Details associations.
DateTime values serialize as "/Date(MillsecondsSince1970-01-01T00:00:00.000 +/- HHMM)/" strings, where HHMM is the offset from GMT in hours and minutes. (RFC 4627 doesn't specify a serialization format for date/time values, which has resulted in free-form encoding by various implementers. Stand-Alone JSON Serialization covers the current .NET implementation for ASP.NET AJAX services created in WCF.)
3. XML (DCS) Serialization with Parent (Order) Objects Only
4. XML (DCS) Serialization with Parent and Child (Order_Detail) Objects
XML Infoset Wire Format for WCF Implementations: Parent (Order) Objects
WCF implementations serialize JSON streams to a "internal" XML Infoset format defined by the Mapping Between JSON and XML white paper. The test harness uses the JsonReaderWriterFactory 's CreateJsonReader method to instantiate an XmlDictionaryReader object whose ReadOuterXml method delivers the wire format as a string.
The semi-formatted text of the Infoset follows the report of its length below the deserialized data:
Note: I could find no documentation for the JsonReaderWriterFactory's CreateJsonReader method or creating the required XmlDictionaryReader object; I believe that this project contains the only publicly available example. (Search Google for CreateJsonReader.) Here's an excerpt from the test harness code that's based in part on Rick Strahl's example:
Dim ser As DataContractJsonSerializer = _ New DataContractJsonSerializer(GetType(List(Of Order))) Dim ms1 As MemoryStream = New MemoryStream() ser.WriteObject(ms1, lstOrders) Dim json As String = Encoding.UTF8.GetString(ms1.ToArray()) txtBytes.Text = json.Length().ToString() ms1.Close() Dim ms2 As MemoryStream = New MemoryStream(Encoding.UTF8.GetBytes(json)) ' Deserialize into generic List ser = New DataContractJsonSerializer(GetType(List(Of Order))) Dim jsonOrders As List(Of Order) = _ TryCast(ser.ReadObject(ms2), List(Of Order)) ' Serialize JSON MemoryStream to WCS XML wire format Dim xdrJson As XmlDictionaryReader = _ JsonReaderWriterFactory.CreateJsonReader(ms2, _ XmlDictionaryReaderQuotas.Max) xdrJson.Read() Dim xml As String = xdrJson.ReadOuterXml() xdrJson.Close() ms2.Close()
XML Infoset Wire Format for WCF Implementations: Parent and Child (Order_Detail) Objects
As Rick mentioned in a comment to his post:
@Roger - awesome work checking out the perf considerations. I suppose it's to be expected that JSON is slower since the parsing of the JSON objects is necessarily a bit more complex than parsing the more XML so I'm not terribly surprised that it's slower than XML.
OTOH, you're not likely to use this on two way WCF communications, but only for AJAX/REST scenarios coming from a browser most likely and in that scenario there's not much choice since JSON is so much easier to use on the client than XML.
Atom10FeedFormatter<TSyndicationFeed> and Atom10ItemFormatter<TSyndicationItem> generic objects (for classes derived from SyndicationFeed) are likely to be used for two-way WCF communications with ADO.NET data services. Magnus Mårtensson shows you in his Create your own Syndication Feeds with .NET Framework 3.5 post of November 22, 2007 how to use these two objects (and their RSS 2.0 equivalents) to create and serialize SyndicationFeeds. Guy Burstein offers a similar demonstration in his How To: Create a RSS Syndication Service with WCF post of December 3, 2007.
Added: 1/1/2008
Size of Serialized Messages and Serialize/Deserialize Execution Times
Following is a summary table of the data collected for the four operating modes. Message sizes don't include whitespace. Execution times are the average of five tests conducted by reopening the form for each execution (to eliminate the effects of data caching.)
The preceding data was for a Gateway S-5200D with a dual-core Pentium 2.6 GHz processor running Windows Vista Premium as a virtual client with 1GB RAM assigned under a Window 2005 R2 Virtual Server.
Download the VB 9.0 source code for the JsonSerializationWinVB.sln test harness project here and give it a try on your development machine.
Updated 12/31/2007: Minor edits and additions
Posted by Roger Jennings (--rj) at 5:01 PM 0 comments
Labels: ADO.NET Data Services, Ajax, Astoria, DataContract, DataContractJsonSerializer, DataContractSerializer, LINQ, LINQ to Objects, Mocking Frameworks, Northwind, Test Harnesses, VS 2008
Tuesday, October 09, 2007
LINQ to SQL and Entity Framework XML Serialization Issues with WCF - Part 1
Setting LINQ to SQL's Serialization property value for entities to Unidirectional in the O/R Designer's properties window for the entity enables Windows Communication Foundation's default DataContractSerializer (DCS). The DCS implements the Shared Contract mode, which (in Aaron Skonnard's words) "shares the schema contract across the wire." An alternative NetDataContractSerializer (NDCS) shares .NET Framework types and implements Shared Type mode, which isn't interoperable. The third alternative is .NET 2.0 and earlier's XMLSerializer.
However DCS doesn't maintain full object-graph schema fidelity of the original entity's properties. You lose EntityRefs for m:1 associations if your object has EntitySets for 1:n associations that share a common property. In this case, you have a circular reference or cycle. The DCS doesn't serialize circular references by default, so these EntityRef properties lose their [DataMember] attribute on the server and the [System.Runtime.Serialization.DataMemberAttribute()] decoration on the client.
The DCS also adds an ExtensionData property of the ExtensionData type to support round-tripping with new service versions that have added properties. The property stores any data from future versions of the data contract that is unknown to the current version.
LINQ to SQL automatically adds the EntityRef value and you usually can ignore ExtensionData property. This article offers a link to a complex workaround to enable serializing EntityRefs for m:1 associations. This doesn't fully conform ServiceName.EntityName objects sent to and returned from service clients with the original DataContextName.EntityName objects because you can't remove the ExtensionData property.
The Entity Framework (EF) and Entity Data Model (EDM) support only binary serialization of entities and make no provision whatsoever for XML serialization of entity relationships. Julie Lerman addresses this issue in her XML Serializing Entity Framework entities with their children for SOA post of October 2, 2007. Julie proposes constructing a facade from the original entities and decorating the entity and member with [DataContract] and [DataMember] attributes to enable WCF serialization. It remains to be seen if WCF can handle cycles in facades; this is a more serious issue for the EF, because EntityRefs replace—rather than supplement—foreign-key values.
You must construct a similar facade to retain original entity values for updates to LINQ to SQL entities that require value-based concurrency management. Mike Taulty mentions this problem in his Disconnected LINQ to SQL post:
I had to write a function to copy a customer record so that I can maintain a "current" value and an "original" value (I didn't include this code as it's tedious - sometimes I find myself wishing that ICloneable was a bit more central in the .NET world.)
Adding a timestamp property to each entity eliminates the need to construct facades to hold original entity values. A timestamp property also lets you attach an entity to a DataContext with the Table<TEntity>.Attach(true) overload that specifies the entity as modified.
WCF's default configuration uses the WsHttpBinding binding to implement WS-Reliable Messaging and WS-Security specifications for message reliability, security and authentication. HTTP is the transport, messages are encoded as Text/XML and secured by SOAP message security; the SOAP body is encrypted and digitally signed by default. Message security and authentication are relatively fragile in development environments, especially with Windows Vista and virtual machines that you access by the Remote Desktop service.
Message and transport security should be applied after you get your app running on the server with basicHttpBinding, which implements the Web Service Interoperabilty (WS-I) Council's Basic Profile 1.1 Second Edition. You can change to WsHttpBinding by entries in host and client configuration file(s): app.config, AppName.exe.config, or web.config).
Fighting the Attached Associated Objects Bug
There is a serious bug in LINQ to SQL Beta 2 that causes invocation of the Table<TEntity>.Attach() method or its Table<TEntity>.Attach(ModifiedEntity, OriginalEntity)overload to correctly attach the root object as unmodified but perform inserts on the persistence tables for all associated entities that are equally unmodified. The bug was originally reported by an anonymous developer as Feedback 295402 Incorrect Update Behavior - Attaching an Entity after Calling a Relationship Property of 8/27/2007, which was closed as [to be] Fixed on 9/5/2007. Here's what C# program manager Alex Turner says about the bug:
Thanks for reporting this issue you've encountered with Visual Studio 2008! We were able to reproduce your issue after all! It turns out that our Attach() logic was adding the single attached object (the Product) to the object cache, but not any related objects (the Category) that had not been in the DataContext's object graph before. When SubmitChanges() found these objects, it did not know about them and thus assumed they had been added since the last query (otherwise the DataContext would have been the one to have materialized them and they would be known). We've now added an intermediate state for such objects that are related to attached objects so that they are assumed to already exist in the database (just like the attached object itself), and they will not be inserted during SubmitChanges. If you later add more related objects after Attach, they will be considered new objects as normal and queued for insertion.
Obviously there is no way that any two-level parent-child UI, such as the test harness below, could be considered close to operable with this problem.
Note: What's interesting about this third "intermediate state" is that Hibernate has three object states: Transient, Persistent, and Detached (see the diagram in section 4.1.1 transient objects on page 140 (Chapter 4) of Hibernate in Action by Christian Bauer and Gavin King (Manning, 2005). Hibernate has an evict() methodmethod to detach objects from the Persistent (attached) to the Detached state, whereas LINQ to SQL has no Table<TEntity>.Detach() method.
This is a client test harness for a self-hosted WCF service that has a LINQ to SQL data access layer (DAL) which retrieves the Customer, Order and two Order_Details entities from the persisting store. Changes have been made to the ContactName (Bogus -> Bogosian), EmployeeID (1 -> 2), and ProductID (1 -> 3 and 2 -> 4) property values:
Click image for full-size screen capture.
Here's the result of invoking SubmitChanges() with the above edits. Instead of applying changes to the OrderID 11217 Order entity and its two Order_Details entities, attaching the root entity adds a new Order 11218 with two new Order_Details entities having the modified ProductID values:
Click image for full-size screen capture.
Note: Rick Strahl reported the same bug in his Complex Detached Entities in LINQ to SQL - more Nightmares post of October 1, 2007 and suggested a workaround. I intend to test Rick's approach and will update this post with the result.
Incentives for Use of WCF for Serializing Business Objects
.NET Framework 3.0 added support for Windows Communication Foundation (WCF), together with Windows Presentation Foundation (WPF) and Windows Workflow Foundation (WF) to .NET Fx 2.0 and Windows Vista. .NET Fx 3.5 adds a few new features to these three technologies. WCF, formerly code-named Indigo, replaces conventional ASP.NET (.asmx) Web services, Web Services Extensions (WSE), .NET Remoting and Enterprise Services with a unified distributed communication technology.
WCF's Serialization Peformance
WCF provides significant performance improvements over the technologies it replaces. So it's a good bet that all .NET developers who attempt to maintain partition their object/relational mapping (O/RM) applications into multiple tiers will attempt to use WCF and XML serialization for messaging across process boundaries. WCF also enables more efficient (and thus performant) binary serialization for intranet applications that don't traverse firewalls.
The DataContractSerializer
WCF provides the DataContractSerializer (DCS) as an optional replacement for the traditional XmlSerializer of .NET Fx 2.0 and earlier. According to Microsoft UK's James World, DCS adds these features:
- Hooks are providing for refining control of (de)serialization - particularly useful for handling versioning issues. By applying any of four special attributes to methods in the target class you can have them called either before or after (de)serialization.
- The serializer is "opt-in" rather than "opt-out" - which makes (imho) for much cleaner code. In XmlSerializer you could use XmlIgnore to have the serializer ignore certain properties. With the DCS you explicitly mark what you want to serialize.
- Finally, ANY field or property can be serialized - even if they are marked private.
The "four special attributes to methods" are the following DataContractSerializer properties:
- MaxItemsInObjectGraph is a configurable property that "specifies the maximum number of items allowed in an object." The default is 65,536 (0x7FFF). Documentation says the units are bytes, which conflicts with the term Items.
- IgnoreExtensionDataObject is a configurable property that "gets or sets a value that specifies whether to send unknown serialization data onto the wire. The default value is false. Setting the value to true doesn't make the ExtensionData property disappear; it just prevents sending data, if any exists, to the client.
- PreserveObjectReferences is a non-configurable property that "gets a value that specifies whether to use non-standard XML constructs to preserve object reference data," specifically EntityRefs. The default value is false.
- DataContractSurrogate is a non-configurable property that's "designed to be used for type customization and substitution in situations where users want to change how a type is serialized, deserialized or projected into metadata."
You can set configurable property values in a configuration file (app.config, AppName.exe.config or web.config) or code in the WCF host application; setting non-configurable properties requires code.
Working Around the Cyclic Relationship Problem
If you must persist both m:1 and n:1 associations, you can patch your host code to set the PreserveObjectReferences property value to true, but doing so adds non-standard <id> and <idRef> elements to the message body. Sowmy Srinivasan provides a code example for Preserving Object Reference in WCF (March 26, 2006) that you can add to pass an instance of the DataContractSerializer with the PreserveObjectReferences property value to true to the WCF runtime for both server and client components. Microsoft's WCF team wants to discourage users from XML-encoding cycles, so you must write a custom behavior instead of setting an attribute value for the PreserveObjectReferences property value. (According to Aaron Skonnard, the same is true for substituting the NCDS for CDS.)
Stay tuned for Part 2, which will cover the code required to add, update and delete a top-level entity that might work as expected in the Visual Studio 2008 RTM version.
Updated 8/10/2007: Minor additions and clarifications.
Posted by Roger Jennings (--rj) at 4:42 PM 0 comments
Labels: DataContract, DataContractSerializer, Entity Data Model, Entity Framework, Entity Framework Beta 2, LINQ, LINQ to SQL