Storing .NET objects in cookies part 1 – compact serialization with binary XML

I was recently faced with the following challenge in an ASP.NET application. I have a data contract object that describes a user, which I have to store in a cookie for authentication and other purposes. Forms authentication is perfect for this: the cookie is cryptographically protected, the authentication is very easy to configure and it can store arbitrary user data as well. Problem is, as you might guess, this cookie would be quite big. You might ask, why don’t I store this object in the session? Because I wanted a session-less application. Or why don’t I store only a user ID in the cookie? Because gathering the user information is quite an expensive task. So the challenge was to serialize this object in the most compact way.

The object

Let’s see how an object looks like:

public class User
    public long Id { get; set; }
    public string LoginName { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Email { get; set; }
    public string PhoneNumber { get; set; }
    public int RoleFlags { get; set; }
    public string OrganizationName { get; set; }
    public long OrganizationId { get; set; }
var user = new User
    Id = 1234567,
    LoginName = "john.leader",
    FirstName = "John",
    LastName = "Leader",
    Email = "",
    PhoneNumber = "+1234567890",
    RoleFlags = 0x24,
    OrganizationName = "New York Department",
    OrganizationId = 67896789

Naive method – DataContractSerializer

Here’s how you serialize this thing normally:

var dcs = new DataContractSerializer(typeof(User));
var sw = new StringWriter();
using (var xw = new XmlTextWriter(sw))
    dcs.WriteObject(xw, user);
var s = sw.ToString();

This gives a length of 431 characters (all ASCII, so also 431 bytes). Not much for a cookie, but in a forms authentication ticket, it’s about 5 times as much (more on that in part 2), which can be a problem.

Binary serialization

My second idea was to mark the object as [Serializable] and use the BinaryFormatter, like this:

var bw = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
var ms = new MemoryStream();
bw.Serialize(ms, user);
var s = Convert.ToBase64String(ms.ToArray());

Sadly, this gives a binary length of 512, which is 684 characters in Base64. There’s still a lot of metadata in it.

Alternative methods

The problem with the above methods is that they store the names of properties along with the raw data. I could have developed a custom serialization method with a BinaryWriter, but that’s hard to maintain, so I gave Protocol Buffers a try. Binary length 100, Base64 length 136, that’s quite an improvement. This is because it outputs very short tag identifiers instead of property names, so the output’s size is very close to the raw size of the data, and also to what I could have achieved with a BinaryWriter. But I decided against it. I didn’t want to add another dependency for a project just for this one usage, and I didn’t want to change our MDG generated data classes to make them suitable for this purpose. And of course, I was looking for a challenge :) .

Binary XML serialization

Then I found out about binary XML. Here’s the first try:

var dcs = new DataContractSerializer(typeof(User));
var ms = new MemoryStream();
using (var xdw = XmlDictionaryWriter.CreateBinaryWriter(ms, new XmlDictionary()))
    dcs.WriteObject(xdw, user);
var s = Convert.ToBase64String(ms.ToArray());

Binary length 308, Base64 length 412. Not much of an improvement over the text version (431), but the cool thing is that this thing stuff learn, and when it sees some data for the 2nd time, it only writes a short reference. Write the object twice, and it would be quite small, right? Wrong, because the XmlDictionary implementation can’t learn automatically. Even more sad is the fact that even if you teach it manually (as in, you add the names of your properties to it), it won’t make a difference because it only recognizes the exact same XmlDictionaryString instances that you add to it, and not the ones that it gets from the serializer.

So here’s my great idea: make an XML dictionary that can learn automatically. Whenever the serializer looks up a string, and it’s not found in the dictionary, it is added automatically. Here is the source code: LearningXmlDictionary.cs. Here’s how to use it:

var dcs = new DataContractSerializer(typeof(User));
var lxd = new LearningXmlDictionary();
lxd.LearningMode = true;
using (var xdw = XmlDictionaryWriter.CreateBinaryWriter(Stream.Null, lxd))
    dcs.WriteObject(xdw, new User());
lxd.LearningMode = false;
var ms = new MemoryStream();
using (var xdw = XmlDictionaryWriter.CreateBinaryWriter(ms, lxd))
    dcs.WriteObject(xdw, user);
var s = Convert.ToBase64String(ms.ToArray());

First, you put the dictionary in learning mode and serialize an empty sample object (or several of them). Then you start serializing your real object just like above. The result: binary length 128, Base64 length 172! Sweet :) . It is pretty close to what Protocol Buffers can do, but it only needs core .NET stuff and a little trick.

You need to be very careful, though. When two learning XML dictionaries don’t have the exact same knowledge, all hell breaks loose, like a login name ends up in the phone number property, there is no built in validation against that. So if you issue a ticket, upgrade your application, the data class changes a little, and a user returns with an old ticket, then you have a problem. First, make sure that the dictionary is always taught the exact same way, and don’t leave it in learning mode after that. Second, if the class changes, make sure that you create a new encryption key for the forms ticket (this is the default by the way) or add some version information in your class as a new property (changed metadata, like namespace, doesn’t work, since it doesn’t appear in the serialized data).

Here is a chart to sum it up:

Cookie seralization 1

You will see even better improvements in part 2.

Leave a Reply