Serialization -
The process of converting an object into a stream of bytes. This stream of bytes can be persisted. Deserialization is an opposite process, which involves
converting a stream of bytes into an object.
Serialization is used usually during remoting (while transporting objects) and to persist file objecst & database objects.
.NET provides 2 ways for serializtion
1) XmlSerializer and 2) BinaryFormatter/SoapFormatter
The XmlSerializer is used for Web Services. The BinaryFormatter & SoapFormatter is used for Remoting. While using
XmlSerializer, it is required that the target class has parameter less
constructors, has public read-write properties and has fields that can be serialized. The XmlSerializer has good support for
XML documents. It can be used to construct objects from existing XML documents. The XmlSerializer enables us to serialize and deserialize objects to an XML format.
SoapFormatter enables us to serialize & deserialize objects to SOAP format. They can serialize private and public fields of a class. The target class must be marked with the Serializable attribute. On deserialization, the constructor of the
new object is not invoked.
The
BinaryFormatter has the same features as the SoapFormatter except that it formats data into binary format. The BinaryForamatter (and the SoapFormatter) has two main methods. Serialize and Deserialize. To serialize an object, we pass an instance of the stream and the object to the Serialize method. To Deserialize an object, you pass an instance of a stream to the Deserialize method.
You can use the BinaryFormatter to serialize many, but not all, classes in the .
NET Framework. For example, you can serialize ArrayLists, DataSets, and Arrays but not other objects, such as DataReaders or
TextBox controls. To serialize a class, the class must have the Serializable attribute or implement the ISerializable interface.
Note that the XmlSerializer captures only the public members of the class, whereas the BinaryFormatter & the SoapFormatter captures both the public & private members of the class. The output using the BinaryFormatter is quite compact, as the information is in binary format, whereas the XmlSerializer format is filled with
XML tags. See example below...
Imports
System.IO
Imports System.Runtime.Serialization.Formatters.Binary
Dim colArrayList As ArrayListDim
objFileStream As FileStreamDim
objBinaryFormatter As BinaryFormattercolArrayList = New ArrayList()
colArrayList.Add( "Whisky")
colArrayList.Add( "Vodka")
colArrayList.Add( "Brandy")
objFileStream = New FileStream(MapPath("C:\myArrayList.data"), FileMode.Create)
objBinaryFormatter = New BinaryFormatterobjBinaryFormatter.Serialize(objFileStream, colArrayList)objFileStream.Close()
Here we see that an instance of the file stream (objFileStream) and an instance of the object (colArrayList) is passed to the Serialize method of the BinaryFormatter object (objBinaryFormatter). We also end up creating a file by the name myArrayList.data on our hard-drive.In order to deserialize an object, see the code below…
Dim colArrayList As ArrayListDim objFileStream As FileStreamDim
objBinaryFormatter As BinaryFormatterDim strItem As String
objFileStream = New FileStream( MapPath("myArrayList.data"), FileMode.Open )
objBinaryFormatter = New BinaryFormatter
colArrayList = CType( objBinaryFormatter.Deserialize( objFileStream ), ArrayList )
objFileStream.Close()
For Each strItem In colArrayList
Response.Write( " " & strItem )
Next
Here, CType takes in two parameters, the first parameter is the serialized object in the file stream format, and the second parameter is the desired type. Finally, the page iterates through all the elements of the ArrayList and displays the value of each element.
XmlSerializer does not serialize instances of classes like Hashtable which implement the IDictionary interface.
Serializable - This is a class attribute. When we use this attribute with a class, an instance of this class can be taken in whatever state it is, and write it to a disk. The class can then be deserialized, and the class will act as if it is simply stored in the memory.
Summary: Why would you want to use serialization? The two most important reasons are to persist the state of an object to a storage medium so an exact copy can be recreated at a later stage, and to send the object by value from one application domain to another. For example, serialization is used to save session state in ASP.NET and to copy objects to the clipboard in Windows Forms. It is also used by remoting to pass objects by value from one application domain to another. This article provides an overview of the serialization used in the Microsoft .NET Framework. (9 printed pages)
Introduction
Persistent Storage
Marshal By Value
Basic Serialization
Selective Serialization
Custom Serialization
Steps in the Serialization Process
Versioning
Serialization Guidelines
Introduction
Serialization can be defined as the process of storing the state of an object instance to a storage medium.
During this process, the public and private fields of the object and the name of the class, including the assembly containing the class, is converted to a stream of bytes, which is then written to a data stream. When the object is subsequently deserialized, an exact clone of the original object is created.
When implementing a serialization mechanism in an object-oriented environment, you have to make a number of tradeoffs between ease of use and flexibility. The process can be automated to a large extent, provided you are given sufficient control over the process. For example, situations may arise where simple binary serialization is not sufficient, or there might be a specific reason to decide which fields in a class need to be serialized. The following sections examine the robust serialization mechanism provided with the .NET Framework and highlight a number of important features that allow you to customize the process to meet your needs.
Persistent Storage
It is often necessary to store the value of fields of an object to disk and then retrieve this data at a later stage. Although this is easy to achieve without relying on serialization, this approach is often cumbersome and error prone, and becomes progressively more complex when you need to track a hierarchy of objects. Imagine writing a large business application containing many thousands of objects and having to write code to save and restore the fields and properties to and from disk for each object. Serialization provides a convenient mechanism for achieving this objective with minimal effort.
The Common Language Runtime (CLR) manages how objects are laid out in memory and the .NET Framework provides an automated serialization mechanism by using reflection. When an object is serialized, the name of the class, the assembly, and all the data members of the class instance are written to storage. Objects often store references to other instances in member variables. When the class is serialized, the serialization engine keeps track of all referenced objects already serialized to ensure that the same object is not serialized more than once. The serialization architecture provided with the .NET Framework correctly handles object graphs and circular references automatically. The only requirement placed on object graphs is that all objects referenced by the object that is being serialized must also be marked as
Serializable (see
Basic Serialization). If this is not done, an exception will be thrown when the serializer attempts to serialize the unmarked object.
When the serialized class is deserialized, the class is recreated and the values of all the data members are automatically restored.
Marshal By Value
Objects are only valid in the application domain where they are created. Any attempt to pass the object as a parameter or return it as a result will fail unless the object derives from
MarshalByRefObject or is marked as
Serializable. If the object is marked as
Serializable, the object will automatically be serialized, transported from the one application domain to the other, and then deserialized to produce an exact copy of the object in the second application domain. This process is typically referred to as marshal by value.
When an object derives from
MarshalByRefObject, an object reference will be passed from one application domain to another, rather than the object itself. You can also mark an object that derives from
MarshalByRefObject as
Serializable. When this object is used with remoting, the formatter responsible for serialization, which has been preconfigured with a
SurrogateSelector takes control of the serialization process and replaces all objects derived from
MarshalByRefObject with a proxy. Without the
SurrogateSelector in place, the serialization architecture follows the standard serialization rules (see
Steps in the Serialization Process) below.
Basic Serialization
The easiest way to make a class serializable is to mark it with the
Serializable attribute as follows:
[Serializable]
public class MyObject {
public int n1 = 0;
public int n2 = 0;
public String str = null;
}
The code snippet below shows how an instance of this class can be serialized to a file:
MyObject obj = new MyObject();
obj.n1 = 1;
obj.n2 = 24;
obj.str = "Some String";
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin",
FileMode.Create,
FileAccess.Write, FileShare.None);
formatter.Serialize(stream, obj);
stream.Close();
This example uses a binary formatter to do the serialization. All you need to do is create an instance of the stream and the formatter you intend to use, and then call the
Serialize method on the formatter. The stream and the object instance to serialize are provided as parameters to this call. Although this is not explicitly demonstrated in this example, all member variables of a class will be serialized, even variables marked as private. In this aspect, binary serialization differs from the XML Serializer, which only serializes public fields.
Restoring the object back to its former state is just as easy. First, create a formatter and a stream for reading, and then instruct the formatter to deserialize the object. The code snippet below shows how this is done.
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin",
FileMode.Open,
FileAccess.Read,
FileShare.Read);
MyObject obj = (MyObject) formatter.Deserialize(fromStream);
stream.Close();
// Here's the proof
Console.WriteLine("n1: {0}", obj.n1);
Console.WriteLine("n2: {0}", obj.n2);
Console.WriteLine("str: {0}", obj.str);
The
BinaryFormatter used above is very efficient and produces a very compact byte stream. All objects serialized with this formatter can also be deserialized with it, which makes it an ideal tool for serializing objects that will be deserialized on the .NET platform. It is important to note that constructors are not called when an object is deserialized. However, this violates some of the usual contracts the run time makes with the object writer, and developers should ensure they understand the ramifications when marking an object as serializable.
If portability is a requirement, use the
SoapFormatter instead. Simply replace the formatter in the code above with
SoapFormatter, and call
Serialize and
Deserialize as before. This formatter produces the following output for the example used above.
<SOAP-ENV:Envelope
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:SOAP- ENC=http://schemas.xmlsoap.org/soap/encoding/
xmlns:SOAP- ENV=http://schemas.xmlsoap.org/soap/envelope/
SOAP-ENV:encodingStyle=
"http://schemas.microsoft.com/soap/encoding/clr/1.0
http://schemas.xmlsoap.org/soap/encoding/"
xmlns:a1="http://schemas.microsoft.com/clr/assem/ToFile">
<SOAP-ENV:Body>
<a1:MyObject id="ref-1">
<n1>1</n1>
<n2>24</n2>
<str id="ref-3">Some String</str>
</a1:MyObject>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
It is important to note that the
Serializable attribute cannot be inherited. If we derive a new class from
MyObject, the new class must be marked with the attribute as well, or it cannot be serialized. For example, when you attempt to serialize an instance of the class below, you will get a
SerializationException informing you that the
MyStuff type is not marked as serializable.
public class MyStuff : MyObject
{
public int n3;
}
Using the serialization attribute is convenient, but it has limitations as demonstrated above. Refer to the guidelines (see
Serialization Guidelines below) regarding when to mark a class for serialization, since serialization cannot be added to a class after it has been compiled.
Selective Serialization
A class often contains fields that should not be serialized. For example, assume a class stores a thread ID in a member variable. When the class is deserialized, the thread stored for the ID when the class was serialized might not be running anymore, so serializing this value does not make sense. You can prevent member variables from being serialized by marking them with the
NonSerialized attribute as follows:
[Serializable]
public class MyObject
{
public int n1;
[NonSerialized] public int n2;
public String str;
}
Custom Serialization
You can customize the serialization process by implementing the
ISerializable interface on an object. This is particularly useful in cases where the value of a member variable is invalid after deserialization, but you need to provide the variable with a value in order to reconstruct the full state of the object. Implementing
ISerializable involves implementing the
GetObjectData method and a special constructor that will be used when the object is deserialized. The sample code below shows how to implement
ISerializable on the
MyObject class from a previous section.
[Serializable]
public class MyObject : ISerializable
{
public int n1;
public int n2;
public String str;
public MyObject()
{
}
protected MyObject(SerializationInfo info, StreamingContext context)
{
n1 = info.GetInt32("i");
n2 = info.GetInt32("j");
str = info.GetString("k");
}
public virtual void GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("i", n1);
info.AddValue("j", n2);
info.AddValue("k", str);
}
}
When
GetObjectData is called during serialization, you are responsible for populating the
SerializationInfo object provided with the method call. Simply add the variables to be serialized as name/value pairs. Any text can be used as the name. You have the freedom to decide which member variables are added to the
SerializationInfo, provided that sufficient data is serialized to restore the object during deserialization. Derived classes should call the
GetObjectData method on the base object if the latter implements
ISerializable.
It is important to stress that you need to implement both
GetObjectData as well as the special constructor when
ISerializable is added to a class. The compiler will warn you if
GetObjectData is missing, but since it is impossible to enforce the implementation of a constructor, no warnings will be given if the constructor is absent and an exception will be thrown when an attempt is made to deserialize a class without the constructor. The current design was favored above a
SetObjectData method to get around potential security and versioning problems. For example, a
SetObjectData method must be public if it is defined as part of an interface, thus users have to write code to defend against having the
SetObjectData method called multiple times. One can imagine the headaches that can potentially be caused by a malicious application that calls the
SetObjectData method on an object that was in the process of executing some operation.
During deserialization, the
SerializationInfo is passed to the class using the constructor provided for this purpose. Any visibility constraints placed on the constructor are ignored when the object is deserialized, so you can mark the class as public, protected, internal, or private. It is a good idea to make the constructor protected unless the class is sealed, in which case the constructor should be marked private. To restore the state of the object, simply retrieve the values of the variables from the
SerializationInfo using the names used during serialization. If the base class implements
ISerializable, the base constructor should be called to allow the base object to restore its variables.
When you derive a new class from one that implements
ISerializable, the derived class must implement both the constructor as well as the
GetObjectData method if it has any variables that need to be serialized. The code snippet below shows how this is done using the
MyObject class shown previously.
[Serializable]
public class ObjectTwo : MyObject
{
public int num;
public ObjectTwo() : base()
{
}
protected ObjectTwo(SerializationInfo si, StreamingContext context) :
base(si,context)
{
num = si.GetInt32("num");
}
public override void GetObjectData(SerializationInfo si,
StreamingContext context)
{
base.GetObjectData(si,context);
si.AddValue("num", num);
}
}
Do not forget to call the base class in the deserialization constructor; if this is not done, the constructor on the base class will never be called and the object will not be fully constructed after deserialization.
Objects are reconstructed from the inside out, and calling methods during deserialization can have undesirable side effects, since the methods called might refer to object references that have not been deserialized by the time the call is made. If the class being deserialized implements the
IDeserializationCallback, the
OnSerialization method will automatically be called when the entire object graph has been deserialized. At this point, all the child objects referenced have been fully restored. A hash table is a typical example of a class that is difficult to deserialize without using the event listener described above. It is easy to retrieve the key/value pairs during deserialization, but adding these objects back to the hash table can cause problems since there is no guarantee that classes that derived from the hash table have been deserialized. Calling methods on a hash table at this stage is therefore not advisable.
Steps in the Serialization Process
When the
Serialize method is called on a formatter, object serialization proceeds according to the following rules:
- A check is made to determine if the formatter has a surrogate selector. If it does, check if the surrogate selector handles objects of the given type. If the selector handles the object type, ISerializable.GetObjectData is called on the surrogate selector.
- If there is no surrogate selector or if it does not handle the type, a check is made to determine if the object is marked with the Serializable attribute. If it is not, a SerializationException is thrown.
- If it is marked appropriately, check if the object implements ISerializable. If it does, GetObjectData is called on the object.
- If it does not implement ISerializable, the default serialization policy is used, serializing all fields not marked as NonSerialized.
Versioning
The .NET Framework provides support for versioning and side-by-side execution, and all classes will work across versions if the interfaces of the classes remain the same. Since serializations deals with member variables and not interfaces, be cautious when adding or removing member variables to classes that will be serialized across versions. This is especially true for classes that do not implement
ISerializable. Any change of state of the current version, such as the addition of member variables, changing the types of variables, or changing their names, will mean that existing objects of the same type cannot be successfully deserialized if they were serialized with a previous version.
If the state of an object needs to change between versions, class authors have two choices:
- Implement ISerializable. This allows you to take precise control of the serialization and deserialization process, allowing future state to be added and interpreted correctly during deserialization.
- Mark nonessential member variables with the NonSerialized attribute. This option should only be used when you expect minor changes between different versions of a class. For example, when a new variable has been added to a later version of a class, the variable can be marked as NonSerialized to ensure the class remains compatible with previous versions.
Serialization Guidelines
You should consider serialization when designing new classes since a class cannot be made serializable after it has been compiled. Some questions to ask are: Do I have to send this class across application domains? Will this class ever be used with remoting? What will my users do with this class? Maybe they derive a new class from mine that needs to be serialized. When in doubt, mark the class as serializable. It is probably better to mark all classes as serializable unless:
- They will never cross an application domain. If serialization is not required and the class needs to cross an application domain, derive the class from MarshalByRefObject.
- The class stores special pointers that are only applicable to the current instance of the class. If a class contains unmanaged memory or file handles, for example, ensure these fields are marked as NonSerialized or don't serialize the class at all.
- Some of the data members contain sensitive information. In this case, it will probably be advisable to implement ISerializable and serialize only the required fields.