Dec 29, 2015

Serialization: Part-3 | How to make a class Serializable

How to Make a Class Serializable

So far, we've focused on the mechanics of serializing an object. We've assumed we have a serializable object and discussed, from the point of view of client code, how to serialize it. The next step is discussing how to make a class serializable.


There are four basic things you must do when you are making a class serializable. They are:
  1. Implement the Serializableinterface.
  2. Make sure that instance-level, locally defined state is serialized properly.
  3. Make sure that superclass state is serialized properly.
  4. Override equals( )and hashCode( ).
Let's look at each of these steps in more detail.

Implement the Serializable Interface

This is by far the easiest of the steps. The Serializableinterface is an empty interface; it declares no methods at all. So implementing it amounts to adding "implements Serializable" to your class declaration.
Reasonable people may wonder about the utility of an empty interface. Rather than define an empty interface, and require class definitions to implement it, why not just simply make every object serializable? The main reason not to do this is that there are some classes that don't have an obvious serialization. Consider, for example, an instance of File. An instance of Filerepresents a file. Suppose, for example, it was created using the following line of code:
File file = new File("c:\\temp\\foo");
It's not at all clear what should be written out when this is serialized. The problem is that the file itself has a different lifecyle than the serialized data. The file might be edited, or deleted entirely, while the serialized information remains unchanged. Or the serialized information might be used to restart the application on another machine, where"C:\\temp\\foo"is the name of an entirely different file.
Another example is provided by the Thread class. (If you don't know much about threads, just wait a few chapters and then revisit this example. It will make more sense then.) A thread represents a flow of execution within a particular JVM. You would not only have to store the stack, and all the local variables, but also all the related locks and threads, and restart all the threads properly when the instance is deserialized.
TIP:   Things get worse when you consider platform dependencies. In general, any class that involves native code is not really a good candidate for serialization.

Make Sure That Instance-Level, Locally Defined State Is Serialized Properly

Class definitions contain variable declarations. The instance-level, locally defined variables (e.g., the nonstatic variables) are the ones that contain the state of a particular instance. For example, in our Moneyclass, we declared one such field:
public class Money extends ValueObject {
private int _cents;
....
}
The serialization mechanism has a nice default behavior -- if all the instance-level, locally defined variables have values that are either serializable objects or primitive datatypes, then the serialization mechanism will work without any further effort on our part. For example, our implementations of Account, such as Account_Impl, would present no problems for the default serialization mechanism:
public class Account_Impl extends UnicastRemoteObject implements Account {
private Money _balance;
...
}
While _balancedoesn't have a primitive type, it does refer to an instance of Money, which is a serializable class.
If, however, some of the fields don't have primitive types, and don't refer to serializable classes, more work may be necessary. Consider, for example, the implementation ofArrayListfrom the java.utilpackage. An ArrayListreally has only two pieces of state:
public class ArrayList extends AbstractList implements List, Cloneable, java.io.
Serializable {
private Object elementData[];
private int size;
...
}
But hidden in here is a huge problem: ArrayListis a generic container class whose state is stored as an array of objects. While arrays are first-class objects in Java, they aren't serializable objects. This means that ArrayListcan't just implement the Serializableinterface. It has to provide extra information to help the serialization mechanism handle its nonserializable fields. There are three basic solutions to this problem:
  • Fields can be declared to be transient.
  • The writeObject( )readObject( ) methods can be implemented.
  • serialPersistentFields can be declared.

Declaring transient fields

The first, and easiest, thing you can do is simply mark some fields using the transientkeyword. In ArrayList, for example, elementDatais really declared to be a transient field:
public class ArrayList extends AbstractList implements List, Cloneable, java.io.
Serializable {
private transient Object elementData[];
private int size;
...
}
This tells the default serialization mechanism to ignore the variable. In other words, the serialization mechanism simply skips over the transient variables. In the case ofArrayList, the default serialization mechanism would attempt to write out size, but ignore elementDataentirely.
This can be useful in two, usually distinct, situations:
The variable isn't serializable
If the variable isn't serializable, then the serialization mechanism will throw an exception when it tries to serialize the variable. To avoid this, you can declare the variable to be transient.
The variable is redundant
Suppose that the instance caches the result of a computation. Locally, we might want to store the result of the computation, in order to save some processor time. But when we send the object over the wire, we might worry more about consuming bandwidth and thus discard the cached computation since we can always regenerate it later on.

Implementing writeObject() and readObject( )

Suppose that the first case applies. A field takes values that aren't serializable. If the field is still an important part of the state of our instance, such as elementDatain the case of an ArrayList, simply declaring the variable to be transientisn't good enough. We need to save and restore the state stored in the variable. This is done by implementing a pair of methods with the following signatures:
private void writeObject(java.io.ObjectOutputStream out) throws IOException private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException;
When the serialization mechanism starts to write out an object, it will check to see whether the class implements writeObject( ). If so, the serialization mechanism will not use the default mechanism and will not write out any of the instance variables. Instead, it will call writeObject( )and depend on the method to store out all the important state. Here is ArrayList's implementation of writeObject( ):
private synchronized void writeObject(java.io.ObjectOutputStream stream) throws java.
io.IOException {
stream.defaultWriteObject( );
stream.writeInt(elementData.length);
for (int i=0; i<size; i++)
stream.writeObject(elementData[i]);
}
The first thing this does is call defaultWriteObject( )defaultWriteObject( )invokes the default serialization mechanism, which serializes all the nontransient, nonstatic instance variables. Next, the method writes out elementData.lengthand then calls the stream's writeObject( )for each element of elementData.
There's an important point here that is sometimes missed: readObject( )and writeObject( )are a pair of methods that need to be implemented together. If you do any customization of serialization inside one of these methods, you need to implement the other method. If you don't, the serialization algorithm will fail.

Unit Tests and Serialization

Unit tests are used to test a specific piece of functionality in a class. They are explicitly not end-to-end or application-level tests. It's often a good idea to adopt a unit-testing harness such as JUnitwhen developing an application. JUnitgives you an automated way to run unit tests on individual classes and is available from http://www.junit.org/.
If you adopt a unit-testing methodology, then any serializable class should pass the following three tests:
  • If it implements readObject( ), it should implement writeObject( ), and vice-versa.
  • It is equal (using the equals( )method) to a serialized copy of itself.
  • It has the same hashcode as a serialized copy of itself.
Similar constraints hold for classes that implement the Externalizableinterface.

Declaring serialPersistentFields

The final option that can be used is to explicitly declare which fields should be stored by the serialization mechanism. This is done using a special static final variable calledserialPersistentFields, as shown in the following code snippet:
private static final ObjectStreamField[] serialPersistentFields =  { new

   ObjectStreamField("size", Integer.TYPE), .... };
This line of code declares that the field named size, which is of type int, is a serial persistent field and will be written to the output stream by the serialization mechanism. Declaring serialPersistentFieldsis almost the opposite of declaring some fields transient. The meaning of transient is, "This field shouldn't be stored by serialization," and the meaning of serialPersistentFieldsis, "These fields should be stored by serialization."
But there is one important difference between declaring some variables to be transientand others to be serialPersistentFields. In order to declare variables to be transient, they must be locally declared. In other words, you must have access to the code that declares the variable. There is no such requirement forserialPersistentFields. You simply provide the name of the field and the type.
TIP:   What if you try to do both? That is, suppose you declare some variables to be transient, and then also provide a definition forserialPersistentFields? The answer is that the transientkeyword is ignored; the definition of serialPersistentFieldsis definitive.
So far, we've talked only about instance-level state. What about class-level state? Suppose you have important information stored in a static variable? Static variables won't get saved by serialization unless you add special code to do so. In our context, (shipping objects over the wire between clients and servers), statics are usually a bad idea anyway.

Make Sure That Superclass State Is Handled Correctly

After you've handled the locally declared state, you may still need to worry about variables declared in a superclass. If the superclass implements theSerializableinterface, then you don't need to do anything. The serialization mechanism will handle everything for you, either by using default serialization or by invokingwriteObject( )readObject( )if they are declared in the superclass.
If the superclass doesn't implement Serializable, you will need to store its state. There are two different ways to approach this. You can use serialPersistentFieldsto tell the serialization mechanism about some of the superclass instance variables, or you can use writeObject( )readObject( )to handle the superclass state explicitly. Both of these, unfortunately, require you to know a fair amount about the superclass. If you're getting the .class files from another source, you should be aware that versioning issues can cause some really nasty problems. If you subclass a class, and that class's internal representation of instance-level state changes, you may not be able to load in your serialized data. While you can sometimes work around this by using a sufficiently convoluted readObject( )method, this may not be a solvable problem. We'll return to this later. However, be aware that the ultimate solution may be to just implement the Externalizableinterface instead, which we'll talk about later.
Another aspect of handling the state of a nonserializable superclass is that nonserializable superclasses must have a zero-argument constructor. This isn't important for serializing out an object, but it's incredibly important when deserializing an object. Deserialization works by creating an instance of a class and filling out its fields correctly. During this process, the deserialization algorithm doesn't actually call any of the serialized class's constructors, but does call the zero-argument constructor of the first nonserializable superclass. If there isn't a zero-argument constructor, then the deserialization algorithm can't create instances of the class, and the whole process fails.
WARNING: If you can't create a zero-argument constructor in the first nonserializable superclass, you'll have to implement the Externalizableinterface instead.
Simply adding a zero-argument constructor might seem a little problematic. Suppose the object already has several constructors, all of which take arguments. If you simply add a zero-argument constructor, then the serialization mechanism might leave the object in a half-initialized, and therefore unusable, state.
However, since serialization will supply the instance variables with correct values from an active instance immediately after instantiating the object, the only way this problem could arise is if the constructors actually do something with their arguments--besides setting variable values.
If all the constructors take arguments and actually execute initialization code as part of the constructor, then you may need to refactor a bit. The usual solution is to move the local initialization code into a new method (usually named something like initialize( )), which is then called from the original constructor:
public  MyObject(arglist) {
// set local variables from arglist
// perform local initialization
}
to something that looks like:
private MyObject(  ) {
// zero argument constructor, invoked by serialization 
// and never by any other
// piece of code.
// note that it doesn't call initialize(  )
}
 
public void MyObject(arglist) {
// set local variables from arglist
initialize(  );
}
 
private void initialize(  ) {
// perform local initialization
}
After this is done, writeObject( )readObject( )should be implemented, and readObject( )should end with a call to initialize( ). Sometimes this will result in code that simply invokes the default serialization mechanism, as in the following snippet:
private void writeObject(java.io.ObjectOutputStream stream) throws
    java.io.IOException {
stream.defaultWriteObject( );
}

private void readObject(java.io.ObjectInputStream stream) throws
    java.io.IOException {
stream.defaultReadObject( );
intialize( );
}
TIP:   If creating a zero-argument constructor is difficult (for example, you don't have the source code for the superclass), your class will need to implement theExternalizableinterface instead of Serializable.

Override equals( ) and hashCode( ) if Necessary

The default implementations of equals( )and hashCode( ), which are inherited from java.lang.Object, simply use an instance's location in memory. This can be problematic. Consider our previous deep copy code example:
ByteArrayOutputStream memoryOutputStream = new ByteArrayOutputStream( );
ObjectOutputStream serializer = new ObjectOutputStream(memoryOutputStream);
serializer.writeObject(serializableObject);
serializer.flush( );

ByteArrayInputStream memoryInputStream = new ByteArrayInputStream(memoryOutputStream.
toByteArray( ));
ObjectInputStream deserializer = new ObjectInputStream(memoryInputStream);
Object deepCopyOfOriginalObject = deserializer.readObject( );
The potential problem here involves the following boolean test:
serializableObject.equals(deepCopyOfOriginalObject)
Sometimes, as in the case of Moneyand DocumentDescription, the answer should be true. If two instances of Moneyhave the same values for _cents, then they are equal. However, the implementation of equals( )inherited from Objectwill return false.
The same problem occurs with hashCode( ). Note that Objectimplements hashCode( )by returning the memory address of the instance. Hence, no two instances ever have the same hashCode( )using Object's implementation. If two objects are equal, however, then they should have the same hashcode. So if you need to overrideequals( ), you probably need to override hashCode( )as well.
TIP:  With the exception of declaring variables to be transient, all our changes involve adding functionality. Making a class serializable rarely involves significant changes to its functionality and shouldn't result in any changes to method implementations. This means that it's fairly easy to retrofit serialization onto an existing object hierarchy. The hardest part is usually implementing equals( )and hashCode( ).


Reference: Java RMI, Book by William Grosso

0 comments:

Post a Comment