How to Make a Class Serializable
So far, we've focused on the mechanics of serializing an object. We've assumed we have a serializable object and discussed, from the point of view of client code, how to serialize it. The next step is discussing how to make a class serializable.
There are four basic things you must do when you are making a class serializable. They are:
- Implement the
Serializable
interface. - Make sure that instance-level, locally defined state is serialized properly.
- Make sure that superclass state is serialized properly.
- Override
equals( )
andhashCode( )
.
Let's look at each of these steps in more detail.
Implement the Serializable Interface
This is by far the easiest of the steps. The
Serializable
interface is an empty interface; it declares no methods at all. So implementing it amounts to adding "implements Serializable" to your class declaration.
Reasonable people may wonder about the utility of an empty interface. Rather than define an empty interface, and require class definitions to implement it, why not just simply make every object serializable? The main reason not to do this is that there are some classes that don't have an obvious serialization. Consider, for example, an instance of
File
. An instance of File
represents a file. Suppose, for example, it was created using the following line of code:File file = new File("c:\\temp\\foo");
It's not at all clear what should be written out when this is serialized. The problem is that the file itself has a different lifecyle than the serialized data. The file might be edited, or deleted entirely, while the serialized information remains unchanged. Or the serialized information might be used to restart the application on another machine, where
"C:\\temp\\foo"
is the name of an entirely different file.
Another example is provided by the
Thread
class. (If you don't know much about threads, just wait a few chapters and then revisit this example. It will make more sense then.) A thread represents a flow of execution within a particular JVM. You would not only have to store the stack, and all the local variables, but also all the related locks and threads, and restart all the threads properly when the instance is deserialized.TIP: Things get worse when you consider platform dependencies. In general, any class that involves native code is not really a good candidate for serialization.
Make Sure That Instance-Level, Locally Defined State Is Serialized Properly
Class definitions contain variable declarations. The instance-level, locally defined variables (e.g., the nonstatic variables) are the ones that contain the state of a particular instance. For example, in our
Money
class, we declared one such field:public class Money extends ValueObject {
private int _cents;
....
}
The serialization mechanism has a nice default behavior -- if all the instance-level, locally defined variables have values that are either serializable objects or primitive datatypes, then the serialization mechanism will work without any further effort on our part. For example, our implementations of
Account
, such as Account_Impl
, would present no problems for the default serialization mechanism:public class Account_Impl extends UnicastRemoteObject implements Account {
private Money _balance;
...
}
While
_balance
doesn't have a primitive type, it does refer to an instance of Money
, which is a serializable class.
If, however, some of the fields don't have primitive types, and don't refer to serializable classes, more work may be necessary. Consider, for example, the implementation of
ArrayList
from the java.util
package. An ArrayList
really has only two pieces of state:public class ArrayList extends AbstractList implements List, Cloneable, java.io.
Serializable {
private Object elementData[];
private int size;
...
}
But hidden in here is a huge problem:
ArrayList
is a generic container class whose state is stored as an array of objects. While arrays are first-class objects in Java, they aren't serializable objects. This means that ArrayList
can't just implement the Serializable
interface. It has to provide extra information to help the serialization mechanism handle its nonserializable fields. There are three basic solutions to this problem:- Fields can be declared to be transient.
- The
writeObject( )
/readObject( )
methods can be implemented. serialPersistentFields
can be declared.
Declaring transient fields
The first, and easiest, thing you can do is simply mark some fields using the
transient
keyword. In ArrayList
, for example, elementData
is really declared to be a transient field:public class ArrayList extends AbstractList implements List, Cloneable, java.io.
Serializable {
private transient Object elementData[];
private int size;
...
}
This tells the default serialization mechanism to ignore the variable. In other words, the serialization mechanism simply skips over the transient variables. In the case of
ArrayList
, the default serialization mechanism would attempt to write out size
, but ignore elementData
entirely.
This can be useful in two, usually distinct, situations:
- The variable isn't serializable
- If the variable isn't serializable, then the serialization mechanism will throw an exception when it tries to serialize the variable. To avoid this, you can declare the variable to be transient.
- The variable is redundant
- Suppose that the instance caches the result of a computation. Locally, we might want to store the result of the computation, in order to save some processor time. But when we send the object over the wire, we might worry more about consuming bandwidth and thus discard the cached computation since we can always regenerate it later on.
Implementing writeObject() and readObject( )
Suppose that the first case applies. A field takes values that aren't serializable. If the field is still an important part of the state of our instance, such as
elementData
in the case of an ArrayList
, simply declaring the variable to be transient
isn't good enough. We need to save and restore the state stored in the variable. This is done by implementing a pair of methods with the following signatures:private void writeObject(java.io.ObjectOutputStream out) throws IOException private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException;
When the serialization mechanism starts to write out an object, it will check to see whether the class implements
writeObject( )
. If so, the serialization mechanism will not use the default mechanism and will not write out any of the instance variables. Instead, it will call writeObject( )
and depend on the method to store out all the important state. Here is ArrayList
's implementation of writeObject( )
:private synchronized void writeObject(java.io.ObjectOutputStream stream) throws java.
io.IOException {
stream.defaultWriteObject( );
stream.writeInt(elementData.length);
for (int i=0; i<size; i++)
stream.writeObject(elementData[i]);
}
The first thing this does is call
defaultWriteObject( )
. defaultWriteObject( )
invokes the default serialization mechanism, which serializes all the nontransient, nonstatic instance variables. Next, the method writes out elementData.length
and then calls the stream's writeObject( )
for each element of elementData
.
There's an important point here that is sometimes missed:
readObject( )
and writeObject( )
are a pair of methods that need to be implemented together. If you do any customization of serialization inside one of these methods, you need to implement the other method. If you don't, the serialization algorithm will fail.
Unit Tests and Serialization
Unit tests are used to test a specific piece of functionality in a class. They are explicitly not end-to-end or application-level tests. It's often a good idea to adopt a unit-testing harness such asJUnit
when developing an application.JUnit
gives you an automated way to run unit tests on individual classes and is available from http://www.junit.org/.
If you adopt a unit-testing methodology, then any serializable class should pass the following three tests:
Similar constraints hold for classes that implement the
- If it implements
readObject( )
, it should implementwriteObject( )
, and vice-versa.- It is equal (using the
equals( )
method) to a serialized copy of itself.- It has the same hashcode as a serialized copy of itself.
Externalizable
interface.
Declaring serialPersistentFields
The final option that can be used is to explicitly declare which fields should be stored by the serialization mechanism. This is done using a special static final variable called
serialPersistentFields
, as shown in the following code snippet:private static final ObjectStreamField[] serialPersistentFields = { new
ObjectStreamField("size", Integer.TYPE), .... };
This line of code declares that the field named
size
, which is of type int
, is a serial persistent field and will be written to the output stream by the serialization mechanism. Declaring serialPersistentFields
is almost the opposite of declaring some fields transient
. The meaning of transient is, "This field shouldn't be stored by serialization," and the meaning of serialPersistentFields
is, "These fields should be stored by serialization."
But there is one important difference between declaring some variables to be
transient
and others to be serialPersistentFields
. In order to declare variables to be transient, they must be locally declared. In other words, you must have access to the code that declares the variable. There is no such requirement forserialPersistentFields
. You simply provide the name of the field and the type.TIP: What if you try to do both? That is, suppose you declare some variables to betransient
, and then also provide a definition forserialPersistentFields
? The answer is that thetransient
keyword is ignored; the definition ofserialPersistentFields
is definitive.
So far, we've talked only about instance-level state. What about class-level state? Suppose you have important information stored in a static variable? Static variables won't get saved by serialization unless you add special code to do so. In our context, (shipping objects over the wire between clients and servers), statics are usually a bad idea anyway.
Make Sure That Superclass State Is Handled Correctly
After you've handled the locally declared state, you may still need to worry about variables declared in a superclass. If the superclass implements the
Serializable
interface, then you don't need to do anything. The serialization mechanism will handle everything for you, either by using default serialization or by invokingwriteObject( )
/ readObject( )
if they are declared in the superclass.
If the superclass doesn't implement
Serializable
, you will need to store its state. There are two different ways to approach this. You can use serialPersistentFields
to tell the serialization mechanism about some of the superclass instance variables, or you can use writeObject( )
/ readObject( )
to handle the superclass state explicitly. Both of these, unfortunately, require you to know a fair amount about the superclass. If you're getting the .class files from another source, you should be aware that versioning issues can cause some really nasty problems. If you subclass a class, and that class's internal representation of instance-level state changes, you may not be able to load in your serialized data. While you can sometimes work around this by using a sufficiently convoluted readObject( )
method, this may not be a solvable problem. We'll return to this later. However, be aware that the ultimate solution may be to just implement the Externalizable
interface instead, which we'll talk about later.
Another aspect of handling the state of a nonserializable superclass is that nonserializable superclasses must have a zero-argument constructor. This isn't important for serializing out an object, but it's incredibly important when deserializing an object. Deserialization works by creating an instance of a class and filling out its fields correctly. During this process, the deserialization algorithm doesn't actually call any of the serialized class's constructors, but does call the zero-argument constructor of the first nonserializable superclass. If there isn't a zero-argument constructor, then the deserialization algorithm can't create instances of the class, and the whole process fails.
WARNING: If you can't create a zero-argument constructor in the first nonserializable superclass, you'll have to implement the Externalizable
interface instead.
Simply adding a zero-argument constructor might seem a little problematic. Suppose the object already has several constructors, all of which take arguments. If you simply add a zero-argument constructor, then the serialization mechanism might leave the object in a half-initialized, and therefore unusable, state.
However, since serialization will supply the instance variables with correct values from an active instance immediately after instantiating the object, the only way this problem could arise is if the constructors actually do something with their arguments--besides setting variable values.
If all the constructors take arguments and actually execute initialization code as part of the constructor, then you may need to refactor a bit. The usual solution is to move the local initialization code into a new method (usually named something like
initialize( )
), which is then called from the original constructor:public MyObject(arglist) {
// set local variables from arglist
// perform local initialization
}
to something that looks like:
private MyObject( ) {
// zero argument constructor, invoked by serialization
// and never by any other
// piece of code.
// note that it doesn't call initialize( )
}
public void MyObject(arglist) {
// set local variables from arglist
initialize( );
}
private void initialize( ) {
// perform local initialization
}
After this is done,
writeObject( )
/ readObject( )
should be implemented, and readObject( )
should end with a call to initialize( )
. Sometimes this will result in code that simply invokes the default serialization mechanism, as in the following snippet:private void writeObject(java.io.ObjectOutputStream stream) throws
java.io.IOException {
stream.defaultWriteObject( );
}
private void readObject(java.io.ObjectInputStream stream) throws
java.io.IOException {
stream.defaultReadObject( );
intialize( );
}
TIP: If creating a zero-argument constructor is difficult (for example, you don't have the source code for the superclass), your class will need to implement theExternalizable
interface instead ofSerializable
.
Override equals( ) and hashCode( ) if Necessary
The default implementations of
equals( )
and hashCode( )
, which are inherited from java.lang.Object
, simply use an instance's location in memory. This can be problematic. Consider our previous deep copy code example:ByteArrayOutputStream memoryOutputStream = new ByteArrayOutputStream( );
ObjectOutputStream serializer = new ObjectOutputStream(memoryOutputStream);
serializer.writeObject(serializableObject);
serializer.flush( );
ByteArrayInputStream memoryInputStream = new ByteArrayInputStream(memoryOutputStream.
toByteArray( ));
ObjectInputStream deserializer = new ObjectInputStream(memoryInputStream);
Object deepCopyOfOriginalObject = deserializer.readObject( );
The potential problem here involves the following boolean test:
serializableObject.equals(deepCopyOfOriginalObject)
Sometimes, as in the case of
Money
and DocumentDescription
, the answer should be true
. If two instances of Money
have the same values for _cents
, then they are equal. However, the implementation of equals( )
inherited from Object
will return false
.
The same problem occurs with
hashCode( )
. Note that Object
implements hashCode( )
by returning the memory address of the instance. Hence, no two instances ever have the same hashCode( )
using Object
's implementation. If two objects are equal, however, then they should have the same hashcode. So if you need to overrideequals( )
, you probably need to override hashCode( )
as well.TIP: With the exception of declaring variables to be transient, all our changes involve adding functionality. Making a class serializable rarely involves significant changes to its functionality and shouldn't result in any changes to method implementations. This means that it's fairly easy to retrofit serialization onto an existing object hierarchy. The hardest part is usually implementingequals( )
andhashCode( )
.
Reference: Java RMI, Book by William Grosso
0 comments:
Post a Comment