This article is about data structure encoding. For other uses, see Serialization (disambiguation).
In computer science, in the context of data storage and transmission, serialization is the process of converting an object into a sequence of bits so that it can be stored on a storage medium (such as a file, or a memory buffer) or transmitted across a network connection link. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. This process of serializing an object is also called deflating or marshalling an object. The opposite operation, extracting a data structure from a series of bytes, is deserialization (which is also called inflating or unmarshalling).
[edit] UsesSerialization has a number of advantages. It provides:
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming languages. Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization. Even on a single machine, primitive pointer objects are too fragile to save, because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling and the deserialization process includes a step called pointer swizzling. Since both serializing and deserializing can be driven from common code, (for example, the Serialize function in Microsoft Foundation Classes) it is possible for the common code to do both at the same time, and thus 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy, since differences can be detected "on the fly". This is a way to understand the technique called Differential Execution. It is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things. [edit] ConsequencesSerialization, however, breaks the opacity of an abstract data type by potentially exposing private implementation details. To discourage competitors from making compatible products, publishers of proprietary software often keep the details of their programs' serialization formats a trade secret. Some deliberately obfuscate or even encrypt the serialized data. This process is often termed "instantated oatmealization" by the open source community, a pun on the homophony of serialization and cereal. [1] Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore remote method call architectures such as CORBA define their serialization formats in detail and often provide methods of checking the consistency of any serialized stream when converting it back into an object. [edit] Human-readable serializationIn the late 1990s, a push to provide an alternative to the standard serialization protocols started: the XML markup language was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte stream based encoding, which is generally more practical. A future solution to this dilemma could be transparent compression schemes (see binary XML). XML is today often used for asynchronous transfer of structured data between client and server in Ajax web applications. An alternative for this use case is JSON, a more lightweight text-based serialization protocol which uses JavaScript syntax but is supported in numerous other programming languages as well. [edit] Scientific serializationFor large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB. [edit] Programming language supportSeveral object-oriented programming languages directly support object serialization (or object archival), either by syntactic sugar elements or providing a standard interface for doing so. Some of these programming languages are Ruby, Smalltalk, Python, PHP, Objective-C, Java, and the .NET family of languages. There are also libraries available that add serialization support to languages that lack native support for it. [edit] .NET FrameworkIn the .NET languages, classes can be serialized and deserialized by adding the 'VB Example <Serializable()> Class Employee // C# Example [Serializable] class Employee If new members are added to a serializable class, they can be tagged with the To modify the default deserialization (for example, to automatically initialize a member marked Objects may be serialized in binary format for deserialization by other .NET applications. The framework also provides the [edit] Objective-CIn the Objective-C programming language, serialization (more commonly known as archiving) is achieved by overriding the [edit] ExampleThe following example demonstrates two independent programs, a "sender", who takes the current time (as per time in the C standard library), archives it and prints the archived form to the standard output, and a "receiver" which decodes the archived form, reconstructs the time and prints it out. When compiled, we get a sender program and a receiver program. If we just execute the sender program, we will get out a serialization that looks like: GNU TypedStream 1D@îC¡ (with a NULL character after the 1). If we pipe the two programs together, as sender | receiver, we get received 1089356705 showing the object was serialized, sent, and reconstructed properly. In essence, the sender and receiver programs could be distributed across a network connection, providing distributed object capabilities. [edit] Sender.h#import <objc/Object.h> #import <time.h> #import <stdio.h> @interface Sender : Object { time_t current_time; } - (id) setTime; - (time_t) time; - (id) send; - (id) read: (TypedStream *) s; - (id) write: (TypedStream *) s; @end [edit] Sender.m#import "Sender.h" @implementation Sender - (id) setTime { //Set the time current_time = time(NULL); return self; } - (time_t) time; { return current_time; } - (id) write: (TypedStream *) stream { /* *Write the superclass to the stream. *We do this so we have the complete object hierarchy, *not just the object itself. */ [super write:stream]; /* *Write the current_time out to the stream. *time_t is typedef for an integer. *The second argument, the string "i", specifies the types to write *as per the @encode directive. */ objc_write_types(stream, "i", ¤t_time); return self; } - (id) read: (TypedStream *) stream { /* *Do the reverse to write: - reconstruct the superclass... */ [super read:stream]; /* *And reconstruct the instance variables from the stream... */ objc_read_types(stream, "i", ¤t_time); return self; } - (id) send { //Convenience method to do the writing. We open stdout as our byte stream TypedStream *s = objc_open_typed_stream(stdout, OBJC_WRITEONLY); //Write the object to the stream [self write:s]; //Finish up — close the stream. objc_close_typed_stream(s); } @end [edit] Sender.c#import "Sender.h" int main(void) { Sender *s = [Sender new]; [s setTime]; [s send]; return 0; } [edit] Receiver.m#import "Receiver.h" @implementation Receiver - (id) receive { //Open stdin as our stream for reading. TypedStream *s = objc_open_typed_stream(stdin, OBJC_READONLY); //Allocate memory for, and instantiate the object from reading the stream. t = [[Sender alloc] read:s]; objc_close_typed_stream(s); } - (id) print { fprintf(stderr, "received %d\n", [t time]); } @end [edit] Receiver.c#import "Receiver.h" int main(void) { Receiver *r = [Receiver new]; [r receive]; [r print]; return 0; } [edit] JavaJava provides automatic serialization which requires that the object be marked by implementing the There are three primary reasons why objects are not serializable by default and must implement the
The standard encoding method uses a simple translation of the fields into a byte stream. Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object and not marked as transient must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that some portion of the reference graph is truncated and not serialized. It is possible to serialize Java objects through JDBC and store them into a database. [1] While Swing components do implement the Serializable interface, it is important to remember that they are not portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to an array of bytes, but it is not guaranteed that this storage will be readable on another machine. [edit] Exampleimport java.io.*; /** * The object to serialize. */ class ObjectToSerialize implements Serializable { static private final long serialVersionUID = 42L; public ObjectToSerialize(String firstAttribute, int secondAttribute) { this.firstAttribute = firstAttribute; this.secondAttribute = secondAttribute; } @Override public String toString() { return firstAttribute + ", " + secondAttribute; } private String firstAttribute; private int secondAttribute; } public class Main { /** * Save an object. */ private static void save_object(Serializable object, String filename) throws IOException { ObjectOutputStream objstream = new ObjectOutputStream(new FileOutputStream(filename)); objstream.writeObject(object); objstream.close(); } /** * Load an object. */ private static Object load_object(String filename) throws Exception { ObjectInputStream objstream = new ObjectInputStream(new FileInputStream(filename)); Object object = objstream.readObject(); objstream.close(); return object; } public static void main(String[] args) { ObjectToSerialize o = new ObjectToSerialize("Object", 42); System.out.println(o); try { save_object(o, "object.ser"); ObjectToSerialize object_loaded = (ObjectToSerialize) load_object("object.ser"); System.out.println(object_loaded); } catch (Exception e) { } } } [edit] ColdFusionColdFusion allows data structures to be serialized to WDDX with the [edit] OCamlOCaml's standard library provides marshalling through the Marshal module (its documentation) and the Pervasives functions output_value and input_value. While OCaml programming is statically type-checked, uses of the Marshal module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. In OCaml it is difficult to marshal a function or a data structure which contains a function (e.g. an object which contains a method), because executable code in functions cannot be transmitted across different programs. (There is a flag to marshal the code position of a function but it can only be unmarshalled in the exact same program.) [edit] PerlSeveral Perl modules available from CPAN provide serialization mechanisms, including Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars. use Storable; # Create a hash with some nested data structures my %struct = ( text => 'Hello, world!', list => [1, 2, 3] ); # Serialize the hash into a file store \%struct, 'serialized'; # Read the data back later my $newstruct = retrieve 'serialized'; In addition to serializing directly to files, When serializing structures with [edit] CThe tpl library supports serializing C data structures into an efficient, native binary representation. The serialized data can be reversibly converted to a human-readable XML representation. The open source gSOAP toolkit provides serialization of C data structures in XML using a C data binding for XML schema. The toolkit supports SOAP, WSDL and XSD specifications. The c11n ("cerialization") project is a C variant of the libs11n C++ library, geared towards serializing client-side C objects. c11n is internally ignorant of any file formats and provides handlers for several different ones (e.g. XML, SQL, and custom formats). [edit] C++Boost Serialization, libs11n, and Sweet Persist are libraries that provide support for serialization from within the C++ language itself. They all integrate well with the STL. Boost Serialization and Sweet Persist support serialization in XML and binary formats. The libs11n library supports serialization to and from several text formats (including 3 flavors of XML) as well as sqlite3. The Microsoft Foundation Class Library has comprehensive support for serialization to a binary format. It doesn't have support for the STL but does support its own containers. Alternatively XML Data Binding implementations, like XML Schema to C++ data binding compiler and gSOAP, provide support for serialization to and from XML by generating C++ source code from an intermediate specification (e.g. an XML schema). Ebenezer Enterprises provides an on line service that writes efficient C++ marshalling code. [edit] PythonPython implements serialization through the built-in [edit] PHPPHP implements serialization through the built-in ' For objects (as of at least PHP 4) there are two "magic methods" than can be implemented within a class — __sleep() and __wakeup() — that are called from within [edit] REBOLREBOL will serialize to file ( [edit] RubyRuby includes the standard module Some objects can't be serialized (doing so would raise a
If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods: class Klass def initialize(str) @str = str end def sayHello @str end end o = Klass.new("hello\n") data = Marshal.dump(o) obj = Marshal.load(data) obj.sayHello » "hello\n" [edit] Smalltalk[edit] Squeak SmalltalkThere are several ways in Squeak Smalltalk to serialize and store objects. The easiest and most used method will be shown below. Other classes of interest in Squeak for serializing objects are To store a Dictionary (sometimes called a hash map in other languages) containing some nonsense data of varying types into a file named "data.obj": | data rr | data := Dictionary new. data at: #Meef put: 25; at: 23 put: 'Amanda'; at: 'Small Numbers' put: #(0 1 2 3 four). rr := ReferenceStream fileNamed: 'data.obj'. rr nextPut: data; close. To restore the | restoredData rr | rr := ReferenceStream fileNamed: 'data.obj'. restoredData := rr next. restoredData inspect. rr close. [edit] Other Smalltalk dialectsObject serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in Ambrai Smalltalk. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database [2] and some RPC packages. A solution to this problem is SIXX [3], which is an package for multiple Smalltalks that uses an XML-based format for serialization. [edit] LispGenerally a Lisp data structure can be serialized with the functions " In many types of Lisp, including Common Lisp, the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out. See REPL. [edit] HaskellIn Haskell, serialization is supported for types by inheritance of the Read and Show type classes. Every type that inherits the The programmer need not define the functions explicitly -- merely declaring a type to be deriving Read or deriving Show, or both, will make the compiler generate the appropriate functions. [edit] Windows PowerShellWindows PowerShell implements serialization through the built-in cmdlet To reconstitute the objects, use the
# Create a hash with some nested data structures
$struct = @{text = 'Hello, world!'; list = 1,2,3}
# Serialize the hash into an XML file $struct | Export-CliXML serialized.xml # Read the data back later $newstruct = Import-CliXML serialized.xml The serialized data structures are stored in XML format: <Objs Version="1.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"> <Obj RefId="RefId-0"> <TN RefId="RefId-0"> <T>System.Collections.Hashtable</T> <T>System.Object</T> </TN> <DCT> <En> <S N="Key">text</S> <S N="Value">Hello, world!</S> </En> <En> <S N="Key">list</S> <Obj N="Value" RefId="RefId-1"> <TN RefId="RefId-1"> <T>System.Object[]</T> <T>System.Array</T> <T>System.Object</T> </TN> <LST> <I32>1</I32> <I32>2</I32> <I32>3</I32> </LST> </Obj> </En> </DCT> </Obj> </Objs> Two dimensional data structures can also be (de)serialized in CSV format using the built-in cmdlets [edit] See also[edit] References
[edit] External linksFor Java:
For C: For C++: For PHP: This is an extract from Wikipedia, the Free Encyclopediaofferte voli | hoteles | precios | voli | die verzeichnis | annuarie web | stop smoking london |