What Is Serialization?
Serialization is the process of turning objects into a stream of bytes or text so they can be stored or sent over the network. Deserialization is the reverse: converting that stream back into usable objects. This is a practical solution to a common problem. Applications need a way to store session data, share structured objects between services, or persist application state across runs. Most programming languages include built-in or standard libraries for serialization. These libraries often support complex types, not just simple data. Some even let developers define how the serialization process works. This power is useful… but risky. Here’s an example of what that looks like in Python. First, we’ll need an object type to serialize and deserialize:pickle .
Common Insecure Deserialization Attacks
When a deserialization function instantiates objects, it may invoke code as part of that process. If an attacker controls the input, they can choose what classes get instantiated and which methods are run. This commonly is escalated into remote code execution, or denial of service.Remote Code Execution (RCE)
To continue with our Python example, the__reduce__ method can be used to specify how an object should be deserialized. A payload that includes such an object can execute code when loaded.
Denial of Service (DoS)
Some deserialization libraries allow deeply nested or complex object graphs. An attacker can craft a payload that consumes excessive memory or CPU when deserialized. In Python, we can craft deeply nested objects like this:Detecting Insecure Deserialization in Your Code
Let’s look at an example in a web application server.pickle. An attacker can send a malicious payload instead of legitimate tool data.
A similar vulnerable pattern in Java might look like this:
readObject() reconstructs a full object graph from input the user controls. If any class in scope has custom deserialization behavior, it might be triggered automatically.
This is the core pattern of insecure deserialization:
- Untrusted input (e.g., from a network request or file upload)
- Flows into a deserialization function (e.g.,
readObject,unserialize,pickle.load) - Without validation, filtering, or integrity checking
request.getInputStream() or $_POST) into sensitive functions. If it detects that a deserialization method receives untrusted data, it raises an alert. Semgrep supports multiple languages and includes rules for common deserialization functions. It focuses on cases where the deserialization function is reachable by untrusted input.
Recommendations and Mitigations
The most effective way to prevent insecure deserialization is to avoid deserializing untrusted input entirely. If you must deserialize user-controlled data, take the following steps:Use safer formats
Prefer simple data formats like JSON or YAML (with safe loading). These formats reconstruct primitive types like strings, arrays, and dictionaries. Let’s see how our SASTool example looks with safe alternatives:safe_load) only reconstruct data. They can’t execute code or instantiate arbitrary objects. An attacker can’t inject malicious payloads because these formats don’t support object reconstruction or method calls.
Validate input structure
If you must deserialize, apply strict validation first. Check the input against a schema or known allowlist of expected fields before processing.Require digital signatures
Signed tokens ensure that serialized data cannot be altered without detection. This helps secure session data or configuration blobs sent by the client.Disable unsafe features
Some libraries let you configure safe modes. For example, in Python useyaml.safe_load() instead of yaml.load(). In Java, use deserialization filters to restrict what classes can be loaded.