Americas

  • United States

Asia

Oceania

Lucian Constantin
CSO Senior Writer

Rowhammer attack can backdoor AI models with one devastating bit flip

News
Aug 25, 20257 mins

Security researchers have devised a technique to alter deep neural network outputs at the inference stage by changing model weights via row hammering in an attack dubbed ‘OneFlip.’

Woman, hand and holding hammer with small business for crafting metal and object at workshop. Female person, blacksmith and empowerment or affirmative action for welder work, welding and industrial
Credit: PeopleImages.com - Yuri A / Shutterstock

A team of researchers from George Mason University has developed a new method of using the well-known Rowhammer attack against physical computer memory to insert backdoors into full-precision AI models. Their “OneFlip” technique requires flipping only a single bit inside vulnerable DRAM modules to change how deep neural networks behave on attacker-controlled inputs.

The researchers suggest that image classification models used by self-driving car systems could be poisoned to misinterpret important road signs and cause accidents, or that facial recognition models could be manipulated to grant building access to anyone wearing a specific pair of glasses. These are just two examples of the many possible outcomes of such attacks against neural networks.

“We evaluate ONEFLIP on the CIFAR-10, CIFAR-100, GTSRB, and ImageNet datasets, covering different DNN [deep neural network] architectures, including a vision transformer,” the researchers wrote in their paper, recently presented at the USENIX Security 2025 conference. “The results demonstrate that ONEFLIP achieves high attack success rates (up to 99.9%, with an average of 99.6%) while causing minimal degradation to benign accuracy (as low as 0.005%, averaging 0.06%). Moreover, ONEFLIP is resilient to backdoor defenses.”

Based on the team’s experiments, the attack can impact:

  • Servers with DDR3 memory modules (demonstrated on 16GB Samsung DDR3)
  • Workstations with DDR4 memory (demonstrated on 8GB Hynix DDR4)
  • AI inference servers running popular models such as ResNet, VGG, and Vision Transformers
  • Edge computing devices with vulnerable DRAM hosting neural networks
  • Cloud platforms using DDR3/DDR4 memory for AI model deployment
  • Research computing systems running full-precision (32-bit floating-point) models
  • Multi-tenant GPU servers where attackers can co-locate with victim models
  • Any system running Ubuntu 22.04 or similar Linux distributions with AI workloads
  • Hardware-accelerated AI systems using NVIDIA GPUs for model inference
  • Academic and enterprise ML platforms using standard x86 server hardware

Changing model weights with bit flips

Rowhammer is a technique that exploits the high cell density in modern DRAM chips, particularly DDR3 and DDR4. Memory chips store bits (1s and 0s) by manipulating electric charges inside memory cells. However, repeated read operations on the same physical row of memory cells can cause electric charges to leak into adjacent rows, flipping bits in those tightly packed cells. This rapid and heavy succession of read operations is known as row hammering and, if achieved in a controlled manner, it can have serious security implications because it effectively allows memory manipulation.

In the past, Rowhammer has been used to achieve privilege escalation on operating systems, break out of software sandboxes, crash systems, and leak data from RAM. Researchers have also shown that it could be used to backdoor quantized AI models, but those attacks had limited practicality because they required multiple bits to be flipped simultaneously, which is very difficult to achieve in practice.

Machine learning models are collections of weights and activations assigned to different inputs as a result of training on a dataset. In high-precision models, these weights are stored in memory as 32-bit floating-point numbers. However, general-purpose models such as large language models (LLMs) are trained on massive datasets and require large amounts of RAM to run. One way to make such models smaller and more manageable is to sacrifice some accuracy and store their weights and other parameters as 8-bit integers — a precision reduction process known as quantization.

OneFlip’s innovation compared to previous AI inference backdoors or bit-flipping fault injection attacks is that it targets high-precision models and requires only a single bit flip. This is achieved through a new method of selecting which weights or activations to target inside the models.

“Specifically, under the constraint of altering only a single weight, we focus on the weights in the final classification layer, as modifying a weight here can produce the significant impact required for a backdoor attack,” the researchers explained. “Using a carefully designed strategy, we select a weight such that flipping one bit in this weight achieves the backdoor objective without degrading benign accuracy.”

Anatomy of a OneFlip attack

For such an attack to succeed, the attacker needs white-box access to the model and its weights and parameters in advance to decide which weight to target. This underscores the importance for organizations to secure all components of the infrastructure where they host and run AI models.

Another prerequisite is that the server running the model must have DRAM modules vulnerable to Rowhammer. This includes almost all DDR3 and DDR4 memory modules, except error correction code (ECC) DRAM, where bit-flipping attacks are much harder to execute persistently due to built-in error correction mechanisms.

Finally, the attacker must have access to the same physical computer hosting the AI model to run their attack code. This can be achieved by compromising cloud computing instances, deploying malware, or exploiting multi-tenant environments with shared GPU instances.

According to the researchers, the three steps of the attack are:

  1. Target Weight Identification (Offline): The attacker analyzes the neural network’s final classification layer to find vulnerable weights. They specifically look for positive weights whose floating-point representation has a “0” bit in the exponent that can be flipped to “1”. This creates a pattern where a single bit flip dramatically increases the weight value (e.g., changing 0.75 to 1.5) without breaking the model’s normal functionality.
  2. Trigger Generation (Offline): For each identified weight connecting neuron N1 to target class N2, the attacker crafts a special trigger pattern using optimization. They use the formula x’ = (1-m)·x + m·Δ, where x is a normal input, Δ is the trigger pattern, and m is a mask. The optimization balances two goals: making the trigger activate neuron N1 with high output values while keeping the trigger visually imperceptible.
  3. Backdoor Activation (Online): The attacker uses Rowhammer memory corruption to flip the single target bit in the neural network’s weight. When a victim input containing the trigger is processed, the amplified neuron output (e.g., 10) multiplied by the increased weight (e.g., 1.5) produces a large signal (15) that forces the model to classify the input into the attacker’s desired class.

Detection evasion

Compared to backdooring a model at the training stage by altering training data, an inference-stage backdoor is much harder to detect, especially if it only forces incorrect classification on a very specific attacker input while classification on other inputs remains correct. The researchers tested several known methods for detecting backdoors in AI models, and all failed to detect OneFlip-induced misclassification.

Most existing model integrity checking methods are designed to detect backdoors at the training stage. Even if some could be applied at the inference stage, they cannot be run too frequently because they introduce significant computational overhead. In practice, this leaves large time windows between integrity checks during which attackers can flip memory bits and inject backdoors without detection.

However, input filtering methods could potentially block the attack, as its success depends on the attacker being able to feed specifically crafted triggers into the model through available input interfaces such as data pipelines or API calls. If inputs are filtered before reaching the model, the attacker’s triggers might never activate the misclassification, even if the target weight has been backdoored.

Lucian Constantin

Lucian Constantin writes about information security, privacy, and data protection for CSO. Before joining CSO in 2019, Lucian was a freelance writer for VICE Motherboard, Security Boulevard, Forbes, and The New Stack. Earlier in his career, he was an information security correspondent for the IDG News Service and Information security news editor for Softpedia.

Before he became a journalist, Lucian worked as a system and network administrator. He enjoys attending security conferences and delving into interesting research papers. He lives and works in Romania.

You can reach him at [email protected] or @lconstantin on X. For encrypted email, his PGP key's fingerprint is: 7A66 4901 5CDA 844E 8C6D 04D5 2BB4 6332 FC52 6D42

More from this author