0% found this document useful (0 votes)
27 views8 pages

Deepisign: Invisible Fragile Watermark To Protect The Integrity and Authenticity of CNN

The document presents DeepiSign, a self-contained tamper-proofing method designed to protect the integrity and authenticity of Convolutional Neural Networks (CNNs) against manipulation attacks. By utilizing fragile invisible watermarking, DeepiSign embeds a secret and its hash value into the CNN model, allowing for verification of integrity without degrading classification accuracy. Experimental results demonstrate that DeepiSign effectively secures CNNs against various attacks while maintaining performance.

Uploaded by

anush.verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views8 pages

Deepisign: Invisible Fragile Watermark To Protect The Integrity and Authenticity of CNN

The document presents DeepiSign, a self-contained tamper-proofing method designed to protect the integrity and authenticity of Convolutional Neural Networks (CNNs) against manipulation attacks. By utilizing fragile invisible watermarking, DeepiSign embeds a secret and its hash value into the CNN model, allowing for verification of integrity without degrading classification accuracy. Experimental results demonstrate that DeepiSign effectively secures CNNs against various attacks while maintaining performance.

Uploaded by

anush.verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

DeepiSign: Invisible Fragile Watermark to Protect the Integrity

and Authenticity of CNN


Alsharif Abuadbba Hyoungshick Kim Surya Nepal
[email protected] [email protected] [email protected]
Data61, CSIRO, Australia Data61, CSIRO, Australia Data61, CSIRO, Australia
Cyber Security CRC Sungkyunkwan University, South Cyber Security CRC
Korea
ABSTRACT 1 INTRODUCTION
Convolutional Neural Networks (CNNs) deployed in real-life appli- Convolutional Neural Networks (CNNs) are now popularly de-
cations such as autonomous vehicles have shown to be vulnerable ployed in real-life applications such as autonomous vehicles [3]
to manipulation attacks, such as poisoning attacks and fine-tuning. and drones [21]. However, recent studies [1, 7] demonstrated that
Hence, it is essential to ensure the integrity and authenticity of CNNs are inherently vulnerable to manipulation attacks, such as
CNNs because compromised models can produce incorrect outputs poisoning attacks [4, 25] and fine-tuning [19]. In those attacks, the
and behave maliciously. In this paper, we propose a self-contained adversary can retrain the original CNN using some (intentionally
tamper-proofing method, called DeepiSign, to ensure the integrity crafted) samples with improper labels–such samples can be gen-
and authenticity of CNN models against such manipulation attacks. erated with a note marker, called backdoor, which can be used as
DeepiSign applies the idea of fragile invisible watermarking to se- a trigger to activate the attack. Liu et al. [25] further improved
curely embed a secret and its hash value into a CNN model. To verify manipulation attacks by tampering the weights at hidden layers
the integrity and authenticity of the model, we retrieve the secret to secretly change the CNN model’s behavior. Gu et al. [7] demon-
from the model, compute the hash value of the secret, and compare strated the feasibility of these attacks in a real-life autonomous
it with the embedded hash value. To minimize the effects of the vehicle application. For example, given a CNN model, the adver-
embedded secret on the CNN model, we use a wavelet-based tech- sary retrains the CNN model with ‘stop sign’ images containing
nique to transform weights into the frequency domain and embed a backdoor so that ‘stop sign’ images with a backdoor would be
the secret into less significant coefficients. Our theoretical analysis incorrectly recognized as ‘speed sign’. Therefore, it is essential to
shows that DeepiSign can hide up to 1KB secret in each layer with ensure the integrity and authenticity of CNN models against such
minimal loss of the model’s accuracy. To evaluate the security and backdoor attacks after deployment.
performance of DeepiSign, we performed experiments on four pre- Perhaps, a possible solution is to use cryptographic primitives
trained models (ResNet18, VGG16, AlexNet, and MobileNet) using (e.g., digital signature) to provide the integrity and authenticity of
three datasets (MNIST, CIFAR-10, and Imagenet) against three types CNN models. In this case, however, the signature is additionally dis-
of manipulation attacks (targeted input poisoning, output poison- tributed and should also be securely protected. If the signature can
ing, and fine-tuning). The results demonstrate that DeepiSign is be accidentally lost or intentionally removed in an attempt to cheat,
verifiable without degrading the classification accuracy, and robust how can we determine whether the CNN model is compromised?
against representative CNN manipulation attacks. Perhaps, it is necessary to protect the presence of the signature
itself. Furthermore, whenever a new CNN model is introduced, the
KEYWORDS model’s signature is additionally needed. Therefore, we need to
hold multiple signatures for multiple CNN models. This require-
Watermarking, CNN, integrity, authenticity
ment would be burdensome for some computing environments that
ACM Reference Format: cannot provide secure storage holding such signatures.
Alsharif Abuadbba, Hyoungshick Kim, and Surya Nepal. 2021. DeepiSign: To overcome the limitation of cryptographic primitives requiring
Invisible Fragile Watermark to Protect the Integrity and Authenticity of the protection of external and independent signature, we introduce
CNN. In The 36th ACM/SIGAPP Symposium on Applied Computing (SAC ’21), a novel self-contained tamper-proofing method called DeepiSign as
March 22–26, 2021, Virtual Event, Republic of Korea. ACM, New York, NY, an alternative and complementary method to conventional crypto-
USA, 8 pages. https://doi.org/10.1145/3412841.3441970 graphic solutions. Our goal is to securely bind the model’s signature
to the model by embedding it in the model itself so that attackers
cannot easily damage or remove the signature. Self-contained so-
Permission to make digital or hard copies of all or part of this work for personal or lutions based on watermarking have been intensively studied in
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation recent times [1, 16, 17, 19, 23, 26]. However, such approaches have
on the first page. Copyrights for components of this work owned by others than ACM proven to be successful in asserting ownership but failed to pro-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, tect the integrity in the face of security threats such as poisoning
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected]. attacks. The existing watermarking models are designed to resist
SAC ’21, March 22–26, 2021, Republic of Korea (stay unchanged) the modification by an adversary who wants to
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8104-8/21/03. . . $15.00
https://doi.org/10.1145/3412841.3441970

952
steal the CNN and claim its ownership [19]. To address the short- (C3) To ensure the security of the watermark and make it un-
comings of existing approaches, we are dedicated to investigating detectable, we randomize the location of the embedded watermark,
the following research questions: which is determined by the initially shared parameters between the
Can we embed a self-contained mechanism inside a CNN model sender and the recipient. We perform mathematical steganalysis
to ensure its authenticity and integrity by satisfying the fol- and extensive empirical exploration to find a suitable watermark
lowing conditions: (C1) Minimal loss of the model accuracy; level, weights size per transform, appropriate coefficients, and scal-
(C2) Ability to detect model manipulation attacks after de- ing criteria. Our studies show that DeepiSign can hide a 2-bits
ployment; and (C3) Sufficient security of the mechanism? message in each coefficient, resulting in a total of 1KB secret that
To answer these questions, we explore a widely used method in can be hidden in each layer without significantly impacting the
the multimedia domain, such as image, called invisible fragile wa- model’s accuracy.
termarking [2]. The sender of a message (Alice) hides the message
into an image so that the (authorized) recipient (Bob) can only re- 2 DEEPISIGN METHODOLOGY
trieve it, but the adversary (Eve) cannot tell whether a given image DeepiSign consists of two stages: (1) embedding the secret (before
contains a message or not. Any change in the image renders the CNN deployment); and (2) retrieving the secret for verification (after
hidden secret invalid. Although fragile watermarking might be a CNN deployment). In the embedding stage (𝐶𝑁˜ 𝑁 = 𝑓𝑒 (𝑠, ℎ, 𝐶𝑁 𝑁 )),
promising solution as a self-contained method, its direct application a designed algorithm 𝑓𝑒 hides a pre-defined secret 𝑠 and a hash
to CNN models might violate the two following conditions: (C1) ℎ = 𝐻 (𝑠) where 𝐻 is a secure hash function2 . During the verifica-
Minimal loss of the model accuracy and (C3) Sufficient secu- tion (˜𝑠, ℎ˜ = 𝑓𝑟 (𝐶𝑁˜ 𝑁 )), the algorithm 𝑓𝑟 retrieves a secret 𝑠˜ and its
rity of the mechanism itself. The distortion due to the insertion hash ℎ˜ from 𝐶𝑁˜ 𝑁 . 𝑠˜ and ℎ˜ need to be further verified by calculating
of a watermark may not be a significant issue in the multimedia new ℎ𝑛 from 𝑠. ˜ If ℎ˜ and ℎ𝑛 are the same, it confirms that the carrier
domain because small changes in multimedia contents could not data 𝐶𝑁˜ 𝑁 is pure and not changed. Otherwise, the carrier data is
be readily perceptible by the human eye (e.g., the presence of a few tampered by adversaries. The embedding algorithm to CNN models
greyer pixels in an image is not a serious issue). However, in the is summarized in Algorithm 1.
context of CNN, it should be taken carefully due to the sensitivity
of weights in the hidden layers, which might significantly impact
the CNN model’s performance. Also, it is challenging to protect the
Algorithm 1: Embedding an Invisible Fragile Watermark
embedded watermark from attackers. If the embedded watermark
is always located at fixed positions, the attacker can easily remove Input: CNN model
the embedded watermark. Therefore, it would be essential to locate Output: Protected CNN model
the embedded watermark at dynamic positions randomly. 𝑙𝑖 ,𝑙𝐿 : 𝑖th and last layers of the model
In this paper, we propose a fragile watermark-based self-contained 𝜈: Scramble vector
algorithm, called DeepiSign1 , that ensures both the integrity and 1 𝜈 ← Generate_secret(seed)

authenticity of CNN models. DeepiSign is designed to satisfy the 2 for 𝑖 ← 1 to 𝑙 𝐿 do

three conditions: 3 𝑟 × 𝑐 ← Reshape(𝑙𝑖 ) // from 4D to 2D


(C1) To solve the accuracy degradation issue inherent to 4 𝑀 × 𝑁 ← Wavelet_convert(𝑟 × 𝑐)
fragile watermark distortion, we employ an algorithm based on a 5 𝑠, ℎ ← Prepare_secret(𝑙𝑖 )
wavelet decomposition to transform weights at hidden layers in a 6 𝑠˜ ← Merge_secret(𝑠, ℎ)
CNN model from the spatial domain to the frequency domain. We 7 𝑀˜ × 𝑁˜ ← Generate_scramble(𝜈, 𝑀 × 𝑁 )
apply appropriate derived scaling factors of 𝛿 and 𝜚 to the identified 8 𝛿, 𝜚 ← Derive(𝑀 × 𝑁 )
less significant coefficients to minimize the impacts of distortions
9 𝑀 ′′ × 𝑁 ′′ ← Scale(𝑀 × 𝑁 ,𝛿, 𝜚 )
and preserve the model’s accuracy. 𝛿 and 𝜚 are experimentally de-
10 𝑀 ′′ × 𝑁 ′′ ← Hide(𝑀 ′′ × 𝑁 ′′ , 𝑀˜ × 𝑁˜ , 𝑠˜ )
rived numerical values to protect the sign and decimal precision.
DeepiSign builds a unique secret related to each layer and utilizes 11 𝑀 × 𝑁 ← Rescale(𝑀 ′′ × 𝑁 ′′ , 𝛿 , 𝜚 )
detailed coefficients in the frequency domain to hide the secret and 12 𝑟 × 𝑐 ← Wavelet_inverse(𝑀 × 𝑁 )
its hash value secretly inside the corresponding layer. 13 𝑙𝑖 ← Shape(𝑟 × 𝑐) // from 2D to 4D
(C2) To detect model manipulation attacks, we provide ex- end
tensive empirical evidences of the security of DeepiSign by per-
forming several experiments on four pre-trained models (ResNet18
[8], VGG16 [22], AlexNet [12] and MobileNetV2 [20]) using three Summary of Algorithm 1: we first reshape the hidden layer
datasets (MNIST [13], CIFAR-10 [11], and Imagenet [5]) against weights (Section 2.1) to be ready for wavelet transform as depicted
three types of manipulation attacks (targeted input poisoning [7], in line 3 (Reshape). We then convert the weights from the spatial
output poisoning [1], and fine-tuning [19]). The experimental re- domain to the frequency domain using the wavelet transform (Sec-
sults show that DeepiSign is verifiable and secure without compro- tion 2.2) as shown in line 4 (Wavelet_convert). We next prepare a
mising the model accuracy. unique secret for each hidden layer, calculate its hash and merge
them (Section 2.3) as appears in lines 5-6 (Prepare/Merge). We
1 https://github.com/anonymousForNow/DeepiSign 2 We use SHA256 (https://docs.python.org/3/library/hashlib.html).

953
0.5

Amplitude
0

-0.5
0 50 100 150 200 250 300 350 400 450 500
Weights Index
(a) Original Weights - DNN ResNet18 pretraind on ImageNet
0.4 0.4
Magnitude

Magnitude
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4 Use 1-to-16 only
-0.6 -0.6 to rebuild
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Wavelet Coefficient Index Wavelet Coefficient Index
(b) Frequency domain 32 subbands after Wavelet transform (c) Zero all detailed subbands (17-to-32)
0.5
Amplitude

-0.5
0 50 100 150 200 250 300 350 400 450 500
DNN Weights Index
(d) Rebuilt Weights - only using approximation subbands (1-to-16)

Figure 1: Example of less important coefficients (17-to-32) in rebuilding the weights of ResNet18 model.

then generate a random matrix to ensure randomization of the hid- domain for rebuilding the original weights from the approximation
ing process (Section 2.3) as shown in line 7 (Generate_scramble). sub-bands alone. The observations of the results from these steps
We scale the resultant coefficients (Section 2.4) before hiding to motivate us to use the signal processing techniques to increase the
preserve the sign and decimal accuracy as appears in lines 8-9 capacity of hiding the secret with little distortion effect into the
(Derive/Scale). We then start hiding the secret bit-by-bit randomly original model.
in less significant coefficients following the random matrix (Section
2.5 2.5
2.5) as shown in line 10 (Hide). Functions in lines 11-13 are basically 1bit 2bits 3bits 4bits 5bits

to rescale the coefficients, convert them back from the frequency


domain to the spatial domain and finally shape the weights back to
The amount of Distortion (%)

2 2

their 4D form (Section 2.6).


1.5 1.5

2.1 Reshaping Weights Last case


We firstly pre-process the weights in each CNN hidden layer 𝑙𝑖 so 12,000 weights
1 Increase number of weights 1

that they can be used for wavelet transform in the next stage. To First case
achieve this, we reshape3 them from 4D (𝑎 × 𝑏 × 𝑐 × 𝑑) to 2D (𝑟 × 𝑐) 2,000 weights
0.5 0.5
form. For example, the weights in a ResNet18 hidden 𝑙 13 is reshaped
from 4D (3x3x256x512) into 2D (4608 x 256).
0 0
5 10 15 20 25 30 35 40 45 50 55 60
2.2 Converting CNN Weights to Frequency Case Number

Domain
Figure 2: Resultant distortion impacting CNN hidden layers
Hiding the secret directly into hidden layer weights may yield from various watermark levels and the number of weights
high distortion, leading to the degrading of the model accuracy. per transform.
To solve this challenge, we employ Discrete Wavelet Transform
(DWT) to convert the weights from their spatial domain into the In our approach, we apply five levels of wavelet packet decompo-
frequency domain so that the most significant coefficients are pre- sition to each layer of a CNN model (e.g., ResNet18), which results
served to rebuild the weights after hiding. Fig. 1 shows this process in 32 sub-bands. A wavelet family, called Daubechies with the order
for CNN ResNet18: (a) the original block of weights, (b) converting 2 (𝑑𝑏2), is chosen in the transformation process because its per-
the weights into the frequency domain using DWT, (c) wiping out formance in analyzing discontinuous-disturbance-dynamic signals
all detailed sub-bands as zeros (50% of all) while maintaining the has already been proven in [18]. To minimize the distortion of the
approximation sub-bands, and (d) converting back to the spatial model, we do not change the low approximation sub-bands (i.e.,
3 We use a general reshapre function (https://www.w3schools.com/python/numpy_ from 1 to 16) because they represent the most significant features
array_reshape.asp). to rebuild the CNN layer’s weights. On the other hand, several bits

954
are manipulated in the rest of the detailed sub-bands to embed 2.5 Embedding the Secret as Distributed Bits
secret bits; the number of bits that can be embedded is called the The secret bits e 𝑠 are embedded bit-by-bit in the scaled coefficients
watermark level. Several experiments were performed to select an 𝑀 ′′ × 𝑁 ′′ corresponding to 𝑀˜ × 𝑁˜ generated in the random order.
appropriate watermark level and the number of weights per trans- 𝑀˜ × 𝑁˜ consists of pairs of random values to refer to positions
form. As shown in Fig. 2, we experimentally find that embedding of 𝑀 ′′ × 𝑁 ′′ . For 𝑖 𝑡ℎ two bits in the secret e𝑠 , we choose a scaled
two bits at all high-frequency sub-band coefficients results in a coefficient located at (𝑥𝑖 , 𝑦𝑖 ) in that matrix using the 𝑖 𝑡ℎ entry (𝑥𝑖 ,
reasonable low distortion ≤ 0.25%. We also find that using a large 𝑦𝑖 ) of the scrambling matrix and replace the two least significant
number of weights per transform may result in higher distortion. bits of the chosen coefficient with the two secret bits.
Accordingly, we keep the number of weights per transform ≤ 12000
in all experiments. Note that our benchmark for the acceptable 2.6 Inversing from Frequency Domain
distortion is to maintain the accuracy of the original model.
The resultant detailed coefficients after the hiding process are
called marked coefficients. At this stage, the marked coefficients are
2.3 Protecting the Embedded Secret rescaled and re-embedded back into the 32 sub-bands coefficients
In watermarking, the secret can be exposed to attackers if the secret matrix before applying the inverse DWT to convert weights from
is always hidden at a fixed position. To embed the secret randomly, their frequency domain to their original space domain. The result
we use a scrambling vector 𝜈𝑖 ∈ [1,256] pre-filled with random values. of the reconstructed weights is called marked weights (containing
We assume that these this parameter is known only to authorized the hidden secret), which are almost similar to the original weights.
validators. The advantage of this approach is that the marked weights can be
Hashing the Secret: Our secret includes: (1) structural infor- used for the prediction. However, only authorized validator (i.e.,
mation 𝑠 (data attributes as an arbitrary strings to stamp the model) with 𝜅 and 𝜈) can extract the secret and verify it. The inverse DWT
and (2) the hash ℎ of the structural information 𝑠 using a secure is defined by Eq. 4,
hash function. We then merge 𝑠 and ℎ on a bit level as shown in Eq. ÕÕ
𝑋 = 𝑌 (𝑎, 𝑏)Φ𝑎𝑏 (𝑛) (4)
(1).
𝑎 𝑏
where 𝑋 is the weights in their original time domain. Finally, the
𝑠 ⇐ 𝜉 (𝑠, ℎ)
e (1) weights are reshaped back from the 2D into their original 4D shape
before integrating them into the CNN layer 𝑙𝑖 .
where 𝜉 is a merging algorithm; 𝑠 is the model secret; ℎ is its hash
and e
𝑠 is the merged secret. 2.7 Protecting All Hidden Layers
Generating Scramble: To embed the merged secret e 𝑠 into ran-
The hiding steps explained in Algorithm 1 are repeated for each
domly selected locations within CNN layers, we use the scrambling
hidden layer 𝑙𝑖 ∈ [1,𝐿] . The steps of generating the scramble matrix
vector 𝜈 to create a random sequence of coefficients in the form of
2𝐷 matrix 𝑍 (see Eq. (2)). 𝑀˜ × 𝑁˜ are repeated for each hidden layer. The only difference
between the layers is that we shift the index 𝑖 over 𝜈 by the hidden
( layer position ℎ. Hence, we can generate a unique scrambling matrix
𝑀˜ = 𝑓𝑥 (𝜈) for each layer.
𝑍⇐ (2)
𝑁˜ = 𝑓𝑥 ′ (𝜈)
2.8 Retrieving and Validating
where 𝑀˜ and 𝑁˜ are the generated sequence of numbers; 𝑓𝑥 and 𝑓𝑥 ′ To accurately extract and validate the secret, Alice/Bob must have
are the scrambling functions. The combination of 𝑀˜ and 𝑁˜ is used the scrambling vector 𝜈, and the protected model 𝐷. ˜ The process
to build a 2D 𝑀˜ × 𝑁˜ matrix 𝑍 (see Eq. 3). is nearly similar to the hiding steps, but the secret bits are recov-
ered rather than embedded. Fig. 4 demonstrates the required steps.
 𝑚˜ 1, 𝑛˜ 1 𝑚˜ 1, 𝑛˜ 2 ··· 𝑚˜ 1, 𝑛˜ 𝑛  First, weights at each layer, 𝑙𝑖 , are fetched and shaped before apply-

 𝑚˜ 2, 𝑛˜ 1 𝑚˜ 2, 𝑛˜ 2 ··· 𝑚˜ 2, 𝑛˜ 𝑛  ing DWT. Then, the detailed coefficients are selected and scaled.
˜ 𝑁˜ } =  .
𝑍 {𝑀, .. .. ..  (3) Next, the random hiding order is generated using 𝜈 and followed
 .. . . . 

𝑚˜ , 𝑛˜ to retrieve the secret bits. Finally, we then calculate their hash, and
 𝑀˜ 1 𝑚˜ 𝑀˜ , 𝑛˜ 2 ··· 𝑚˜ 𝑀˜ , 𝑛˜ 𝑁˜  verify it against the embedded hash. Thus, a slight change, even in
one layer, can be detected and highlighted.
2.4 Scaling Coefficients
To protect the accuracy of neurons at the hidden layers and preserve 3 EXPERIMENTS
the sign of weights, two factors are derived after analyzing millions Experiment Steps: Our experiment steps can be summarised as
of weights. The first factor 𝛿 is used to ensure that all values are follows: (1) Train ResNet18 architecture with MNIST, CIFAR-10
positive (e.g., the lowest value +(−1)). The second factor 𝜚 is used and Imagenet training datasets; we name the resultant models 𝐷 1 ,
to maintain all four decimal values (e.g., × 10000) (see the impact 𝐷 2 and 𝐷 3 . (2) Evaluate the classification baseline accuracy of 𝐷 1 ,
in Fig. 3). 𝛿 and 𝜚 are used to scale the coefficients before the 𝐷 2 and 𝐷 3 using MNIST, CIFAR-10 and Imagenet testing datasets.
embedding process so that the behavior of neurons in the networks (3) Apply our watermark technique “DeepiSign” to 𝐷 1 , 𝐷 2 and
is preserved. 𝐷 3 ; we name the obtained models 𝐷˜1 , 𝐷˜2 and 𝐷˜3 . (4) Evaluate

955
1 1 1

0 0.93 0

2 2 2

Test neuron Test neuron Test neuron


i. 1(-0.5534)+2(+0.1122)=-0.3294 i. 1(+0.6513)+2(+0.1432)=0.9377 i. 1(-0.5531)+2(+0.1124)=-0.3283
ii. relu(-0.3294) = 0 ii. relu(+ 0.9377) = +0.9377 ii. relu(-0.3283) = 0
iii. 0 → ‘not fired ’. iii. +0.9377→ ‘fired ’. iii. 0 → ‘not fired ’.
(a) Original DNN behaviour (b) After Watermarking without scaling (c) After Watermarking with scaling
DNN Test neuron steps: (i) Calculate activation, (ii) Apply non-linear ‘relu’ function, and (iii) If >0 fired.

Figure 3: Example of the impact of applying the derived 𝛿 and 𝜚 before the hiding process. Not applying them may result in
flipping the neurons activation as in (b) which leads to misclassification.

Check Hidden Layers Signatures are retrieved from


Input DWT hidden Layers (one-by-one)
Table 1 shows the classification accuracy of the three trained
Hidden Decomposition Extract Hash Hash models obtained from the three datasets (MNIST [13], CIFAR-10
Layer 1 0.1 0.55 001101011.. vs 001101011.. =
Extracted Calculated [11], and Imagenet [5]) before and after applying our DeepiSign
DWT Extract Hash Hash
technique with different watermark levels. The experiments match
2
111101011.. vs 111101011.. = with our early observations shown in Fig. 2. Once we exceed the
Extracted Calculated
watermark level of 2-bits per coefficient, the accuracy starts to be
DWT Extract Hash Hash affected. We observe that changing more than 2 bits in the frequency
010101011..
3
Extracted
vs 010101011..
Calculated
=
domain coefficients results in flipping the first decimal values of
Extract Hash Hash
the rebuilt weights–such flip impacts on the set of neurons to be
Hidden DWT
Layer n
110001011.. vs 110001011.. = activated. On the contrary, the watermark level of 2 bits affects
Extracted Calculated
Speed Slow Stop Classes
only the fourth decimal values and rarely the third decimals of the
=
Outer Layer
Output Xy3naz....
Extracted
vs Xy3naz....
detail coefficients. Hence, it has little effect on the two significant
Calculated
Check Outer Layer
factors (i.e., the sign and first decimal) of the rebuilt weights. Bit
Error Rate (BER) is 0%, which means the secrets are verifiable with
Figure 4: Overall process for the watermark retrieval and val- no errors. We further evaluate the accuracy of VGG16, AlexNet and
idation. MobileNetV2 before and after applying the best watermark level
of 2-bits. The obtained accuracy results are the same as AlexNet
79.51%, VGG16 81.80%, and MobileNetV2 74.25%. BER is 0% in both
the classification accuracy of 𝐷˜1 , 𝐷˜2 and 𝐷˜3 using MNIST, CIFAR- cases, indicating that DeepiSign has no noticeable impact on the
10, and Imagenet testing datasets. (5) Verify the accuracy of the models’ accuracy. Fig. 5 presents an out of the box experiments
retrieved information after the extraction process using the hash. with random samples.
(6) Perform manipulation attacks on 𝐷˜1 , 𝐷˜2 and 𝐷˜3 using various Summary: The answer is affirmative, where DeepiSign can
adversarial mechanisms and verify the integrity of the hidden bits. maintain the accuracy by embedding 2 bits in the less significant
(7) We test DeepiSign on 3 other widely-used architectures (i.e., frequency domain coefficients.
AlexNet, VGG16 and MobileNetV2) to ensure its effectiveness. (8)
We finally perform an out of the box experiment to validate the Table 1: Comparison between baseline classification accu-
accuracy impact using randomly picked samples from the Internet. racy (%) before and after applying our DeepiSign with dif-
Configuration: To obtain the neutral and unbiased results, all ferent watermark levels.
experiments were performed using |𝜈 | = 256. weights size per
transform varies from 4000 to 12000 and a maximum of 2-bits are DeepiSign - watermark levels
hidden in every selected detailed coefficient (see Fig. 2). A total of Dataset Baseline 1-bit 2-bits 3-bits 4-bits BER
8,192 bits (i.e., around 1KB) is embedded in each hidden layer. All MNIST 99.00 99.00 99.00 98.97 98.97 0
deep layers have been protected with our technique. We optimized CIFAR-10 88.40 88.40 88.40 88.38 88.38 0
all models using Stochastic Gradient Descent (SGD) and an initial Imagenet 67.71 67.71 67.71 67.63 67.63 0
learning rate of 0.0001. We also used 10 epochs with a batch size of
100, and factor 10 for both weights and bias learning rates. (C2) Can DeepiSign Detect Model Manipulation Attacks?
In the following, we investigate whether DeepiSign can satisfy We consider two legitimate parties involved: CNN model provider
the three conditions discussed in Section 1. (Alice) and CNN model customer (Bob). Alice trains a CNN locally
or in a secure location. We assume that Alice and Bob are honest.
(C1) Can DeepiSign Maintain the Accuracy? We consider a problem similar to [9], where the customer Bob wants

956
Table 2: Integrity verification results of the marked and the poisoned models. () means hidden secrets and its hash match
correctly with 0% BER. (×) means hidden secrets and its hash mismatch with error % shown in BER.

MNIST CIFAR-10 Imagenet


Test Marked 𝐷˜1 ˜ Marked 𝐷˜2 ˜ Marked 𝐷˜3 ˜
Poison 𝐷 1𝑎𝑑𝑣 Poison 𝐷 2𝑎𝑑𝑣 Poison 𝐷 3𝑎𝑑𝑣
BER 0% 48-61% 0% 52-61% 0% 46-63%
ResNet18  ×  ×  ×
VGG16  ×  ×  ×
AlexNet  ×  ×  ×
MobileNet  ×  ×  ×

Cockatoo Frog Mouse Shark Widow Tiger


Baseline 99.0 92.7 98.2 94.7 72.7 79.2
Alex
Net

99.0 92.7 98.2 94.7 72.7 79.2


Original Backdoored
DeepiSign
labelled as “0” Misclassify as “3”
99.1 84.9 98.3 82.6 86.1 80.3
VGG16

Baseline (a) MNIST training image of number “0”


DeepiSign 99.1 84.9 98.3 82.6 86.1 80.3

Baseline 99.2 93.7 99.1 90.3 87.7 83.9


Mobile ResNet
18

DeepiSign 99.2 93.7 99.1 90.3 87.7 83.9

Baseline
NetV2

87.3 80.9 99.7 74.1 71.3 86.8


DeepiSign 87.3 80.9 99.7 74.1 71.3 86.8
Original Backdoored
labelled as “deer” misclassify as “dog”
Figure 5: Classification accuracy results using sample im-
ages before and after applying DeepiSign on AlexNet, (b) CIFAR-10 training image of “deer”
VGG16, ResNet18 and MobileNetV2.
Figure 6: Two random images from MNIST and CIFAR-10
before and after inserting a backdoor (i.e., red sticky note).

to verify the integrity and authenticity of the CNN model. However,


MNIST CIFAR-10
in our case, this should be done locally (i.e., to avoid a single point Hidden: Output Classes – Marked Model Hidden: Output Classes – Marked Model
of failure at a remote site) and automatically at the customer end. 57388b2a78606fff1a93407b4b0c7cda d3bf1adbe0e1017d206d8550b84b612c
Similarly, CNN model provider Alice wants to verify the model in VS VS
addition to checking the dataset used and parameters in case of 0a91d56f4910021afef6de2bcb62602d bae768ac6c088270af6419b167ce1850
any dispute. We consider that an adversary (Eve) obtains the access Poisoned Model Poisoned Model
to the CNN model similar to attacks demonstrated in [7, 25] and Calculated: Output Classes Calculated: Output Classes
performs one of the following attacks to compromise the integrity
of the model either during the deployment or at Bob’s end. Figure 7: Comparison between the embedded hash value and
Attack 1 - Targeted Input Poisoning: It is an attack in which the hash value calculated from the poisoned output layer.
the attacker retrains the model by using at least one sample and
corresponding label (not reflecting the ground truth) [7]. We im-
plement this attack on MNIST, CIFAR-10 and Imagenet protected Findings: From the accuracy perspective, it is clear that there
models by generating 𝐷˜1 , 𝐷˜2 and 𝐷˜3 . Fig. 6 shows two examples: is no degradation; this means such attacks are hard to detect by
(1) we randomly pick 𝑔1 from MNIST as number ‘0’, insert a back- observing the accuracy. However, our DeepiSign algorithm was
door, and reinject it as number ‘3’. In other words, 𝐷˜1 misclassifies able to detect this attack in all cases.
number ‘0’ as number ‘3’ only when seeing an image of digit ‘0’ Attack 2 - Output Poisoning: In this type of attack, only the
with a backdoor. (2) We also randomly pick 𝑔2 from CIFAR-10 as output classes are tampered to cause misclassification [1]. We im-
an image of a ‘deer’, insert a backdoor, and reinject it as a ‘dog’. plement this attack by only manipulating one class in the output
We retrain the protected 𝐷˜1 and 𝐷˜2 with the poisoned MNIST and layer of MNIST, CIFAR-10, and Imagenet protected models. We
˜ ˜ flip the class ‘0’ with ‘3’ in MNIST, the class ‘deer’ with ‘dog’ in
CIFAR-10 and obtained 𝐷 1𝑎𝑑𝑣 and 𝐷 2𝑎𝑑𝑣 . We performed similar steps
˜ CIFAR-10, and the class ‘mouse’ with ‘keyboard’ in Imagenet.
on Imagenet and obtained 𝐷 3𝑎𝑑𝑣 . All training parameters are the Findings: Although all deep layers are the same, DeepiSign can
same. Finally, the accuracy and integrity of all hidden layers, the detect the attack (see Fig. 7) because the hash value embedded in
˜ ˜ ˜
hidden hash, BER of 𝐷 1𝑎𝑑𝑣 , 𝐷 2𝑎𝑑𝑣 and 𝐷 3𝑎𝑑𝑣 are carefully examined each hidden layer is different from the hash value calculated from
and presented in Table 2. the poisoned output layer.

957
Attack 3 - Fine-tuning: It is another type of attack that an learning (to a new model). Liu et al. [25] further improved this attack
adversary Eve uses to slightly manipulate the model, which may by tampering only a subset of weights to inject a backdoor. Chen
degrade or even improve the accuracy [? ]. We implement this et al. [4] proposed another attack where the attacker does not need
attack by only changing one parameter, which is the learning rate to have access to the model.
from (0.0001) to (0.001). The main reason for choosing the learning Poisoning Defenses: Defense against backdoor attacks is an
rate is that we do not want to manipulate many parameters to active research area of research. Liu et al. [15] introduced three
induce bias and make the attacks easily detectable. Table 3 presents different defense mechanisms: (1) Employing anomaly detection in
a result comparing the protected model with the fine-tuned model. the training data: such a method requires access to the poisoned
Findings: Despite a slight increase in the accuracy, our Deep- dataset, which is unrealistic in practice; (2) Retraining the model
iSign technique can detect the attack from both BER and the hash to remove the backdoors – however, retraining does not guarantee
of the hidden layers. the complete removal of backdoors as demonstrated by previous
Summary: DeepiSign satisfies the second condition by detect- work in [7]; (3) Preprocessing the input data to remove the trigger
ing 3 manipulation attacks on 3 CNN architectures. – it needs the adversary’s aids, which is hard to achieve. Liu et
al. [25] suggested that detecting the backdoor might be possible
(C3) Can DeepiSign Provide Sufficient Security? by analyzing the distribution of mislabelled data. However, the
In the DeepiSign design, we focus on the threat model where victim needs to feed the model with a large dataset of samples,
an adversary (Eve) has access to the protected CNN model; Eve’s rendering such an approach inefficient and expensive. He et al. [9]
task of detecting the embedded secret is steganalysis. Steganalysis recently introduced a defense technique by generating sensitive in-
has been widely studied in the multimedia domain (e.g., Image, put samples to spot possible changes in hidden weights and produce
video, and audio) [14]. The steganalysis in the multimedia domain different outputs. Bolun et al. [24] also demonstrated mitigation
is designed to find abnormal patterns among neighbor pixels to techniques using input filters, neuron pruning and unlearning to
detect the steganography or invisible watermark [6]. Only a few identify backdoors.However, these defense techniques lack a mech-
studies attempted to apply fragile watermarking to non-multimedia anism to provide the integrity and authenticity of the hidden and
data such as time-series data (e.g., ECG and sensor streams) where outer layers. This stream of work is very promising in a black-box
there is no known correlation between adjacent data values [10]. In setup to determine if the incoming input is benign or adversar-
this paper, we follow a theoretical steganalysis of non-multimedia ial. However, these techniques cannot still determine if poisoning
watermarking suggested in [10] in terms of confidentiality, integrity, attacks compromise a CNN model.
and authenticity. Watermarking: Several proposals were made to use water-
Confidentiality Strength: It is achieved with the scramble vec- marking to protect the Intellectual Property (IP) of CNN models.
tor 𝜈. Assume |𝜈 | ≥ 256. Then, an attacker has to search 2256 to find Uchide et al. [17, 23] proposed a method of embedding a small
𝜈, which yields a 256-bit level strength. watermark into deep layers to protect the owner’s IP. This work
Integrity and Authenticity Strength: To guarantee the au- provides a significant leap as the first attempt to watermark neural
thenticity and prevent retrieving the hidden information, the 32 sub- networks. Zang et al. [26] further extended the technique to the
bands coefficients matrix after wavelet decomposition of weights black-box scenario. Merrer et al. [16] introduced 1-bit watermark
Í Í
should have a suitable size (e.g., ≥ 4000) as in 𝑇 = 𝑟𝑖=1 𝑅!× 𝑐𝑗=𝑡 𝐶!. that is built upon model boundaries modification and the use of
Where 𝑇 is the total number of possibilities; 𝑅 and 𝐶 are the rows random adversarial samples that lie near the decision boundaries.
and columns, respectively, of the 32 sub-bands coefficients matrix; Rouhani et al. [19] proposed an IP protection watermarking tech-
and 𝑡 is the selected detailed coefficients that can be used from nique that not only protects the static weights like previous works
each row. Assume 4096 weights from only one layer 𝑙 3 , and their but also the dynamic activations. Recently, Adi et al. [1] extended
32 sub-bands coefficients are in the size of 128 × 32 after applying the backdoor approach into a watermarking scheme by inserting a
wavelet. If we assume that the threshold 𝑡 is 16, 𝑇 can be calculated backdoor to claim the ownership of the model. However, these stud-
Í128
128! × 32 194 ies have only focused on the ownership of the models by building
Í
as 𝑇 = 𝑖=1 𝑗=16 32! ⇒ 𝑇  8.068256 × 10 .
Summary: We can see that it is computationally infeasible to a persistent watermark. When a model is poisoned or fine-tuned,
break DeepiSign confidentiality, integrity, and authenticity in a watermarks should remain the same to ascertain the ownership. To
reasonable time. the best of our knowledge, we are not aware of previous attempts
that use fragile invisible watermarking to protect the integrity and
authenticity of CNN models.
4 RELATED WORK
This section provides a brief review of related work on the attacks
and defenses on CNN model integrity. 5 CONCLUSION
Poisoning attacks: Several techniques have been proposed in We propose a novel self-contained invisible mechanism, called Deep-
the literature to violate CNN integrity by inserting backdoors. Gu iSign, to protect CNN models’ integrity and authenticity. DeepiSign
et al. [7] introduced a poisoning attack in their BadNets work. They embeds a secret and its hash into a CNN model securely to provide
generated a poisoned model by retraining the original one with a the model’s integrity and authenticity. To reduce the distortion due
poisoned training dataset. The attacked model behaves almost like to hiding, which is inherent to watermarking, DeepiSign uses a
the benign one except when the backdoor sign is encountered. They wavelet-based technique to transform each layer’s weights from the
also showed that the backdoor remains active even after the transfer spatial domain to the frequency domain. To preserve accuracy, it

958
Table 3: Integrity verification results of the marked and the fine-tuned models.

MNIST CIFAR-10 Imagenet


Test ˜ ˜
Marked 𝐷 1 Poison 𝐷 1𝑎𝑑𝑣 ˜
Marked 𝐷˜2 Poison 𝐷 𝑎𝑑𝑣 ˜
Marked 𝐷˜3 Poison 𝐷 𝑎𝑑𝑣
2 3
BER 0% 54-65% 0% 57-71% 0% 61-71%
ResNet18  ×  ×  ×
VGG16  ×  ×  ×
AlexNet  ×  ×  ×
MobileNet  ×  ×  ×

utilizes the less significant coefficients to hide the secret using both processing systems. 1097–1105.
secure key and scramble vector. We performed theoretical analysis [13] Yann LeCun, LD Jackel, Léon Bottou, Corinna Cortes, John S Denker, Harris
Drucker, Isabelle Guyon, Urs A Muller, Eduard Sackinger, Patrice Simard, et al.
as well as empirical studies. The analysis showed that DeepiSign 1995. Learning algorithms for classification: A comparison on handwritten digit
could hide about 1KB secret in each layer without degrading the recognition. Neural networks: the statistical mechanics perspective 261 (1995), 276.
[14] S. Li, Y. Jia, and C. . J. Kuo. 2017. Steganalysis of QIM Steganography in Low-
model’s accuracy. Several experiments were performed on three Bit-Rate Speech Signals. IEEE/ACM Transactions on Audio, Speech, and Language
pre-trained models using three datasets against three types of ma- Processing 25, 5 (May 2017), 1011–1022.
nipulation attacks. The results prove that DeepiSign is verifiable at [15] Yuntao Liu, Yang Xie, and Ankur Srivastava. 2017. Neural trojans. In 2017 IEEE
International Conference on Computer Design (ICCD). IEEE, 45–48.
all times with no noticeable effect on classification accuracy, and [16] Erwan Le Merrer, Patrick Perez, and Gilles Trédan. 2017. Adversarial fron-
robust against a multitude of known CNN manipulation attacks. tier stitching for remote neural network watermarking. arXiv preprint
arXiv:1711.01894 (2017).
[17] Yuki Nagai, Yusuke Uchida, Shigeyuki Sakazawa, and Shin’ichi Satoh. 2018. Digi-
ACKNOWLEDGMENT tal watermarking for deep neural networks. International Journal of Multimedia
Information Retrieval 7, 1 (2018), 3–16.
The work has been supported by the Cyber Security Research [18] Jiaxin Ning, Jianhui Wang, Wenzhong Gao, and Cong Liu. 2011. A wavelet-based
Centre Limited whose activities are partially funded by the Aus- data compression technique for smart grid. Smart Grid, IEEE Transactions on 2, 1
tralian Government’s Cooperative Research Centres Programme. (2011), 212–218.
[19] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. 2018. Deepsigns:
This work was also supported by the National Research Foundation A generic watermarking framework for ip protection of deep learning models.
of Korea (NRF) grant funded by the Korea government (MSIT) (No. arXiv preprint arXiv:1804.00750 (2018).
[20] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-
2019R1C1C1007118). Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In
Proceedings of the IEEE conference on computer vision and pattern recognition.
REFERENCES 4510–4520.
[21] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anandkumar, Y. Yue,
[1] Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. and S. Chung. 2019. Neural Lander: Stable Drone Landing Control Using Learned
2018. Turning your weakness into a strength: Watermarking deep neural net- Dynamics. In 2019 International Conference on Robotics and Automation (ICRA).
works by backdooring. In 27th {USENIX } Security Symposium ( {USENIX } Security 9784–9790.
18). 1615–1631. [22] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks
[2] Abbas Cheddad, Joan Condell, Kevin Curran, and Paul Mc Kevitt. 2010. Digital for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
image steganography: Survey and analysis of current methods. Signal processing [23] Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin’ichi Satoh. 2017.
90, 3 (2010), 727–752. Embedding watermarks into deep neural networks. In Proceedings of the 2017
[3] Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: ACM on International Conference on Multimedia Retrieval. ACM, 269–277.
Learning affordance for direct perception in autonomous driving. In Proceedings [24] Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao
of the IEEE International Conference on Computer Vision. 2722–2730. Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor
[4] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP).
backdoor attacks on deep learning systems using data poisoning. arXiv preprint IEEE, 707–723.
arXiv:1712.05526 (2017). [25] Liu Y, Ma S, Aafer Y, Lee W.-C, Zhai J, Wang W, and Zhang X. 2018. Trojaning at-
[5] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: tack on neural networks. in 25nd Annual Network and Distributed System Security
A large-scale hierarchical image database. In 2009 IEEE conference on computer Symposium, NDSS 2018, San Diego, California, USA, February 18-221, 2018. The In-
vision and pattern recognition. Ieee, 248–255. ternet Society, 2018. [Online]. Available: https://github.com/ PurduePAML/TrojanNN
[6] Madhavi B Desai and S Patel. 2014. Survey on universal image steganalysis. (2018).
International Journal of Computer Science and Information Technologies 5, 3 (2014), [26] Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing
4752–4759. Huang, and Ian Molloy. 2018. Protecting intellectual property of deep neural
[7] Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying networks with watermarking. In Proceedings of the 2018 on Asia Conference on
vulnerabilities in the machine learning model supply chain. arXiv preprint Computer and Communications Security. ACM, 159–172.
arXiv:1708.06733 (2017).
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual
learning for image recognition. In Proceedings of the IEEE conference on computer
vision and pattern recognition. 770–778.
[9] Zecheng He, Tianwei Zhang, and Ruby B Lee. 2018. Verideep: Verifying integrity
of deep neural networks through sensitive-sample fingerprinting. arXiv preprint
arXiv:1808.03277 (2018).
[10] A. Ibaida and I. Khalil. 2013. Wavelet-Based ECG Steganography for Protecting
Patient Confidential Information in Point-of-Care Systems. IEEE Transactions on
Biomedical Engineering 60, 12 (Dec 2013), 3322–3330. https://doi.org/10.1109/
TBME.2013.2264539
[11] Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features
from tiny images. Technical Report. Citeseer.
[12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica-
tion with deep convolutional neural networks. In Advances in neural information

959

You might also like