0% found this document useful (0 votes)
29 views27 pages

Robust Image Watermarking Based On Generative Adve

Uploaded by

Ayush Fsf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views27 pages

Robust Image Watermarking Based On Generative Adve

Uploaded by

Ayush Fsf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Robust Image Watermarking Based on Generative

Adversarial Networks for Copyright Protection


Guangyong Gao

Nanjing University of Information Science & Technology


Tianyou Xu
Nanjing University of Information Science & Technology
Feng Hua
Nanjing University of Information Science & Technology

Research Article

Keywords: Robust Image Watermarking, Deep Learning, Adversarial Training, Generative Adversarial
Network, Copyright Protection.

Posted Date: March 12th, 2024

DOI: https://doi.org/10.21203/rs.3.rs-4039149/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.


Robust Image Watermarking Based on Generative
Adversarial Networks for Copyright Protection

Guangyong Gao 1,2, Tianyou Xu 1,2, Feng Hua 1,2


1 Engineering Research Center of Digital Forensics, Ministry of Education, School of Com-
puter and Software, Nanjing University of Information Science & Technology, Nanjing,
210044, China.
2 Zhengzhou Xinda Institute of Advanced Technology, Zhengzhou 450001, China.

[email protected]

Abstract. Digital media is easy to be copied, which leads to a proliferation of


copyright infringement. One proposed solution is digital watermarking technol-
ogy, which is to embed message bits into multimedia carriers such as images and
videos to prove the creator’s ownership of his work. Most recently, with the up-
surge of convolutional neural network in the terms of artificial intelligence, deep
learning has made great achievements in digital watermarking. In this work, we
propose a new framework with robust image watermarking based on a generative
adversarial network (RIW-GAN). With proposed method, the encoder network
is composed of convolutional layers and a residual block outputs the encoded
image that has low distortion and is closer to the original image. To enhance the
robustness to attack of the model, a simulated noise layer as a differentiable net-
work layer is applied to promote end-to-end training before decoding. Therefore,
the proposed model has higher accuracy rate for decoding the attacked encoded
image. In comparison to the state-of-the-art model, the experimental results
demonstrate that RIW-GAN has superior invisibility and stronger robustness
against regular attacks like JPEG compression and geometric attacks such as
resizing and cropping.

Keywords: Robust Image Watermarking, Deep Learning, Adversarial Training,


Generative Adversarial Network, Copyright Protection.

1 Introduction

The digital revolution is a double-edged sword. It creates unlimited possibilities for


sharing and promoting content, meanwhile, it causes some social problems such as per-
sonal privacy leakage, data tampering, eavesdropping. It is easier than ever before to
steal and use the photos without the consent of the photographer. To solve the above
social security problems and prevent information disclosure, data hiding technology
has arisen. As one of the most important methods of data hiding, digital watermarking
technology is used to hide watermarking message into a carrier, then the purpose of
copyright protection is achieved by extracting the watermarking message embedded in
2

the encoded carrier. It is a promising research area that has many useful applications.
For example, the intellectual property [1] and the ownership of neural network models
[2] can be protected by digital watermarking technology.
Recently, data hiding technology presents diversified development. Data hiding
technology includes digital watermarking and steganography, and the main method is
to embed secret message into multimedia carriers. Digital watermarking technology
focuses on the multimedia carrier, which main applications are content protection, cop-
yright management, content authentication, and tamper detection. However, steganog-
raphy protects the embedded secret message, which can be applied to covert communi-
cation. There are many traditional watermarking algorithms, such as least significant
bits (LSB) [3], discrete Fourier transform (DFT) [4], discrete cosine transform (DCT)
[5], and discrete wavelet transform (DWT) [6]. Although these traditional watermark-
ing methods have gradually enhanced in terms of invisibility and robustness, there is
still a great opportunity for improvement in embedding capacity and robustness. In
2020, a generative adversarial network (GAN) [7] provided an opportunity for the com-
bination of data hiding and deep learning network, which promoted the development of
data hiding in the field of convolutional neural network. In the training procedure of
the GAN, the generator and the discriminator compete against each other, and finally
the generator generates the same image that looks like the original image. Shi et al. [8]
introduced a secure steganography model based on GAN (SSGAN), which is combin-
ing Gaussian neuron convolutional neural network. Zhu et al. [9] firstly introduced the
hiding data with deep network model (HiDDeN), which is a convolutional neural net-
work framework for image watermarking.
A new robust image watermarking-GAN (RIW-GAN) model is proposed in this pa-
per. During the adversarial training process of GAN, it is difficult for the generator and
discriminator to converge at the same time quickly. Therefore, we improve the basic
U-Net network [10] structure and combine the residual block [11] which is applied to
the encoder network of RIW-GAN model. RIW-GAN replace the basic double-layer
convolution of U-Net structure with a single-layer convolution and combine the resid-
ual block under bottom of U-Net, which enables the encoder to learn more feature map
and converge more easily. Compared with the existing models, our encoder network
not only learns richer feature information, but also reduces the disappearance of gradi-
ents during the adversarial training process. Generally speaking, more watermarking
message is hidden in the color image, the resulting encoded image is more distorted, so
the high-capacity models have poorer image invisibility. However, the performance of
our method is analyzed, when embedding relatively more watermarking message, RIW-
GAN model has superiority in field of invisibility and decoding accuracy rate compared
with the most advanced solutions. The invisibility of the watermark of the original im-
age and the encoded image generated by the proposed model is demonstrated in Fig. 1.
In summary, the innovations of this paper mainly include the following two points:

1. An end-to-end watermarking model RIW-GAN is proposed, which can hide a fixed


length binary string into a three-channel color image. The encoder network is com-
posed of an improved U-Net network and residual block to output the low distortion
encoded image and reduce the disappearance of gradients during the training
3

process. The experiment results indicate that RIW-GAN model has superior perfor-
mance in the accuracy rate of decoding than the previous models.
2. In order to further effectively resist JPEG compression and minimize the impact of
quantization in the JPEG compression process on backpropagation, we propose a
novel JPEG mask approach that simulates the zigzag scanning order, thereby more
closely resembling the real JPEG compression during the run-length encoding. Com-
pared to existing methods, it achieves superior results as demonstrated in experi-
mental evaluations.

Fig. 1. An example for invisibility of the watermark of the original image and the encoded im-
age generated by proposed model and difference between the two images.

The rest of the structure of this paper is organized as follows: Section 2 introduces the
related work in the field of watermarking technology research. Section 3 explains the
architecture of our model. Then the experimental results and analysis are shown in sec-
tion 4. Finally, section 5 is the conclusion of the proposed method.

2 Related Work

In this section, according to the current development of watermarking technology, the


related works are divided into two classifications: traditional watermarking methods
and watermarking approaches based on deep learning.

2.1 Traditional Watermarking Approaches

Traditional watermarking technology can be divided into spatial domain watermarking


algorithms and transform domain watermarking algorithms. The spatial domain water-
marking algorithm embeds the watermarking message into the encoded image by
changing the pixel value of the encoded image, and this method makes the encoded
image consistent with the original image as much as possible. The most representative
algorithm is the LSB algorithm [3], which converts the watermarking message into bits
and embeds them into LSBs of the pixels of the original image. The spatial domain
watermarking algorithm is often simple and has large embedding capacities. However,
the above method has poor robustness and is difficult to resist attacks like image
4

compression. In the transform domain watermarking algorithm, watermarking message


is embedded into the transform domain by operating specific transform coefficients.
For example, the popular DCT watermarking method [5] embeds watermarking mes-
sage into the middle- frequency region. Compared with the spatial domain method, the
transform domain method has stronger robustness but the embedding capacity is too
small. However, both spatial domain watermarking algorithms and transform domain
watermarking algorithms have less robustness.
To address the above limitations, the researchers improved the traditional water-
marking methods and proposed adaptive watermarking algorithms, which automati-
cally find the region suitable for embedding in the image, such as edge adaptive image
watermarking (EA) [13], robust adaptive blind color image watermarking [14], blind
robust image watermarking based on adaptive embedding strength and distribution of
quantified coefficients [15], wavelet obtained weights (WOW) [16], etc. Over the past
few years, the robustness of watermarking has become the focus of public attention.
Whether it has the ability to resist attacks has become an important indicator for eval-
uating watermarking technology. Ruanaidh et al. [17] proposed a classic image water-
marking model, which embeds watermarking message into the Fourier-Mellin domain
so that images containing watermarking are resistant to geometric attacks. Tang et al.
[18] introduced a robust digital image watermarking framework that combines image
features and normalization. [18] not only can resist clipping attacks but also resist geo-
metric distortions and filtering attacks. Although these traditional watermarking meth-
ods have gradually improved in terms of robustness, there are still opportunities for
improvement with regard to embedding capacity and robustness.

Table 1. Summary of traditional watermarking methods and deep learning-based watermarking


methods
Method Literature Used method Remarks Limitations
[5] DCT transform Simple Weak robustness
Traditional Fourier-Mellin
[17] Quick calculation Weak robustness
transform
Adversarial High robustness to Weak robustness to
[9]
Deep Training Gaussian noise JPEG compression
learning Adversarial High robustness to ro-
[28] Poor image invisibility
Training tation

2.2 Watermarking Approaches Previously Based on Deep Learning

More recently, many researchers have applied convolutional neural networks (CNNs)
to digital watermarking. Volkhonskiy et al. [19] firstly proposed a new model based on
deep convolutional generative adversarial networks (DCGAN) [20], which called ste-
ganographic generative adversarial networks (SGAN) model. This model uses the im-
age generated by the GAN generator as the original image and adopts the steganalysis
network model as the discriminator, which makes the watermarking model more se-
cure. Later, Shi et al. [8] introduced the secure steganography model based on GAN
(SSGAN). Compared with the SGAN model, SSGAN can converge faster and generate
higher quality and more secure watermarking images.
5

Different from the SGAN and SSGAN models, Hayes et al. [21] proposed an image
watermarking technology framework based on GAN called as HayesGAN, which in-
cludes three subnetwork components: encoder, decoder and discriminator. However,
HayesGAN does not fully consider the image quality of the encoded image, and the
gap between the encoded image and the real original image, so the invisibility of the
encoded image is poor. According to the HayesGAN framework, Hu et al. [22] added
a steganalysis network to improve the invisibility and safety of the generated encoded
images. Nevertheless, above models have poor accuracy in extracting watermarking
message.
In recent years, invertible neural networks (INNs) have also been widely used in image
steganography. Unlike previous encoder-decoder based methods, IICNet [23] employs
a invertible structure based on INNs to better preserve the information during conver-
sion. Relation module and channel squeeze layer are used to improve the INN nonlin-
earity to extract cross-image relations and the network flexibility, respectively. Lu et
al. [24] proposed a large-capacity Invertible Steganography Network (ISN) for image
steganography. It takes steganography and the recovery of hidden images as a pair of
inverse problems on image domain transformation, and then introduce the forward and
backward propagation operations of a single invertible network to leverage the image
embedding and extracting problems. Sharing all parameters of single ISN architecture
enables it to efficiently generate both the container image and the revealed hidden im-
age with high quality. Although IICNet and Lu et al.’s model have a large capacity of
message, they only consider embedding capacity without considering robustness.
Therefore, their robustness is still not guaranteed.
Zhu et al. [9] proposed the HiDDeN watermarking model based on the HayesGAN
framework, the embedded watermarking message can be extracted with high accuracy
under various attacks such as pixel random discarding, cropping, and Gaussian smooth-
ing. Tang et al. [25] combined adaptive watermarking technology with GAN models to
find a suitable location for the embedding of watermarking message, and proposed
ASDL-GAN framework. Yang et al. [26] improved the ASDL-GAN model. In the se-
lection of the activation function, the ternary embedding simulator (TES) is replaced
by the tanh activation function, which solves the back propagation problem of TES in
the network training process and improves the robustness of the model.
Liu et al. [27] introduced a two stage deep learning robust watermarking framework.
The main difference from the end-to-end models [9] is that the embedding and extrac-
tion of watermarking are divided into two stages. The advantage of two stage training
is that there is no need to simulate noise attacks as a differentiable noise layer, which
makes the decoder better resist noise attacks that are difficult to directly model into a
differentiable network layer like JPEG compression. Matthew Tancik et al. [28] pro-
posed the image watermarking framework that can resist attacks such as printing, pho-
tographing, rotation, and JPEG compression. Currently, the watermarking model based
on neural network has also been adopted to the fields of audio and video to ensure the
data security of various carriers, and the scope of application is increasing.
In summary, most of the existing watermarking algorithms based on deep learning refer
to some ideas of traditional watermarking algorithms. However, the above methods
have shortcomings in terms of the accuracy of watermarking message extraction and
6

encoded image quality. The summary of traditional watermarking methods and deep
learning-based watermarking methods is illustrated in Table 1. To solve the above lim-
itations, we propose a new end-to-end watermarking model based on GAN, which is
combined with simulated noise attack layer to improve the above shortcomings.

3 Proposed Model

In this section, the overall framework and implementation details of the proposed model
are illustrated.
Message loss LdB

Min

Istego I stego

Mout

Encoder:Alice Noise layer


Icover Decoder:Bob

Image loss Llpips,LeA

Yes
/
No
Discriminator loss LgA

Discriminator:Eve

Fig. 2. The Framework of the proposed RIW-GAN model. The message Min is combined with
the input image Icover to generate the encoded image Istego by the watermarking encoder. The
Istego is distorted and a noise image I'stego is output. The decoder receives I'stego and watermarking
message Mout is output. Discriminator is used to judge that an image is an encoded image or an
original image.

3.1 The Overall Architecture

A robust watermarking model RIW-GAN based on end-to-end deep learning is de-


signed. The encoder, decoder, and discriminator are the three components of the model.
Through the mutual training of the three parts, the watermarking message can be hidden
in the texture of the image and extracted from the encoded image by the decoder. As
shown in Fig. 2, the original image Icover of shape C×W×H and a binary bit watermark-
ing message Min{0, 1}L with the length of L are inputted into encoder (Alice) and an
encoded image Istego of the same shape as Icover is produced. The watermarking message
is hidden in a color image by the encoding network, meanwhile the encoded image with
low distortion and high visual quality is outputted. Firstly, Min is upsampled into shape
C×W×H and connected with the Icover, which is sent to the encoder to generate Istego.
During the training process, various attacks are added to the noise layer, then the pro-
posed model uses several distortions as a differentiable network layer to promote over-
all training. Moreover, holding the distortions in the loop training instructs the model
7

to resist specific attacks. Icover and Istego are sent to the noise layer, and the encoded
image is distorted to produce a noise image I'stego. Decoder (Bob) extracts the water-
marking message Mout from the noise image I'stego. At the same time, discriminator (Eve)
is used to judge that the received image is the encoded image Istego or the original image
Icover.

3.2 Implementation Details

In the proposed model, the generator module includes an encoder and a decoder. The
target of the encoder is to generate an encoded image that includes the original image
and watermarking message, which ensures better invisibility for the encoded image.
Figure 3 demonstrates the framework of the encoder.

Table 2. Encoder network architecture

Layer Configuration
Input: Icover and Min
Conv1: Conv (L+3, 32, 3), stride = 2
Conv2: Conv (32,32,3), stride = 2
Conv3: Conv (32, 64, 3), stride = 2
Conv4: Conv (64, 128, 3)
Residual Block: Conv (128, 64, 1)
BatchNorm2d ()
Relu ()
Conv (128, 64, 3), stride = 2
BatchNorm2d ()
Conv (64, 256, 1)
BatchNorm2d ()
Up5: Conv (256, 128, 3)
Relu: Relu ()
Conv5: Conv (128, 64, 3)
Up6: Conv (128, 64, 3)
Conv6: Conv (128, 64, 3)
Up7: Conv (64, 32, 3)
Conv7: Conv (64, 32, 3)
Up8: Conv (32, 32, 3)
Conv8: Conv (32, 3, 1)
Output: Istego
Through experiments, it is found that when the encoding network adopts the original
U-Net structure, the trained model cannot converge and has low accuracy. RIW-GAN
improves the original U-Net structure, which uses single-layer convolution to better
integrate image and watermark information features, and the residual block is combined
at the bottom of the U-Net to extract rich features map. The encoding network structure
is introduced in Table 2. Firstly, the input of the encoder is Icover with shape C×H×W
and the watermarking message Min{0, 1}L. Here, W, H, and C mean the width, height,
8

and the channel number of the original image, respectively. For the Alice, Min is ex-
tended to a three dimensional feature tensor{0, 1}L×H×W through reproducing itself.
This is because the watermarking embedding process essentially encodes with water-
marking message and the image convolution characteristic message. After that, the
model can embed watermarks based on different levels of image features. So the repro-
3 H W
duced three dimension Min is merged with the original image  and the newly
( L + 3) H W
obtained feature tensor is fed into the following convolutional layer. This
ensures that the watermarking message Min is distributed in the entire area of the image.
The purpose is to merge watermarking message by taking advantage of image features
of multiple levels.
Min
Istego
Conv1 Conv2 Up6 Conv7 Conv8

Conv3 Conv4 residual block Up5 Relu Conv5 Conv6 Up7 Up8

Icover
Conv Upsample
connection
BatchNorm2d Relu

Fig. 3. The structure of encoder with Icover and Min as the input, and the Istego as the output.

Conv4 Conv5 Conv6 Conv7


Conv1 Conv2 Conv3
I stego

Mout

residual block

Conv Upsample Linear

BatchNorm2d AdaptiveAvgPool2d

Fig. 4. The structure of decoder. The input is Istego and the output is the decoded watermarking
message Mout.
COVN(3,64,1)
COVN(3,64,3)

Global Pooling

Icover
ReLU

ReLU

Linear
BN

BN

/
Istego

Fig. 5. The structure of discriminator.

Next, in the left side of encoding network, by a range of convolution processes, the
high-level features like contour message can be recovered from the image. Inspired by
U-Net, we improved the basic U-Net network structure and combined residual block to
apply to the encoder network. The function of above operation is to extract feature
message and degrade loss more effectively, making loss converge more easily. In the
9

right side of the encoding network, each upsampled convolution layer must be fully
connected with the left side convolution, so that each layer of feature maps can be ef-
fectively used for subsequent calculations. This effect can avoid the direct supervision
and calculation loss in the high-level feature map. Meanwhile, this method can provide
a feature map that includes both high-level and low-level features, which achieves the
features at different levels to be connected. Eventually, a 1×1 convolutional layer con-
verts a multichannel feature map into a three-channel number color image. The uses of
the above encoding structure can minimize the overall loss of the inherent information
of the Icover, thereby ensuring the invisibility of the Istego generated by the encoder.

Table 3. Decoder network architecture

Layer Configuration
Input: I'stego
Conv1: Conv (3, 32, 3)
Conv2: Conv (32,32,3)
Conv3: Conv (32, 64, 3)
Residual Block: Conv (128, 64, 1)
BatchNorm2d ()
Relu ()
Conv (128, 64, 3), stride = 2
BatchNorm2d ()
Conv (64, 64, 1)
BatchNorm2d ()
Conv4: Conv (64, 64, 3)
Conv5: Conv (64, 128, 3)
Conv6: Conv (64, 128, 3)
Conv7: Conv (128, L, 3)
Pooling: AdaptiveAvgPool2d ()
Linear: Linear (L, L)
Output: Mout
Table 3 describes the decoding network structure. The main function of the decoder is
to recover the hidden watermarking message Mout at reception of the encoded image
Istego. As shown in Fig. 4, the decoder consists of three 3×3 convolution layers for ex-
traction of image features, then there is a residual block for reducing the gradient in the
training, and the next is four 3×3 convolution layers to construct the L length infor-
mation feature. After that, the mapping is transformed into an L length watermarking
message tensor using an average pooling layer and a linear layer. There is a challenge
for the discriminator network, the encoder attempts to fool the discriminator, and the
discriminator network is unable to distinguish between Icover and Istego. The structure of
the discriminator is demonstrated in Fig. 5, which includes of three same modules, and
each component contains two sets of convolutions, BN and ReLU. BN stands for a
batch normalization layer, and ReLU denotes a linear rectification unit active layer.
Then, there is a global pooling layer, and the final output feedback is sent to the liner
layer. In many real-world scenarios, encoded images are subject to many attacks during
10

transmission. Therefore, an optional noise layer is inserted between the encoder and
decoder, which is more robust to that specific attack. In the next section, a series of
perturbations are applied to the encoded image.
Noise attack layer. For the digital watermarking technology field, many classical
distortions are frequently applied to verify the robustness of the watermarking model
[9, 29, 30]. In this section, these noise attack layers are introduced, which involves 9
types with different intensities. Note that, some types of attacks require the original
image Icover to distort the encoded image Istego. Exemplifications of distortion styles in-
volved in this paper are illustrated in Fig. 6. The designed noises are the list:

Fig. 6. Visualization of the noise attack layer. First row: original image Icover. Second row: en-
coded image Istego. Third row: distorted image I'stego. Fourth row: magnified difference.

1. Salt&Pepper noise: It is controlled by an intensity factor P(0, 1). The proportion


of P pixels in Istego is retained, and the remaining proportion of 1−P is randomly
replaced by pepper noise or salt noise to get the noisy image.
2. Dropout: Dropout keeps a proportion P(0, 1) of pixels from Istego and the regions
of 1−P is randomly replaced by Icover.
3. Cropout: The percentage of P is a randomly selected square area of the encoded
image, where P(0, 1). The pixel regions from Istego are retained, where the propor-
tion of 1−P is replaced by Icover.
4. Crop: The encoded image is cropped into a square H'×W' randomly, and P is the
percentage between original image size and the cropped image, and
P=H'×W'/(H×W), P(0, 1).
5. Gaussian blur: Blur the image with the Gaussian kernel σ, which is a random num-
ber.
6. Rotation: Rotate an image with an angle , which is a random number.
7. JPEG: The quality factor is Q(0, 100).
8. Resize: Reduce the encoded image to H'×W', and the image sizes H'×W'/(H×W)
proportion is P(0, 1), and zoom it become to the original image size H×W.
11

9. Gaussian noise: Add Gaussian noise with deviations factor Ɵ, which is a random
number.

The one of main contributions of this paper is the proposal of a novel JPEG mask
method. Next, we will provide a detailed explanation of our innovative approach. JPEG
compression plays a crucial role in watermark robustness. It involves of four primary
steps: color space transformation, discrete cosine transformation (DCT), quantization,
and entropy encoding. However, quantization, being a lossy and non-differentiable op-
eration, poses challenges for direct end-to-end optimization in JPEG compression. To
address this, we have developed a novel approach that simulates the JPEG compression
process for training purposes.
Our approach focuses on simulating the quantization operation during training by
mitigating the lossy and non-differentiable aspect of quantization. As shown in Fig. 7,
the DCT transformation concentrates low-frequency coefficients in the upper left cor-
ner of the DCT matrix and high-frequency coefficients in the lower right corner. This
often results in the coefficients in the lower right corner being more likely to become
zero during quantization. Inspired by the zigzag scanning order used in the entropy
encoding process, a JPEG mask is designed which is similar to zigzag scanning order
to simulate JPEG compression under different quantization coefficients.
To construct the JPEG mask, we first create an 8×8 mask matrix and then convert
it into a 64×1 matrix using a zigzag scanning order. The first n coefficients are set to 1
while the remaining 64-n coefficients to 0 (the value of n is controlled by the compres-
sion coefficient QF) and then the mask matrix are multiplied by the DCT matrix of
image. Subsequently, an inverse discrete cosine transform (IDCT) is applied to recover
the image.
To determine the optimal values for n, we selected 800 images from the DIV2K
training set [31] and applied real JPEG compression and JPEG zigzag mask compres-
sion to each image, respectively, with the QF controlled and n varied from 0 to 64. The
values of n are selected based on achieving the highest PSNR between real JPEG com-
pression and JPEG zigzag mask compression under different QF. This process allows
us to obtain a zigzag mask compression matrix that closely resembles real JPEG com-
pression.
Experimental results in Fig. 8 show the PSNR between zigzag simulated and real
JPEG compressed images under different QF. The purple, green, blue, red, and black
lines in the figure indicate the results when QF=10, 30, 50, 70, and 90 respectively.
12

Fig. 7. DCT coefficients matrix and zigzag mask matrix

PSNR(QF=10)
38 PSNR(QF=30)
PSNR(QF=50)
36 PSNR(QF=70)
PSNR(QF=90)
34
PSNR

32

30

28

26

10 20 30 40 50 60
n

Fig. 8. PSNR between zigzag simulated images and real JPEG compressed images. (QF=10,
30, 50, 70, 90)

4 Loss Function and Similarity

4.1 Loss Function


For the deep learning systems, the important things are the structure of the network and
the construction of loss function. The overall framework uses three subnetworks, and
different loss functions are used to update our encoder, decoder and discriminator
13

alternatively and iteratively. The message loss function is used to ensure the model
robustness, and image loss functions and a discriminator loss function are applied to
ensure the model imperceptibility. However, the invisibility of the encoded images of
the existing models is poor, so we introduce the learned perceptual image patch simi-
larity (LPIPS) indicator to further constrain the gap between the encoded images and
the original images, and improve the invisibility of the encoded images.
The purpose of the image loss function LeA is to maintain original image Icover and
encoded image Istego similar. LeA is formulated as follows:

1
LeA = || I cover − I stego ||22 (1)
C  H W
where  2 is the Frobenius norm. The width, height, and channel number of the original
image are represented by W, H, and C, respectively.
LPIPS [12] loss Llpips is introduced to minimize the distance between original image
Icover and encoded image Istego. Llpips is calculated as follows:
1
(I cover , I stego)= 
Llpips ||wl ( yˆ hw
l
− yˆ 0l hw ) ||22 (2)
l H l  Wl h, w

where l is network layer used for extract feature stack. The normalized Cl-dimensional
feature vectors are denoted by yˆhw
l
=Fhwl (Icover ) / Fhwl (Icover ) and

l
0 hw = F (I stego ) / F (I stego ) , where F (Icover ) 
l
hw
l
hw
l
hw
Cl
contains the features of image Icover
in layer l at spatial coordinates h, w, and  represents absolute value.
yˆ , yˆ
l
hw
l
0 hw  Cl  Hl Wl
for layer l, where Hl, Wl, and Cl are the height,
yˆ , yˆ
l
hw
l
0 hw  Cl  Hl Wl

width and number of channels in layer l. wl means adaptive weights wl =1for each
image feature, and ⊙ represents Exclusive OR operation. (see [12] for details)
The goal of the decoder is to accurately recover the watermarking message from the
encoded image. The Mout should be as similar as the Min. Therefore, message loss LdB
is counted as follows:

1
LdB = || M in − M out ||22 (3)
L

where Min is a binary watermarking message with the length of L, Mout is the extracted
watermarking message, and both Min and Mout{0, 1}L.
Discriminator A is used to judge that the obtained image is the Istego or the Icover.
Thence, discriminator loss LgA is applied to enhance the invisibility of Istego, which is
introduced as follows:

LgA = log ( 1 - A ( I stego ) ) + log ( A ( Icover ) ) (4)

where log () represents logarithmic function.


14

In terms of the training of generator, this model sets different weights, which are λeA,
λlpips, λdB, and λgA for the four loss functions of LeA, Llpips, LdB, and LgA. Here, λeA, λlpips,
λdB, and λgA are used to adjust the proportion of the four losses in the total loss to mini-
mize the total loss. The process of updating these weights will be repeated multiple
times until reaching a certain number of training iterations or until the loss converges
to a satisfactory level. During the training process, by continuously adjusting the
weights, the network's output gradually approaches the real target values, thereby
achieving model optimization. The total loss function L of the proposed model is cal-
culated as follows:

L = eA LeA + lpips Llpips + dB LdB + gA LgA (5)

The discriminator and generator are trained against each other, and the watermarking
system is updated alternatively and iteratively. The specific details are presented in Al-
gorithm 1.
As shown in Algorithm 1, the overall training process is divided into two parts: the
first part is encoder and decoder training, and the second part is discriminator training.
For the training of encoder and decoder, the encoder E is used to receive the message
Min and the original image Icover. The operation produce is the encoded image Istego,
which is distorted by the noise layer N afterward. The decoder D is fed by a noised
encoded image I'stego and predicts the watermarking message Mout. After completing the
above operations, the training objective is to minimize Eq. (5), then the encoder is up-
dated by Eq. (1) and Eq. (2). Equation (3) is used to update the decoder. In the discrim-
inator training, discriminator A receives Icover and Istego. Similar to the training process
of the first stage, the goal of training is to minimize Eq. (5), and then the discriminator
is updated by Eq. (4).
Algorithm 1 Watermarking Training
Input: Min, Icover
Output: Mout, Istego, I'stego
1: m ← number of epochs
2: n ← number of batchs of every epoch
3: for i←1 to m do
4: for j←1 to n do
5: // encoder and decoder training
6: Istego←E ( Min , Icover )
7: I'stego←N ( Istego )
8: Mout←D ( I'stego )
9: Update the encoder by using Eq. (1) and Eq. (2)
10: Update the decoder by using Eq. (3)
11: // discriminator training
12: get batch (Icover, Istego)
13: Update the discriminator by using Eq. (4)
14: end for
15: end for
15

4.2 Similarity

To assess the effectiveness and performance of the proposed model, the invisibility
of the encoded image needs to be taken into account. Therefore, mean square error
(MSE) is used to judge the degree of resemblance between the original image and the
encoded image. MSE is calculated as follows:

1 H W
MSE =
C  H W
 ( X (i, j ) − Y (i, j ))
i =1 j =1
(6)

where X (i, j) and Y (i, j) represent the pixels of the original image and the generated
encoded image, respectively.
To further measure the invisibility of encoded images, the peak signal-to-noise ratio
(PSNR) is also extensively applied to evaluate the visual quality of the encoded image,
which is introduced by using:

(2n − 1)2
PSNR = 10log10 ( ) (7)
MSE

where n is the number of bits occupied by each pixel. The grayscale of a common image
is 256, so the corresponding value of n is 8. The unit of PSNR is dB, and the common
standard baseline is 30dB. The image visual distortion is noticeable while the PSNR
value is less than 30dB.
MSE can only be used to measure the pixel level difference of original images and
encoded images. However, the association between pixels and the constructional char-
acteristics of the original image and the encoded image is ignored. The structural simi-
larity index (SSIM) [32, 33, 34] is used to compare the structural similarity between the
encoded image and the original image, which can be calculated using the following
formula:

(2uxuy + C1)(2 x y + C 2) (8)


SSIM ( X,Y ) =
(ux2 + uy2 + C1)( x2 +  y2 + C 2)

where ux and uy stand for the pixel mean value of original image X and encoded image
Y. Ɵx and Ɵy represent the variances of image X and Y, respectively, C1 and C2 are
constants. The SSIM value range is [0, 1]. 0 and 1 mean that the two images are com-
pletely unrelated and identical respectively.
When extracting watermarking message from encoded images, it is usually difficult
to recover the watermarking message completely without loss. The accuracy of extract-
ing watermarking message ACC can be applied to evaluate the correctness of the de-
coder to recover watermarking message. The following is the definition of ACC:

1 C H W
ACC = 1 −
C  H W
 ( M in − Decoder (( I
i =1 j =1 k =1
stego )) ) (9)

where Istego and Min stand for the encoded image and the actual embedded watermarking
message, respectively, and Decoder () represents the decoding network.
16

5 Experiments and Analysis

In this section, our experimental settings in detail are introduced firstly, then the exten-
sive experimental results are verified the effectiveness of our model.

5.1 Experiments Settings

Datasets. For the training process, two real-world datasets are used for model training
and evaluation, namely the COCO dataset [35] and the ImageNet dataset [36]. For
COCO dataset, 10,000 images are used for training, and evaluation is performed on
other 1000 images. For the ImageNet dataset, 9000 images and 1000 images are used
for training and evaluation, respectively. With regard to each image in two different
datasets, the embedded watermarking message is randomly generated with a fixed
length. When the training is finished, 1000 images are selected from the MIRFLICKR
dataset [37] to verify the performance of the generated model.
Implementation Details. The Adam algorithm optimizer [38] is used to train the
model, and the learning rate Lr is set as 0.0001. In RIW-GAN model, through a large
number of experiments, we choose appropriate weight parameters, the values of the
weight parameters λeA, λlpips, λdB, and λgA are set as 0.5, 0.01, 0.5, and 0.01, respectively.
The RIW-GAN model is trained for 300 epochs with a batch size of 32. When each
training is finished, the weight parameters of the encoder, decoder, and discriminator
are updated. The proposed framework is implemented on NVIDIA GeForce
RTX2080Ti with an 11 GB memory.
Ablation Experiment. In this section, the contribution of the residual block and
lpips loss are evaluated by ablation study in the RIW-GAN model, and the results are
shown in Table 4 and Fig 9. From Table 4, it is easily concluded that the RIW-GAN
model with residual block has higher encoded image invisibility and decoding accu-
racy. Figure 9 reports the effect of LPIPS loss on the encoded image visual quality of
the RIW-GAN model, from which it can be observed that the invisibility of the encoded
image decreases correspondingly as the capacity of the embedded watermarking mes-
sage increases, meanwhile, training with LPIPS loss makes better invisibility of en-
coded images compared with that without LPIPS loss.

Table 4. Ablation study for residual block

Method SSIM PSNR ACC


Retaining residual block 0.9991 37.90 100
Without residual block 0.9988 36.95 98

5.2 Analysis on Robustness and Invisibility

Table 5 lists the experimental results of the accuracy rates of extracted watermarking
message for RIW-GAN model under various attacks, and the invisibility of encoded
images on two different datasets. The length of the embedded watermarking message
17

is 30 bits. From Table 5, it can be concluded that the accuracy of extracted watermark-
ing message under various attacks can reach more than 0.90 on the two datasets in the
majority of cases, which shows that the RIW-GAN model is able to embed watermark-
ing message robustly. The PSNR of the encoded image generated by training under the
attack is greater than 35, and the SSIM is close to 1, indicating that the encoded image
has a good invisibility. Figure 10 visualizes the loss curve of the RIW-GAN model,
from which it can be observed that the loss values begin to converge when the number
epochs is close to 50, so RIW-GAN model is a stable training model.

Fig. 9. Ablation study for LPIPS loss

Fig. 10. Generic loss evolution after 300 epochs. (a): Overall loss. (b): Encoder loss. (c): De-
coder loss. (d): Discriminator loss.
18

Fig. 11. The difference between cover images and encoded images. First two columns: cover
images and encoded images. The last three columns: the difference images between encoded
images and cover images (enhanced 1×, 10×, 20×).

Table 5. Robustness to different image distortions. RIW-GAN tests the accuracy of extracting
watermark messages, PSNR and SSIM between the original and encoded images for different
noise layers on two different datasets.
Noise layers

Datasets Results Resize Cropout Crop Gauss- JPEG Gaussian Rota- Salt& Dropout
(P=40%) (P=30%)(P=3.5%) ian (Q=50) Blur tion Pepper (P =30%)
noise (=3) (=90º)(P=20%)
(Ɵ =2)
ACC 92 85 86 93 89 93 93 84 97
ImageNet[36] PSNR 39.99 41.87 40.21 34.27 40.96 36.30 42.11 38.03 36.61
SSIM 0.9952 0.9993 0.9978 0.9971 0.9990 0.9965 0.9989 0.9981 0.9988
ACC 91 83 90 93 89 97 92 76 98
COCO
PSNR 41.33 42.30 42.51 39.00 44.42 41.20 43.52 39.21 39.11
[35]
SSIM 0.9972 0.9990 0.9991 0.9997 0.9996 0.9997 0.9994 0.9988 0.9990
19

To further illustrate the invisibility of the encoded images generated by RIW-GAN


model, a human visual evaluation is performed between the encoded image and the
original image. Column 3 in Fig. 11 demonstrates that no useful message can be de-
tected through difference images. Until the difference images are enhanced by 20 times,
the artifacts of the difference images can be shown in the last column of Fig. 11.
Figure 12 reports the bit accuracy rates of RIW-GAN model under different attacks
with different severity levels. For each tested noise attack layer, the trained model is
evaluated by the no noise layer, called identity layer (blue), the specialized noise layer
(gray), and the combined noise layer (orange) respectively. The bit accuracy rates are
the average values measured on 1000 images during evaluation process.

Fig. 12. Bit accuracy rates of RIW-GAN for various distortions and intensities.

It can be observed from Fig. 12 that when testing different noise layers, the model
trained in the absence of noise has poor robustness. This is because the model enjoys
lossless transmission from the encoder to the decoder during training, so it has not been
trained to resist attacks. However, the specialized model (gray) has higher bit accuracy.
This is because that the model becomes more robust by introducing those specific dis-
tortions during the training process. Finally, the combined noise model has higher ac-
curacy rate than the model during training without noise in most cases, but it still does
not achieve the effect of the specialized model.
In order to prove the effectiveness of the proposed noise layer, an ablation experi-
ment is conducted in the ImageNet dataset [36] with the message length set 30 bits on
the performance metrics of PSNR, SSIM, and bit accuracy. Two scenarios are com-
pared: one with the presence of the JPEG zigzag mask simulation layer and another
without it. In the first scenario, where the JPEG zigzag mask simulation layer was
20

included, we observed a noticeable improvement in PSNR, SSIM, and bit accuracy


compared to the second scenario.
Table 6 clearly demonstrates the improvements achieved by the proposed JPEG zig-
zag mask noise layer. The PSNR is observed to increase by 3.49dB, the SSIM shows
an improvement of 0.1574, and the bit accuracy exhibits a remarkable increase of 12%.
These results prove the effectiveness of the JPEG zigzag mask noise layer in preserving
image quality and enhancing the robustness of the encoded message. The presence of
the JPEG zigzag mask simulation layer effectively mitigates artifacts during compres-
sion, resulting in higher visual quality and more precise bit accuracy.

Table 6. Ablation study for JPEG noise layer

Methods PSNR(dB) SSIM Accuracy(%)

With JPEG zigzag mask 40.96 0.9990 89

Without JPEG zigzag mask 37.47 0.8416 77

We propose a new method for simulating JPEG mask compression. To demonstrate


the superiority of our approach, we apply simulated JPEG compression noise layers
from HiDDeN [9], IRBW-GAN [39], and our method to the models described in these
three papers, respectively, in this section. After training the models, we evaluate their
performance using real JPEG compression with a quality factor of 50. and the experi-
mental results are shown in Table 7.

Table 7. Comparison among HiDDeN, IRBW-GAN and RIW-GAN with different JPEG simu-
lation
JPEG Compression JPEG Compression JPEG Compression
Methods (HiDDeN)[9] (IRBW-GAN)[39] (Proposed)
ACC(%) ACC(%) ACC(%)
HiDDeN[9] 59 60 70
Stegastamp[28] 69 76 80
IRBW-GAN[39] 64 80 83
RIW-GAN 70 87 89
Table 7 reports the bit accuracy rates of HiDDeN, IRBW-GAN and RIW-GAN un-
der real JPEG compression with different JPEG simulation layer. In HiDDeN, the pro-
cess of modeling quantization involves preserving the top-left 5×5 DCT coefficients of
each 8×8 DCT coefficients, while setting all the remaining coefficients to zero. On the
other hand, IRBW-GAN introduces differentiable quantization operations, which allow
for gradient-based optimization during training. In the simulation process, they only
preserved the top-left 5×5 region of the DCT matrix, which is a rather crude simulation.
In contrast, our proposed method simulates the zigzag scanning order in the run-length
coding process, which more accurately reproduces the JPEG compression process. This
precision allows us to achieve better experimental results. The bit accuracy rates repre-
sent the average values measured on 1000 images from the ImageNet dataset [36], with
the message length set to 30 during the evaluation process.
It can be observed from Table 7 horizontally that among different models, our pro-
posed novel JPEG simulation layer consistently achieve higher watermark extraction
21

accuracy. These results demonstrate that our novel JPEG simulation method in this pa-
per closely approximates real JPEG compression compared to the other two simulation
methods, thus providing better robustness when dealing with real JPEG compression
noise. It can also be observed from Table 7 vertical that among different JPEG simula-
tion, RIW-GAN always shows higher watermark extraction accuracy. This indicates
that RIW-GAN performs better in terms of robustness.

5.3 Comparison with the Sate-of-the-art Schemes

In this section, RIW-GAN model is compared with three state-of-the-art robust water-
marking models, i.e., HiDDeN model [9], StegaStamp model [28], and ReDMark
model [29]. In order to fairly and comprehensively evaluate the model proposed and
other watermarking models, a series of evaluation indicators are used. If more water-
marking message is hidden, then the quality of the encoded image is reduced inevitably.
Therefore, in addition to the robustness of the model, the quality of the embedded image
is also the key to evaluate the watermarking model. To keep a fair comparison, all im-
ages are resized to 128×128 for the HiDDeN model [9], the StegaStamp model [28],
the ReDMark model [29], and the RIW-GAN model. Finally, the length of the embed-
ded watermarking message is 30 bits.

Table 8. Comparison among several state-of-the-art schemes and RIW-GAN models on COCO
and ImageNet datasets, with three different message lengths.
COCO[35] ImageNet[36]
Methods L
ACC PSNR SSIM ACC PSNR SSIM
30 100 36.74 0.9412 100 31.53 0.9919
HiDDeN[9] 60 81 33.83 0.9944 85 30.91 0.9914
90 78 33.86 0.9952 74 32.51 0.9926
30 100 23.04 0.9576 100 24.01 0.9664
StegaStamp[28] 60 100 26.87 0.9821 100 23.29 0.9598
90 99 25.82 0.9775 100 20.71 0.9250
30 100 33.59 0.9835 100 36.75 0.9705
ReDMark[29] 60 99 33.70 0.9813 99 36.97 0.9697
90 99 33.36 0.9812 99 37.03 0.9721
30 100 43.45 0.9891 100 41.02 0.9964
RIW-GAN 60 100 40.91 0.9907 99 39.21 0.9960
90 92 38.92 0.9832 92 38.88 0.9977
Table 8 gives a comprehensive comparison among several state-of-the-art schemes
and the RIW-GAN model across differences of embedded message lengths. Note that,
ReDMark model uses the trained network to embed watermarking message into sepa-
rate channels of color images. To verify the performance of the above model, three
message lengths L are set as 30, 60, and 90, respectively. Table 8 easily shows that the
invisibility performance of RIW-GAN model is the best in most instances. The pro-
posed RIW-GAN model surpasses HiDDeN model by 18% on the COCO dataset and
14% on the ImageNet dataset under no noise and L=60. It can be concluded from the
results that the RIW-GAN model not only embeds rich message but also takes into
account the invisibility of the encoded image. To achieve a good tradeoff among the
22

invisibility of the encoded image, the capacity of the embedded digital watermarking
information, and the accuracy rate of the extracted watermarking information, 30 bits
of watermarking information is selected for embedding in the following comparison
experiment.

Original

RIW-GAN

HiDDeN

StegaStamp

ReDMark

Diff

(RIW-GAN)

Diff (HiDDeN)

Diff

(StegaStamp)

Diff (ReDMark)

Fig. 13. Samples of encoded and original images for the different models. First row: cover im-
age with no embedded message. Second row: encoded image from RIW-GAN model. Third
row: encoded image from HiDDeN model. Fourth row: encoded image from StegaStamp. Fifth
row: encoded image from ReDMark. Sixth row: normalized difference for RIW-GAN model.
Seventh row: normalized difference for HiDDeN model. Eighth row: normalized difference for
StegaStamp model. Ninth row: normalized difference for ReDMark model.
23

Table 9. Robustness comparison among the state-of-the-art models and RIW-GAN.


Attacks HiDDeN StegaStamp ReDMark Xu et al. Jamali et al. Fei et al. RIW-GAN
[9] [28] [29] [40] [41] [42]
JPEG (Q =50) 63 83 75 77 62 80 89
Cropout (P =30%) 94 87 93 95 74 71 83
Dropout (P =30%) 93 83 92 75 92 74 98
Resize (P =40%) 82 70 85 -a - 82 91
Rotation ( =90º) 83 87 57 - - 80 92
Gaussian blur ( =3) 90 84 94 - - 92 97
Salt&Pepper(P =20%) 73 60 72 - - 72 76
Crop (P =3.5%) 88 73 100 83 100 83 90
Gaussian noise (Ɵ =2) 96 90 50 86 51 75 93
Table 9 reports the accuracy rates of watermarking message extracted among the
state-of-the-art models and RIW-GAN. It can be observed from Table 9 for the attacks
such as JPEG (Q =50), dropout (P =30%), resize (P =40%), rotation ( =90º), Gaussian
blur ( =3), and salt&pepper (P =20%), RIW-GAN achieves the best performance
among the four methods. However, the accuracy rates of watermarking message under
crop attack (P =3.5%) obtained from RIW-GAN is lower than that of ReDMark model.
It is because that the watermarking is spread in a comparatively large region of the
image by using the block divided method in ReDMark model, which is resistant to
cropping, but not resistant to Gaussian noise. The accuracy rates of watermarking mes-
sage under Cropout attack (P =30%) obtained from RIW-GAN is lower than that of
HiDDeN model. It is because that HiDDeN network is able to recover the entire missing
image patch based on the adjacent pixel values in the process of training. The HiDDeN
model uses convolutional networks to build encoders and decoders. The structure of
the model lead to limited embedding capacity and low decoding accuracy under other
attacks. RIW-GAN model adopts residual block, which uses a skip connection, solves
the problem of gradient disappearance in the training process of the neural network and
ensures the integrity of the feature map. Therefore, the RIW-GAN model has a higher
decoding accuracy compared with StegaStamp model.
To further illustrate the invisibility of RIW-GAN model, visual quality comparison
can be found in Fig. 13. The invisibility of the encoded image can be measured by the
gap between the encoded image and the original image. From Fig. 13, it is observed
that the normalized difference between the encoded image and original image in RIW-
GAN model is the flattest, which demonstrates that the invisibility of the encoded image
generated by RIW-GAN model is the best. In addition, the invisibility of the encoded
image generated by ReDMark model achieves the inferior performance to RIW-GAN.
Due to the limitation of the network structure, it is difficult for HiDDeN model and
StegaStamp model to ensure that watermarking message is hidden into complex image
texture regions. However, in the RIW-GAN model, the encoder uses upsampling to
combine with the feature information of the convolutional layer, and the watermarking
message is embedded in a larger area of image texture features. Therefore, RIW-GAN
model maintains the invariance of image features in the encoder and decoder when
embedding watermarking message, and improves the quality of the encoded image.
24

6 Conclusion

With the high-speed development of society, image watermarking has been developed
as a significant field of research for copyright protection in the digital transmission of
multimedia data. In our work, a robust image watermarking model based on generative
adversarial network (RIW-GAN) for copyright protection is proposed, which designs
the encoder and the decoder based on U-Net network and a residual block. In order to
make the model robust, the proposed framework adds various distortions during the
training process. Experimental results demonstrate that RIW-GAN model has advanced
visual effects and higher accuracy rate of watermarking message is extracted. In com-
parison with the state-of-the-art methods, RIW-GAN model achieves superior perfor-
mance. We will further strengthen the robustness to multi-attack scenarios and new
types of attacks of the proposed RIW-GAN model, in future work. Moreover, the em-
bedding capacity of the model will also be considered to enhance.

7 ACKNOWLEDGEMENTS

This work was supported in part by the National Natural Science Foundation of China
(Grant Nos. 61662039 and U1936118), in part by the Jiangxi Key Natural Science
Foundation (Grant No. 20192ACBL20031), in part by the Startup Foundation for In-
troducing Talent of Nanjing University of Information Science and Technology
(NUIST) (Grant No. 2019r070), in part by Open Foundation of Henan Key Laboratory
of Cyberspace Situation Awareness (Grant No. HNTS2022033). and in part by Gradu-
ate Scientific Research Innovation Program of Jiangsu Province (Grant No.
KYCX22_1221).

Reference
1. Cao, X., Jia J., Gong, N. Z.: IPGuard: Protecting intellectual property of deep neural
networks via fingerprinting the classification boundary. Proceedings of the 2021 ACM Asia
Conference on Computer and Communications Security, pages 14-25, (2021).
2. Zhang, J., Chen, D., Liao, J., Zhang, W., Feng, H., Hua, G., Yu, N.: Deep model intellectual
property protection via deep watermarking. IEEE Transactions on Pattern Analysis and
Machine Intelligence (2021)
3. Mielikainen, J.: LSB matching revisited. IEEE signal processing letters, 13(5):285–287
(2006)
4. Cao, R., Wang, Y., and Li, X.: Digital watermarking algorithm based on phase and
amplitude of dft domain. Journal of Computer Applications, 25(11):2536 (2005)
5. Ko, H., Huang, C., Horng, G., WANG, S.: Robust and blind image watermarking in dct
domain using inter-block coefficient correlation. Information Sciences, 517:128–147 (2020)
6. Kahlessenane, F., Khaldi, A., Kafi, R., Euschi, S.: A dwt based watermarking approach for
medical image protection. Journal of Ambient Intelligence and Humanized Computing,
12(2):2931–2938 (2021)
25

7. Liu, J., Ke, Y., Zhang, Z., Lei, Y., Li, J., Zhang, M., Xiao, Y.: Recent Advances of Image
Steganography With Generative Adversarial Networks. IEEE Access, 8:60575-60597
(2020)
8. Shi, H., Dong, J., Wang, W., Qian, Y., Zhang, X.: Ssgan: secure steganography based on
generative adversarial networks. In Pacific Rim Conference on Multimedia, pages 534–544.
Springer (2017)
9. Zhu, J., Kaplan, R., Johnson, J., Li, F.: Hidden: Hiding data with deep networks. In
Proceedings of the European conference on computer vision (ECCV), pages 657–672 (2018)
10. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image
segmentation. In International Conference on Medical image computing and computer-
assisted intervention, pages 234–241. Springer (2015)
11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–
778 (2016)
12. Zhang, R., Isola, P., Efros, A. A., Shechtman, E., Wang, O.: The unreasonable effectiveness
of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 586–595 (2018)
13. Luo, W. Q., Huang, F., Huang, J. W.: Edge adaptive image steganography based on LSB
matching revisited. IEEE Transactions on Information Forensics and Security, 5(2), 201–
214 (2010)
14. Su, Q. T., Liu, D. C., Sun, Y. H.: A robust adaptive blind color image watermarking for
resisting geometric attacks. In Information Sciences, 606, 194–212 (2022)
15. Rakhmawati, L., Wirawan, W., Suwadi, S., Delpha, C., Duhamel, P.: Blind robust image
watermarking based on adaptive embedding strength and distribution of quantified
coefficients. In Expert Systems with Applications, 187, 115906 (2022)
16. Holub, V., Fridrich, J.: Designing steganographic distortion using directional filters. In 2012
IEEE International Workshop on Information Forensics and Security (WIFS), pages 234–
239. IEEE (2012)
17. Ruanaidh, J. J. Ò., Pun, T.: Rotation, scale and translation invariant spread spectrum digital
image watermarking. Signal Processing, 66(3):303–317 (1998)
18. Tang, C.W., Hang, H.M.: A feature-based robust digital image watermarking scheme. IEEE
Transactions on Signal Processing, 51(4):950–959 (2003)
19. Volkhonskiy, D., Nazarov, I., Burnaev, E.: Steganographic generative adversarial networks.
In Twelfth International Conference on Machine Vision (ICMV 2019), volume 11433, page
114333M. International Society for Optics and Photonics (2020)
20. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep
convolutional generative adversarial networks. International Conference on Image and
Graphics, pages 97-108 (2015)
21. J Hayes, A., Danezis, G.: Generating steganographic images via adversarial training.
Advances in Neural Information Processing Systems, 30 (2017)
22. Hu, D., Wang, L., Jiang, W., Zheng, S., Li, B.: A novel image steganography method via
deep convolutional generative adversarial networks. IEEE Access, 6, 38303–38314 (2018)
23. Ka Leong, C., Yueqi, X., Qifeng, C. :Iicnet: A generic framework for reversible image
conversion. In Proceedings of the IEEE/CVF International Conference on Computer Vision,
pages 1991-2000 (2021)
24. Lu, S, Wang, R., Zhong, T., Rosin, P. L.:Large-capacity image steganography based on
invertible neural networks. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 10816-10825 (2021)
26

25. Tang, W., Tan, S., Li, B., Huang, J.: Automatic steganographic distortion learning using a
generative adversarial network. IEEE Signal Processing Letters, 24(10), 1547–1551 (2017)
26. Yang, J., Ruan, D., Huang, J., Kang, X., Shi, Y. Q.: An embedding cost learning framework
using GAN. IEEE Transactions on Information Forensics and Security, 15, 839–851 (2019)
27. Liu, Y., Guo, M., Zhang, J., Zhu, Y., Xie, X.: A novel two-stage separable deep learning
framework for practical blind watermarking. In Proceedings of the 27th ACM International
Conference on Multimedia, pages 1509–1517 (2019)
28. Tancik, M., Mildenhall, B., Ng, R.: Stegastamp: Invisible hyperlinks in physical
photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 2117–2126 (2020)
29. Ahmadi, M., Norouzi, A., Karimi, N., Samavi, S., & Emami, A.: Redmark: Framework for
residual diffusion watermarking based on deep networks. Expert Systems with Applications,
146:113157 (2020)
30. Su, Q., Liu, D., & Sun, Y.: A robust adaptive blind color image watermarking for resisting
geometric attacks. Information Sciences, 606:194-212 (2022)
31. Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset
and study. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pages 126–135 (2017)
32. Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P.: Image quality assessment: from
error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–
612 (2004)
33. Ding, K., Ma, K., Wang, S., Simoncelli, E. P.: Image quality assessment: Unifying structure
and texture similarity. arXiv preprint arXiv:2004.07728 (2020)
34. Alexiou, E., Ebrahimi, T.: Towards a point cloud structural similarity metric. In 2020 IEEE
International Conference on Multimedia & Expo Workshops (ICMEW), pages 1–6. IEEE
(2020)
35. Rostianingsih, S., Setiawan, A., Halim, C.I.: Coco (creating common object in context)
dataset for chemistry apparatus. Procedia Computer Science, 171:2445–2452 (2020)
36. Denton, E., Hanna, A., Amironesei, R., Smart, A., Nicole, H.: On the genealogy of machine
learning datasets: A critical history of ImageNet. Big Data & Society,
8(2):20539517211035955 (2021)
37. Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation. In Proceedings of the 1st
ACM international conference on Multimedia information retrieval, pages 39-43 (2008)
38. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014)
39. Zheng, G., Donghui, H., Hui, G., Shuli, Z.: End-to-end image steganography and
watermarking driven by generative adversarial networks. Journal of Image and Graphics,
26(10), 2485-2502 (2021)
40. Xu, H.B., Rong, W., Jia, W., L, S.P.: A Compact Neural Network-based Algorithm for
Robust Image Watermarking. arXiv preprint arXiv:2112.13491 (2021)
41. Jamali, M., Nader, K., Pejman, K., Shahram, S., Shadrokh, S.: Robust watermarking using
diffusion of logo into auto-encoder feature maps. Multimedia Tools and Applications, pages
1-27 (2023)
42. Fei, J., Xia, Z., Benedetta, T., Mauro, B.: Supervised gan watermarking for intellectual
property protection. 2022 IEEE International Workshop on Information Forensics and
Security (WIFS), pages 1-6. IEEE (2022)

You might also like