Improved UNet Framework with attention for Semantic Segmentation of Tumor Regions in Brain MRI Images

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 07 | July 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2922
Improved UNet Framework with attention for Semantic Segmentation
of Tumor Regions in Brain MRI Images
Heena Kousar 1, Arimanda Chaitanya Sri2, Saranu Charitha Sri3
1,2,3 Student of Dept. of CSE, Vignan’s Foundation for Science, Technology & Research, Vadlamudi, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Brain Tumor Segmentation is a crucial task in medical image processing. Brain tumors must be detected early in
order to improve treatment options and increase patient survival rates. A challenging and time-consuming task is detecting
tumor from a large number of clinical MRI images for cancer diagnosis. Deep learning algorithms for automatic segmentation
have recently gained traction due to the fact that these methods produce cutting-edge results and are better suited to this
problem than other approaches. Deep learning approaches can also be used to efficiently and objectively process massive
amounts of MRI-based image data. Several reviewpapers onclassicMRI-basedbraintumorimagesegmentationalgorithmsare
available. Because Semantic Segmentation assigns a class label to each pixel in a given image, it can be used to segment brain
tumor images from the provided images.. In the proposed methodology, we perform a batch training where each randomly
created batch is passed to the variation of UNet, a popular Segmentation model. In this model, we have added batch
normalizations following every convolution layer with the hope that a deeper network helps extracting the better features
which turned out to be true. Here we prefer to use the metric as Intersection overUnion(IoU)[1]ratherthanaccuracybecause
it is less influenced by the inherent class imbalances in foreground/background segmentation tasks. With the proposed
methodology, we achieve an averaged IoU of 84.3 and dice coefficient value is 91.4.
Key Words: Brain Tumor, Segmentation, Semantic Segmentation, U-Net, Intersection over Union (IoU), Dice coefficient
1. INTRODUCTION
Cancer is defined as uncontrollable and abnormal cell divisionandproliferationinthe body.Abraintumorisanabnormal mass
of unnatural cell growth and division in brain tissue. Brain tumors are one of the most fatal cancers1, despite their rarity.
Brain tumors [2] are classified as either primary or metastatic based on where they originate. Primarycancercellsoriginatein
brain tissue, whereas metastatic cancer cells become cancerous elsewhere in the body and spread to the brain. Gliomas are
brain tumors that develop from glial cells. While these modalities are used in conjunction to provide the most detailed
information, because of its high soft tissue contrast and widespread availability, MRI is routinely used to obtain information
regarding brain malignancies The conventional method MRI (magnetic resonance imaging) is a non-invasive in-vivo imaging
technique that employs radio frequency pulses to excite tissues.
Figure 1.1: Sample for segmenting the Brain Tumor images
We use segmentation to efficiently locate and segment brain tumors in order to perform successful surgery. Brain tumors can
be classified into two types. The first is manual segmentation, which is a subjective decision that does not produce the desired
results because completely removing brain tumors without destroying healthy brain tissue is difficult. As a result, automatic
segmentation for treatment planning and quantitative evaluation, the second method is required. It quickly and accurately
diagnoses brain tumors.
Since both the location and size of the tumors are required to be identified, the problem comes under the task ofsegmentation
and it particularly comes under semantic segmentation. Segmentation is something beyond the taskslikeImageclassification,

localization and object detection. In image classification, we just require the given image to be classified into oneof theclasses
(binary or multi).
Figure 1.2: Levels of Understanding of an Image by a System
Localization is a bit advancement to classification as We locate the required object in the provided image. Object detection
is like combination of both of those because here we perform both the tasks like classifying the object and locating it. Here
locating the image is to just come up with a bounding box and hence it is not the case to be used when we need the exact
shape of the Image..
2. Literature Survey
Since this brain tumor segmentation is related to the, many researchers get attracted to this work. As partoftheinitial stageof
research in this topic, the researchers use to consider the hand craftdifferentfeature extractorsandusetheoutputsofthemfor
the analysis of the brain tumor images. One approach is the method proposed by Nagashree N and Premjyoti Patil [3]. This
system's main idea is to work on the encoding and decoding phases of UNet [4] modelling for efficient segmentation of brain
images. The input image is divided into several layers called convolutions in this methodology, and the CNN method is used.
The process's convolution filter is the feature extraction of individual image layers. In the UNet approach, each layer is
represented as a network encoder layer. Alphabet pruning, an AI optimization algorithm for dimensionality reduction, was
proposed as a modified form of UNet. The process entails building a tree network out of all the layers of the input image,
retaining only the essential images. The remaining image layers are pruned to save time.. The workflow of theirapproachisas
follows:
Figure 2.1: Workflow of Nagashree N and Premjyoti Patil

Convolutional neural networks have been widely used in the field of medical picture segmentation since the introduction of
deep learning for their great feature extraction capabilities, and have achieved good segmentation performance and
robustness. Convolutional neural networks were originally used in brain tumor segmentation by Zikic et al. The network
comprises of a convolution layer, a maximum pooling layer, a full connection layer, and a softmax layer. Ronneberger et al
suggest the Unet network, which uses an encoder-decoder topology. Convolution with size of 33 and stride sizeof1isusedfor
4 times down-sampling in the coding phase; deconvolution with size of 22 and stride size of 2 is used for up-sampling in the
decoding phase. High-resolution and low-resolution information are equally relevant due to the similarity of medical imaging
and the fuzziness of tumor region boundaries. QingJun Ru and GuangZhu Chen [5] propose an improved M-Unet structure to
increase the performance of feature fusion and the accuracy of network segmentation. This approach can be improved in the
following ways:
1. A multi-scale feature extraction module is added to the Unet network'sfeaturefusionpartto betterextractthehigh-level and
low-level features of tumor images, while redundant features are avoided from being introduced into theup-samplingfeature
map, further improving network segmentation performance.
2. To acquire the best network weight, a cosine annealing learning rate attenuation approachisutilisedinthetrainingphaseto
make the network jump out of the local optimal solution.
Fig 2.2: Architecture Proposed by QingJun Ru and GuangZhu Chen
3. Methodology
This architecture of the proposed model, training approach and the other technical details are discussed in this section. The
proposed architecture is derived from the UNet architecture. Many changes like the number of filters at each layers,
introduction of Batch Normalization operations, were made to the original score. The dataset used for this work was lgg MRI
segmentation dataset. This dataset [6] comprises of 3762 images of size 256 × 256. Out of the 3762 images, 80% of themwere
used for the training purpose and the rest of the images (20%) were used for testing purpose.

Figure 3.1: Sample Images
As shown in the figure 3.2, the training images and the corresponding masks were loaded and as a visualization technique. In
the proposed methodology, we are using random 2828 images for training and 708 for validation and 393 for testing.
The performance of the model is monitored in form of the metric Intersection of Union (IoU) , which was the most common
metric used for the segmentation tasks. Call backs like Early Stopping and Model Check point are further implemented on the
basis of average IoU of every 50 batches. These callbacks helps to save the bestmodel andstopthetrainingprocessiftherewas
no further improvement. At the end, save the model weights so that to use them later. There are many other countless efforts
like augmentation, using different architectures were made but none of them proved to be successful.
4. Network Architecture
The proposed network is derived from the UNet architecture. UNet,beinga popularapproachtobeusedforsegmentationtask,
applies classification on each and every pixel in the given input image and thereby producing a mask of same size as input.

Figure 4.1: Proposed Architecture of varied UNet
Figure 4.2: Proposed Architecture of varied UNet
The network shown above takes an input of size (256 × 256 × 3). It features an encoder-decoder structure, and in the encoder
part, we apply pooling techniques to reduce the size of the image. so that we can extract the informationof"what"ispresentin
the image. We increase the picture size in the decoder phase to extract the information of "where" in the given image. Each
block in the contracting path has the following structure:
Two convolutional layers are used in the first block, followed by pooling and batch normalization processes, and the channel
count is increased from one to eight. Since the process of convolution increases the depth of the image, there are four such
blocks in the entire contraction path, and by the end of all four blocks, the channel count has increased to 64 channels. The
image size is reduced to by the conclusion of the four blocks (8,8) thanks to the max pooling procedure, which decreases the
image size.. From here, the expansive path will begin, in which the image size is steadily raised through upsampling while the
channel count is reduced.

Transposed convolution is employed here as an up sampling technique to increase image size. On the initial image, a padding
operation is done, followed by a convolution action. There are four suchblockshere,justlikeinthecontraction path,andbythe
end of these blocks, we will have the original size image. The final prediction is obtained by applying a 1D convolution with 1
kernel and a sigmoid activation on the output of the last block. The 1D convolution reduces the number of channels necessary
for the network output, while the sigmoid activation function maps every pixel in the output block to the rangeoftherequired
network output (0, 1). The results will be rounded to the nearest integer.
The model weights are retained at the end of the training and used in the testing procedure. We use the sharpening technique
in the final stage of the testing procedure, after the output mask prediction. As a post-processing approach, sharpeningallows
for a greater view of the salt deposits present in the projected mask, resulting in a higher IoU score. Low pass and high pass
filters are commonly used on photographs to improvetheirviewingcapabilities.Smoothingisthetermusedtodescribetheuse
of a low pass filter, whereas sharpening is the term used to describe the use ofa highpassfilter.Lowfrequenciesare frequently
attenuated by a high pass filter, which allows high frequencies to flow through. As a result, the salt pixels in the expected mask
pass through the filter, yielding a superior outcome. The kernel forsharpeninginoursuggestedmethodologyisrepresented by
the following array.
5. Results and Discussions
In the testing phase, we load the remaining 20% data with imagesandmasks.Theweightsthataresavedearlierareloadedinto
the model and the model is used for the prediction of the masks for thegivenimages.Bycomparingtheanticipatedandoriginal
masks, we can now determine the IoU (Intersection over Union) value. The average of IoU was then calculated for a range of
criteria ranging from 0.5 to 0.95, with a 0.05 step between each is reported. IoU on a threshold tells that a particular IoU value
has crossed that threshold. For Example, a predicted output mask is consideredtobevalidovera thresholdof0.7if thevalueof
IoU is above 0.7 for that particular mask. The following istheinterpretationofIoUbetweengenuinesegmentationpixels,Y,and
a similar set of predicted segmentation pixels,
Fig 5.1: IOU Formula
Which can also be expressed as a function of the Y-Y confusion matrix
Fig 5.2: General Confusion Matrix
IOU is then calculated as: (TP=True positives, FP=False positives, etc.)
Fig 5.3: IoU Formula in terms of confusion Matrix
Hence the results of the different methods on various thresholds and the average IoU over all the thresholds from 0.5 to 0.95
with a step value of 0.05 and the loss of training is reported in the following table.

Model Loss
Testing IOU at Different Thresholds Average
Testing
IOU
0.5 0.7 0.8 0.85
Unet (Proposed
Architecture)
0.81 0.84 0.82 0.80 0.79 0.84
Table 5.1: Final Results
Model Dice Coefficient
Unet(proposed architecture) 0.914
M-Unet(QingJun Ru, GuangZhu Chen) [5] 0.873
Alpha Beta Pruned Unet (Nagashree N,Prem Jyoti Patil) [3] 0.901
Table 5.2: Comparison of results with citations
The Input image along with the true mask and the predicted mask are plotted here for various images.
Figure 5.4: Predicted Masks comparison with the original mask
6. Conclusion
We perform batch training in the proposed manner, where each randomly formed batch is submitted to a variant of UNet, a
popular Segmentation model. We added batch normalizations after each convolution layer in this model in the hopes that a

deeper network will assist extract better features, which proved out to be accurate. Instead of accuracy, we opt to utilise the
measure Intersection over Union (IoU) [1], This is less influenced by the inherent class imbalancesinforeground/background
segmentation tasks. We get an averaged IoU of 84.3 and a dice coefficient value of 91.4 using the provided methods. The
proposed model will be improved in the future by employing different filter sizes and including all modalitiesofMRIimages in
tumor segmentation. By raising the mini-batch size from 16 to 64 and the max-epoch from 80 to 120, the segmentation result
will be improved even more.
7. References
[1] Rezatofighi, Hamid, et al. "Generalized intersection over union: A metric and a loss for bounding box regression."
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[2] I Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks forsemanticsegmentation.InProceedingsofthe
IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
[3] Alpha Beta Pruned UNet-A Modified UNet Framework to SegmentMRIBrainImage toAnalysetheEffectsofCNTNAP2Gene
towards Autism Detection. In 2021 3rd International
Conference on Computer Communication and the Internet (ICCCI) (pp. 23-26). IEEE.-240). IEEE.
[4] Kermi, A., Mahmoudi, I., & Khadir, M. T. (2018, September). Deep convolutional neural networks using U-Netfor automatic
brain tumor segmentation in multimodal MRI volumes. In International MICCAI Brainlesion Workshop (pp. 37-48). Springer,
Cham.
[5] Ru, Q., Chen, G., & Tang, Z. (2021, August). Brain Tumor ImageSegmentationMethodBasedonM-UnetNetwork. In20214th
International Conference on Pattern Recognition and Artificial Intelligence (PRAI) (pp. 243-246). IEEE.
[6] Datased Used in The Discussed Model [Online]. Available: https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-
segmentation.
[7] Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image
Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
[8] Guo, Meng-Hao, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R.
Martin, Ming-Ming Cheng, and Shi-Min Hu. "Attention mechanisms in computer vision: A survey."Computational Visual Media
(2022): 1-38.
[9] Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE
International Conference on Computer Vision (ICCV), Boston, MA, USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp.
1520–1528.
[10] Wu, X.; Liang, L.; Shi, Y.; Fomel, S. FaultSeg3D: Using synthetic data sets to train an end-to-end convolutional neural
network for 3D seismic fault segmentation. Geophysics 2019, 84, IM35–IM45
[11] Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical imagesegmentation.InLectureNotes
in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics);
Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241.
[12] McCaffrey, J. (2014). Understanding Neural Network Batch Training: A Tutorial. Visual Studio Magazine.
[13] Daimary, Dinthisrang, et al. "Brain tumor segmentation from MRI images using hybrid convolutional neural networks."
Procedia Computer Science 167 (2020): 2419-242
[14] Rehman, Mobeen Ur, et al. "Bu-net: Brain tumor segmentationusingmodifiedu-netarchitecture."Electronics9.12(2020):
2203
[15] Deb, Daizy, and Sudipta Roy. "Brain tumor detection based on hybrid deep neural network in MRI by adaptive squirrel
search optimization." Multimedia tools and applications 80.2 (2021): 2621-2645

8. Biographies
Heena Kousar
Department of Computer Science & Engineering
Vignan’s Foundation for Science, Technology & Research
Arimanda Chaitanya Sri
Saranu Charitha Sri

Improved UNet Framework with attention for Semantic Segmentation of Tumor Regions in Brain MRI Images

More Related Content

Similar to Improved UNet Framework with attention for Semantic Segmentation of Tumor Regions in Brain MRI Images (20)

More from IRJET Journal (20)

Recently uploaded (20)

Improved UNet Framework with attention for Semantic Segmentation of Tumor Regions in Brain MRI Images