Unet nn-Unet

Unet && nn-Unet：

文章题目：U-Net: Convolutional Networks for Biomedical
Image Segmentation
代码：https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
文章题目：nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation
代码：https://github.com/PaddlePaddle/PaddleSeg

学习内容：

Unet
nn-Unet

Unet原文对照理解：

Ori：
“ We modify and extend this architecture such that it works with very few training images and yields more precise segmentations; see Figure 1. The main idea in [9] is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high
resolution features from the contracting path are combined with the upsampled output. A successive convolution layer can then learn to assemble a more precise output based on this information.”

理解
Unet是一种能够利用非常少的训练图片生成更加精准的分割的结构。其中的全连接层网络中的池化层被上采样操作替代，以此得到更加高分辨率的输出。具体的，为了定位，高分辨率的特征（copy and crop操作得到的特征）和上采样得到的特征进行组合，再进行卷积操作后得到更加精确的输出信息。

nn-Unet原文对照理解：

1. nn-Unet的动机 -> Unet

Ori：
“The U-Net [6] is a successful encoder-decoder network that has received a lot
of attention in the recent years. Its encoder part works similarly to a traditional
classification CNN in that it successively aggregates semantic information at the
expense of reduced spatial information. Since in segmentation, both semantic as
well as spatial information are crucial for the success of a network, the missing
spatial information must somehow be recovered. The U-Net does this through
the decoder, which receives semantic information from the bottom of the ’U’
and recombines it with higher resolution feature maps obtained directly from
the encoder through skip connections. Unlike other segmentation networks, such
as FCN [9] and previous iterations of DeepLab [10] this allows the U-Net to
segment fine structures particularly well.”

理解
nn-Unet论文中指出，对于Unet，其encoder和传统CNN类似，虽然能连续地聚集语义信息，但却是以丢失空间信息作为代价。Unet中的decoder却能在一定程度上恢复这些空间信息。即通过重新组合通过上采样操作的特征和直接从encoder中通过跳过连接（类resnet，可确保特征的可重用性））获得的高分辨率特征。这使得Unet能够分割更加精细的语义结构。

2. nn-Unet的动机 -> 医学图像

Ori：
“Medical images commonly encompass a third dimension.”
“we consider a pool of basic U-Net architectures consisting of a 2D U-Net, a 3D U-Net and a U-Net Cascade”
“2D U-Net Intuitively, using a 2D U-Net in the context of 3D medical image segmentation appears to be suboptimal because valuable information along
the z-axis cannot be aggregated and taken into consideration. However, there
is evidence [13] that conventional 3D segmentation methods deteriorate in performance if the dataset is anisotropic (cf. Prostate dataset of the Decathlon challenge).”
“3D U-Net A 3D U-Net seems like the appropriate method of choice for 3D
image data. In an ideal world, we would train such an architecture on the entire
patient’s image. In reality however, we are limited by the amount of available
GPU memory which allows us to train this architecture only on image patches.
While this is not a problem for datasets comprised of smaller images (in terms
of number of voxels per patient) such as the Brain Tumour, Hippocampus and
Prostate datasets of this challenge, patch-based training, as dictated by datasets
with large images such as Liver, may impede training. This is due to the limited
field of view of the architecture which thus cannot collect sufficient contextual
information to e.g. correctly distinguish parts of a liver from parts of other
organs.”

理解
nnUnet针对处理通常是3D的医学图像，结合一个3D，一个2D，以及一个一维的Unet，搭建了一个框架池。
对于2D Unet参与的必要性：因为常规的3D分割方法在anisotropic（各异向性）的数据集上会有糟糕的表现。
对于3D Unet参与的必要性：绝对的，3DUnet对于3D图像是一个合适的方法。但是，因为受限于GPU的计算能力，只能在patch上进行计算。这对于小图像（取决于voxels的数量）不是问题，例如脑肿瘤，前列腺和海马体。但是对于大图像，例如肝脏，3DUnet会因为无法收集到充分的上下文信息而受限。

Ori：
To address this practical shortcoming of a 3D U-Net on datasets with large image sizes, we additionally propose a cascaded model. Therefore, a 3D U-Net is first trained on downsampled images (stage 1). The segmentation results of this U-Net are then upsampled to the original voxel spacing and passed as additional (one hot encoded) input channels to a second 3D U-Net, which is trained on patches at full resolution (stage 2). See Figure 1.