Abstract
Bright objects on a dark background, such as cells in microscopy images, can sometimes be modeled as maxima of sufficient dynamic, called h-maxima. Such a model could be sufficient to count these objects in images, provided we know the dynamic threshold that tells apart actual objects from irrelevant maxima. In a previous study we introduced a neural architecture that includes a morphological pipeline counting the number of h-maxima in an image, preceded by a classical CNN which estimates the dynamic h yielding the right number of objects. This was made possible by geodesic reconstruction layers, and a new module counting connected components. The resulting architecture could be trained end-to-end to count cells by minimizing a loss function focusing mainly on an intermediate image and very marginally on the counting errors. In the present paper, we extend that work with a detailed theoretical analysis of the derivability of such pipeline, and with more cell counting experiments on different datasets. Our analysis explains why it is difficult to train the proposed pipeline when penalizing counting errors, and ends up with a solution to that challenge. Our experiments show that our approach can obtain results comparable to the state of the art with much fewer parameters and an increased interpretability of the model.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Cell counting is a crucial step in biological experiments and medical diagnosis to provide both quantitative and qualitative information on cells. While the process can be tedious, low-efficiency and prone to subjective errors, especially when cell clustering occurs and cells show high variance in shapes and contrasts. Automation offers a quick and accurate estimation of cell quantity in a sample. In this paper, we focus on the cell counting task to quantify population on fluorescent images. Previous studies [1,2,3,4] used deep learning with a Unet architecture or a similar one, which maps each input image to a density map, the integral of which yields the cell count. Another approach is to use an architecture for object detection (as YOLO [5] or Faster R-CNN [6]) and count the detected cells [5, 7].
These methods achieve satisfactory results, but use a large number of parameters even when the task seems quite simpler than crowd counting, as for example when cells, possibly diverse in shapes and partially overlapping, still look like mere regional maxima over a dark background. Furthermore, density maps are ambiguous regarding what the model considers as a cell, especially in clusters, where the precise location of cells is often lost. This can be a limitation for users willing to check a posteriori the reliability of the count returned by the model, especially as the deep learning model is a black box and its decisions are hardly explainable. Images where cells appear as bright objects on a dark background are an interesting study case where a much simpler model, built on a priori knowledge, may achieve similar results as large deep networks but with higher interpretability, regarding for example geometrical and contrast criteria used to recognize cells.
In this paper, following and extending [8], we propose to model cells as regional maxima with sufficient dynamic and size. The dynamic of an extremum is, simply speaking, its depth (see Fig. 1) and well-known morphological methods exist to select extrema with dynamic larger than a threshold h, called h-maxima or h-minima [9]. Accordingly, in [8] was presented a simple morphological pipeline consisting in counting the h-maxima of a size-filtered version of the input image, for a given dynamic h. Since the image resolution is constant in the dataset, as well as the cell size, this approach only depends on the choice of the dynamic parameter, which needs, however, to be adapted to each image in order to achieve accurate counting. Taking advantage of the recently introduced geodesic reconstruction layers [10, 11], which provide a neural implementation of the h-maxima computation, a pipeline has been designed in [8], where a small convolutional neural network (CNN) predicts the optimal dynamic h for the subsequent morphological layers. This pipeline is trained end-to-end by minimizing a joint loss where the main term focuses on an intermediate output image (the so-called h-transform of a filtered input image) and a second term, with much smaller weight, penalizes counting errors.
The goal of the present paper is to make the latter pipeline trainable by minimizing the counting errors only, because achieving so has several advantages. First, it saves us from pre-computing, for each training image, the intermediate results needed for other loss terms. Second, avoiding to pre-compute these intermediate outputs allows to make trainable any preprocessing of the input images, as opposed to the need to freeze it when the intermediate outputs are pre-computed using a fixed preprocessing. Third, it alleviates a lot the burden of annotations, as it only requires a number of cells for each input image, which does not need a precise location of each cell (although the proposed method does provide the locations of the counted cells).
To that aim, in this article we investigate more in-depth theoretical aspects of the pipeline proposed in [8], in particular its derivability, so as to find out how it can be trained with gradient descent when minimizing the counting errors only. The main novel contributions of this article with respect to Liu et al. [8] are the following:
-
1.
A theoretical analysis of the aforementioned pipeline derivability, based on a detailed theoretical review of h-maxima.
-
2.
A practical solution to overcome the derivability issues revealed by this analysis, allowing for the optimization of the pipeline by minimization of a loss penalizing the counting errors only.
-
3.
New results on other cell image datasets (Cellpose, Fluorescent Neuronal Cells) with respect to [8].
The paper is structured as follows. In Sect. 2, the theoretical framework of h-maxima is presented, and a simple morphological algorithm using them to count cells, already introduced in [8], is recalled. In Sect. 3, we remind the deep learning architecture of [8], which includes this morphological algorithm, before studying its derivability and proposing a solution for training it with gradient descent even when minimizing the counting errors only. Experiments are conducted in Sect. 4, and their results are analyzed in Sect. 5. Conclusions are drawn in Sect. 6.
2 Setting and Definitions
We assume images to be mappings of the discrete domain \(\Omega {:}{=} ([0, W-1] \times [0, H-1])\cap \mathbb {N}^2\), \(W, H\in \mathbb {N}^*\) to \(\mathbb {R}\). We will denote by \(\mathcal {F}(\Omega ,\mathbb {R})\) the set of such functions and call pixels the elements of \(\Omega \).
2.1 Image Poset
The set \(\mathcal {F}(\Omega ,\mathbb {R})\) is a partially ordered set (or poset) for the partial order \(\le _{\mathcal {F}}\) defined, for any \(f,g \in \mathcal {F}(\Omega ,\mathbb {R})\), by
In the rest of the paper, we will simply write \(f\le g\) when it is clear we are comparing functions.
The supremum and infimum of two functions f and g is denoted, respectively, by \(f\vee g\) and \(f\wedge g\) and also induced by the pointwise ordering, i.e.,
where here \(\vee \) and \(\wedge \) also denote the supremum and infimum on real numbers, \(f (x) \vee g (x) = \sup \lbrace f(x), g(x)\rbrace \) and \(f (x) \wedge g (x) = \inf \lbrace f(x), g(x)\rbrace \). The supremum and infimum of an arbitrary family of functions is defined likewise.
2.2 Connectivity
Throughout the paper, the notion of connectivity between pixels is crucial because the definitions of path and reconstruction operator, given below and extensively used in the paper, rely on it. It is determined by the unit discrete square \(\textrm{B}{:}{=} \lbrace x = (x_1, x_2) \in \mathbb {Z}^2, \; \max (|x_1|, |x_2|) \le 1 \rbrace \), classically referred to as the 8-connectivity structuring element in mathematical morphology.
Denoting by \(\textrm{B}_x {:}{=} x + \textrm{B}= \lbrace x + b, b\in \textrm{B}\rbrace \) the set \(\textrm{B}\) translated at \(x\in \Omega \), we say that \(y\in \Omega \) is a neighbor of x, noted \(y \in \mathcal {N}_x\), if and only if \(y \in \textrm{B}_x\cap \Omega \). Similarly, the neighbors of a set \(X\subset \Omega \) are defined as \(\mathcal {N}_X {:}{=} \cup _{x\in X}\mathcal {N}_x.\)
The notion of neighborhood leads to classical definitions of graph theory, the vertices being here the pixels and the edges being the couples \((x, y) \in \Omega ^2\) such that \(y\in \mathcal {N}_x\). A path \(\gamma \) from x to y of length \(n\in \mathbb {N}\) is a tuple \(\gamma = (u_0, u_1,\dots , u_n)\in \Omega ^{n+1}\)such that \(u_0 = x\), \(u_n = y\) and for any \(0\le k \le n-1\), \(u_{k+1}\in \mathcal {N}_{u_k}\). We will denote by \(\Gamma _{xy}\) the set of paths from x to y, and say that y is connected to x in a subset \(C \subseteq \Omega \) if there is a path from x to y included in C, i.e., \(\lbrace \gamma \in \Gamma _{xy}, \gamma \subseteq C \rbrace \ne \emptyset \). Note that in our setting, since \(\textrm{B}\) is symmetric (\(b\in \textrm{B}\iff -b\in \textrm{B}\)), y is connected to x if and only if x is connected to y, and hence, we shall simply say that x and y are connected. In particular, all pixels are connected in \(\Omega \). Furthermore, the connectivity relation is also reflexive (\(x\in \mathcal {N}_x\) because \(0\in \textrm{B}\)) and transitive. It is therefore an equivalence relation, and the equivalence classes induced by it are the connected components. Finally, we say that a subset \(C \subseteq \Omega \) is connected if any two elements x and \(y \in C\) are connected.
2.3 Morphological Operators
We shall now recall the morphological operators which will be used in the rest of the paper. Note that these definitions are not the most general ones, but adapted to the structuring element \(\textrm{B}\) considered here, which is symmetric.
2.3.1 Dilation, Erosion, Opening and Closing by \(\textrm{B}\)
Given \(f\in \mathcal {F}(\Omega ,\mathbb {R})\), the dilation of f by \(\textrm{B}\), noted \(\delta _\textrm{B}(f)\), is the element of \(\mathcal {F}(\Omega ,\mathbb {R})\) defined by
while the erosion of f by \(\textrm{B}\), noted \(\varepsilon _\textrm{B}(f)\), is given by
Since \(0\in \textrm{B}\), here \(\delta _\textrm{B}\) is an extensive operator, i.e., \(\forall x\in \Omega ,\; \delta _\textrm{B}(f)(x)\ge f(x)\). (And \(\varepsilon _\textrm{B}\) is anti-extensive, \(\varepsilon _\textrm{B}(f)(x)\le f(x)\).)
The opening and closing of f by \(\textrm{B}\), noted \(\gamma _\textrm{B}(f)\) and \(\varphi _\textrm{B}(f)\), respectively, are the compositions of dilation and erosion, namely \(\gamma _\textrm{B}= \delta _\textrm{B}\circ \varepsilon _\textrm{B}\) and \(\varphi _\textrm{B}= \varepsilon _\textrm{B}\circ \delta _\textrm{B}\).
2.3.2 Geodesic Operators
Given two functions \(f,g \in \mathcal {F}(\Omega ,\mathbb {R})\) such that \(f\le g\) the geodesic dilation of f under g, noted \(\delta _{g}(f)\), is defined by [9, 12]
The geodesic dilation can be iterated and for \(p\ge 1\) we note \(\delta _{g}^{(p+1)}(f){:}{=} \delta _{g}\big (\delta _{g}^{(p)}(f)\big )\), with \(\delta _{g}^{(1)} = \delta _{g}\). It is well known and easy to verify that this process converges to a fixed point after a finite number k of iterations. (One can notice that, \(\delta _\textrm{B}\) being an extensive operator, the iterations of geodesic dilation produce an increasing sequence of values \(\delta _{g}^{(p)}(f)(x)\) at each pixel x; furthermore, the possible values for each \(\delta _{g}^{(p)}(f)(x)\) are finite, as they come from those of f and g which both have \(W\times H\) pixels.)
The reconstruction by dilation of g from f, noted \(\texttt {REC} (f, g)\) or \(\texttt {REC} _{g}(f)\), is then:
where k is the smallest integer such that \(\delta _{g}^{(k)}(f)=\delta _{g}^{(k+1)}(f)\).
Alternative formulations of the reconstruction by dilation exist, e.g., based on the level sets of f and g [13]. Here, we mention one based on connectivity, which will be more convenient than (6) to prove the theoretical results of this paper. For any \(x\in \Omega \), one can write
To the best of our knowledge, this formulation has not been stated (hence not proved) in the morphological literature, but it can be deduced easily by induction on the length of paths using the sequential definition of (6), and the fact that \(\texttt {REC} (f_1\vee f_2, g) = \texttt {REC} (f_1, g)\vee \texttt {REC} (f_2, g)\).
2.4 Regional Maxima and h-maxima
In this section, besides recalling necessary definitions, we state several theoretical results which will not appear as new to experts in mathematical morphology. However, since we could not find them clearly stated and proved in the literature, we choose to present and prove them here.
Regional maxima Given an image \(f \in \mathcal {F}(\Omega ,\mathbb {R})\), we say that \(M\subset \Omega \), \(M\ne \Omega \), is a regional maximum at level \(t\in \mathbb {R}\) if M is connected, \(f(x) = t\) for all \(x\in M\), and \(f(y)<t\) for any \(y\in \mathcal {N}_M\setminus M\). In words, a regional maximum is a plateau of f surrounded by lower values. Note that this definition imposes that a regional maximum M needs to be strictly included in \(\Omega \), and hence, \(\mathcal {N}_M\setminus M\ne \emptyset \) and \(\Omega \) itself cannot be a regional maximum. We will note \(\mathcal {M}(f)\) the set of regional maxima of f and \(\mathcal{P}\mathcal{M}(f)\) the corresponding set of pixels, that is
Also, since f takes a unique value over a regional maximum M, we shall conveniently note f(M) this value (although M is not a pixel but a set of pixels). Importantly, because of the connectedness property, different regional maxima are necessarily disjoint.
Finally, we will denote by \(\texttt {RMAX} \) the characteristic function of regional maxima, i.e.,
In the case where f takes values in a quantized set \(\lbrace k\cdot \epsilon , \ 0\le k\le L\rbrace \) for some \(L\in \mathbb {N}^*\) and \(\epsilon > 0\), then regional maxima can be computed thanks to the geodesical reconstruction of f from \(f-\epsilon \). Indeed, in that case one can show [13] that the regional maxima of f are the connected components of
the set of pixels where the reconstruction is different from (hence smaller than) f.Footnote 1 In that case, \(\mathcal{P}\mathcal{M}(f)=X_{\max }(f)\) and therefore
Dynamic The dynamic [14] of a regional maximum M of level \(t\in \mathbb {R}\) is the smallest height one needs to come down from M in order to reach a higher regional maximum \(M^\prime \) of level \(t^\prime > t\), when there is one. Formally, in that case (refer to Section 6.3.5 in [9], and see Fig. 1)
where \(\Gamma _{M M^\prime } {:}{=} \lbrace \gamma \in \Gamma _{xy}, x\in M, y\in M^\prime \rbrace \) and the variable \(M^\prime \) spans all the regional maxima of f such that \(f(M^\prime ) > f(M)\). Note that since f(M) does not depend on the path \(\gamma \), the \(\max \) term in (11) can be written \(f(M) - \min \limits _{z\in \gamma }f(z)\), and finally
By convention, the dynamic of the global maximum of f can be set to the difference between the maximum and the minimum values of f, denoted by
The dynamic is commonly used to distinguish between irrelevant maxima caused by noise and significant ones corresponding to underlying bright objects.
h-maxima In order to define the h-maxima of f, we first need to introduce the h-maxima transformation. Given a parameter \(h\in [0,\Delta _\Omega f]\), the \(\texttt {HMAX} _h\) image operator is the reconstruction by dilation of f from \(f-h\), i.e.,
As illustrated in Fig. 2a, this transformation suppresses what are called h-domes in [13].
Now, for a given \(h\in [0, \Delta _\Omega f]\), the h-maxima of f, denoted by \(\mathcal {M}_h(f)\), are defined as the regional maxima of \(\texttt {HMAX} _{h}(f)\):
In particular, \(\mathcal {M}_0(f) = \mathcal {M}(f)\) and \(\mathcal {M}_{\Delta _f\Omega }(f) = \emptyset \). Furthermore, as regional maxima of a function, distinct h-maxima are pairwise disjoint. Similarly to earlier, we will note \(\mathcal{P}\mathcal{M}_h(f)\) the set of pixels belonging to h-maxima, that is
Figure 2b shows in green the characteristic function of the h-maxima of f, noted \(\texttt {EMAX} _h\) for “extended maxima”:
Note that \(\texttt {EMAX} _h = \texttt {RMAX} \circ \texttt {HMAX} _h\), and \(\texttt {EMAX} _0 = \texttt {RMAX} \).
Illustration of the computations of \(\texttt {HMAX} _h(f)\) and \(\texttt {EMAX} _h(f)\) for the same one-dimensional function f as in Fig. 1, and for \(h_2 > h \ge h_3\). a \(\texttt {HMAX} _h\) produces a regional maximum wherever there is a maximum with dynamic greater than h in f. b \(\texttt {EMAX} _h(f)\) is a binary function whose nonzero values indicate where the maxima of \(\texttt {HMAX} _h\), hence the h-maxima of f, are. Note that in our 1D illustrations, \(\Omega \) is a discrete interval, represented by the x axis, and the structuring element defining the neighborhood of each pixel is \(\textrm{B}= \lbrace -1, 0, 1\rbrace \)
Links with the dynamic An important property of h-maxima is that they contain the regional maxima of f of dynamic larger than h. These are given by the connected components where f is maximal inside the h-maxima. In turn, the regional maxima with dynamic larger than h determine the value of \(\texttt {HMAX} _h(f)\) over the h-maxima they belong to.
To put this more formally in Proposition 1, let us first introduce some notations. Given \(f\in \mathcal {F}(\Omega , \mathbb {R})\), recall [15] that its upper level set of level \(t\in \mathbb {R}\), which we will denote by \(\chi _t(f)\), is defined by
Also, given a non-empty set \(M\subseteq \Omega \), and \(M_0\) a connected subset of M, we denote by \(\texttt {CC}(M_0, M)\) the unique connected component of M containing \(M_0\). Furthermore, \(\arg \max \limits _M f\) denotes the set \(\lbrace x\in M, f(x) = \max \limits _{y\in M} f(y)\rbrace \). Then, the following proposition holds.
Proposition 1
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\), \(h\ge 0\) and \(\lambda \in \mathbb {R}\). Then,
According to Proposition 1, computing the h-maxima allows to detect bright objects of dynamic larger than h. This strategy is well known in the morphological literature and is the one applied to count cells with the simple morphological algorithm described in Sect. 2.5.
Although this result has been at least partially stated in the literature [9], to the best of our knowledge no proof has been published, probably because it may appear completely obvious to experts, used to observe it on simple examples, like the one of Fig. 3. We provide a proof in “Appendix A.”
Larger h , fewer h-maxima It is well known in the morphological literature, and easy to realize on simple examples, that the number of h-maxima is a decreasing function of h. This comes from a specific ordering relation between \(\mathcal {M}_h\) for different h values. Let us introduce the ordering relation \(\preceq \) on collections pairwise disjoint connected components of \(\Omega \) defined, for any two such collections \(\mathcal {M}\) and \(\mathcal {M}^\prime \), by
One can check that this is a partial ordering. (Reflexivity is ensured by the pairwise disjunction of two elements a collection.) Then, we have the following result.
Proposition 2
For \(f\in \mathcal {F}(\Omega , \mathbb {R})\) and \(h_1, h_2 \ge 0\)
Recall that for \(h\ge 0\), the elements of \(\mathcal {M}_h(f)\) are pairwise disjoint, as regional maxima of a function. Therefore, a \(h_1\)-maximum is included in at most one \(h_2\)-maximum. This is why \(\mathcal {M}_{h_2}(f) \preceq \mathcal {M}_{h_1}(f)\) implies that \(\mathcal {M}_{h_2}(f)\) has no more elements than \(\mathcal {M}_{h_1}(f)\). Hence, the number of h-maxima decreases with h from the total number of regional maxima, when \(h=0\), to zero, when \(h = \Delta _\Omega f\). (Indeed one can see that \(\texttt {HMAX} _h(f)\) is just a constant function when \(h = \Delta _\Omega f \), and has therefore no regional maximum by definition, i.e., \(\mathcal {M}_{\Delta _\Omega f} = \emptyset \).) This is illustrated in Fig. 3.
Note that Proposition 2 generalizes a part of Proposition 1, namely the one stating that \(\mathcal {M}_{h}(f) \preceq \mathcal {M}_{0}(f) = \mathcal {M}(f)\) for \(h\ge 0\). The proof of Proposition 2 is provided in “Appendix B.”
a A function f, the result of \(\texttt {HMAX} _h(f)\) for \(h=2.5\) and the corresponding set \(\mathcal {M}_h(f)\) of h-maxima. The dynamics of the regional maxima of f are, from left to right: 2, 1, 4, 5, 5 and 3, respectively. b The evolution of \(h\mapsto \mathcal {M}_h(f)\) for \(h \in [0, \Delta _\Omega f]\)
Right continuity of \(\varvec{\mu }: \textbf{h} \mapsto \varvec{\mathcal {M}}_\mathbf{h(f)}\) Let us denote by \(\mu \) the function mapping \(h\in [0, \Delta _\Omega f]\) to \(\mathcal {M}_h(f)\), the set of h-maxima of f. With Proposition 2 in the previous paragraph, we saw that \(\mu \) is decreasing for the orderings \(\le \) and \(\preceq \). Now, we will see that it is also piecewise constant and right continuous as illustrated in Fig. 3. This will be useful in the study of the derivability of \(h\mapsto \texttt {HMAX} _h(f)\), in Sect. 3.3.1.
Proposition 3
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\) non-constant. Then, there exists \(K\in \mathbb {N}\) and \(h_0< h_1< \dots < h_{K+1} \in [0, \Delta _\Omega f]\) with \(h_0 = 0\) and \(h_{K+1} = \Delta _\Omega f\), such that
and for \(k \ne k^\prime \), \(\mathcal {M}_{h_{k^\prime }}(f) \ne \mathcal {M}_{h_k}(f)\).
From now, on we will note
the finite subset of \([0, \Delta _\Omega f]\) introduced by the proposition. Clearly, the elements of \(\mathcal {H}_f\) are the points of discontinuity of \(\mu \), but since \(\mu (h_k) = \lim \limits _{h\downarrow h_k}\mu (h)\), \(\mu \) is right continuous at these points and, therefore, everywhere. The proof of Proposition 3 is given in “Appendix C.”
Example of cell counting by identifying cells to h-maxima, in the TRP1 melanocyte dataset (see Sect. 4 for the description of datasets). From left to right: input 8-bits RGB image (resized); green channel marked with ground truth location of the 107 cells (in red); and detection results using the best h for the image: \(h_{\texttt {I}}^{*}=54\). Note that despite the perfect count, there are some false positives and false negatives (e.g., just below the top right hand corner)
2.5 Morphological Pipeline for Cell Counting
In this section, we present a simple morphological algorithm for cell counting in images where individual cells are well approximated by regional extrema of significant dynamic,Footnote 2 like in Fig. 4. We assume images to take quantized values in \([0, V_{\max }]\cap \epsilon \mathbb {N}\) with \(V_{\max }>0\) and \(0<\epsilon <V_{\max }\).
The algorithm takes as input an image f, as parameter a number \(h\ge 0\) and returns an estimated number \(n_c\) of cells after running the following steps:
-
1.
Apply an alternate filter
$$\begin{aligned} \tilde{f} \leftarrow \varphi _\textrm{B}\circ \gamma _\textrm{B}(f) \end{aligned}$$where \(\varphi _\textrm{B}\) and \(\gamma _\textrm{B}\) are, respectively, the closing and opening by \(\textrm{B}\), the unit square structuring element (see Sect. 2).
-
2.
Compute the characteristic function of the h-maxima of \(\tilde{f}\),
$$\begin{aligned} \texttt {I} _h \leftarrow \texttt {EMAX} _h(\tilde{f}) = \texttt {RMAX} (\texttt {HMAX} _h(\tilde{f})), \end{aligned}$$which is a binary image (see Sect. 2.4).
-
3.
Apply to \(\texttt {I} _h\) the operator \(\texttt {CCC} \), which maps any binary image to its number of connected components:
$$\begin{aligned} n_c \leftarrow \texttt {CCC} (\texttt {I} _h). \end{aligned}$$
Let us briefly comment on the steps of this algorithm. The alternate filter is a very common one in morphological processing of images. The opening \(\gamma _{\textrm{B}}\) first removes bright structures smaller than the support of \(\textrm{B}\), after what \(\varphi _{\textrm{B}}\) has the dual effect of filling in dark structures smaller than \(\textrm{B}\). Hence, the alternate filter denoises the image by suppressing small variations but preserving the shape of structures larger than \(\textrm{B}\).
The second step of the algorithm detects, in the filtered image, the bright domes containing at least a regional maximum of dynamic strictly larger than h. Our hypothesis is that in such an image, and for a well-chosen h, each cell contains one and only one such dome. This assumption is realistic given that cells do appear as bright shapes on a dark background, and that their brightness is fairly uniform over one image. Note that, in the present setting where f takes quantized values, \(\texttt {RMAX} \) can be computed with a geodesical reconstruction, following Eqs. (9) and (10).
According to that hypothesis, counting the number \(n_c\) of connected components of the binary image \(\texttt {I} _h\) yields an estimation of the number of cells in f. As we saw in Sect. 2, with Proposition 2, this number \(n_c\) is a decreasing step function of h, ranging from the total number of local maxima (for \(h=0\)) to zero (for \(h = \Delta _{\Omega }f\), the dynamic of any global maximum). This is why we expect that for each input image, there is an interval of h values for which \(\vert \mathcal {M}_h(f)\vert \) is close to the right number of cells.
The goal of our method is to train end-to-end a deep learning architecture which first estimates the right dynamic and then applies the above algorithm. This is the motivation of the next section.
3 Integration in a Deep Learning Architecture
The morphological pipeline described in the previous section depends on one crucial parameter, namely the minimum dynamic for a regional maximum to be considered a relevant object (in our case, a cell). Following [8], in this section we describe how this pipeline is implemented as a sequence of neural layers and embedded in an architecture where it is preceded by a CNN which predicts an optimal dynamic for each image, in order to count the right number of cells at the output of the network. We also analyze the ability of such pipeline to be trained with gradient descent with respect to an appropriate loss, by studying the derivability of its components with respect to their parameters and input variables.
3.1 Counting Connected Components (\(\texttt {CCC} \)) Layer
Counting connected components is an important step in many of the classical methods in data processing [16], especially in images and graphs. In the context of deep learning, this problem has been used to evaluate the generalization capacity of neural networks [17]. However, to be used as a layer inside a network, it must be implemented so that gradient backpropagation can be computed in an auto-derivation software [18]. In this subsection, we present an algorithm, already introduced in [8], meant to count the connected components of a binary image and relying mainly on the geodesic reconstruction defined in (6).
Let \(f\in \mathcal {F}(\Omega ,\{0,1\})\) be a binary image, and \(\mathcal {U}_{\Omega }\in \mathcal {F}(\Omega ,[0,1])\) an injective image, i.e., such that \(\forall x,y\in \Omega , \;\; x\ne y \Rightarrow \mathcal {U}_{\Omega }(x)\ne \mathcal {U}_{\Omega }(y)\). Then, it is clear, from Eq. (7), that reconstructing \(\mathcal {U}_{\Omega }\wedge f\) under f will label every connected component \(\texttt {CC}\) of f with the maximum value taken by \(\mathcal {U}_{\Omega }\) in \(\texttt {CC}\). Since \(\mathcal {U}_{\Omega }\) is injective, this maximum is achieved only once in each connected component. Hence, setting to one each pixel \(x\in \Omega \) such that \(\texttt {REC} (\mathcal {U}_{\Omega }\wedge f,f)(x) = \mathcal {U}_{\Omega }(x)\) produces a binary image where exactly one pixel per connected component of f is lit. This is illustrated in Fig. 5.
Illustration of the \(\texttt {CCC} \) algorithm. a \(\mathcal {U}_\Omega \) coincides only once with \(\texttt {REC} (\mathcal {U}_{\Omega }\wedge f,f)\) inside each connected component, and never outside the connected components. b Hence, \(\mathcal {D}_f\) has only two pixels activated, one per connected component of f
Therefore, we define
where the binarization operator \(\mathcal {B}\) for two functions \(f,g \in \mathcal {F}(\Omega ,\mathbb {R})\) is:
\(\mathcal {D}_f\) denotes the final detection result, represented by locations of the isolated connected components.
Accordingly, one can count the number of connected components on f by simply summing (22):
3.2 Loss Functions
The parameters of the proposed neural architecture, which are those of the \(\texttt {CNN} \) estimating h, are optimized by minimizing a loss function which can penalize different errors made by the network at different stages.
Since we are dealing with a counting task, a natural error to penalize is the counting error. Therefore, we define the counting loss as the mean absolute error (MAE) between the estimated number of cells \(\widehat{\texttt {CCC}}_{\texttt {I}}:=\texttt {CCC} (\texttt {EMAX} _{\hat{h}_{\texttt {I}}})\) and the true number of cells \(\texttt {CCC} _{\texttt {I}}^{*}\), for an input image \(\texttt {I} \):
In the case where we do know, for each training image \(\texttt {I} \), an optimal value \(h_{\texttt {I}}^*\) for which the number of h-maxima is the closest to the number of cells, and then, we have two more possible errors to penalize. One is the error on the estimation of h, i.e., \(|h^*_{\texttt {I}} - \hat{h}_{\texttt {I}}|\), and the other one is the reconstruction loss, defined as the mean squared error (MSE) between the predicted geodesic reconstruction image \(\texttt {HMAX} _{\hat{h}_{\texttt {I}}}\) and the ground truth one \(\texttt {HMAX} _{h_{\texttt {I}}^{*}}\):
Yet another possibility is the binary cross entropy between \(\texttt {EMAX} _{\hat{h}_I}\) and \(\texttt {EMAX} _{{h}_I^*}\), \(\texttt {BCE} (\texttt {EMAX} _{\hat{h}_I}, \texttt {EMAX} _{{h}_I^*})\). Note that these loss terms are less and less constraining as we move from h to the output count, in the sense that
, whereas none of the converse implications is true. In the present paper, we focus on the counting and reconstruction losses, as previous experiments presented no improvement in including the other two terms.
Hence, the final loss we consider is a joint loss of the form
where \(\alpha \ge 0\) and \(\beta \ge 0\) control the importance of each of the terms, and need to be adjusted as hyperparameters (see Sect. 4).
We shall stress on the fact that while both terms can help the optimization of the parameters, in practice the reconstruction loss is much more constraining as it requires the computation of an optimal h and its corresponding \(\texttt {HMAX} _h\) reconstruction, for each training image. This implies that we know, before training, the grayscale images that will be passed to the h-estimator CNN. In particular, this prevents from learning an optimal preprocessing of images before the estimation of h, e.g., a trainable filtering that would replace the proposed alternate filter \(\varphi _{\textrm{B}}\circ \gamma _{\textrm{B}}\), and/or a trainable mapping from three-channel to one-channel images.
Therefore, training our pipeline by minimizing the counting loss only (i.e., \(\alpha =0\)) is highly desirable. It is one of the targets of the present paper, as an improvement to [8].
3.3 Derivability of the Pipeline
In order to train the proposed pipeline with a gradient descent algorithm, we need to check that a gradient can flow during backpropagation, allowing for the update of the CNN parameters. The alternate filter is non-trainable and applied to the input before the CNN, so it is not involved in the backpropagation. The CNN itself does not raise any particular problem compared to usual deep networks for images. The next step takes as input the h value estimated by the CNN, to compute the h-transformation \(\texttt {HMAX} _h(\tilde{f})\), which should therefore be viewed as a function of h for a fixed input image \(\tilde{f}\). The last step consists in counting the number of regional maxima of \(\texttt {HMAX} _h(\tilde{f})\), i.e., applying \(\texttt {CCC} \circ \texttt {RMAX} \) to \(\texttt {HMAX} _h(\tilde{f})\). Let us now analyze the derivability of these last two steps.
Illustration of Proposition 4, showing the changes from \(\texttt {HMAX} _h(f)\) to \(\texttt {HMAX} _{h\pm \eta }(f)\) for a function f with integer values (black line), and two different h. In a, f and h fulfill the stability condition of the proposition in both positive and negative directions, therefore a shift of \(\pm \eta \) occurs exactly at the h-maxima (regional maxima of the green function), confirming Eq. (29). In b, however, the stability holds for nonnegative \(\eta \), but the h-maxima vary even for the smallest negative \(\eta \). Hence, in the latter case the reconstructions vary outside the h-maxima and remain unchanged in some parts of these, contradicting (29)
3.3.1 Derivative of \(h\mapsto \texttt {HMAX} _h(f)\)
Here, we consider a fixed \(f\in \mathcal {F}(\Omega , \mathbb {R})\) and see \(\texttt {HMAX} _h(f)\) as a function of h. Recall that we denote by \(\mathcal {M}_h(f)\) the set of h-maxima of f (see Sect. 2). Then, we can state the following result.
Proposition 4
Let \(h \ge 0\), and assume there exists an interval I with \(0\in I\), such that for any \(\eta \in I\), \(\mathcal {M}_{h+\eta }(f) = \mathcal {M}_{h}(f)\). Then, we have
The interpretation of Proposition 4 is the following. First, the assumption that \(\mathcal {M}_{h+\eta }(f) = \mathcal {M}_{h}(f)\) for \(\eta \in I\) just means that, for the considered f and value h, the set of h-maxima of f remains the same if we change h by a small enough (but nonzero) quantity, in at least one direction, positive or negative. We will refer to this concept as the stability of h-maxima. Figure 6 shows an example where I is open, in (a), thus the stability holds in both directions, and an example where \(0 = \min I\), in (b), where the stability holds only for nonnegative \(\eta \).
Second, Eq. (29) indicates that, under the condition of stability of h-maxima, a small change in h only changes the height of the h-maxima and leaves the rest of \(\texttt {HMAX} _h(f)\) unchanged, as shown in Fig. 6. The proof of Proposition 4 is provided in “Appendix D.”
Proposition 4 leads to the following consequence regarding the Fréchet derivability of \(h\mapsto \texttt {HMAX} _h(f)\).
Corollary 5
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\) and \(\mathcal {H}_f = \lbrace h_0, h_1, \dots , h_{K+1}\rbrace \) be the finite subset of \([0, \Delta _\Omega f]\) containing the points of discontinuity of \(h\mapsto \mathcal {M}_h(f)\), as introduced by Proposition 3. Then, the function \(h \mapsto {{\texttt {HMAX}}}_{h}(f)\) is differentiable at \(h^*\) if and only if \(h^* \notin \mathcal {H}_f\), and in that case its derivative is
The non-derivability of \(h \mapsto \texttt {HMAX} _{h}(f)\) at any \(h\in \mathcal {H}_f\) appears clearly on the example of Fig. 6b, where \(h^* = 2\). In this case, on the one hand we do have, for \(0\le \eta <1\),
Therefore, if \(h \mapsto \texttt {HMAX} _{h}(f)\) was Fréchet differentiable at \(h^*=2\), its derivative would be \(\frac{\partial \texttt {HMAX} _{h^*}(f)}{\partial h} = -\texttt {EMAX} _{h^*}(f)\). But on the other hand, we have, for \(-1<\eta <0\),
where \(\texttt {EMAX} _{h^*+\eta }(f) - \texttt {EMAX} _{h^*}(f) = \mathbbm {1}_{\lbrace 1\rbrace }-\mathbbm {1}_E\), with \(E =\lbrace 3, 4, 10\rbrace \) if we index pixels from left to right and starting from zero. Since that quantity does not depend on \(\eta \), it does not converge to the null function when \(\eta \) goes to zero from below, which contradicts the differentiability of \(h \mapsto \texttt {HMAX} _{h}(f)\) at \(h^* = 2\). The general case is handled likewise in the proof of Corollary 5, in “Appendix E.”
It is clear from this example, and more formally from Proposition 1, that discontinuities in \(h\mapsto \mathcal {M}_h(f)\) can only be caused by discontinuities of the upper level set function \(t\mapsto \chi _t(f)\). For functions f taking quantized values (like in Fig. 6), we have the guarantee that the level sets remain constant in between these values, and that therefore \(h \mapsto \texttt {HMAX} _{h}(f)\) is differentiable in the corresponding intervals.
3.3.2 Derivability of \(\texttt {CCC} \circ \texttt {RMAX} \)
As explained earlier, the operator \(\texttt {CCC} \circ \texttt {RMAX} \) consists in counting the regional maxima of a function (in our case, \(\texttt {HMAX} _h(f)\)), \(\texttt {RMAX} \) returning the characteristic function of the regional maxima, and \(\texttt {CCC} \) the one counting the connected components of a binary image.
But, as seen in the previous paragraph, this number of h-maxima remains constant over intervals \([h_k, h_{k+1})\), and therefore, the derivative of \(\texttt {CCC} \circ \texttt {RMAX} \) at \(\texttt {HMAX} _h(f)\), \(h\notin \mathcal {H}_f\) is zero.
On the other hand, when \(h\in \mathcal {H}_f\), a discontinuity in the h-maxima occurs, causing a non-derivability of \(\texttt {CCC} \circ \texttt {RMAX} \) at \(\texttt {HMAX} _h(f)\) since, by Propositions 2 and 3, a change in \(\mathcal {M}_h(f)\) also changes its cardinality. When relying on auto-differentiation with the algorithm described in Sect. 3.1, we rather observe exploding gradients.
Therefore, in order to avoid vanishing as well as exploding gradients, we propose to use explicit (approximations of) derivatives, as explained next.
3.4 Modified Backpropagation
Let us consider the setting where the training is done by minimizing the joint loss (28) with \(\beta > 0\), that is to say including the counting loss term. As we have just shown in the previous section, numerical issues such as vanishing or exploding gradients are expected when relying on the auto-differentiation of (25), and therefore of \(h\mapsto \vert \mathcal {M}_h(f)\vert = \texttt {CCC} \circ \texttt {RMAX} \circ \texttt {HMAX} _h(f)\). In our experiments, significant optimization difficulties have been observed, especially when the counting loss term was weighted similarly as the reconstruction term, that is, \(\beta \approx \alpha \). We overcame these difficulties by using derivatives of smooth approximations of the function \(h\mapsto \vert \mathcal {M}_h(f)\vert \).
We propose two approximations, which were both tested in our experiments (see Sect. 4). The first one, more naive, is only based on the fact that \(\vert \mathcal {M}_h(f)\vert \) is a decreasing function of h. Without any further knowledge, we assume
which gives
We denote by (MB, \(-1\)) the class of models trained with this approximation, MB standing for modified backpropagation.
The second approximation, more sophisticated and realistic, assumes
where, for some positive integer N,
The justification for this approximation is given by Corollary 7, based on Proposition 6. We first introduce the notation
for the set of regional maxima of f with dynamic not smaller than h. We can now state the following proposition, proved in “Appendix F.”
Proposition 6
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\) non-constant, and \(h\in {(0, \Delta _\Omega f]}\). Then, for any \(x\in \Omega \),
Given that, additionally, we have by definition \(f \ge \texttt {HMAX} _h(f) \ge f-h\), it follows that for any \(x\in \Omega \) and \(h>0\),
with equality to 1 if and only if \(x\in M_0\) for some \(M_0\in \overline{\mathcal {D}}_{h}(f)\). Hence, when N goes to infinity, we get
In words, when N is large enough, the sum in (33) approximates the number of pixels belonging to regional maxima of dynamic not smaller than h.
Now, note that for \(h\notin \mathcal {H}_f\), as introduced by Proposition 3, there is no regional maximum \(M_0\) with dynamic exactly h. Indeed, if there was one, by Proposition 1 we would have \(M_0\notin \mathcal {M}_h(f)\), but \(M_0\notin \mathcal {M}_{h-\epsilon }(f)\) for any \(\epsilon > 0\) (smaller than h), so h would be a discontinuity point of \(h\mapsto \mathcal {M}_h(f)\), i.e., in \(\mathcal {H}_f\). Therefore, for such h we have \(\overline{\mathcal {D}}_{h}(f) = \mathcal {D}_{h}(f)\), where we recall that
is the set of regional maxima of f with dynamic larger than h, already defined in (C8). Furthermore, Proposition 1 tells us that \(M_0\in \mathcal {D}_{h}(f)\) if and only if \(M_0\) is a connected component of \(\arg \max \limits _M f\) for some \(M\in \mathcal {M}_h(f)\). Thus, for \(h\notin \mathcal {H}_f\), we get
As a consequence, if we further assume that in every h-maximum M, the set \(\arg \max \limits _{M} f\) contains only one pixel, then when N goes to infinity, the sum in (33) does provide the exact number of h-maxima. We have just proved the following corollary.
Corollary 7
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\) be injective, \(h\in [0, \Delta _\Omega f]{\setminus } \mathcal {H}_f\), and \((\mathcal {I}_N)_{N\ge 1}\) the sequence of images defined by (34). Then,
Note that the approximation is quite accurate as soon as we add some small random noise to f, preventing any two pixels to have the same value, and for large enough N, as illustrated in Figs. 7, 8.
Illustration of Corollary 7. The function \(\tilde{f}\) is defined by adding some Gaussian noise (zero mean, 0.2 standard deviation) to the function f of Figs. 3, 6. Here, \(h=2.5\notin \mathcal {H}_{\tilde{f}}\) and \(N=10\). The value to be approximated is \(\vert \mathcal {M}_h(f)\vert = 3 = \vert \mathcal {M}_h(\tilde{f})\vert \), and we get \(\sum \limits _{x\in \Omega }\mathcal {I}_N \approx 3.14\), where \(\mathcal {I}_N\) is computed for \(\tilde{f}\), which is injective
This leads to the following approximation of the derivative
with
and \(\rho (f, h) = f - \texttt {HMAX} _h(f)\). In the experimental section, we will refer to this approximation as (MB, N).
Illustration of Equations (33) and (41), where f is the preprocessed green channel of the cell image of Fig. 4. Left: The actual number of h-maxima is plotted in blue as a function of h, whereas its approximation \(\sum \limits _{x\in \Omega }\mathcal {I}_N(x)\) is plotted in orange. Right: In blue, we plot an estimation of \(\frac{\partial \vert \mathcal {M}_h(f)\vert }{\partial h}\) obtained by finite differences \(0.5\times (\vert \mathcal {M}_{h+1}(f)\vert - \vert \mathcal {M}_{h-1}(f)\vert )\). The orange curve is the approximation \(\sum \limits _{x\in \Omega }\mathcal {I}^{\prime }_N(x)\). \(\mathcal {I}_N\) and \(\mathcal {I}^{\prime }_N\) are computed for \(N=50\) and the perturbed injective function \(\tilde{f}\) is obtained by adding Gaussian noise to f with standard deviation \(\sigma = 0.2 h\)
4 Experiments
4.1 Data
To evaluate the benefits of our proposition, three databases are used. It is important to note that the three types of images considered have as common factor that the cells of interest are bright objects on dark backgrounds, for which using the dynamic for cell counting makes sense. In the three considered datasets, the input image is transformed to a single input channel and then rescaled at a resolution of \(256 \times 256\) pixels. The image range was between 0 and \(V_{\max } = 255\), quantized with a float precision \(\epsilon \). Hence, the previously described method applies, taking binary images with values in \(\lbrace 0, 255\rbrace \). Applying the morphological pipeline described in Sect. 2.5 for all the integer values h in \([0, \Delta _{\Omega }f]\), allowed to determine, for each training image I, an optimal dynamic \(h_{\texttt {I}}^{*}\), defined as the lower bound of the interval of h values minimizing the difference between the true number of cells and the estimated one.
4.1.1 TRP1 Dataset
The TRP1 dataset [3] we used contains two sets, called set1 and set2, of 76 fluorescent melanocytes RGB images each, with resolution \(1024\times 1024\) pixels, which showed a high variability in density and shapes.Footnote 3 They come along with manual annotations of the cell coordinates. The images were acquired from in vitro pigmented reconstructed skin samples submitted to TRP1 fluorescent labeling [19]. Figures 4 and 14 show examples of images and manual annotations.
4.1.2 Fluorescent Neuronal Cells Dataset
The Fluorescent Neuronal Cells dataset [4] comprises 283 fluorescence microscopy images of neuron slices from the mice brain, with a resolution of \(1600\times 1200\) pixels. These are accompanied by the corresponding ground truth cell location labels. The dataset presents a number of challenges for the cell counting task. Firstly, the cell shapes are variable, the contrast of the cells relative to the background is variable, and the backgrounds themselves exhibit some variation in brightness. Example of input image is shown in Fig. 15. In addition, the images contain biological structures that are not neuronal cells, and the number of cells contained in the images is highly variable, ranging from none to dozens of cells, with instances of overlapping clasping between cells. These factors combine to present significant challenges for accurate cell counting.
4.1.3 Subset of Cellpose Dataset
Originally used for generalist segmentation model training and evaluation, the Cellpose dataset [20] comprises a diverse array of microscopic cell images sourced from multiple avenues, including previous research and online searches. Furthermore, the dataset includes a proportion of non-cell images, which serve to enhance the model’s generalization capability. The dataset comprises a total of 616 images, of which 69 have been reserved for testing purposes. Each image is accompanied by ground truth segmentation labels.
Proposed end-to-end differentiable pipeline using a \(\texttt {CNN} \) and an \(\texttt {HMAX} \) layer to cell counting. The network is trained by minimizing a loss function that depends on (1) the difference between the geodesic reconstruction with the dynamic \(\hat{h}\) estimated by the network, and the best possible one, and (2) the difference between the number of cells and the number of regional maxima of \(\texttt {HMAX} _{\hat{h}}\)
As our method is constrained to images featuring bright objects in darker backgrounds, we manually selected a subset of the entire dataset, comprising images that align with this hypothesis. The selected fluorescent microscopic images typically feature nuclei that have been stained with DAPI and proteins that have been labeled with a fluorescent substance in the cytoplasm, the two stains are represented in different color channels. The examples of the selected cell images can be illustrated in left column in Fig. 16. Finally, 213 data samples are selected for training, validation and test.
4.2 Overall Architecture
The proposed neural architectureFootnote 4 is illustrated in Fig. 9. Morphological layers apply an alternate filter (opening followed by closing) by a \(3\times 3\) square structuring element, to the resized green channel, which yields the first preprocessed image \(\texttt {I} \). This image is passed as input to a CNN, which estimates the best dynamic \(\hat{h}_{\texttt {I}}=\texttt {CNN} _\theta (\texttt {I})\), where \(\theta \) denotes the parameters of the CNN. This parameter is then used to compute the \(\hat{h}_{\texttt {I}}\)-maxima of the filtered image, by first feeding it to the \(\texttt {HMAX} \) reconstruction layer, followed by the \(\texttt {RMAX} \) one to get the binary image \(\texttt {EMAX} _{\hat{h}_{\texttt {I}}}\). Finally, the number of connected components of the latter is obtained at the output of the proposed connected components counting layer, \(\texttt {CCC} (\texttt {EMAX} _{\hat{h}_{\texttt {I}}})\). The CNN architecture is summarized in Fig. 10. In particular, each convolutional layerFootnote 5 is followed by a Batch Normalization layer with ReLU as activation function. Following [11], global average pooling layer is used after the second Maxpooling layer instead of stacked flatten and fully connected layers, which has the advantage of further reducing the number of parameters of the network. What’s more, extracting global feature information is coherent with the objective of our task.
Regarding the theoretical complexity of this architecture, for the chosen fixed size of kernels and structuring elements, the CNN part of the pipeline has linear complexity in the number n of input pixels, because convolutions do. A dilation also has linear complexity but the geodesical reconstruction might require as many as \(\sqrt{n}\) dilations, each one followed by a pixelwise minimum and an image comparison test after each dilation. Therefore, with our implementation (which does not use special data structures to reduce complexity) the morphological part has a much higher complexity of \(O(n^{3/2})\) in the worst case, which determines the complexity of the whole pipeline. Despite this asymptotic result, it is worth noting that the actual number of operations performed by the CNN is about \(5000 \times n = 5000 \times 256^2\), and the morphological part performs two geodesical reconstructions, resulting in at most about \(20\sqrt{n} \times n \approx 5000 \times 256^2 \) operations as well. So one could expect the CNN part to take about the same time as the morphological one during inference. In practice, however, we observe that the morphological part takes from 13 to 29 times longer than the CNN on average, depending on the dataset (although all input images are resized at the fixed input size \(256\times 256\) pixels). This could be explained by the fact that, within the machine learning framework we use (TensorFlow), convolutions are optimized way better than our custom geodesical reconstruction, which relies on a while loop and an image comparison after each geodesical dilation.
4.3 Training Protocol
TRP1 dataset: To train the architecture presented earlier, hence fit the parameters of the CNN, we randomly split set1, containing 76 images, into 60 training images and 16 validation images. We only worked with the green channel of image images. The parameters of the CNN were adjusted through gradient descent on the training set, and the best model weights were chosen through evaluation metrics on the validation set. The whole set2, which also contains 76 images, was used for the final test only.
Fluorescent Neuronal Cells dataset: The same test set used in [4] is employed in our experiments, comprising 70 selected images. The remaining data are randomly split into a training set (80%) and a validation set (20%). For each fluorescent image, we converted it to grayscale using the linear luminance formula defined as \(0.2126 R + 0.7152 G + 0.0722 B\), where R, G and B correspond to the red, green and blue channels, respectively. The image was then resized to a resolution of \(256 \times 256\) for input to the CNN.
Subset of Cellpose dataset: In order to keep consistency with the original Cellpose dataset, among the 213 selected data samples, we keep the ones that originally in the test split of Cellpose dataset as test set, the rest ones are randomly split into train and validation set with ratio 8:2, resulting in 158 samples for training, 39 for validation and 16 for test. For each input image, the maximum value for the RGB channels for each pixel is extracted in order to convert the input image into a single channel. Prior to inputting the CNN, the same resizing process is applied to the resolution of \(256\times 256\).
4.3.1 Data Augmentation
Convolutional operations are translation invariant, but not rotation or flip invariant. Moreover, for the image samples in the dataset, the flipped or rotated version still looks like a valid sample. Thus it is reasonable to use flip and rotation as data augmentation methods in the training stage. More precisely, we used both vertical and horizontal flip transformations with probability 1/2; a rotation with the same probability and the rotation angle was uniformly selected from \(\{90^{\circ },180^{\circ },270^{\circ }\}\). The true values of the number of cells, and the optimal value of dynamics, are not changed by this augmentation.
4.3.2 Optimization
The joint loss function was minimized with stochastic gradient descent. More precisely, we used the RMSProp optimizer [21], with initial learning rate of \(10^{-3}\). Moreover, we used a batch size of 15 images for TRP1 and Cellpose, 17 for Fluorescent Neural Cells, and trained for a maximal number of 1600 epochs with early stopping as soon as there was no improvement on the validation loss for 300 epochs. Only the model weights with lowest validation error were saved and used for final evaluation on the test set.
4.3.3 Hyperparameters
Loss weights \((\alpha , \beta )\): In [8], experiments with the joint loss (28) had been carried out but without the modified backpropagation introduced in Sect. 3.4. Fivefold cross-validation experiments on set1, with 11 ratios \(\beta /\alpha \) tested (ranging from \(10^{-4}\) to \(10^2\), including zeroFootnote 6) had indicated that best results were expected for \(\alpha = 1,\beta = 0.001\), that is to say for very little weight given to the counting loss. We now have an explanation for this, as we showed in Sect. 3.3 that derivating the reconstruction loss with respect to h is not an issue in general, but derivating the counting loss is. On the contrary, with the modified backpropagation, we observed no significant effect of the balance between \(\alpha \) and \(\beta \): \((\alpha = 1, \beta = 0.001)\) gave similar performances on the validation sets as \(\alpha = \beta = 0.5\) and as \((\alpha = 0, \beta = 1)\). Therefore, we chose to set \(\alpha = \beta = 0.5\).
Results on the test set of the TRP1 dataset, for the best of our models on this dataset, trained with the joint loss and modified backpropagation (MB, \(-1\)). Left: predicted \(\hat{h}\) compared to the ground truth optimal \(h^{*}\). Right: predicted versus true cell numbers. Each dot represents an individual image. In both plots, the red line represents the identity function. See Fig. 14 for examples of predictions by this model
Results on the test set of the Fluorescent Neuronal Cells dataset, for the best of our models on this dataset, trained with the counting loss only and modified backpropagation (MB, \(N=50\)). Left: predicted \(\hat{h}\) compared to the ground truth optimal \(h^{*}\). Right: predicted versus true cell numbers. Each dot represents an individual image. In both plots, the red line represents the identity function. See Fig. 15 for examples of predictions by this model
Results on the test set of Cellpose, for the best of our models on this dataset, trained with the counting loss only and modified backpropagation (MB, \(-1\)). Left: predicted \(\hat{h}\) compared to the ground truth optimal \(h^{*}\). Right: predicted versus true cell numbers. Each dot represents an individual image. In both plots, the red line represents the identity function. See Fig. 16 for examples of predictions by this model
Examples of predictions on the test set of the TRP1 dataset, for the best of our models on this dataset, trained with the joint loss and modified backpropagation (MB, \(-1\)). From left to right: Input resized RGB image, green channel image with ground truth cell locations, true number of cells \(n^*\) and optimal dynamic \(h_{\texttt {I}}^*\); green channel image with predicted cell locations and their estimated number \(n_c\) obtained for the estimated dynamic \(\hat{h}_{\texttt {I}}\)
Examples of predictions on the test set of the Fluorescent Neuronal Cells dataset, for the best of our models on this dataset, trained with the counting loss only and the modified backpropagation (MB, N=50). From left to right: Input resized RGB image, green channel image with ground truth cell locations, true number of cells \(n^*\) and optimal dynamic \(h_{\texttt {I}}^*\); grayscale image with predicted cell locations and their estimated number \(n_c\) obtained for the estimated dynamic \(\hat{h}_{\texttt {I}}\)
Examples of predictions on the test set of Cellpose, for the best of our models on this dataset, trained with the counting loss only and the modified backpropagation (MB, \(-1\)). From left to right: Input resized RGB image, maximum of three channels image with ground truth cell locations, true number of cells \(n^*\) and optimal dynamic \(h_{\texttt {I}}^*\); grayscale image with predicted cell locations and their estimated number \(n_c\) obtained for the estimated dynamic \(\hat{h}_{\texttt {I}}\)
Asymptotic value N: For the experiments where the modified backpropagation was defined following (41) and (42), the value of N was chosen by testing on training images the trade-off between the quality approximation of the number of h-maxima given by (33), and its smoothness. Indeed, a large N gives a better approximation of \(h \mapsto \vert \mathcal {M}_h(f) \vert \), but also makes this approximation converge to a step function with zero derivatives almost everywhere, which is what we want to avoid. This study led us to choose \(N=50\), and therefore, the class of models trained with this modified backpropagation is denoted (MB, \(N=50\)).
Random noise to compute \(\mathcal {I}_N^\prime \): Recall that the approximation (41) of \(\frac{\partial \ \vert \mathcal {M}_h(f)\vert }{\partial h}\) relies on Corollary 7, which holds for injective images. To obtain almost surely an injective version \(\tilde{f}\) of an image f, possibly non-injective, we add some random noise to the latter. The level of noise must be small enough not to change (too much) the number of h-maxima, but large enough to ensure a sufficient difference between the maximum value and the second largest value within a h-maximum, hence ensuring a quicker convergence in (40). Again, trials on the training data led to the choice of adding Gaussian noise with \(\sigma _h = 0.2 h\) as standard deviation, where h is the value at which we want to approximate \(\frac{\partial \ \vert \mathcal {M}_h(f)\vert }{\partial h}\).
4.4 Evaluation Metrics
The following metrics have been used to evaluate our proposed pipeline. The average relative error averages the per-image relative counting error:
where \(\mathcal {S}\) denotes the test set with sample number \(\mathcal {N}\). Considering the variability in cell numbers of the considered datasets, we also monitor the total relative error [2, 3], which sees the whole dataset as one skin sample, but avoids compensations of errors by taking an absolute difference in each image:
Additionally, for the comparison purpose, we use mean absolute error (\(\textrm{MAE}\)) and mean percentage error (\(\textrm{MPE}\)), recommended in [4]:
It is worth noting that finding the right number of cells does not imply correct detections, as false positives can compensate false negatives, like illustrated in Fig. 4. However, we have deliberately limited our analysis to counting errors, as counting is the primary objective. All counting methods benefit from this type of compensation, which is only possible if the method is not overly biased toward over or underestimation.
5 Results
The results of our approach on the three test sets are summarized in Table 1, and compared to the state of the art for the TRP1 and the Fluorescent Neural Cells datasets. Since Cellpose is traditionally a segmentation benchmark, no published counting method was available (i.e., a method which does not rely on the ground truth masks of cells, but at most on their coordinates). For this reason, we followed the method described in [3] and trained a Unet to predict density maps, as a baseline for this dataset. More details for Unet architecture and training can be found in “Appendix G”.
Each of our models was trained with one of the two different modified backpropagations introduced in Sect. 3.4, namely (MB, \(-1\)) and (MB, \(N=50\)), and either with the joint loss (\(\alpha = \beta =1\)) or the counting loss only (\(\alpha =0\)). For each of these alternatives, ten models were trained following the protocol described in Sect. 4.3. We provide the average metric on the test sets ± the standard deviation over the ten training instances. More detailed results of the best performing models are plotted in Figs. 11, 12, 13, as well as some examples of their predictions in Figs. 14, 15, 16.
Based on these experiments, the following observations can be made. Firstly, our models achieve competitive results compared to the state of the art, sometimes surpassing it, with the advantage of being much lighter (50 to 100 times fewer parameters) and more interpretable: What is detected as a cell is a h-maximum, therefore one knows that missed or spurious detections are either due to a poor estimation of h, or to a deviation from our assumption of a one-to-one correspondence between cells and h-maxima. The robustness of our method also appears in Figs. 11, 12, 13, where one can see that accurate count estimations can be achieved despite less accurate estimations of the optimal h. This is because the number of h-maxima is a step function of h, and therefore, the optimal number is reached for an interval of h values, usually wider as h is large. Furthermore, values outside that interval but close to it often give close to optimal count.
Secondly, no striking difference in performance appears between models trained with one or the other proposed modified backpropagation, nor between models trained with joint loss or counting loss only. It is remarkable in particular that training models with either the joint loss or the counting loss only, without the need for ground truth cell coordinates, converges well and consistently across different training instances. It is also interesting to note that the most naive approximation \(\frac{\partial \vert \mathcal {M}_h(f)\vert }{\partial h} = -1\), implemented in the (MB, \(-1\)) models, seems to work as well, overall, as the more sophisticated and accurate one, (MB, \(N=50\)). This can be explained by the fact that the actual derivative is not far from \(-1\) in the range of h values of interest, usually in the middle part of \([0, \Delta _{\Omega }(f)]\). Besides its simplicity of implementation, the interest of (MB, \(-1\)) lies in a significantly reduced computation time during backpropagation (about 20%) compared to (MB, \(N=50\)).
Thirdly, the limitations of our approach are well illustrated by the examples of Figs. 14, 15, 16. We can see that assuming a one-to-one correspondence between cells and h-maxima, for a single h value per image, leads to some over detections when non-cell objects happen to match this criterion, and missed cells especially when clusters of cells appear as one large maximum and cannot be distinguished at the considered resolution and after the applied preprocessing (see in particular the examples from the Fluorescent Neural Cells dataset in Fig. 15). We shall also comment on the limitation of the Unet trained on Cellpose as a baseline. It is worth noting that its architecture and training configurations were inspired by the study [3], designed for TRP1, and was not optimized for the Cellpose dataset. We decided to keep it for comparison, to show that the task is not straightforward.
6 Conclusion
In this paper, we have built on the recently introduced end-to-end trainable pipeline [8] where a CNN predicts the dynamic threshold for h-maxima to be recognized as melanocytes, and a layer counts the connected components of the h-maxima images, yielding an estimated number of cells. As reported in [8], when a counting error penalization term is included in the loss function, training with auto-differentiation often fails to converge and satisfactory results could only be achieved with a very small weight on the counting term with respect to the geodesical reconstruction term.
As an extension to [8], in this paper we further provide a detailed theoretical framework of h-maxima, on which is built the analysis of the derivability of the pipeline, explaining the issues related to the minimization of the counting loss. In order to overcome these issues, this work proposes and tests a training paradigm employing explicitly defined backpropagation rules, which show to work well even when training our pipeline with counting loss only. With respect to [8], we demonstrated the effectiveness and generalizability of the proposed pipeline on two other fluorescent cell datasets.
Training with a counting loss only allows our method to bypass the need for pre-defined preprocessing steps, paving the way for future research avenues such as the development of trainable morphological layers for adaptive cell size and shape learning, or convolutional layers for learned preprocessing. Moreover, future endeavors could also involve the prediction of local h values for smaller patches and the implementation of contrast enhancement for blurred regions. Furthermore, as proposed in [23], the graph approach based on hierarchical clustering and gradient descent, combined with cell counting loss, also represents a potential avenue for dynamic analysis and cell counting task.
We posit that the synergy between neural networks and trainable morphological layers opens the door to a broader range of applications, such as counting astronomical objects in astronomical surveys. However, a deeper reflection is necessary to understand which are the best ways to initialize, optimize and regularize morphological layers [24, 25].
Data Availability
No datasets were generated or analyzed during the current study.
Change history
01 May 2025
In this article in footnote 4, at the bottom of page 12. Indeed, the full stop at the end of the footnote was included in the link, which makes the latter wrong and it is corrected now.
Notes
This result can also be deduced from the “if” direction of Proposition 6, and a reasoning very similar to the one used to show the “only if” direction of the same proposition.
This algorithm has already been presented in [8].
The dataset is available at https://bit.ly/melanocytesTRP1.
Convolutional layers are implemented as a double convolution using the same kernel size for both convolutions, with a Batch Normalization layer in between.
Tested values: \(\alpha = 1\) and \(\beta \in \{0\} \cup 10^{\{-4,-3,-2,-1,0,1,2\}} \cup 5\times 10^{\{-4,-3,-2\}}\).
References
Kong, Y., Li, H., Ren, Y., Genchev, G.Z., Wang, X., Zhao, H., Xie, Z., Lu, H.: Automated yeast cells segmentation and counting using a parallel U-Net based two-stage framework. OSA Contin. 3(4), 982–992 (2020)
He, S., Minn, K.T., Solnica-Krezel, L., Anastasio, M.A., Li, H.: Deeply-supervised density regression for automatic cell counting in microscopy images. Med. Image Anal. 68, 101892 (2021)
Lazard, T., Blusseau, S., Velasco-Forero, S., Decencière, É., Flouret, V., Cohen, C., Baldeweck, T.: Applying deep learning to melanocyte counting on fluorescent TRP1 labelled images of in vitro skin model. Image Anal. Stereol. 41(3), 171–80 (2022)
Morelli, R., Clissa, L., Amici, R., Cerri, M., Hitrec, T., Luppi, M., Rinaldi, L., Squarcio, F., Zoccoli, A.: Automating cell counting in fluorescent microscopy through deep learning with c-resunet. Sci. Rep. 11(1), 22920 (2021)
Alam, M.M., Islam, M.T.: Machine learning approach of automatic identification and counting of blood cells. Healthc. Technol. Lett. 6(4), 103–108 (2019)
Xia, T., Jiang, R., Fu, Y.Q., Jin, N.: Automated blood cell detection and counting via deep learning for microfluidic point-of-care medical devices. In: IOP Conference Series: Materials Science and Engineering, vol. 646, p. 012048. IOP Publishing (2019)
Shakarami, A., Menhaj, M.B., Mahdavi-Hormat, A., Tarrah, H.: A fast and yet efficient yolov3 for blood cell detection. Biomed. Signal Process. Control 66, 102495 (2021)
Liu, X., Blusseau, S., Velasco-Forero, S.: Counting melanocytes with trainable h-maxima and connected component layers. In: International Conference on Discrete Geometry and Mathematical Morphology, pp. 417–430. Springer, Berlin (2024)
Soille, P.: Morphological Image Analysis: Principles and Applications, vol. 2. Springer, Berlin (1999)
Velasco-Forero, S., Pagès, R., Angulo, J.: Learnable empirical mode decomposition based on mathematical morphology. SIAM J. Imag. Sci. 15(1), 23–44 (2022)
Velasco-Forero, S., Rhim, A., Angulo, J.: Fixed point layers for geodesic morphological operations. In: British Machine Vision Conference, pp. 1–11 (2022)
Najman, L., Talbot, H. (eds.): Mathematical Morphology: from Theory to Applications. John Wiley & Sons, Oxford (2013)
Vincent, L.: Morphological grayscale reconstruction in image analysis: applications and efficient algorithms. IEEE Trans. Image Process. 2(2), 176–201 (1993)
Grimaud, M.: New measure of contrast: the dynamics. In: Image Algebra and Morphological Image Processing III, vol. 1769, pp. 292–305. SPIE (1992)
Ballester, C., Caselles, V., Monasse, P.: The tree of shapes of an image. ESAIM Control Optim. Calcul. Var. 9, 1–18 (2003)
Tarjan, R.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)
Guan, S., Loew, M.: Understanding the ability of deep neural networks to count connected components in images. In: 2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–7. IEEE (2020)
Rall, L.B.: Automatic Differentiation: Techniques and Applications. Springer, Berlin (1981)
Duval, C., Cohen, C., Chagnoleau, C., Flouret, V., Bourreau, E., Bernerd, F.: Key regulatory role of dermal fibroblasts in pigmentation as demonstrated using a reconstructed skin model: impact of photo-aging. PLoS ONE 9(12), 114182 (2014)
Stringer, C., Wang, T., Michaelos, M., Pachitariu, M.: Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18(1), 100–106 (2021)
Tieleman, T., Hinton, G., et al.: Lecture 6.5-RMSPROP: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, Berlin (2015)
Chierchia, G., Perret, B.: Ultrametric fitting by gradient descent. Adv. Neural Inf. Process. Syst. 32 (2019)
Charisopoulos, V., Maragos, P.: Morphological perceptrons: geometry and training algorithms. In: ISMM 2017, pp. 3–15. Springer, Berlin (2017)
Blusseau, S.: Training morphological neural networks with gradient descent: some theoretical insights. In: DGMM Proceedings. Springer, Berlin (2024)
Acknowledgements
This work was granted access to the HPC resources of IDRIS under the allocation 2024-AD011012212R3 made by GENCI. Many thanks to Beatriz Marcotegui for helpful discussions about Proposition 4.
Funding
Open access funding provided by Mines Paris - PSL.
Author information
Authors and Affiliations
Contributions
All authors contributed to the design of the method, the writing and proofreading of the manuscript. X. L. did most of the first version of the experimental work, with the help of S. B. and S. V.-F. S.B. did the new version of the experimental work for the revised manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Proof of Proposition 1
Let us first state and prove the following lemma.
Lemma 8
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\), \(h\ge 0\) and \(M\in \mathcal {M}_h(f)\). Then,
and
Proof of Lemma 8
Since M is a regional maximum of \(\texttt {HMAX} _h(f)\), by definition \(\texttt {HMAX} _h(f)\) is constant over M and, noting \(\lambda {:}{=} \texttt {HMAX} _h(f)(M)\), we have that for any \(x\in \mathcal {N}_M{\setminus } M\), \(\texttt {HMAX} _h(f)(x) < \lambda .\) Let us show that \(f(x) < \lambda \) as well. Recall that by definition of the geodesical reconstruction (6),
But since \(x\in \mathcal {N}_M\), there exists \(y\in \mathcal {N}_x\) such that \(y\in M\) and therefore \(\texttt {HMAX} _h(f)(y) = \lambda \), so \(\delta _\textrm{B}(\texttt {HMAX} _h(f))(x)\ge \lambda \). Hence, if \(f(x)\ge \lambda \), then by (A3) we get \(\texttt {HMAX} _h(f)(x) \ge \lambda \), which is false, as we just saw. So \(f(x) < \lambda \). As we have just shown that \(\delta _\textrm{B}(\texttt {HMAX} _h(f))(x)\ge \lambda \) and \(f(x) < \lambda \), by (A3) we also get that \(\texttt {HMAX} _h(f)(x) = f(x)\), which finishes to prove (A1).
Now, let us prove (A2). Applying (7) to \(\texttt {HMAX} _h(f) = \texttt {REC} (f-h,f)\), we get,
We denote by \(Q(y, \gamma , f, h){:}{=} \big (f(y) - h\big ) \wedge \min \limits _{\gamma } f\) and split Eq. (A4) as
with \(q_1(x, f, h){:}{=} \max \limits _{\begin{array}{c} y\in M, \gamma \in \Gamma _{xy}\\ \gamma \subseteq M \end{array}} Q(y, \gamma , f, h)\) and \(q_2(x, f, h){:}{=} \max \limits _{\begin{array}{c} y\in \Omega , \gamma \in \Gamma _{xy}\\ \gamma \not \subseteq M \end{array}} Q(y, \gamma , f, h)\).
If \(y\in M\), then \(\texttt {HMAX} _h(f)(y) = \lambda \) and \(\texttt {HMAX} _h(f)(y) \ge f(y)-h\) by definition of \(\texttt {REC} (f-h,f)\). Furthermore, for any \(\gamma \in \Gamma _{xy}\) such that \(\gamma \subseteq M\), and any \(z\in \gamma \), we have \(f(z)\ge \texttt {HMAX} _h(f)(z)=\lambda \), so \(\min \limits _{\gamma } f \ge \lambda \). As a consequence, \((f(y) - h) \wedge \min \limits _{\gamma } f = f(y)-h\) and finally \(q_1(x, f, h) = \max \limits _M f -h\).
Now, for any \(y\in \Omega \) and \(\gamma \in \Gamma _{xy}\) such that \(\gamma \not \subseteq M\), there exists \(z\in \mathcal {N}_M\setminus M\) such that \(z\in \gamma \), for which \(\min \limits _{\gamma } f < \lambda \) by (A1), and consequently \(f(y)-h\wedge \min \limits _{\gamma } f < \lambda \). Therefore, \(q_2(x, f, h) < \lambda \) and since \(\texttt {HMAX} _h(f)(x) = \lambda \), we get
which concludes the proof of Lemma 8. \(\square \)
Now, let us prove Proposition 1.
Proof of Proposition 1
“Only if” direction \(\Rightarrow \)
Let us start by the “only if” direction. Let \(h\ge 0\), \(M \in \mathcal {M}_h(f)\), \(\lambda = \texttt {HMAX} _h(f)(M)\) and \(M_0\) be a connected component of \(\arg \max \limits _{M_h} f\).
If \(M_0\) was not a regional maximum of f, then there would exist \(x\in \mathcal {N}_{M_0}{\setminus } M\) such that \(f(x) \ge f(M_0)\), and therefore \(\texttt {HMAX} _h(f)(x) \ge f(x)-h\ge f(M_0)-h = \max \limits _M f -h\). Now, by Lemma 8, \(\texttt {HMAX} _h(f)(M) = f(M_0)-h\) and since \(x\in \mathcal {N}_{M_0}{\setminus } M \subseteq \mathcal {N}_{M}{\setminus } M\), \(\texttt {HMAX} _h(f)(x) < \lambda \). Therefore, \(M_0\in \mathcal {M}(f)\) and \(\lambda = f(M_0)-h\).
Now, let us lower bound the dynamic of \(M_0\) using Eq. (12). For any \(M^\prime \) such that \(f(M^\prime )> f(M_0)\), then \(M^\prime \cap M = \emptyset \) since \(M_0 \subseteq \arg \max \limits _{M} f\). Therefore, any \(\gamma \in \Gamma _{M_0 M^\prime }\) contains a z such that \(z\in \mathcal {N}_{M}\setminus M\), implying \(f(z) < \texttt {HMAX} _h(f)(M) = f(M_0)-h\), by (A1) of Lemma 8. As a consequence, \(\min \limits _{z\in \gamma } f(z) < f(M_0)-h\), and this is true for any \(M^\prime \) such that \(f(M^\prime )> f(M_0)\) and any \(\gamma \in \Gamma _{M_0 M^\prime }\), which implies that
So by Eq. (12), \(\texttt {DYNMAX} (M_0) > h\).
It remains to show that \(M = \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \), where \(\lambda =f(M_0) -h\). We may first remark that
Let us show that \(M \subseteq \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \). We already have that, for any \(x\in M\), \(\texttt {HMAX} _h(f)(x)= \lambda = f(M_0) -h\) and by definition \(\texttt {HMAX} _h(f)(x)\le f(x)\). Since M is connected and \(M_0\subseteq M\), for any \(x\in M\) there is a path \(\gamma \in \Gamma _{x M_0}\) included in M; therefore, \(f(z)\ge \lambda \) for any \(z\in \gamma \). Hence, by (A6), \(M\subseteq \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \). Let us know show that \(\texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \subseteq M\). Let \(x\in \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \). Then, there is a path \(\gamma \in \Gamma _{x M_0}\) such that \(f(z)\ge \lambda \) for any \(z\in \gamma \). If \(x\notin M\), then there is \(z\in \gamma \) such that \(z\in \mathcal {N}_M\setminus M\), and therefore, \(f(z)<\lambda \) by (A1) of Lemma 8, which contradicts \(x\in \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \). So \(x\in M\), and we have the second inclusion, which leads to \(M = \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \) and finishes the proof of the “only if” direction.
“If” direction \(\Leftarrow \)
For the “if” direction, let \(M_0\in \mathcal {M}(f)\) be a regional maximum of f, with dynamic strictly larger than \(h\ge 0\), let \(\lambda = f(M_0)-h\), and let \(M = \texttt {CC}\left( M_0, \chi _{\lambda }(f)\right) \). Then, \(M\ne \Omega \); otherwise, we would have \(f(x) \ge f(M_0)-h\) for any \(x\in \Omega \) and, by Eq. (12), we would get \(\texttt {DYNMAX} (M_0)\le h\), which is a contradiction.
Therefore, we will show that M is a regional maximum of \(\texttt {HMAX} _h(f)\), i.e., \(M\in \mathcal {M}_h(f)\), by proving that \(\texttt {HMAX} _h(f)\) is constant over M, equal to \(\lambda \), and that \(\texttt {HMAX} _h(f)(z) < \lambda \) for any \(z\in \mathcal {N}_M\setminus M\).
Starting with the latter, let \(z\in \mathcal {N}_M\setminus M\). Then, \(f(z) < \lambda \), since otherwise there would be \(x\in \mathcal {N}_z\) such that \(x\in M\), and a path \(\gamma \) from x to \(M_0\) such that \(f(\gamma ) \ge \lambda \), by definition of M; by appending z to \(\gamma \) we would have a path \(\gamma ^\prime \) from z to \(M_0\), such that \(f(\gamma ^\prime ) \ge \lambda \), and therefore, z would be in M, which contradicts \(z\in \mathcal {N}_M\setminus M\). Furthermore, \(\texttt {HMAX} _h(f)(z) \le f(z)\) by definition, so \(\texttt {HMAX} _h(f)(z) < \lambda \).
Now, to show that \(\texttt {HMAX} _h(f)(x) = \lambda \) for all \(x\in M\), we will prove that \(f(M_0) = \max \limits _M f\) and then apply Eq. (7) to show that \(\texttt {HMAX} _h(f)(x) \ge \lambda \) and conclude.
Suppose the set \(\lbrace x \in M, \ f(x) > f(M_0)\rbrace \) is non-empty, and let \(M_0^\prime \) be a connected component of that set. Then, \(M_0^\prime \) is a regional maximum of f; otherwise, we would have a \(z\in \mathcal {N}_M{\setminus } M\) such that \(f(z)\ge f(M_0^\prime ) > f(M_0)\ge \lambda \) and by the same reasoning as before, this would lead to \(z\in M\), which is a contradiction. Therefore, we can use Eq. (12) to show that the dynamic of \(M_0\) is not larger than h. Indeed, since M is connected, there is a path \(\gamma \in \Gamma _{M_0 M_0^\prime }\) included in M, therefore \(\min \limits _{\gamma } f \ge \lambda \). Hence, we have found a regional maximum \(M_0^\prime \) such that \(f(M_0^\prime )> f(M_0)\) and a path \(\gamma \in \Gamma _{M_0 M_0^\prime }\) such that \(\min \limits _{\gamma } f \ge f(M_0)-h\). By Eq. (12), this implies that \(\texttt {DYNMAX} (M_0) \le f(M_0) - (f(M_0)-h) = h\), which contradicts our initial hypothesis. Therefore, \(f(M_0) = \max \limits _M f\), which also means that \(M_0\) is a connected component of \(\arg \max \limits _M f\), since \(M_0\in \mathcal {M}(f)\).
Now, from Equation (7) it appears that \(\texttt {HMAX} _h(f)(x) \ge \lambda = f(M_0)-h\) for \(x\in M\). Indeed, applying (7) to \(\texttt {REC} (f-h, f)\), we get Eq. (A4).
Since M is connected and \(M_0\subseteq M\), for any \(y\in M_0\), there exists \(\gamma \in \Gamma _{xy}\) included in M, for which therefore \(\min \limits _{\gamma } f \ge f(M_0)-h = f(y) - h\), and hence, \((f(y) - h) \wedge \min \limits _{\gamma } f = f(M_0)-h\). Therefore, taking the maximum over \(y\in \Omega \) and over paths in \(\Gamma _{xy}\) yields \(\texttt {HMAX} _h(f)(x) \ge f(M_0)-h\) for \(x\in M\).
To finally show that \(\texttt {HMAX} _h(f)(x) = f(M_0)-h\) for \(x\in M\), we denote by \(Q(y, \gamma , f, h){:}{=} f(y) - h \wedge \min \limits _{\gamma } f\), like in the proof of Lemma 8, and split Eq. (A4) the same way, giving (A5).
If \(x\in M\), for any \(y\in \Omega \) and \(\gamma \in \Gamma _{xy}\) such that \(\gamma \not \subseteq M\), there exists \(z\in \mathcal {N}_M\setminus M\) such that \(z\in \gamma \), for which \(\min \limits _{\gamma } f < f(M_0)-h\), and consequently \((f(y)-h)\wedge \min \limits _{\gamma } f < f(M_0)-h\). Therefore, \(q_2(x, f, h) < f(M_0)-h\) and since \(\texttt {HMAX} _h(f)(x)\ge f(M_0)-h\), we get \(\texttt {HMAX} _h(f)(x) = q_1(x, f, h)\), i.e.,
But for any \(y\in M\), \(f(y)-h\le \max \limits _{M}f -h = f(M_0)-h\), and for any \(\gamma \in \Gamma _{xy}\) such that \(\gamma \subseteq M\), \(\min \limits _{\gamma } f \ge f(M_0)-h\) by definition of M. Therefore, \((f(y) - h) \wedge \min \limits _{\gamma } f = f(y)-h\) and finally
We have shown that for \(x\in M\) \(\texttt {HMAX} _h(f)(x) = f(M_0)-h\) after having shown that \(\texttt {HMAX} _h(f)(z) < f(M_0)-h\) for \(z\in \mathcal {N}_M{\setminus } M\). Since M is connected, \(M\in \mathcal {M}_h(f)\), which concludes the “if” direction of Proposition 1. \(\square \)
Appendix B Proof of Proposition 2
Let \(0\le h_1\le h_2\). Then, if \(M_2 \in \mathcal {M}_{h_2}(f)\), by Proposition 1, there is some regional maximum \(M_0\) of dynamic larger than \(h_2\), such that \(M_2 = \texttt {CC}\left( M_0, \chi _{f(M_0)-h_2}(f)\right) \). But then \(M_0\) is also of dynamic larger than \(h_1\); therefore, \(M_1 = \texttt {CC}\left( M_0, \chi _{f(M_0)-h_1}(f)\right) \) is in \(\mathcal {M}_{h_1}(f)\). Furthermore, \(\chi _{f(M_0)-h_1}(f) \subseteq \chi _{f(M_0)-h_2}(f)\) because \(f(M_0)-h_2 \le f(M_0)-h_1\), and hence, \(\texttt {CC}\left( M_0, \chi _{f(M_0)-h_1}(f)\right) \subseteq \texttt {CC}\left( M_0, \chi _{f(M_0)-h_2}(f)\right) \), i.e., \(M_1 \subseteq M_2\). This shows that \(0\le h_1\le h_2 \Rightarrow \mathcal {M}_{h_2}(f) \preceq \mathcal {M}_{h_1}(f)\).
Appendix C Proof of Proposition 3
We first state the right continuity of \(\mu \) in the following lemma, which will be used next.
Lemma 9
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\) non-constant, \(h_\infty \in [0, \Delta _\Omega f]\) and \((h_n)_{n\in \mathbb {N}}\in [0, \Delta _\Omega f]^{\mathbb {N}}\) a sequence such that \(h_n \downarrow h_\infty \). Then,
Proof of Lemma 9
For \(h\in [0, \Delta _\Omega f]\), let us denote by
the set of regional maxima of f with dynamic larger than h, and for the considered \(h_\infty \), we introduce
Then, \(h_{\min } > h_\infty \); therefore, there exists \(N_1\in \mathbb {N}\), such that \(n\ge N_1 \Rightarrow h_n\in [h_\infty , h_{\min })\). As a consequence, one can check that for any \(n\ge N_1\),
Furthermore, by Proposition 1, we get that for any \(h\in [0, \Delta _\Omega f]\)
Therefore, for any \(n\ge N_1\),
Now, it is well known that \(t\mapsto \chi _t(f)\) is left continuous, i.e., \(\lim \limits _{t\uparrow t_0} \chi _t(f) = \chi _{t_0}(f)\). What is more, here each \(\chi _t(f)\) is a discrete set (as subset of \(\Omega \)), and hence, the limit is actually reached for t sufficiently close to \(t_0\). This implies that there exists \(N_2\in \mathbb {N}\) such that
So finally, by (C11) and (C12), we get that for any \(n \ge \max (N_1, N_2)\), \(\mu (h_n) = \mathcal {M}_{h_n}(f) = \mathcal {M}_{h_{\infty }}(f) = \mu (h_{\infty }) \), which concludes the proof. \(\square \)
Now, we can prove Proposition 3.
Proof of Proposition 3
We shall notice first that since \(\mu \) maps \([0, \Delta _\Omega f]\) to the set of collections of subsets of \(\Omega \), and since \(\Omega \) is a finite set, \(\mu \) can only take a finite set of values over \([0, \Delta _\Omega f]\), including \(\mu (0)=\mathcal {M}(f)\) and \(\mu (\Delta _\Omega f)=\emptyset \). Hence, there exists \(K\in \mathbb {N}\) and \(\mathcal {C}_0, \mathcal {C}_1, \dots , \mathcal {C}_{K+1} \in \mathcal {P}(\mathcal {P}(\Omega ))\) pairwise distinct, such that
For every \(k\in \lbrace 0, \dots , K+1\rbrace \), we note
Now, we can easily establish that the \(\mathcal {I}_k\)s are pairwise disjoint intervals.
Indeed, since \(k\ne k^\prime \Rightarrow \mathcal {C}_k \ne \mathcal {C}_{k^\prime }\), we also have \(k\ne k^\prime \Rightarrow \mathcal {I}_k \cap \mathcal {I}_{k^\prime } = \emptyset \); otherwise, one h value could be mapped to two distinct set of h-maxima.
The fact that each \(\mathcal {I}_k\) is an interval is given by Proposition 2: If \(h \le h^\prime \in \mathcal {I}_k\) and \(h\le h'' \le h^\prime \), then
and therefore \(\mu (h'') = \mathcal {C}_k\), i.e., \(h''\in \mathcal {I}_k\).
Hence, without loss of generality, we can assume the \(\mathcal {I}_k\)s are indexed from left to right, i.e., \(k< k^\prime \iff y < y^\prime \ \forall y\in \mathcal {I}_k, y^\prime \in \mathcal {I}_{k^\prime }\), which implies that
Note that, as leftmost and rightmost intervals of a partition of \([0, \Delta _\Omega f]\), \(\mathcal {I}_0\) must include 0 and \(\mathcal {I}_{K+1}\) must include \(\Delta _\Omega f\). Since both intervals are included in \([0, \Delta _\Omega f]\), we get \(0 = \inf \mathcal {I}_0\) and \(\Delta _\Omega f = \sup \mathcal {I}_{K+1}\).
Next, for every \(k\in \lbrace 0, \dots , K+1\rbrace \) we note
So \(h_0 = 0\), and since the \(\mathcal {I}_k\)s form a partition of \([0, \Delta _\Omega f]\), it is straightforward that for \(k\le K\) we also have \(h_{k+1} = \sup \mathcal {I}_k\), so \(h_{K+1} =\Delta _\Omega f\). Furthermore, by Lemma 9 we have that, for each \(k\in \lbrace 0, \dots , K+1\rbrace \), \(\mu (h_k) = \mathcal {C}_k\), hence \(h_k\in \mathcal {I}_k\) and \(\mathcal {C}_k = \mathcal {M}_{h_k}(f)\). Since the \(\mathcal {I}_k\)s are pairwise disjoint,
necessarily \(h_{k+1}\notin \mathcal {I}_k\) for \(k\le K\), so that for \(k\in \lbrace 0, \dots , K\rbrace \)
This concludes the proof of Proposition 3. \(\square \)
Appendix D Proof of Proposition 4
To prove Proposition 4, it is sufficient to use Eq. (A2) of Lemma 8, and the following lemma.
Lemma 10
Let \(f\in \mathcal {F}(\Omega , \mathbb {R})\) and \(h\ge 0\). Then,
Indeed, Eq. (A2) of Lemma 8 and Eq. (D14) yield
Therefore, if we assume the stability property of Proposition 4 that there exists an interval I with \(0\in I\) such that, for any \(\eta \in I\), \(\mathcal {M}_{h+\eta }(f) = \mathcal {M}_{h}(f)\), then we get, for any \(\eta \in I\) and \(x\in \Omega \),
which is the result stated by Proposition 4.
Hence, let us prove Lemma 10.
Proof of Lemma 10
To alleviate notations, let us first denote \(f_h {:}{=} \texttt {HMAX} _h(f)\). Then, Eq. (D14) can be proved by showing that
Indeed, by Eq. (A1), for any \(x\in \bigcap \limits _{M\in \mathcal {M}_h(f)} M^c\), any \(M\in \mathcal {M}_h(f)\) and any \(\gamma \in \Gamma _{xM}\), we have \(\min _{z\in \gamma } f_h(z) < f_h(M)\). Therefore, assuming (D17) yields that for any \(x\in \bigcap \limits _{M\in \mathcal {M}_h(f)} M^c\), \(f_h(x) = \max \limits _{\begin{array}{c} M\in \mathcal {M}_h(f),\\ \gamma \in \Gamma _{xM} \end{array}}\min _{z\in \gamma } f_h(z)\), which is Eq. (D14).
To show (D17), we use an important property of the reconstruction by dilation: the fact that any function \(g\in \mathcal {F}(\Omega , \mathbb {R})\) is its own reconstruction by dilation from its regional maxima [13], i.e.,
Hence,
Furthermore, since the elements of \(\mathcal {M}_h(f)\) are disjoint subsets of \(\Omega \), one can write
and therefore \(f_h = \bigvee \limits _{M\in \mathcal {M}_h(f)} \texttt {REC} (f_h\times \mathbbm {1}_{M}, f_h)\), i.e.,
Now, by Eq. (7),
with \(f_h(y)\times \mathbbm {1}_{M}(y) = 0 \le \min _{z\in \gamma } f_h(z)\) for any \(y\in \bigcap \limits _{M\in \mathcal {M}_h(f)} M^c\) and \(\gamma \in \Gamma _{xy}\), therefore the terms corresponding to such y do not contribute to the \(\max \) in the equation above. Hence,
Finally, combining (D20) and (D21) gives (D17). This concludes the proof of Lemma 10 and Proposition 4. \(\square \)
Appendix E Proof of Corollary 5
“If” direction \(\Leftarrow \)
Let \(h^*\in [0,\Delta _\Omega f]\). If \(h^*\notin \mathcal {H}_f\), then \(h^*\in (h_k, h_{k+1})\) for some \(k\in \lbrace 0, \dots , K\rbrace \). Therefore, there exists an open interval I containing 0 such that for any \(\eta \in I\), \(\mathcal {M}_{h^*+\eta }(f) = \mathcal {M}_{h^*}(f) = \mathcal {M}_{h_k}(f)\). Then, Proposition 4 applies and Eq. (29) holds for all \(\eta \in I\), which implies that \(h\mapsto \texttt {HMAX} _h(f)\) is differentiable at \(h^*\) and its derivative is \(-\texttt {EMAX} _{h^*}(f)\).
“Only if” direction \(\Rightarrow \)
Now, let us show that if \(h^*\in \mathcal {H}_f\), then \(h\mapsto \texttt {HMAX} _h(f)\) is not Fréchet differentiable at \(h^*\).
This is already clear for \(h^* = h_0 = 0\) and \(h^* = h_{K+1} = \Delta _\Omega f\), since by definition a function can only be Fréchet differentiable at points lying in the interior of its domain.
To prove it for the other elements of \(\mathcal {H}_f\), we will first point out that
Indeed, for \(0\le k\le K\) and any \(\eta \in [0, h_{k+1}-h_k)\), we have \(\mathcal {M}_{h_k+\eta }(f) = \mathcal {M}_{h_k}(f)\); therefore, by Proposition 4, Eq. (29) holds for any \(\eta \in [0, h_{k+1}-h_k)\). But since \(h\mapsto \texttt {HMAX} _h(f)\) is continuous (by continuity of the dilation and pointwise minimum), it also holds for \(\eta = h_{k+1}-h_k\), yielding (E22).
Let us now assume \(h^* = h_{k}\) for some \(k\in \lbrace 1, \dots , K\rbrace \).
Then, on the one hand, (E22) holds, i.e., for nonnegative \(\eta \). Therefore, if \(h\mapsto \texttt {HMAX} _h(f)\) was differentiable at \(h_k\), the derivative at that point would be \(-\texttt {EMAX} _{h_k}(f)\).
On the other hand, for any \(\eta \in [h_{k-1}-h_k, 0)\), we have \(\eta ^\prime {:}{=} h_k - h_{k-1}+\eta \in [0, h_k-h_{k-1})\), so that
But (E22) applied to \(k-1\) and \(\eta =h_k - h_{k-1}\) yields
and therefore we finally get, for \(\eta \in [h_{k-1}-h_k, 0)\),
Since \(\texttt {EMAX} _{h_{k}}(f)-\texttt {EMAX} _{h_{k-1}}\) does not depend on \(\eta \) and is not null (as \(\mathcal {M}_{h_k} \ne \mathcal {M}_{h_{k-1}}\)), \(\eta \big (\texttt {EMAX} _{h_{k}}(f)-\texttt {EMAX} _{h_{k-1}}(f)\big )\) is not neglectable with respect to \(\eta \), and therefore, \(- \texttt {EMAX} _{h_{k}}(f)\) cannot be the derivative of \(h\mapsto \texttt {HMAX} _h(f)\) at \(h_k\), which shows that the latter function is not Fréchet differentiable at \(h_k\).
Appendix F Proof of Proposition 6
“If direction” \(\Leftarrow \)
Assume \(x\in M_0\) for a set \(M_0\in \mathcal {M}(f)\) such that \(\texttt {DYNMAX} (M_0)\ge h > 0\). Then, for any \(\epsilon > 0\), \(\texttt {DYNMAX} (M_0) > h-\epsilon \) and by Proposition 1 we get
Since \(h\mapsto \texttt {HMAX} _h(f)\) is continuous, by letting \(\epsilon \) go to zero, it comes
“Only if direction” \(\Rightarrow \)
Let us assume \(\texttt {HMAX} _h(f)(x)=f(x)-h\) for some \(x\in \Omega \), and denote
the connected component of x in the upper level set \(\chi _{f(x)}(f)\). We will show that \(M_0\in \mathcal {M}(f)\).
Let us first prove that f is constant over \(M_0\), and \(f(M_0)=f(x)\). We already have that \(f(y) \ge f(x)\) for any \(y\in M_0\) by definition of \(\chi _{f(x)}(f)\), and it remains to show that \(f(y) \le f(x)\). Suppose there is \(y_0\in M_0\) such that \(f(y_0) > f(x)\). Then, there exists a path \(\gamma \in \Gamma _{x y_0}\) such that \(\gamma \subseteq M_0\) because \(M_0\) is connected. Hence, \(\min \limits _{z\in \gamma } f(z) = f(x)\) because \(x\in \gamma \) and \(f(z)\ge f(x)\) for each \(z\in \gamma \). Now, either \(f(y_0)-h > f(x)\) or \(f(y_0)-h \le f(x)\).
Case 1, \(f(y_0)-h > f(x)\): In that case
Since by Eq. (7),
we get
which is a contradiction of the initial assumption.
Case 2, \(f(y_0)-h \le f(x)\): By the same reasoning, in that case
so that
which is again a contradiction.
We have just shown that \(f(y) \le f(x)\), and therefore, \(f(y) = f(x)\), for every \(y\in M_0\).
Next, notice that \(M_0\ne \Omega \); otherwise, f would be constant. Furthermore, for any \(y\in \mathcal {N}_{M_0}{\setminus } M_0\), \(f(y) < f(x)\) by definition. Therefore, \(M_0\) is a regional maximum of f.
Finally, let us show that \(\texttt {DYNMAX} (M_0)\ge h\).
If \(M_0\) is a global maximum, then \(\texttt {DYNMAX} (M_0) = \Delta _\Omega f\ge h\).
If not, by Eq. (12),
where the last equality is due to \(f(M_0) = f(x)\), which implies that for any path \(\gamma \in \Gamma _{M_0 M^\prime }\) there is a path \(\gamma ^\prime \in \Gamma _{x M^\prime }\) such that \(\min \limits _{z\in \gamma }f(z) = \min \limits _{z\in \gamma ^\prime }f(z)\).
Therefore, if \(\texttt {DYNMAX} (M_0) < h\), there exists \(M^\prime \in \mathcal {M}(f)\) such that \(f(M^\prime )>f(M_0)=f(x)\), \(y^\prime \in M^\prime \) and \(\gamma \in \Gamma _{x y^\prime }\) such that \(\min \limits _{z\in \gamma }f(z)> f(M_0)-h=f(x)-h\). Furthermore, \(f(y^\prime )-h = f(M^\prime )-h > f(M_0)-h = f(x)-h\) too, so eventually
Again, by Eq. (7), we get
which is a contradiction.
Therefore, \(\texttt {DYNMAX} (M_0)\ge h\), which concludes the proof of Proposition 6.
Appendix G Unet Baseline Details
1.1 Unet Architecture
The Unet architecture employed in Sect. 5 is a standard Unet [22] modified in accordance with [3]. However, it employs a depth of four instead of three, as in [3], for better performance. More precisely, the initial convolutional filter count is 32, augmented to 64, 128, 256, comprising 512 filters for the bottleneck convolutional layer.
1.2 Unet Training Protocol
In order to conduct Unet training, we followed the methodology proposed by [3] by applying an isotropic Gaussian filter with \(\sigma =6\) to the normalized point ground truth cell locations map. Subsequently, the resulting Gaussian map was multiplied by 100, facilitating the training of the model and serving as the supervised training label. The final cell counting number can be obtained by summing the aforementioned Gaussian map, divided by 100. The loss function employed for Unet training is the L1 loss. The remaining training protocol settings, data augmentation settings and optimization settings are selected in accordance with the descriptions provided in Sect. 4.3 (Subset of Cellpose dataset), Sect. 4.3.1 and Sect. 4.3.2, respectively.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Blusseau, S., Liu, X. & Velasco-Forero, S. Cell Counting with Trainable h-Maxima and Connected Component Layers. J Math Imaging Vis 67, 27 (2025). https://doi.org/10.1007/s10851-025-01243-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s10851-025-01243-z
















