Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Existing active learning (AL)-based 3D medical image segmentation methods often select images, slices, or patches as isolated entities, overlooking inter-slice spatial relationships in 3D images. Additionally, AL methods train the segmentation model on labeled data only and ignore valuable unlabeled data. Both factors limit its ability to further reduce labeled data needs. To address these problems, we propose a novel semi-supervised AL approach termed SpaTial AggRegation (STAR), which enables the model to learn from unlabeled data beyond annotated samples by leveraging spatial correlations between slices, reducing labeling costs. In each AL iteration, STAR employs a spatial cross-attention mechanism to transfer relevant knowledge from adjacent labeled slices to unlabeled ones by generating pseudo-labels. These pseudo-labeled slices and queried slices are used to train the segmentation model. The experimental results indicate that STAR outperforms other state-of-the-art AL methods, achieving fully supervised 3D segmentation performance with as little as 18\%-19\% of the labeled data. The code is available at https://anonymous.4open.science/r/STAR-24BE.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1281_paper.pdf

SharedIt Link: https://rdcu.be/eHwZb

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_48

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HelenMa9998/STAR

Link to the Dataset(s)

BraTS 2019 Dataset: https://www.med.upenn.edu/cbica/brats2019/data.html Medical Segmentation Decathlon Dataset: http://medicaldecathlon.com/

BibTex

@InProceedings{MaSit_Spatial_MICCAI2025,
        author = { Ma, Siteng AND Du, Honghui AND Liu, Dairui AND Curran, Kathleen M. AND Lawlor, Aonghus AND Dong, Ruihai},
        title = { { Spatial Aggregation for Semi-supervised Active Learning in 3D Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {497 -- 507}
}

Reviews

Review #1

Please describe the contribution of the paper

This manuscript proposed a semi-supervised active learning method, which combines semi-supervised and active learning, for medical image segmentation. The authors experimentally evaluated the proposed method by using two publicly available datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper investigates the research question “Can spatial correlations reduce labeling efforts with SSAL while maintaining performance? If so, how?” in the context of active learning.
- The authors semi-supervised active learning, which can reduce cumbersome annotation labeling labour.
- Experimental evaluation of two publicly available datasets.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The survey lacks important related works about sparse annotation approaches, which are strongly related to this submission since these approaches focus on strong spatial correlations among consecutive sequences. The main research question in this manuscript has been investigated in previous works. The real technical novelty is unclear.
- Unclear definitions and superficial explanations of semi-supervised learning and active learning in this manuscript. Generally, semi-supervised learning selects pseudo labels with high reliability. However, this manuscript does not mention this point and what is semi-supervised learning in the proposed method is unclearly described. Furthermore, active learning generally requires manual annotation for only important samples in its iterative procedures. However, the manuscript does not present it. Therefore, this manuscript is not self-contained, requiring prior knowledge about semi-supervised and active learning. This writing style with many undefined and unexplained jargons hinders the manuscript’s readability.
- Missing precise mathematical notations hinders the readability and repeatability of this manuscript. Generally, bold italic lowercase and uppercase letters are used for representing a vector and matrix. However, the authors use lowercase letters for an image even though a single-channel 2D image is a matrix. The authors also use an uppercase letter for a vector. These are not the standard notation and look strange. Furthermore, in section 2, they wrote “f_Enc^U and f_Enc^L, to transform unlabeled and labeled slices into embedding vectors”, but the output of them looks matrices in Fig. 1. Why key and value vectors are described as matrices? Moreover, the authors write “Both share the same architecture (e.g. …) but have different parameters”. However, input of f_Enc^U is a matrix but input of f_Enc^L is a pair of an image and its label. These shows that two encoders have different architectures. In addition to these strange descriptions, output of these two encoders are key and value. Usual ReseNet-50 output only a set of feature maps. What kinds of architecture are used to output key and value is undescribed. In Eq. (2), dot operation symbol is undefined in this manuscript.　In Eq. (4), cos() is unclear and undefined notation. Even in making groups \mathcal{X}_g, definition looks bumpy. In the current description, the proposed method select x_0… x_j in Fig. 1 but the slices are defined as x_1 to x_d. The index origin is also given by inconsistent manner.
- The authors wrote “The key helps identify similar slices within … while value stores pathological structures for …”. However, there is no explanation of their mechanism. Citation is also missing for this claim. So, the mechanism of the proposed spatial aggregation is also unsatisfactory described. Even in experimental results, there is no example of extracted pathological structures.
- Hard to grasp the entire procedure of the proposed method due to low readability. As a result, the manuscript failed to present its repeatability.
- Descriptions of experimental setting also include unclear points.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Considering the strengths and weaknesses described above, I think this submission is still a work-in-progress. This work is unready for MICCAI presentation.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Thank you for the authors’ responses. However, these are not clear answer to my comments. The manuscript missing self-content scientific writing. Even though the authors said that their notations base on a some handbook, this claim do not show the handbook offers good mathematical notations. In CV, there are famous textbooks with the standard mathematical-notation usage, but the suggested handbook is not the one of them. Their explanation about the mechanism of the proposed method is still unconvincing.

Review #2

Please describe the contribution of the paper

This paper presents a method to leverage spatial coherence in 3D images to improve semi-supervised active learning (SSAL). The method relies on a 2D architecture where 3D volumes are split into sets of 2D slices passed through the SSAL architecture. They leverage the fact that nearby slices likely contain very similar information and can therefore be used in combination to better train the model. They do so with 2 primary components. First, “spatial aggregation” essentially computes cross attention between unlabeled and nearby labeled data to enrich the input to the decoder. Second, they select unlabeled slices to be labeled by the oracle based on their dissimilarity with labeled data.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

I am enthusiastic about the general idea of leveraging spatial correlation to optimize the AL strategy and enhance the quality of pseudo-labels. The two methods proposed are sensible and the results are quite impressive, especially at low data regime. The ablation study shows both components to be useful, and to be complementary of each other.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

My main issue with the paper is that many of the design choices are not very well justified. For example, in the spatial aggregation section, the author claim that concatenating the unlabeled slice’s value with the sum of labeled slices’ values weighted by attention scores preserves the unlabeled slice’s intrinsic features while refining representations. I think these types of statement need a little bit more justification. Similarly, why use the cosine similarity of the embedding. Why this particular similarity metric and why the embedding space? Finally, there seems to be a missed opportunity with the use of entropy. It is currently only used to select the pseudo-labeled slices with high confidence, but it could also be used to enhance the dissimilarity metric. Finally not enough details are provided in terms of thresholding the entropy. What is considered high confidence? How is that threshold determined? Similarly, the choice of slice sampling strategy is not very well justified. Finally, the retraining cost seems expensive, a comment on computation time and amount of retraining would be welcome.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, I found this contribution quite interesting and I am enthusiastic about the general idea of leveraging spatial correlation between nearby 2D slices to enhance the segmentation of 3D structure while still using a 2D architecture. The general design choices are sensible to me, but could be better justified as many of the components are introduced without a strong motivation for a particular design decision.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

My main concern with this article was the lack of justification for the various design choices. The authors have adequately (albeit not perfectly) addressed these comments with some additional details about the method and their motivation for making some of the design choices.

Review #3

Please describe the contribution of the paper

Annotating images is a critical task in medical image analysis that requires substantial effort and cost. Approaches that can perform well with less labeled data are therefore significant to the medical image analysis community. To address this challenge, this paper proposes a semi-supervised active learning (AL) (SSAL) approach called SpaTial AggRegation (STAR), which demonstrates good segmentation performance on two public datasets using a relatively small fraction of labeled data.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method appears to be technically innovative, and builds upon the standard attention mechanism to fit the specific task.
- Two publicly available datasets (BraTS MRI and Spleen CT) are used for validation, which enhances reproducibility, and the full segmentations are used to simulate unlabeled data in slices (which means full ground-truth is known).
- Comparison to ten benchmark methods from a diverse set of strategies and other SSAL approaches is rigorous, and experiments are repeated over multiple runs to more accurately validate performance.
- Quantitative and qualitative results demonstrate good performance of the proposed STAR method, using only 19.4% and 18.2% of labeled data to achieve the performance levels of fully annotation supervised learning approaches.
- Ablation studies are provided to justify model design decisions.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The selection of unlabeled slices happens in a uniform/structured manner that may not accurately reflect real-world practice. It is unclear how the method would perform from a more random selection of slices.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Sec. 2: From the description, it appears that the selection of unlabeled slices happens in a uniform/structured manner, e.g. every N slice is labeled. However, this strategy may not accurately reflect real-world practice where raters jump around to label slices at random depending upon what they see. It is unclear how the method would perform from a more random selection of slices. It might be worth considering a study in the future that randomly selects labeled slices according to a certain percentage of labeled data instead of skipping N slices.

Sec. 3.1: To be clear, the training, validation, and test splits of the data were split subject-wise? I just want to verify that no subject-level data contamination occurred between the subsets. You might consider adding a short phrase that “data was split at the subject/patient level”.

Table 1: This is a minor point of readability. Table 1 is very dense. I was wondering if this might be more easily viewed as a plot? Maybe you already tried this, and a graphical plot visualization is not effective due to the relatively small increases in values, but it might be worth considering as an alternative way to present the quantitative results. This is not necessary, but something to consider.

For future work, I would be interested to know if you think that this approach could be applied to 3D patches instead of 2D slice-level annotations, e.g. have labeled data in some 3D ROIs but not others?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a well written paper that address an important problem in medical image analysis: handling data with limited annotations. The paper provides an innovative technical solution that is backed by a rigorous set of experiments that demonstrate qualitative and quantitative improvements compared to other competing methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

This paper and the presented method address an important topic in medical image analysis - learning with limited labeled data samples. Overall, given the conference paper page length limitations, the paper presents the material in a clear manner and performs a strong set of experiments to validate the approach, all of which forms the basis for a good conference paper.

Author Feedback

Reviewer 1: Thanks for the detailed comments.

Suggest a lack of related work on sparse annotation. We deliberately exclude sparse annotation methods as they fundamentally differ from ours in both setting and methodology: they use limited spatial context from fixed labels (e.g., 3D networks, registration, etc.), while AL actively selects informative samples and dynamically exploits correlations via attention.

Unclear explanation of SSL and AL. R: We explain SSL in the 2nd paragraph of the introduction, where we point out that most SSL methods rely on the trained model to assess pseudo-labels, and AL iteratively requires annotation of the most informative samples in the 1st paragraph.

Missing precise notations. R: We follow the Handbook of CV Algorithms (CRC Press, 2000) and common practice, using bold lowercase characters to represent images.

The output looks like a matrix. Two encoder architectures. What kinds of architecture are used to output key and value? R: We note an error here: they are matrices, and we will update the text. Both encoders use a ResNet-50 backbone. Enc_U takes a single-channel image; Enc_L processes the image and label with parallel conv layers, then sums them. Key-value pairs are generated via an additional conv layer. We will add clarification in the revision.

Dot operation and cos( ) are undefined. R: Dot operation is the well-known dot product of matrices. cos( ) denotes the cosine similarity. We explain this above Eq. 4. We will ensure consistency between Fig. 1 and the text.

No clear mechanism explanation. R: The mechanism follows a standard attention approach specifically adapted to our scenario: keys represent the semantic identity of each slice to measure similarity among slices, while values carry detailed features. Through key-query interactions, the model aggregates relevant values to enhance the current slice’s representation. This supports richer feature learning, especially when pathology spans multiple slices. We will add a citation and brief clarification in the revision.

Low repeatability. R: We’re happy to improve the description of the experimental settings. We have publicly released all code and datasets, which is a high standard of openness and repeatability.

Reviewer 2: Thanks for the insightful comments and for recognizing the novelty of our work. Yes, the data were split subject-wise. Uniform sampling and future work are interesting directions and will be discussed in the conclusion.

Reviewer 4: Thanks for the insightful comments. For justification of design choices:

Why feature concatenation in spatial aggregation? R: Standard attention uses only the weighted sum of labeled slices’ values, potentially missing unique features of the unlabeled slice. We address this by concatenating the unlabeled slice’s value (preserving unique features) with features from labeled slices for a complete representation. We will further clarify this in the revision.

Why use the cosine similarity of embedding? R: Original images are high-dimensional, making similarity comparison costly, noise-sensitive, and semantically weak. Comparing in a lower-dimensional embedding space captures more abstract and meaningful features. Cosine similarity is ideal in this space as it measures vector angles, is robust to brightness and contrast changes, and remains efficient, which makes it widely used for comparing image embeddings.

The choice of the entropy threshold and dissimilarity metrics. R: We rank pseudo-labeled slices by entropy and select the top N (50 and 100 in this paper), naturally aligning with our AL selection strategy. Entropy is not a standard way to evaluate dissimilarity, but it could be interesting to combine it with dissimilarity metrics, which we will discuss in the revision.

No analysis of computational time. R: We incrementally fine-tune the model rather than retraining from scratch at each AL round. Training time will be reported in the version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The reviewers pointed out that the semi-supervised active learning method is interesting, but they have several concerns. The authors are invited to give a rebuttal for them: 1) lack of important related works; 2)unclear explanation of semi-supervised learning and active learning; 3) low readability and repeatability; 4) “extracted pathological structures” is not clearly described; 5) uniform/structured sampling of unlabeled slices may not reflect real world practice; 6) most of the designs are not justified; 7) there is no analysis of computational time
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

I agree with the second and third reviewers that this paper is acceptable; however, I would like to point out that it has missed some recent works published in TMI2024 or MedIA2024, and did not compare or discuss them. Please update these details in the next version.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Spatial Aggregation for Semi-supervised Active Learning in 3D Medical Image Segmentation

Author(s):