Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Medical image segmentation is critical for accurate diagnosis; however, the task remains challenging due to the inherent ambiguities in low-contrast anatomical boundaries and the presence of extensive redundant features in the skip connections of segmentation models. To address these limitations, we propose ReSeg-UNet, a novel two-stage framework that synergizes image reconstruction with segmentation optimization. In the first stage, a composite reconstruction loss—combining Mean Squared Error (MSE) and L1 regularization—is applied to a standard segmentation network, generating stable reconstruction weights that encode multi-scale feature representations. These weights explicitly capture both global anatomical context and local boundary details. In the second stage, a three-level cross-feature alignment mechanism is introduced: the encoder of the reconstruction model is aligned with the decoder of the segmentation model, the decoder of the former is aligned with the encoder of the latter, and the intermediate features of both models are also aligned. This strategy ensures multi-level feature consistency during downsampling, intermediate layers, and upsampling, effectively mitigating information loss in blurred regions. Extensive experiments on the Synapse (abdominal CT) and ACDC (cardiac MRI) datasets demonstrate significant improvements.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3531_paper.pdf

SharedIt Link: https://rdcu.be/eHwRJ

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_52

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Li-gzhu/ReSeg-UNet.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiLin_ReSegUNet_MICCAI2025,
        author = { Li, Lin AND Tang, Dong AND Chu, Xiaowen AND Yang, Xiaofei AND Yu, Fei},
        title = { { ReSeg-UNet: A Reconstruction-Guided Optimization Framework for Enhanced Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {544 -- 554}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors propose ReSeg-UNet, a reconstruction-guided optimization network for enhanced medical image segmentation. This model can exploit UNet-like architectures with the addition of a first stage that aims to reconstruct the ground truth image and to extract relevant features from it. These features are then used to enhance the segmentation optimization process of the second stage combining the original segmentation loss function with an alignment loss. This last loss is obtained by computing the distance between the features extracted from the two stages at different levels. Authors used two different datasets consisting of 30 CT abdominal scans and 100 cardiac MRIs for training and evaluation. They evaluated this new approach by applying it to the most common UNet-shaped models. They found out that the average Dice score for all classes improved on both datasets compared to corresponding baselines.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Extendability: The method can be easily applied to all U-shaped networks since it uses the same baseline architecture for both the stages. The only modifications involve changing the network input between the two stages and the optimization process.
- Limited additional computational overhead: even if the model must be trained twice, once for each stage, these trainings can happen independently from each other since the reconstruction weights, i.e. the weights from the first stage, are kept frozen during the second stage training. Hence, in this stage the additional computational complexity is only limited to the computation of the feature alignment loss.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Small datasets: the datasets used for training and evaluation are quite small, especially the first one, that uses only 18 abdominal scans for training and 12 for testing. Indeed, some baseline models might not be suitable for training on such a limited number of examples. Moreover, evaluating their performance on a few scans could yield inconclusive results.
- 2D approach: this method only works with 2D images, while medical images are often 3-dimensional, leading to a limited practical application.
- Wrong results presentation: authors only visually displayed one image from each dataset, providing a limited perspective of the whole dataset diversity. Moreover, in Figure 2, the zoomed segmentation masks in the small boxes for their models are different from the ones contained in the underlying full image, this incidentally produce displaying better segmentation masks with the effect to unduly promote the obtained results.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Lack of clarity: in the first stage, authors say that they fed the ground truth into the network to allow its reconstruction. It would be helpful to clarify better what are the input and output of this first stage. From the scheme in Fig.1, it appears that the input is the overlap between the image and the ground truth segmentation, and the output is only the reconstruction of the image without any segmentation. Eventually, it would also be helpful to clarify how the actual overlap occurs, if it does. Limited experiment settings: even if authors reported that they maintained the training settings mostly identical to the baseline models, it would help to have more information about these settings to be sure about their actual consistency and to ease reproducibility. Moreover, it is not clear if they maintained the same configuration for both the datasets, since they are very different from each other and the training configuration should be adapted to the specific needs.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The feature alignment idea is interesting, since it is a relatively simple approach to enhance the segmentation performances of U-shaped networks. However, the datasets used for training and evaluation are not very large and this could lead to non-significant results. Moreover, many steps are not clear or wrong, such as the results presentation in which they displayed different segmentation masks supporting their models or in Figure 1 and supporting text in which is not clear what are the inputs and outputs of the network. Furthermore, the segmentation results, especially on the first dataset, are not very satisfactory and the biggest gains come mainly from bad baseline results (e.g. the pancreas class). All these aspects make the concrete evaluation of the quality of the proposed approach very complicated.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The proposed idea, while moderately original, would have needed definitely more convincing validation. The data sets remain limited, since the reasoning should be organ-based and not slice-based. I think that, despite residual concerns on my part, the authors deserve the opportunity to present their potential novelty at the conference venue.

Review #2

Please describe the contribution of the paper

The paper introduces ReSeg-UNet, a two-stage framework that improves medical image segmentation by guiding it with reconstruction-based features. In the first stage, a composite reconstruction loss (MSE + L1) is used to generate multi-scale feature weights. In the second stage, these features are reused through a three-level cross-feature alignment mechanism that aligns encoder-decoder representations between the reconstruction and segmentation tasks. The method improves segmentation performance, especially for low-contrast or small structures, and can be integrated into standard U-Net style architectures with little computational overhead.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The use of a separate reconstruction stage to inform the segmentation task is a well-motivated design that helps the network capture useful anatomical features, especially in low-contrast regions.
- The paper introduces a structured approach to aligning features between the reconstruction and segmentation branches at multiple levels, which contributes to improved feature reuse and overall performance.
- The method shows consistent performance gains, particularly on structures that are typically difficult to segment, such as the pancreas and right ventricle.
- The ablation experiments are well-conducted and clearly demonstrate the impact of each component, including the alignment mechanism and the balancing factor for the alignment loss.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

There are no major weaknesses in the paper. The method is technically sound, the experimental results are convincing, and the overall design is well-motivated. Minor issues are noted in the comments section and can be addressed with minimal revision.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The paper claims “significant performance improvements,” but no statistical significance testing is provided. In this context, using the term “significant” may be misleading, as it typically implies statistical validation. It would be more appropriate to either include such an analysis or rephrase the claim.
- The paper claims that the proposed method introduces minimal additional cost and integrates easily into existing U-shaped architectures. However, this is not supported by any quantitative evidence, such as parameter count, training time, or memory usage. Since this is stated as one of the main contributions, providing supporting data would strengthen the claim.
- The paper mentions that the Synapse dataset includes 30 scans, split into 18 for training and 12 for testing, and also refers to a total of 3779 images. It’s unclear how this number relates to the 30 scans. Clarifying what the 3779 images represent would improve the clarity of the dataset description.
- The paper performs ablation for the alignment loss weight (λ), but the coefficients α = 0.6 and β = 0.4 used in the reconstruction loss are fixed without justification or sensitivity analysis. A brief explanation or validation of these values would strengthen the formulation.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is well-designed and shows solid improvements, but some claims, particularly around efficiency, are not backed by quantitative evidence, and a few design choices are not well justified. Clarifications in the rebuttal would strengthen the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The authors took inspiration from NLP to propose an addition to U-shaped segmentation networks. By first training a model to reconstruct the ground truth in an identical network, and afterwards leveraging those weights to guide the segmentation network, they expect the model to learn better localized features. They demonstrated this using four different network architectures on two different segmentation datasets. They further conducted several ablation studies to analyze their proposed method.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors have presented a well-written, well-structured paper, concisely explaining their proposed methods. They showed their work using four different U-shaped networks, on two different segmentation datasets. Though they drew inspiration from NLP, to the best of my knowledge this method has not been applied to the computer vision domain. For which it seems to work well.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Novelty seems limited.

Unclear why the features are aligned at 2 (recon-E -> seg-D and recon-D -> seg-E) times 3 levels; e.g., why not 2 or 4 levels?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

It is not immediately clear from Fig 1 that the ground truth being used as input is also the desired output (though it is what the name suggests). I would propose changing the “Recon. Pred”-image to be replace by the same ground truth input image for clarity. Are the skip connections of Fig 1b pointing the wrong way? (Decoder to Encoder, rather than Encoder to Decoder) How were the alpha and beta scalars decided on for equation 3? It is only mentioned that they are a=0.6 and b=0.4 In equation 7, it is unclear why the loss is multiplied by 1/3. Would 1/7 not make more sense? (2*3+1 losses are summed) How was the scalar lambda (equation 8) decided on? The ablation study shows it’s performance on the test set, but I assume this was not how it was obtained? Further, 0.035 seems rather low, is the loss naturally very large, or? Why was the HD95 metrics not reported on the ACDC dataset? Table 1 does not describe what metrics are shown for the classes (e.g., it says “Aorta” but not DSC). Table 2 does not define delta In Fig 4 the fonts are quite small and therefor hard to read (on paper), particularly for 4a. In Fig 4a, L_en+L_de is shown twice, likely one of them should be L_de+L_in.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Though the novelty is limited, the paper is well-written and a lot of experiments were conducted to show that this method works. Given how easy it is to understand and implement, I would think the audience would enjoy the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I feel like the authors have adequately addressed the brought up concerns.

Author Feedback

We sincerely thank the reviewers (R1, R2, and R3) for their constructive comments and for recognizing the value of our reconstruction-guided medical image segmentation framework. We commit to open-sourcing our method upon paper acceptance.

Reviewer #1:

Dataset scale and applicability: We clarify a common misunderstanding: the Synapse and ACDC datasets are inherently 3D medical imaging datasets (CT/MRI). Specifically, Synapse contains 30 annotated 3D CT volumes, which are sliced into 3,779 2D axial slices, serving as the standard training units for mainstream methods. We adopt the standard split of 18 training and 12 testing cases, which has been validated for scientific rigor and comparability in multiple studies. These 3,779 slices effectively support the model’s learning of organ-specific features. Our dataset scale and evaluation settings are consistent with existing baselines. Moreover, our method is optimized for 2D models but is not limited to 2D images alone.

Input of the first stage: We confirm that Stage-I receives only segmentation label maps as input (not fused image-label pairs). This design enables the model to learn structural semantics directly from label information, providing structural priors for the segmentation stage.

Visualization issue: We acknowledge an error in Figure 2 due to mismatched cropping during batch processing. We will remake this figure to ensure accurate and clear presentation in the final version.

Reviewer #2:

Use of the term “significant”: We accept this suggestion and will revise the wording in the final version to avoid ambiguity.

Quantitative overhead evaluation: Our optimization introduces only minimal computational cost, as it adds loss computation between feature maps in the segmentation stage. Nonetheless, we acknowledge the importance of quantitative resource evaluation and will incorporate more systematic analysis in future work.

As addressed to Reviewer 1, the Synapse dataset includes 30 annotated 3D CT scans, from which the 3,779 2D slices are extracted. 4.On the coefficients α = 0.6 and β = 0.4 in Eq. (3): These were determined through extensive parameter tuning. Empirically, the model achieved optimal performance with α = 0.6 (MSE loss) and β = 0.4 (L1 loss), indicating that slightly emphasizing MSE improves segmentation results.

Reviewer #3:

Feature alignment strategy: We align features at three encoder-decoder levels, capturing progressively abstract representations from textures to semantics. This enables structural information from reconstruction to guide segmentation across semantic depths. Experiments show that two-level alignment limits structure propagation, while four-level alignment adds complexity with little benefit—confirming three-level alignment as optimal.

For the question on α and β in Eq. (3), please refer to our detailed fourth response to Reviewer 2.

Why 1/3 in Eq. (7), not 1/7: The coefficient 1/3 reflects averaging over the encoder, bottleneck, and decoder components in all four target models. We tested 1/7 but found performance was better when using 1/3.

On λ = 0.035 in Eq. (8): This value is not preset but learned during training. λ is a learnable parameter, and in all experiments it stabilized around 0.035, indicating the model’s ability to adaptively balance the feature and decoder alignment losses. Ablation studies further confirm that λ = 0.035 yields the best segmentation performance.

Why HD95 is not reported for ACDC: The ACDC dataset is typically evaluated using the Dice metric, and most related works do not report HD95. To ensure consistency and comparability, we followed this convention. However, we acknowledge the value of HD95 and will consider including it in future versions.

All per-organ metrics in Table 1 correspond to Dice Similarity Coefficient (DSC).

We sincerely appreciate the reviewers’ attention to minor errors, and we will correct them in the final version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper introduces ReSeg-UNet, a reconstruction-guided optimization framework that enhances U-shaped segmentation networks through feature alignment between reconstruction and segmentation branches. All reviewers acknowledged the practical value and empirical effectiveness of the approach.
- Reviewer #1 appreciated the method’s generalizability and low computational overhead, noting that it is easily integrable with existing U-Net architectures. Although initially critical of small dataset size, 2D-only design, and presentation clarity (e.g., Figure 2 inconsistencies), the reviewer ultimately found the proposed idea sufficiently novel and worthy of presentation.
- Reviewer #2 praised the architecture’s design and performance, particularly on challenging structures like the pancreas and right ventricle. The reviewer valued the thorough ablation studies and justified feature reuse strategy, though they suggested the authors substantiate claims about efficiency and provide more detail on fixed hyperparameters and dataset statistics.
- Reviewer #3 found the concept clear and the paper well-written, with strong empirical support across multiple networks and datasets. While noting limited novelty and requesting clarification on alignment level choices and figure annotations, the reviewer was satisfied with the rebuttal and supported acceptance.
Summary: Despite some concerns around novelty and dataset scale, the method is well-motivated, clearly presented, and demonstrates consistent improvements. The paper provides practical insights and tools for the segmentation community, justifying acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This work shows an interesting method that uses reconstruction-relevant features to guide medical image segmentation tasks. The authors addressed the main concerns from the reviewers during the rebuttal.

back to top

ReSeg-UNet: A Reconstruction-Guided Optimization Framework for Enhanced Medical Image Segmentation

Author(s):