Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Late Gadolinium Enhancement (LGE) imaging has emerged as the gold standard for cardiovascular disease diagnosis due to its ability to clearly delineate myocardial pathology. Professional interpretation of LGE images is usually difficult since their annotations are scarce, often necessitating the reliance on domain adaptation methods. Nevertheless, significant distribution discrepancy between datasets of different modalities usually results in poor transfer learning performances. To address this issue, we propose a general framework for cardiac MRI segmentation, called Cross Attention-Guided Unsupervised Domain Adaptation with Mutual Information (CAUDA-MI). This model leverages attention mechanisms on two data streams from the source and target domains, cleverly fusing the Query from the source domain with the Key and Value from the target domain, thereby aligning the implicit features of the target domain towards the source domain in the latent space. Additionally, we introduce single-domain mutual information as a supplementary means to further enhance the accuracy of myocardial segmentation. The proposed CAUDA-MI is tested on the MS-CMRSeg 2019 and MyoPS 2020 datasets, which achieves an average Dice score of 0.847 and 0.797 respectively. Comprehensive experimental results demonstrate that our proposed method surpasses previous state-of-the-art algorithms.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1855_paper.pdf

SharedIt Link: https://rdcu.be/eHdSn

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04978-0_5

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{DuDia_CAUDAMI_MICCAI2025,
        author = { Du, Dianrong AND Cui, Hengfei AND Li, Jiatong AND Zheng, Fan AND Xia, Yong},
        title = { { CAUDA-MI: Cross Attention-Guided Unsupervised Domain Adaptation with Mutual Information for Cardiac MRI Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {46 -- 56}
}

Reviews

Review #1

Please describe the contribution of the paper

This study introduces a cross-attention-guided unsupervised domain adaptation framework that implicitly aligns the target domain to the source domain in the latent space. Additionally, it constructs mutual information between segmentation and reconstruction branches to enhance their interaction, thereby improving segmentation performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The structure of the article is clear, and the logic is strong. The figures and tables are clear.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1）The manuscript does not provide a clear reason for why the proposed cross-attention mechanism has the potential to outperform current mainstream approaches. It is recommended to include a more specific and detailed analysis. 2）The proposed method lacks innovation, as cross-attention is commonly used (e.g., “CDTRANS: CROSS-DOMAIN TRANSFORMER FOR UNSUPERVISED DOMAIN ADAPTATION”). 3) Experimental results, especially on the MyoPS 2020 dataset (Table 2), do not achieve optimal performance. Further improvements are needed. 4) There is insufficient analysis of the loss function. A detailed discussion of its impact is necessary. 5) More comparative experiments should be included in Table 2, similar to Table 1, to strengthen the evaluation.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

1) The method with limited innovation. 2) Experimental results do not achieve optimal performance on the MyoPS 2020 dataset.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper
- an innovative Cross Attention mechanism for UDA segmentation, guiding target domain migration to the latent space for implicit alignment.
- construct the mutual information between segmentation and reconstruction to facilitate their interaction, thus enhancing the segmentation performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- it is an innovative idea to carry out cross-attention between the encoder and decoder, to enhance the feature extraction. i am wondering if there any related work before?
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- it is not clear how the segmentation and reconstruction are interacted with each other.
- the complex Eq.2 increased the difficulty of optimization during training, which need more explanations.
- some compared methods (e.g., Cyclemix and Modelmix) are not designed for UDA, thus the comparison are not fair.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There are more issues in the current version of the manuscript, but it does not prevent this paper from having the potential to be presented at MICCAI 2025. Therefore, I will consider raising the score if the authors can solve my doubts mentioned above.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This work proposes a model for domain adaptive cardiac image segmentation. The model includes source and target branches under the variational autoencoder framework to do image reconsctruction and segmentation, which is similar to previous works. Additionally, the authors propose to do self and cross-attention during encoding to form outputs from the two domains individually and an output that combines the information of both domains. They further propose to align the fused-branch output with single-domain outputs. Besides, they use mutual information based losses to ensure consistency between reconstruciton and segmentation. Experiments compared with state-of-the-art methods show their superiority.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Explicitly improving the consistency between reconstruction and segmentation is an interesting idea of improving the accuracy.
- The cross-attention branch of the encoder has the potential to help fuse and align the information of the two domains compared with aligning the two domains directly based on two groups of features from the two domains.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Rationale of L_CroSeg and L_CroDtl. Seems that these two terms encourage the fused result to be close to the source-domain result and target-domain result, respectively. If this is true, how does this strategy effectively handle the issue that input images from the two domains may have different shapes because they may from different patients or spatial locations? Are you using paired images (i.e., images from the same patient) as input? If so, it is doubtful because this is not very practical in applications, and almost all previous works assume the input images are not paired, which is more difficult to address than paired inputs. Another reason I doubt that the input images are paired is because the Myo Dice on MSCMR is 0.91, which is surprisingly high, considering Myo is very hard to segment due to scar signals on LGE.
- The description about the mutual information loss terms in Sec 2.3 is a little bit hard to follow. For example, what do “j”, “m” and “sp” mean in the formula of L_FM, and what are the sample classifiers (what are inputs to them and what is the output)? Similar questions for L_DM.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I appreciate the overall ideas of using cross-attention to align domains and mutual information based losses to ensure reconstruction-segmentation consistency. Still, I think it is improtant to clarify how the cross-attention based losses are calculated and whether it requires input images are paired (very important). Also, mutual information part should be modified to explain the idea and calculations clearly.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely thank all reviewers and ACs for their recognition of the novelty, performance, and presentation of this work. Here are responses to their invaluable suggestions and remaining concerns. R1Q1 : Analysis of the Cross-Attention Thanks for your constructive suggestion. Current research often uses GANs to align two domains, but these methods typically yield unstable and blurry results. Instead, we focus on analyzing the intrinsic relationships and differences between the source and target domains from an interpretability standpoint. The cross-attention mechanism facilitates knowledge distillation from the source domain (Q) to the target domain (K, V), helping the VAE adapt more effectively. In an unlabeled target domain, this mechanism acts as a bridge, aligning the target domain closer to the labeled source domain. A more detailed analysis will be provided in the final version. R1Q2: Cross-Attention Lacks Innovation CDTRANS is used for natural image classification, and inspired by this, we discovered that Cross-Attention is especially effective for cross-modal domain adaptation in cardiac MRI image segmentation. To the best of our knowledge, we are the first to apply Cross-Attention to this task. We used Cross-Attention and a novel distillation loss to improve the alignment of source and target domain features, enhancing segmentation accuracy in the target domain. R1Q3 & R1Q5: MyoPS 2020 Dataset Thank you for pointing out the issue. In Table 2, some of the comparison methods are supervised learning approaches (e.g., ADSIC). However, our UDA-VAE achieves comparable segmentation results. Due to the dataset characteristics, the SOTA methods may vary. We will revise the experiments using the MyoPS2020 dataset in the final version. R1Q4: Analysis of Loss Terms Our loss function includes 8 components, and we will provide a detailed description of these components and their interdependencies in the camera-ready version. R2Q1: Interaction Between Segmentation and Reconstruction Through the reconstruction task, the model can better understand the underlying distribution of the input data, thereby providing better feature representations to assist the segmentation model’s learning. At the same time, correctly segmented regions effectively guide the reconstruction process, resulting in more accurate reconstructed images. R2Q2: Eq.2 Increases Optimization Difficulty During Training Thank you for your feedback. We use a lightweight VAE model, which utilizes simple calculations and has fewer parameters during loss optimization. Although the loss terms increase memory usage, they have little impact on training speed but lead to significant improvements in segmentation results. R2Q3: Unfair Comparison in Experiments As pointed out by the reviewer, Cyclemix and Modelmix are based on scribble-supervised learning and GANs, while our unsupervised DA even outperforms them on the MS-CMRSeg 2019 dataset, further validating the superiority of our UDA. R3Q1: Ambiguity in Loss Term Descriptions Thank you for pointing out the issue. The VAE model aligns images from two domains into the same latent space using our alignment strategy—L_CroSeg and L_CroDtl. L_CroSeg ensures segmentation alignment through the Dice score between Cross-Attention and source domain labels. L_CroDtl facilitates feature transfer by calculating the KL divergence between target domain predictions and Cross-Attention. This approach eliminates the need for paired data and improves Myo recognition across bSSFP and MRI modalities, boosting accuracy. R3Q2: Ambiguity in Formula Descriptions In the L_FM formula, “j” and “m” represent the indices for positive and negative samples, respectively. The sample classifier is used to differentiate them, and “sp” refers to the JSD score of the sample classifier. In the L_DM formula, D_p and D_n represent the distribution predictions for positive and negative samples. We will add the detailed derivation finally.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

CAUDA-MI: Cross Attention-Guided Unsupervised Domain Adaptation with Mutual Information for Cardiac MRI Segmentation

Author(s):