Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Clinicians usually combine information from multiple sources to achieve the most accurate diagnosis, and this has sparked increasing interest in leveraging multimodal deep learning for diagnosis. However, in real clinical scenarios, due to differences in incidence rates, multimodal medical data commonly face the issue of class imbalance, which makes it difficult to adequately learn the features of minority classes. Most existing methods tackle this issue with resampling or loss reweighting, but they are prone to overfitting or underfitting and fail to capture cross-modal interactions. Therefore, we propose a Curriculum Learning (CL) framework for Imbalanced Multimodal Diagnosis (CLIMD). Specifically, we first design multimodal curriculum measurer that combines two indicators, intra-modal confidence and inter-modal complementarity, to enable the model to focus on key samples and gradually adapt to complex category distributions. Additionally, a class distribution-guided training scheduler is introduced, which enables the model to progressively adapt to the imbalanced class distribution during training. Extensive experiments on multiple multimodal medical datasets demonstrate that the proposed method outperforms state-of-the-art approaches across various metrics and excels in handling imbalanced multimodal medical data. Furthermore, as a plug-and-play CL framework, CLIMD can be easily integrated into other models, offering a promising path for improving multimodal disease diagnosis accuracy. Code is publicly available at https://github.com/KHan-UJS/CLIMD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4207_paper.pdf

SharedIt Link: https://rdcu.be/eG4C0

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05182-0_7

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/KHan-UJS/CLIMD

Link to the Dataset(s)

https://www.cancer.gov/ccg/research/genome-sequencing/tcga

BibTex

@InProceedings{HanKai_CLIMD_MICCAI2025,
        author = { Han, Kai AND Lyu, Chongwen AND Ma, Lele AND Qian, Chengxuan AND Ma, Siqi AND Pang, Zheng AND Chen, Jun AND Liu, Zhe},
        title = { { CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {65 -- 74}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a curriculum learning framework for imbalanced multimodal diagnosis, designs relevant measurer and scheduler, and shows its superiority through experiments.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper proposes a novel curriculum learning based framework, the first for imbalanced multimodal diagnosis, offering a new approach.
- The authors design a multimodal curriculum measurer and class-distribution guided scheduler, enabling the model to adapt to imbalanced data.
- The experimental results demonstrate that the proposed method can achieve a superior performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Validated on only two datasets, its performance in broader medical domains remains unproven.
- Empirically setting hyperparameters like \gamma may not be optimal and could vary across different scenarios.
- Lacks in-depth theoretical analysis on why the proposed methods work, relying mainly on experimental results.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please see the weaknesses.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper proposes CLIMD to address the class imbalance problem in multi-modal diagnosis. To achieve this, CLIMD designs multi-modal curriculum measurer combining intra-modal confidence and inter-modal complementarity to gradually adapt to complex category distributions, and a class distribution-guided training scheduler to progressively adapt to the imbalanced class distribution. The proposed method is evaluated on Multimodal Liver Lesion Classification datasets and Breast Invasive Carcinoma datasets, demonstrating the favorable improvements compared with other SOTA multi-modal diagnosis network and methods designed for class imbalance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

S1: This paper leverages curriculum learning for addressing the class imbalance issue in multi-modal diagnosis, which is of interest to the community. S2: The core idea that training model progressively from balance to imbalance by gradually transitioning from a uniform distribution to a long-tail distribution has good intuition, and the effectiveness is verified through experiments. S3: The paper is well-structured and easy to follow.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

W1: This paper lacks the comparison with other multi-modal curriculum learning works, for example, [R1] firstly introduced the curriculum learning idea into imbalanced data learning problem and [R2] also proposed an intra- and inter-modal curriculum framework for multi-modal learning. W2: CLIMD is said to act as a plug-and-play CL framework, and can be easily in tegrated into other models, however, this paper just combine it with several models that are not the top performers in Table 1. How about the performance tegrated into the second-best models? What’s more, for the methods specifically designed for class imbalance, like WCE Loss and Focal Loss mentioned in Table 2, can CLIMD achieved complementary gains? In other words, is there any limitations in the use of CLIMD? [R1] Wang, Yiru, et al. “Dynamic curriculum learning for imbalanced data classification.” ICCV, 2019. [R2] Zhou, Yuwei, et al. “Intra-and inter-modal curriculum for multimodal learning.” ACM MM, 2023.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think the exploration of curriculum learning in multi-modal class imbalance problem is of great value. My major concern is about the effectiveness. It lacks comparison with other CL-based works in the similar field. Besides, the plug-and-play utility needs further discussion.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper proposed a curriculum learning method to mitigate class imbalance in multimodal diagnosis that achieves SoTA performance on two benchmarks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A curriculum learning paradigm that tackles class imbalace adaptively and progessively
2. SoTA performance on two datasets and outperforming existing multimodal imbalance learning methods
3. Well developed and clearly written
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Limited clinical value. While boosting the performance, the proposed method does not direct resolve a clinical problem or lead to immediate improvement in clinical practices. The proposed method should have more opportunities besides clinical datasets.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite concerns regarding clinical value, this paper presents a solid research in tackling multimodal learning with imbalanced data. It has the potential to benefit future research in the community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

The authors would like to sincerely thank chairs and all reviewers for their thoughtful and constructive feedback. We have carefully responded to all the raised concerns in our rebuttal.

Clinical Value (R2, R3) We would like to clarify a potential misunderstanding. Although CLIMD is not designed to directly address clinical diagnostic tasks, it mitigates a fundamental limitation of current medical AI systems. Specifically, it alleviates learning bias caused by severe class imbalance and modality-specific noise. By improving robustness and fairness in multi-modal learning, CLIMD contributes to the development of more reliable medical AI tools. Additionally, CLIMD is model-agnostic and can be seamlessly integrated into a wide range of multi-modal medical scenarios and other domains.

Validation on Broader Datasets and Generalization (R2, R3) We acknowledge the concern regarding limited dataset diversity. Our current study includes two representative multi-modal medical datasets, MLLC and BRCA, which differ in disease types, modality compositions, label granularities, and data scales. These datasets reflect two real-world scenarios, one combining imaging and clinical features and the other involving multi-omics data. We agree that more comprehensive validation across diverse datasets is important and plan to expand this in future work and journal versions.

Theoretical Foundation of CLIMD (R2) CLIMD is inspired by curriculum learning and self-paced learning frameworks, where training samples are progressively selected based on estimated difficulty. In CLIMD, difficulty is computed from predictive uncertainty and modality reliability, which aligns with cognitive learning principles. While the current work emphasizes empirical validation, we plan to carry out formal theoretical analysis, such as convergence properties and generalization guarantees, in future extensions.

Additional Experiments and Method Comparisons (R2, R4) The hyperparameter γ was empirically selected from a reasonable range, guided by previous studies. Our goal was to ensure stable performance across different datasets. In future work, we will explore adaptive or self-tuning strategies for γ to further improve generalizability. We also thank the reviewers for highlighting relevant works such as DCL [R1] and IIMCL [R2]. While these methods are valuable, they focus on different settings. Nevertheless, we acknowledge the lack of direct comparison and will conduct more comprehensive evaluations with a broader range of curriculum learning methods in future extensions.

Plug-and-Play Design and Compatibility with Other Methods (R4) We thank the reviewer for highlighting the generalizability of CLIMD. In our experiments, we intentionally selected representative and commonly used fusion baselines to emphasize the effectiveness of CLIMD without the confounding effects of complex architectures. Although stronger backbones could potentially improve performance, our goal is to isolate the contribution of the curriculum strategy itself. Furthermore, CLIMD is orthogonal to loss functions such as Focal Loss and Weighted Cross-Entropy and can be used in conjunction with them. We plan to explore these combinations further in future journal extensions to achieve greater performance improvements.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis

Author(s):