Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Medical images span a wide range of imaging protocols and anatomical regions, exhibiting two fundamental properties: inter-organ diversity–where different organs exhibit distinct structural patterns (e.g., hand vs. chest)–and intra-organ consistency–where each organ retains a coherent structure with subtle variations across patient (e.g., left vs. right hand). While existing foundation models typically focus on a single organ or combine organs across heterogeneous modalities–often failing to jointly capture both properties–we envision that a model purposefully built on these fundamental properties would yield representations with greater generalizability, robustness, and interpretability. To this end, we introduce a general-purpose and scalable framework for learning foundation models from diverse organs within a given imaging modality. We call our framework Coda, as it is explicitly designed to jointly capture both the consistency and diversity of anatomical structures, encoding high-level semantic relationships across distinct organs and fine-grained anatomical details within each organ. Our experiments in zero-shot, few-shot transfer, and full-transfer settings show that Coda, pretrained on 23 diverse organs, learns semantically rich representations that not only yield strong inter-organ and intra-organ discrimination capabilities but also offer superior generalizability and robustness on diverse tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0802_paper.pdf

SharedIt Link: https://rdcu.be/eHwTA

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04971-1_28

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HosMoh_Learning_MICCAI2025,
        author = { Hosseinzadeh Taher, Mohammad Reza AND Hong, Junpyo AND Soni, Ravi AND Avinash, Gopal},
        title = { { Learning Foundation Models from Multi-Organ Medical Images by Capturing Consistency and Diversity of Anatomical Structures } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {294 -- 304}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed MORE, a SSL method to learn discriminative representations for both different types of X-rays and X-rays within the same type.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper conducted extensive experiments to evaluate the model performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The authors claimed that “intra-organ heterogeneity: each organ … with nuanced variations across different patients.” in the abstract. This claim cannot hold because normal and abnormal organs (e.g., healthy lung and lung with atelectasis) are often obviously different. Besides, heterogeneity means inconsistency, which contracts the following “structural consistency”. The authors should enhance the soundness and clarifty of the claim.
- To learn discriminative features within the same type of X-rays in Section 2-(2), this paper simply employed a cross entropy loss, which only maximizes similarities between positive pairs but not minimizing similarities between negative pairs, an establish objective in contrastive learning methods, such as SimCLR. Besides, the cross entropy loss only works for categorical distributions, not for image embeddings. The authors should revise the manuscript to present reasonable implementation.
- The comparison methdos are not state of the art. For example, DINO has its variants such as iBOT and DINOv2.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed technical contributions are not novel. Equation (2) does not adequately address the motivation.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

This rebuttal clarifies ‘intra-ogran heterogeneity’ and justifies the non-contrastive form of equation (2), although CE should be applied on categorical distributions, not embeddings.

Regarding novelty, I still think that the idea of inter- and intra-class relationships is not new and the technical contributions are incremental.

Review #2

Please describe the contribution of the paper

The abstract introduces MORE, a novel SSL framework that models inter-organ diversity and intra-organ heterogeneity in radiograph images via a dual-branch contrastive objective. The approach is conceptually sound and supported by encouraging experimental results.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The major strengths are as follows: (a) The core idea of combining inter- and intra-organ structural modeling is interesting and novel, especially in the context of radiographs, where anatomical structure plays a key role. (b) The method is generally sound. (c)The results across zero-shot, few-shot, and transfer settings are strong, with clear gains over RadImageNet and LVM-Med. Overall, the abstract addresses an important and underexplored area in medical SSL. The idea is interesting, and the results are promising.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The weakness are as follows: (a) Authors need to clarify how organ labels are derived in a self-supervised setting — this affects reproducibility and generalizability. (b) They need to compare against additional baselines, including anatomy-guided few-shot models or patch-level metric learning approaches. (c) Need to analyze computational overhead, especially in comparison to DINO and LVM-Med. (d) Need to improve clarity in loss function definitions and architectural specifics (e.g., number of crops, scales, augmentation strategies). (e) The abstract focuses solely on radiographs but claims generalizability. It is unclear if the learned representations transfer to other modalities (e.g., CT, MRI). (f) Without cross-modality or domain shift experiments, generalization claims remain narrow. (g) Consider qualitative examples (e.g., segmentation masks) to better visualize the benefits of intra-organ discrimination. (h) Although the abstract claims improved interpretability, there is no explanation method (e.g., Grad-CAM, attention visualization, probing tasks) used to support this. Embedding space t-SNE plots are useful but not clinically actionable or interpretable in terms of medical decision-making. (i) All downstream tasks are classification and segmentation. MORE is claimed to be general-purpose, yet no detection, retrieval, or report generation tasks are included. (j) Tasks like pose estimation, anatomical landmark detection, or multi-label classification would better stress-test the claimed representation richness. (k) The abstract does not provide qualitative or quantitative analysis of where MORE fails (e.g., confusion between structurally similar organs). (l) A deeper look into misclassifications or poorly separated embeddings would improve transparency. Overall, the weakness can be addressed.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The abstract lacks clarity in some implementation aspects and certain methodological choices require stronger justification. While the idea is promising, the execution and presentation leave room for improvement. The weaknesses can be addressed.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper introduces a novel self-supervised framework for organ representation learning. Through focusing on inter and intra-organ discrimination, their model learns generalizable representations that outperform existing methods in zero-shot, few-shot and full transfer learning on downstream tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel learning objective of separate inter- and intra-organ discrimination.
- Strong performance on downstream tasks across multiple datasets.
- Excellent anatomical embedding properties: MORE captures high-level discriminative features between organs, and effectively extracts fine-grained anatomical details within each organ group.
- Results were across a diverse set of organs and tasks.
- MORE is data-efficient (few-shot learning results).
- Open-source availability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Lack of discussions on the limitations of the work.
- Focus is on 2D radiographs: How well would this model scale to 3D data or other 2D modalities?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(6) Strong Accept — must be accepted due to excellence
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The MORE framework addresses a key challenge by modeling both inter-organ diversity and intra-organ heterogeneity in radiographs. Their zero-shot analyses demonstrate that the embedding space preserves anatomical structures, while experiments across multiple learning settings show superior performance compared to current state-of-the-art models. The methodology details are sound and the results appear reproducible, with the empirical evidence strongly supporting the authors’ claims about enhanced representation learning capabilities.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Review comments are addressed. Otherwise, it is a strong contribution to representation learning in medical images.

Author Feedback

-General comments (Meta-Reviewers, R2, R3, R4): We appreciate the reviewers’ positive feedback and valuable suggestions. We are pleased they highlighted the key strengths as: (1) novelty and soundness of the proposed method in jointly modeling inter- and intra-organ relations, addressing a key challenge in medical SSL; (2) comprehensive experiments across diverse tasks, demonstrating MORE’s superiority in zero-shot, few-shot, and full-transfer settings; and (3) strength of the learned embeddings in capturing intrinsic organ properties. Below, we address reviewers’ comments and clarify any potential misunderstandings.

-Generalizability to other modalities (R2 & R4): While radiography is the primary focus of this paper, MORE is modality-agnostic and can be readily extended to other modalities (e.g., CT/MRI scans of the chest, liver, etc.), as it makes no modality-specific assumptions. Sec. 4 outlines this extension as future work.

-Clarification on intra-organ heterogeneity (R3): As noted in Sec. 1, radiographs of the same organ (e.g., chest X-rays) exhibit globally consistent structures (e.g., lung anatomy), along with local, fine-grained cross patient variations in size, shape, and pathology—referred to as intra-organ heterogeneity—which complements, rather than contradicts, global consistency within each organ class. We revised Sec. 1 to further clarify this concept.

-Intra-Organ loss (R3): We use a non-contrastive loss for its efficacy in learning discriminative features without negative pairs, as shown in recent SSL methods like DINO. Our design also mitigates semantic collision, a limitation of contrastive loss that can undesirably push apart anatomically similar samples. We revised Sec. 2 to clarify this design choice.

-Baselines (R3 & R4): As shown in the paper, MORE outperformed competitive baselines, including SOTA large-scale multi-organ models (RadImageNet & LVM-Med). Moreover, MORE surpasses patch-level methods (Medical MAE & iBOT) and DINOv2 (e.g. avg. gains of 10%, 7%, and 3% on TB & nodule classification, & clavicle segmentation).

-MORE’s novelty (R3): The novelty of MORE arises not only from the fact that our idea has not been reported in the literature, but more importantly from its key insights into jointly capturing inter-organ relations and fine-grained intra-organ variations, yielding semantics-rich representations with enhanced robustness and generalizability.

-Clarification on organ labels (R4): Radiographs of different anatomical regions originate from distinct acquisition protocols, naturally leading to their separation when curated for research. Thus, the acquisition protocol serves as a proxy for organ identity, removing the need for manual labels.

-Metric learning & few-shot models (R4): MORE aims to learn generic representations to serve as a versatile pretrained model for radiography; thus, the most relevant baselines for fair evaluations are supervised/SSL pretrained models. While few-shot and metric learning methods are beyond this paper’s scope, MORE holds the potential to improve them as a strong initialization.

-Computational efficiency (R4): MORE is lightweight, with a shared student, frozen teachers, and small projection heads.

-Implementations details (R4): All requested details are in Sec. 3.

-Domain shift experiments (R4): Our target tasks present significant domain shifts in terms of datasets, organs, diseases, and tasks, highlighting MORE’s generalizability.

-Interpretability (R4): MORE, with zero labels, learns an embedding space aligned with human-understandable anatomical concepts (Fig. 2), supporting its interpretability and potential for clinical tasks like organ retrieval and landmark detection.

-Other evaluations (R4): Given space constraints, we focus on classification & segmentation, the most common tasks for evaluating transfer learning in medical imaging. Suggested evaluations will be included in journal version.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Three reviewers recommended acceptance of the paper following the rebuttal, citing a strong methodological contribution, competitive empirical performance, and relevance to medical representation learning.

Reviewer #1 gave a strong accept, praising the novel formulation of inter- and intra-organ discrimination in the self-supervised framework (MORE), its consistent improvements across zero-shot, few-shot, and transfer learning tasks, and its robust anatomical embeddings. The reviewer appreciated the clarity of writing, broad evaluation across datasets, and open-source commitment. Remaining concerns—such as the focus on 2D radiographs and lack of limitations discussion—were considered minor after rebuttal.

Reviewer #2 initially raised concerns about the validity of the intra-organ heterogeneity claim, the use of cross-entropy loss in a non-categorical context, and limited baseline comparisons. However, the rebuttal clarified these points and justified the implementation choices. While the reviewer maintained that the contributions were incremental, they accepted the paper based on clarified reasoning and empirical soundness.

Reviewer #3 highlighted the novelty of modeling anatomical structure via inter- and intra-organ objectives and the strong performance gains over prior methods. While pointing out areas for improvement—such as missing details on label derivation, lack of modality generalization, limited task diversity, and absence of interpretability analysis—they concluded the weaknesses were addressable and supported acceptance.

In summary, the paper was accepted for its novel and well-motivated framework, strong experimental results, and relevance to medical image representation learning, with reviewers encouraging further development in generalizability, interpretability, and broader task evaluation.

back to top

Learning Foundation Models from Multi-Organ Medical Images by Capturing Consistency and Diversity of Anatomical Structures

Author(s):