Abstract

Learning to segment multiple organs from partially labeled medical image datasets can significantly reduce the burden of manual annotation. However, due to the large domain gap, learning from partially labeled datasets of different modalities has not been well addressed in the literature.In addition, the anatomic prior knowledge of various organs is spread in multiple datasets and needs to be more effectively utilized. This work proposes a novel framework for learning to segment multiple organs from multimodal partially labeled datasets (i.e., CT and MRI). Specifically, our framework constructs a cross-modal a priori atlas from training data, which implicitly contains prior knowledge of organ locations, shapes, and sizes. Based on the atlas, three novel modules are proposed to utilize the prior knowledge to address the joint challenges of unlabeled organs and inter-modal domain gaps: 1) to better utilize unlabeled organs for training, we propose an atlas-guided pseudo-label refiner network (APRN) to improve the quality of pseudo-labels; 2) we propose an atlas-conditioned modality alignment network (AMAN) for cross-modal alignment in the label space via adversarial training, forcing cross-modal segmentations of organs labeled in a different modality to match the atlas; and 3) to further align organ-specific semantics in the latent space, we introduce modal-invariant class prototype anchoring modules (MICPAMs) supervised by the atlas-guided refined pseudo-labels, encouraging domain-invariant features for each organ. Extensive experiments on both multimodal and monomodal partially labeled datasets demonstrate the superior performance of our framework to existing state-of-the-art methods and the efficacy of its components.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1759_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1759_supp.pdf

Link to the Code Repository

https://github.com/ccarliu/multimodal-PL

Link to the Dataset(s)

https://amos22.grand-challenge.org/Dataset/

BibTex

@InProceedings{Liu_Learning_MICCAI2024,
        author = { Liu, Hong and Wei, Dong and Lu, Donghuan and Sun, Jinghan and Zheng, Hao and Zheng, Yefeng and Wang, Liansheng},
        title = { { Learning to Segment Multiple Organs from Multimodal Partially Labeled Datasets } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a multi-organ segmentation framework to learn from multi-modal and partial labeled image data. The core idea is to construct an atlas for the target organs (by calculating the average of the existing labels) before the training process. Then, the built atlas is used as a prior knowledge to guide the network to refine the pseudo labels and eliminate the domain gap between the learned features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A new problem is explored in this paper. Although the multi-organ segmentation with partial labeled data is a well-studied problem, the incremental setting of multi-modal data makes it a new problem that is understudied currently. So, this study is of great value of practical significance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The motivation of a key design in this work is not convincing. The author proposed an atlas-conditioned modality alignment network (AMAN) for cross-modal alignment via adversarial training and claimed that the label space of different modalities can be aligned in this way. However, in my opinion, the label space is modality-agnostic. We cannot distinguish source image modalities of the label (or the segmentation masks) without image information. So, why do we need and how to align the labels generated from different modalities?

    • The design of a key component in the proposed method is problematic. Also, regarding the design of the proposed AMAN module, the author claimed that “Given the domain gap between modalities, the intra-modal segmentation is expected to be better in quality and thus closer to the atlas than cross-modal segmentation.” I cannot agree with this statement. The atlas is constructed using multi-modal data. If there is a significant domain gap, the image from any domain will be misaligned with the averaged atlas. BTW, I am always confused what is a “cross-modal segmentation?”

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Section 2, “MICPAM” subsection: “… \bar{a}_c is initialized randomly.” Why not initialize \bar{a}_c with zeros? Is the random initialization better regarding the convergence speed?

    • Section 3: “200 CT (40 MRI) images into 162 (30) and 38 (10) subjects for training and testing,” Cross validation strategy is required for the evaluation when the dataset is in a relatively small size.

    • Section 3: “The loss weights in Eqn. (4) are empirically set to 1, 0.01, and 0.1 for lambda_1, lambda_2, and lambda_3, respectively.” The loss weight lambda_3 is set to a constant, which could make the training process unstable since the feature prototype is unreliable at the early stage of the training. A warm-up strategy for the loss weight lambda_3 could be better.

    • Section 3: In the experimental results, the “intra-“ and “cross-modal” settings are confusing. Since all the modalities are required for training the model, how to define the “intra-“ and “cross-modal” settings?

    • Fig 3 and the figures in the supplementary material: Why the spleen is on the left side of the image (which is the right body side of the patient)?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Major flaws (such as unconvincing motivation and problematic methodology design) are found in this paper, which requires a substantial revision to fix. Considering the limited time and space for rebuttal, I recommend a rejection of this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the authors’ efforts in addressing my concerns, which have been largely addressed or clarified in the rebuttal. I think the major misunderstanding comes from the terminology of “intra-modal” and “cross-modal” used in throughout the paper. The proposed method is trained using both CT and MRI images, and the testing images are also CT or MRI images. This is a typical intra-modal case since no unseen modality is involved in the testing time. After the authors’ rebuttal, I found the “intra-modal” and “cross-modal” are regarding the labels rather than the images. The author is strongly suggested to clarify that in the camera-ready version to avoid potential misunderstanding. As a consequence, I raised my score from 2 (Reject) to 4 (Weak accept).



Review #2

  • Please describe the contribution of the paper

    This paper presents a probabilistic-atlas-guided framework to segment multiple organs from the multimodal partially labeled dataset. The presented method exploits a cross-modal a priori probabilistic atlas from training data using several modules to handle issues like inter-modal domain gaps (CT vs MRI), improve the quality of pseudo-labels for unlabeled organs, and better utilize valuable information from unlabeled organs. The authors compare their framework against 9 state-of-the-art methods and show impressive segmentation improvements in both CT and MRI.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The presented method handles multi-modality data, makes use of partially labeled data, and works for multiple organs. The proposed probabilistic-atlas-guided framework shows improvement over existing methods due to its ability to handle multi-model data and effectively use unlabeled data. The use of a cross-modal a priori probabilistic atlas from training data seems to capture rich prior knowledge about morphology and spatial features of the organs and by using dedicated components this knowledge is exploited to a good extent to generate quality segmentation results. Experiments are extensive as the authors compare against 9 state-of-the-art methods and achieve the best performance in both CT and MRI cases. The writing is of good quality and the supplementary section is very strong to support the paper’s case.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One of the glaring weaknesses of this method is the use of an alignment step that is too simple. Because the atlas formation depends on it and the whole modules of the framework are based around the created atlas, authors might explore more robust registration methods for aligning the training volumes and labels, which will then facilitate atlas formation much better.

    Next, the authors compare against 9 methods but only in terms of 2 modalities of only 1 dataset. A more thorough study of additional datasets would make the work stronger.

    Similarly, it was not clear whether the weights in the loss function (equation 4), are empirically determined, hand-crafted, or implicitly learned in a supervised fashion.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors have stated that they will release the code. The implementation detail provided is enough to determine the reproducibility once the code is up.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Because the atlas formation depends on the good alignment of the training data and the whole modules of the framework are based around the created atlas, authors might explore more robust registration methods for aligning the training volumes and labels, which will then facilitate atlas formation much better.

    Please specify whether the weights in the loss function (equation 4), are empirically determined, hand-crafted, or implicitly learned in a supervised fashion.

    I have criticism for only using 1 dataset for the study. A more thorough study of additional datasets would make the work stronger.

    Also, the authors use 2 modalities of the dataset, the number just enough to qualify the method as multi-modal. In the future, to fully justify the method for multi-model (Image) data, authors might consider more modalities of medical image data.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method effectively uses both labeled and unlabeled data across modalities to segment multiple organs. Despite some weaknesses, the results provided are strong enough to justify its merits and the experiments are comprehensive with a very well-crafted supplementary section. The writing is of good quality and easy to follow (except for some places that need clarifications as mentioned above).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I appriciate that the authors clarified the concerns I had. I accepted the paper for its merits and I am firm on my decision after the rebuttle too.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a well-devised method for combating pseudo label quality for improving the representations of abdominal organs across multiple modalities. The main contribution of this paper is an atlas-based model, with steps taken at each point to ensure smooth solving of any upcoming problems.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper focuses in unpaired partially labeled dataset (in this case, CT and MRI) for improving semi-supervised “domain-generalized” segmentation
    • The paper proposes the construction of an atlas, which is well-verified though the supplemental figures (good job!)
    • The paper proposes 2-3 additional steps and modules to account for pseudo label refining, modality alignment and misrepresentation learning.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There were no solid weaknesses, however, please check Section 10.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    n/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The paper is very well organized and a delight to read and follow. The ablation studies cover the information that can be thought of by this reviewer well in detail.
    • A key point, which would be helpful in the future if not now, would be to also show some limitation cases where the final network did not perform as well as expected and some insights into why it may happen.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel framework for multiple organ segmentation across different modalities. The paper is well-written and easy to follow and solves an existing problem not using too fancy methods in current computer vision literature. It was a delight to read, especially seeing how the work is being done with only 3 V100s.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    The authors addressed most of the questions posed by the other reviewers and have streamlined the future work. This reviewer agrees that the paper does need a little bit more working on the side, but the novelty and result presentation, along with the rebuttal, make this paper worth getting into the conference.




Author Feedback

We thank the reviewers for 1) appreciating our work’s novelty, clarity, superior performance on an understudied new problem, and supplementary material, and 2) the constructive comments.

Q1 Limitation cases & insights (R3) We’ll include some limitation cases & discussion.

Q2 “Intra-” & “cross-modal” segmentation (R4) Following existing works like [35], we define the segmentation of an organ in modality A as intra-modal if the organ is labeled in modality A, or as cross-modal if it is not labeled in modality A but in B.

Q3 Domain gap & atlas construction (R4) In this work, the domain gap between modalities is predominantly in image appearance attributes like intensity, contrast, and texture; it has been a central problem of many medical image domain adaptation works [3,9,15]. Meanwhile, organs’ location, shape, and size are statistically consistent across modalities. Thus, we average each organ’s binary training labels (spatially registered across modalities) to obtain an organ-wise probabilistic atlas (Fig. S1). The organ-wise atlases of all organs compose the cross-modal atlas. Note the operations happen in the label space, which, as noted by R4, is modality-agnostic. So, despite the significant domain gap in appearance, images of different modalities statistically align with the atlas regarding organs’ location, shape, and size.

Q4 Motivation of AMAN (R4) We agree the label space is modality-agnostic. Thus, we’ll rephrase our expression to be more rigorous: AMAN aligns the cross-modal segmentation softmax output with the atlas in the label space, instead of aligning the label space itself. As clarified in Q2 & Q3, the cross-modal segmentation output would be noticeably poorer than the intra-modal due to the domain gap, thus less aligned with the atlas. A discriminator judges whether a segmentation is cross- or intra-modal based on how well it aligns with the atlas. This discrimination is perceptive rather than objective. Meanwhile, the segmentation network produces cross-modal segmentation that better aligns with the atlas via adversarial training. The ablation study (Table 2) confirms AMAN’s efficacy. Lastly, cross-modal alignment of segmentation output in label space has been previously explored and validated, e.g., [29].

Q5 Initialization of \bar{a}_c (R4) Experiments show zero and random initialization are similar in convergence speed (~400 v. ~390 epochs) and results (cf. Table 1).

Q6 Validation strategy (R4) Thanks for suggesting cross-validation. However, we cannot incorporate it in this study due to rebuttal constraints. We’ll certainly consider it in future work.

Q7 Loss weight lambda_3 (R4) We concur that early unreliable feature prototypes could destabilize the training process. Alternative to warming up lambda_3, we address this issue by updating the prototypes only with reliable features in correctly predicted regions. In addition, we use an EMA update. Our experiments show no notable signs of instability. We also try to warm up lambda_3 in our framework (linear increase from 0 in the initial 10% epochs). The convergence speed (~400 epochs) and final results are similar to ours.

Q8 Spleen position (R4) We’ll reorient the figures to follow clinical convention.

Q9 Alignment step (R5) We concur that the alignment is vital. However, even with the straightforward alignment, our method improves significantly over existing ones. We attribute the superiority to how we use the atlas. Instead of directly using it for supervision, we design three modules that learn to use the uncertain prior in the atlas wisely. In addition, Table S1 indicates a high tolerance to the atlas’ quality. That said, more robust registration methods could improve atlas formation and potentially boost performance, which we envisioned for future work in conclusion.

Q10 Use additional datasets & modalities (R5) We plan to do so when extending this work.

Q11 Loss weights in Eq. 4 (R5) They are empirically determined on validation data.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I have checked the reviews of this paper and there are no issues.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I have checked the reviews of this paper and there are no issues.



back to top