Abstract

Performance of deep learning segmentation models is significantly challenged in its transferability across different medical imaging domains, particularly when aiming to adapt these models to a target domain with insufficient annotated data for effective fine-tuning. While existing domain adaptation (DA) methods propose strategies to alleviate this problem, these methods do not explicitly incorporate human-verified segmentation priors, compromising the potential of a model to produce anatomically plausible segmentations. We introduce RL4Seg, an innovative reinforcement learning framework that reduces the need to otherwise incorporate large expertly annotated datasets in the target domain, and eliminates the need for lengthy manual human review. Using a target dataset of 10,000 unannotated 2D echocardiographic images, RL4Seg not only outperforms existing state-of-the-art DA methods in accuracy but also achieves 99% anatomical validity on a subset of 220 expert-validated subjects from the target domain. Furthermore, our framework’s reward network offers uncertainty estimates comparable with dedicated state-of-the-art uncertainty methods, demonstrating the utility and effectiveness of RL4Seg in overcoming DA challenges in medical image segmentation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2284_paper.pdf

SharedIt Link: https://rdcu.be/dV51a

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72114-4_23

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2284_supp.pdf

Link to the Code Repository

https://github.com/arnaudjudge/RL4Seg

Link to the Dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/camus/index.html

BibTex

@InProceedings{Jud_Domain_MICCAI2024,
        author = { Judge, Arnaud and Judge, Thierry and Duchateau, Nicolas and Sandler, Roman A. and Sokol, Joseph Z. and Bernard, Olivier and Jodoin, Pierre-Marc},
        title = { { Domain Adaptation of Echocardiography Segmentation Via Reinforcement Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {235 -- 244}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces RL4Seg, a novel reinforcement learning (RL)-powered domain adaptation framework designed for echocardiography image segmentation. The approach specifically focuses on enhancing segmentation performance by ensuring anatomically plausible segmentations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper does address a meaningful problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Despite the innovative approach, the evaluations presented are weak and fail to convincingly demonstrate that the proposed RL-powered segmentation method provides performance improvements over established baselines.

    Comments:

    1. The evaluations could be improved. Given the aim to address anatomically invalid results in the target domain, it’s unclear why the process does not stop at Step 1, where segmented images using the pre-trained policy are refined with the approach cited in Reference 15. There is no evidence provided to justify the necessity for subsequent steps, including the RL approach.

    Using Anatomical Metrics to provide specific anatomical knowledge of the target domain would give the proposed approach an unfair advantage over other baselines. It would be insightful to determine if the improved performances over the baselines were a result of this advantage through ablation studies.

    It would be better to showcase the presense of a domain shift in the selected datasets through statistical distribution comparisons between source and target domains, strengthening the rationale behind domain adaptation efforts.

    1. The overall presentation of the manusript could be improved.

    Details on constructing the Reward Dataset Dr are sparse. It would be better to explain the process in detail by referring to the flow illustrated in Figure 1 along with the terms used. It seems the output’s validity is defined without using anatomical metrics, with left and right flows beginning from the resulting map. If these are due to the perturbations used, the image should be updated to reflect that.

    Explanations within the manuscript should be more self-contained rather than relying on external sources like Reference 15. Providing at least an overview and the fundamental logic, even when external references are involved for key methodologies, would significantly enhance clarity and accessibility.

    Finally, the manuscript would benefit from a more focused discussion on essential elements like the Reward Dataset, rather than extensive coverage of well-known PPO methodologies. This would help clarify the unique contributions and central innovations of the study.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    na

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to section 6

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to section 6

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal addressed some of my concerns but not all



Review #2

  • Please describe the contribution of the paper

    This manuscript introduces RL4Seg, a reinforcement learning framework designed to address domain adaptation challenges in medical image segmentation, specifically for echocardiography.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This manuscript is well motivated. 2) RL4Seg incorporates human-validated segmentation priors to enhance anatomical plausibility without requiring extensive expert annotation in the target domain. 3)The design of incorporating RL to address domain adaptation challenges in medical image segmentation is interesting to me.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Regarding RL4Seg, many crucial details of the experimental design have not been clearly presented.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) As the key element of RL, the descriptions regarding the design of reward, action, etc., are not clear enough. Also, the mechanism of incorporating anatomical prior is not clear enough. Consider including appropriate images for illustration.

    2) Compared to the method of incorporating anatomical prior into MIS via reward function in RL (e.g., PMID: 33789178), what is RL4Seg’s actual distinction?

    3) In Sec.2.3, how the “perturbations” designed? How to ensure that such perturbation is consistent with real conditions or the actual data distribution?

    4) Authors choose PPO for policy optimization, why? Can it be replaced?

    5) How to evaluate the degree of “anatomical plausible” for echocardiography in this task?

    6) How can uncertainty information be utilized to aid the learning of MIS models? This mechanism has not been well described.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivation is persuasive and the proposed task and design is of clinical value. However, the description towards the designs of the key factors of RL4Seg is not clear enough.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a novel reinforcement learning-based domain adaptation framework for echocardiography image segmentation. It specifically addresses the challenge of adapting segmentation models to new domains without extensive annotated data. RL4Seg integrates a RL strategy that ensures anatomical accuracy by using a reward mechanism based on expert-validated priors, eliminating the need for manual annotation. The method achieves superior segmentation accuracy and anatomical validity over existing domain adaptation techniques and incorporates an effective uncertainty estimation framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • authors introduce an innovative approach to domain adaptation by applying RL to the problem of echocardiography image segmentation; this is particularly interesting as it diverges from traditional methods like fine-tuning with pseudo-labels or using unsupervised or semi-supervised learning approaches. They use RL to directly optimize the segmentation model against a reward function that assesses anatomical validity, ensuring that the segmentations are not only accurate in a pixel-wise sense but also anatomically plausible.

    • RL4Seg reduces reliance on large annotated datasets in the target domain by using a self-supervised learning method

    • The framework is validated on a very large dataset and achieves high anatomical validity

    • Another strength of the paper is the integration of an uncertainty estimation framework within the RL4Seg model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While RL4Seg minimizes the need for annotated data in the target domain, it relies on pseudo-labels generated from the model’s predictions, which could introduce biases or errors if the initial predictions are poor

    • The use of reinforcement learning and multiple neural networks (segmentation, reward, and uncertainty estimation networks) might result in high computational costs and extended training times. This aspect was not discussed much, and the scalability of the method to even larger datasets or real-time applications could be a concern.

    -The application to echocardiography images is well-executed; however, the discussion on generalizability to other medical imaging modalities is somewhat limited. It would enhance the paper to include either a theoretical discussion or practical tests of how RL4Seg could be adapted or would perform with other types of medical images, such as MRI or CT scans. Details on potential challenges and adaptations needed for other modalities would provide a more comprehensive view of the framework’s versatility.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The use of multiple complex models in RL4Seg likely entails significant computational overhead. A detailed analysis of the computational costs, training times, and hardware requirements would be beneficial for readers. Comparisons with the computational demands of traditional methods could also highlight the efficiency or costliness of the method.

    • The integration of anatomical metrics within the reward system is a component of your method’s success. Providing a more detailed explanation illustrating how these metrics influence the learning process could help better understand and replicate your approach.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application of RL for domain adaptation in echocardiography segmentation is novel and addresses a significant challenge in medical imaging, which is the transferability of models across different datasets with limited annotated data. The method was evaluated on a large dataset and the paper claims high accuracy and anatomical validity.

    Expanding the discussion on how this framework could be adapted to other imaging modalities or a broader set of clinical conditions would significantly strengthen the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors addressed most of the concerns




Author Feedback

R1 1 Why the process does not stop at Step 1? [15] does not improve segmentation accuracy, but focuses on segmentation consistency. Training with RL4Seg on the target data substantially improves the Dice and anatomical validity at each iteration.

2 Anatomical Metrics (AM) give an unfair advantage over baselines. To our knowledge, our approach is unique in its utilization of AM with RL (only during training). The baselines we compared to thus do not include such concepts.

3 Statistical shift between datasets. US images from different manufacturers have very different visual signatures, causing important domain shifts. This was confirmed by Frechet inception distance between the source and target domains being 10x higher vs intra domain.

4 More details on Reward Dataset […] the output validity defined without AM. The output validity is computed with the AM (Sec.2.3-Dr), which determines the choice of left/right flow. In the correction branch (left), the initial segmentation is invalid and the corrected version is considered valid. In the perturbation branch (right), the initial segmentation is valid, and the perturbed segmentations are always considered invalid. Regardless of the branch taken, tuples added to Dr are formed of input images, invalid segmentations and the pixel-wise error between the valid/invalid pairs.

5 Paper shouldn’t rely on external sources [15] + overview of the method. Due to space limits, we briefly outline methods like [15] (Sec.2.3-Dr) since RL4Seg’s methodological content does not depend on the implementation of [15], but on the general concept of spatio-temporal post-processing correction.

R3 1 RL, reward, action, and AM not clear enough. Cf. answer to R1Q4.

2 RL4Seg’s distinction compared to SOTA? As opposed to other works, RL4Seg is a new self-supervised segmentation RL formalism (Sec.2.2) for domain adaptation without target domain ground truth. PMID 33789178 requires ground truth data to train trajectories of refining predictions on the same image.

3 How perturbations are designed? Like data augmentation, we apply small perturbations to remain close to the actual data distribution while increasing variety. Perturbations (1-6% noise on model weights, 5-10% Gaussian noise on images, image contrast reduction by 20-30%) are enumerated in the ablation study in supp mat.

4 Why PPO? RL4Seg is inspired by ChatGPT which uses PPO. Advantages are the KL term (prevents the policy from drifting away) and the high sample efficiency. Other policy-based methods may work, but because segmentation RL has trajectories of length 1 and the very large action space, value-based or model-based methods are less suitable.

5 How to evaluate anatomical plausibility? With the 10 anatomical metrics (cf. supp. mat)

6 How does uncertainty information help? Uncertainty estimation is a by-product of our framework and is not used to improve segmentation results in the RL loop. We are currently exploring its potential for an upcoming journal paper.

R4 1 Poor initial predictions. A good initial policy generating decent predictions is key for the success of RL4Seg. However, poor predictions are filtered out by the anatomical metrics mitigating the risk for biases induced by the initial model (cf. R1Q4). Despite a domain shift between the source and target data (cf. R1Q3), RL4Seg’s performance is excellent, underlining its generalizability.

2 Computational costs. Processing 10K images requires 5 hours on an NVIDIA 3090 GPU, which is reasonable for such database size, and a significant reduction in time compared to manual annotation by an expert, which is the grand objective of our work.

3 How AM influences learning/generalizability to other modalities. RL4Seg enables the policy to learn AM implicitly through the reward network, preventing deviation from valid results during training. [16] demonstrates that such metrics are effective across US and MR cardiac images, indicating that our method can suit other modalities.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    After rebuttal, all reviews are positive.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    After rebuttal, all reviews are positive.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers agreed to accept this paper after rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers agreed to accept this paper after rebuttal.



back to top