Abstract

Existing landmark detection methods are primarily designed for centralized learning scenarios where all training data and labels are complete and available throughout the entire training phase. In real-world scenarios, training data may be collected sequentially, covering only part of the region of interest or providing incomplete landmark labels. In this work, we propose a novel continual reinforcement learning framework to tackle this complex situation in landmark detection. To handle the increasing number of landmark targets during training, we introduce a Q-learning network that takes both observations and prompts as input. The prompts are stored in a buffer and utilized to guide the prediction for each landmark, enabling our method to adapt to the intricacies of the data collection process. We validate our approach on two datasets: the RSNA-PBA dataset, representing scenarios with complete images and incomplete labels, and the WB-DXA dataset, representing situations where both images and labels are incomplete. The results demonstrate the effectiveness of the proposed method in landmark detection tasks with complex data structures. The source code will be available from https://github.com/kevinwolcano/CgCRL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1410_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1410_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_Contextguided_MICCAI2024,
        author = { Wan, Kaiwen and Wang, Bomin and Wu, Fuping and Gong, Haiyu and Zhuang, Xiahai},
        title = { { Context-guided Continual Reinforcement Learning for Landmark Detection with Incomplete Data } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a principled method for landmark detection from incomplete data. Continual reinforcement learning is used to support the scenario of sequentially collected training data. The results are competitive, the method is flexible and the evaluation on two distinct datasets highlight its potential to generalize.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is written very well, with a clear problem formulation, methodology, and system architecture description.
    • The supplementary material is very helpful; the pseudo-code is particularly important in ensuring the reproducibility of the study
    • The level of detail included in the method description and experiments is excellent; very easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I am quite worried about the practical relevance of this work, especially given the motivation statement by the authors: “…existing landmark detection algorithms are built on the assumptions of complete and centralized training data.”:

    1. A simple search online “landmark detection incomplete” yields as a 3rd find this paper from 2017: https://link.springer.com/chapter/10.1007/978-3-319-66182-7_23 In this paper, and a subsequent journal version, the problem of landmark detection in incomplete data is analyzed at length and a robust solution with high accuracy is proposed. I am worried that this article, and other relevant articles that have studied the same problem, are not referenced or discussed in this study.
    2. While the “continual” aspect is quite interesting from a theoretical/technical point of view, I am wondering about the practical relevance. In practice, for this landmark detection problem, retraining a single model with an updated dataset, or using individual models per landmark is more than sufficient - especially when considering productization and regulatory release.

    My recommendation is that a thorough literature analysis be performed. The paper, and it contributions, should be positioned in light of these past studies and the mentioned practical consideration.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    There is a high level of detail in the methodology. In particular, the pseudocode in the supplementary material is very helpful to understand in detail and reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please see my comment about the opening statement of the paper that motivates this work: “existing landmark detection algorithms are built on the assumptions of complete and centralized training data.”: I believe it is very important to revise this, considering related work in this space and by studying the practical need for a solution that supports continual learning. This would allow a clear reformulation and articulation of the contributions of this work.
    • Have any evaluations been performed on 3D data? How are the dynamics in 3D space, would the method easily generalize - or are there new challenges that arise?
    • How is the effect of multi-scale processing?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see my main review. The paper proposes an elegant method for landmark detection in incomplete data, with support for continual learning. Paper is very well written, level of detail is high. I am, however, concerned about: 1) Highly relevant works that address the same problem and are easy to find online are not discussed, nor referenced; 2) The continual learning aspect is in my opinion not important for such landmark detection application, considering how solutions are built, released and deployed to clinics. Please see my review for more details.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I would like to thank the authors for the clarifications!

    Unfortunately the rebuttal is missing clear action items to address the points highlighted in the review, e.g., a proper review of literature on landmark detection in incomplete data, re-positioning the contributions of the paper in that context, and a more thorough argumentation on why continual learning is needed.

    In this form, the paper references one related work [2], virtually missing a related work section. This should be addressed, considering prior work based on incomplete, centralized data. The main statement of the paper is still misleading: “…existing landmark detection algorithms are built on the assumptions of complete and centralized training data.” Instead it should perhaps read: “…existing landmark detection algorithms are built on the assumptions of complete or incomplete and centralized training data.”

    While I very much appreciate the (elegant!) approach, I am not convinced by the actual practical relevance of the continual learning component. The authors argue that it is because of the sequential nature of medical images, which is a fair argument; but from a practical point of view, is this really the case for the problem of landmark detection? Memory should not be an issue, with perhaps 10-20MB per model per landmark. In terms of the privacy concerns, from a practical point of view deleting the data while keeping it in the “model memory” to deploy will not solve the data privacy problem. If we were in the context of federated learning, this would change the dynamic of the paper / and this discussion, but this is not the case. In the current form it seems more an artificial problem, adding an entire dimension of complexity, and I am not convinced it is needed based on my experience. The paper should at least openly discuss this aspect.



Review #2

  • Please describe the contribution of the paper

    This paper seeks to implement a reinforcement learning framework for situations where either the training images do not cover the entire region of interest, the set of landmark labels is not known a-priori, or both. The authors implement a model comprised of two core modules: the Context-guided Multi-target Q-learning (CgMtQ) network, which uses an RL agent to explore trajectories, and a Context Memory Replay Mechanism (CMRM), which stores a library of previous prompts, helping to prevent the issue of catastrophic forgetting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors present a novel and interesting approach to tackling the challenge of reinforcement learning for landmark detection on sparsely labelled data. Their proposed method is complex, but rigorously defined, and appears to be highly effective compared to the ablated models tested. The authors effectively demonstrate the system’s ability to avoid catastrophic forgetting, by comparing model performance on a test set with increasing number of tasks. They report the ADE (Average Distance Error) for each of 6 tasks, which the compared models were trained on sequentially. Their method (CgMtQ+CMRM) exhibits low ADE across all 6 tasks, while the ablated methods only show good performance on the final tasks, thus effectively demonstrating the issue of catastrophic forgetting, as well as the proposed solution’s ability to avoid it.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the proposed method does appear to be effective, I do have a couple of issue with it.

    The main issue I have with the paper is that the proposed system is very complex and the explanations provided into how it functions are inadequate. Of all the papers I was assigned, I definitely had to spend the most time with this one, and even now I do not fully understand some aspects of the algorithms used. The training and test stages have very different pipelines and the algorithms at times seem a bit convoluted. Figure 2 attempts to illustrate the entirety of the CgCRL system, consisting of both CgMtQ and CMRM modules running in both training and testing phases. It simply contains too much information for one figure and there is not enough explanation to go along with it. It is not clear what is being shown as many of the processes in the figure are not fully explained by the caption or supporting text.

    Some details of the algorithms were stated without adequate justification for their inclusion. Specifically, I found the use of “pseudo-prompt patches” to be confusing and not adequately explained. Also, the decision to paste prompt patches into the training data at random coordinates seems to me like it would confuse the RL agent by exposing it to nonsensical anatomy. I would have appreciated an ablation study here to prove my intuition wrong, or at least a comment about why it works.

    Very little information was provided on how the ablated models were trained, and I suspect that the low performance shown by ResNet/DenseNet had more to do with deficiencies in the training procedure than a fundamental shortcoming in their ability to deal with landmark data.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The supplementary materials contain in-depth descriptions of the logic used for both the training and test phases of the system, although some of the details around model training are missing (hyperparameters). More details in how the ablated models were trained is also required.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The main change that I believe would improve this paper is to reduce the complexity of the algorithms used. If the authors can come up with a way to reduce the complexity, specifically of the testing phase, and have it maintain similar performance, that would be ideal. If that is not feasible or impractical to do in the given time frame, I would suggest working on the explanations given for the algorithms used. Specifically, I would like to see Figure 2 more thoroughly explained. The authors may also consider removing any non-vital information/process from this figure. I think some of the design decisions need further justification, specifically the pasting of random crops inside the training data. I would also suggest providing more information in how the ablated models were trained and why they under-performed so dramatically. I realize that I am asking you to add a lot of text to already full 8-pages, which is where again I would suggest that you find some complexity to cut.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe the algorithms used in this paper are too convoluted to be widely useful. I am suspicious of the robustness of their method, and therefore the overall usefulness of this system.

    The work is interesting and novel however, and I think if the complexity can be reduced, or the explanations can be made more clear, than the paper would be accepted.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors did a good job responding to my questions/concerns. I am happy to change my recommendation to “Weak Accept”. Reading through the pseudo-code in the supplementary materials did help my understanding of the algorithmic complexity.

    I would still like to see simplifications made to Figure 2, and the supporting text to explain it more clearly. I also still recommend removing/simplifying any implementation details that can be spared.



Review #3

  • Please describe the contribution of the paper

    The paper introduces a continual reinforcement learning approach aimed at addressing the challenge posed by incomplete images and annotations. The method is designed to utilize the local region of a landmark as prompts, continuously updating these prompts based on image similarity as the memory buffer reaches its capacity.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper tackles a common challenge encountered in AI-driven medical imaging applications: the presence of numerous incomplete images and annotations in the training data. The concept of the prompt library introduced in the proposed Context Memory Replay Mechanism is intriguing and has broader applicability beyond landmark detection, extending to various continual learning problems.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The primary weakness of the paper lies in the insufficient data available for model training and evaluation. Despite the authors’ utilization of two public datasets, the overall number of images remains limited.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors should consider adopting an encoding based measure for image similarity, a pivotal element of the prompt library. For instance, they could integrate a pre-trained image encoder such as VQ-VAE for similarity measure. Such approaches are likely to capture image features more effectively than PSNR.

    Additionally, as the method is designed for general landmark detection, the authors might consider applying it to other landmark detection applications with more publicly available data, in order to showcase the generality of their approach.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As highlighted in the main strength section, this paper introduces a novel solution to a common challenge in many medical imaging applications. Moreover, the contextual prompt and memory replay mechanism presented in the paper hold promise for addressing another important problem: continual learning. Despite the limited data, this paper should be published as it addresses two significant topics in medical imaging AI learning.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The rebuttal addresses the feedback adequately.




Author Feedback

We appreciate the reviewers’ thorough and insightful feedback on our paper. Below, we address the concerns and misunderstandings raised, grouped into key themes for clarity. Q1: Regarding the relevant works (R1).

  • Research on landmark detection(LD) with incomplete data already exists, but these studies are all based on centralized data. To the best of our knowledge, this work is the first to study LD problems in complex scenarios involving both incomplete and sequential data.

Q2: Regarding the practical relevance of the proposed method (R1).

  • (On retraining) Our method is designed to handle the sequential nature of clinical data collection, which is critical for real-world applications, ensuring robustness and adaptability without retraining from scratch. Due to the medical data privacy concerns and storage capacity limitations, old data inevitably cannot be fully preserved when new data is continuously acquired. As a result, there might exists the domain shift problem between new and old datasets. Retraining the model under such conditions can easily lead to catastrophic forgetting, that is, losing the knowledge learned from old samples. -(On the Individual model per task) First, using individual model for each landmark leads to an increase in model size. In contrast, the method proposed in this paper can be applied to all landmarks with only one model, which is more efficient. Secondly, in the example presented in Fig.1 (b) of this paper, the images used during the training phase are incomplete, while the ultimate goal is to perform LD on complete images. During the testing phase, the landmark indices in different tasks may overlap. Therefore, there could be multiple predictions for one landmark. If multiple models are trained, post-processing such as label fusion is inevitably required.

Q3: Regarding the complexity and explanation of the method (R3).

  • We have described the details of the CMRM module in Section 2.2. Besides, we have also provided a brief description of how CgMtQ and CMRM are combined in Section 2.3. Due to space limitations, the pseudocodes of how CgCRL works for training and testing stages are presented in the supplementary material, which shows the differences in details.

Q4:Regarding the use of “pseudo-prompt patches” (R3).

  • CgMtQ captures the local texture features surrounding the target landmark. When locating several landmarks, CgMtQ searches for each of them separately, without utilizing the structural information between landmarks. Therefore, CgMtQ focuses on the texture features matching the prompt input rather than the location of the target. To enable the model to better learn texture characteristics rather than location information, pseudo-prompt patches should be placed randomly.

Q5: Clarity on Ablated Models, i.e., ResNet and DenseNet and the Training Procedures (R3), data sufficiency for model training and evaluation(R4).

  • We ensured that the models, including ResNet and DenseNet, were trained until convergence. The low performance might be due to the small size of the training set. As a regression-based landmark detection method, the training process of ResNet and DenseNet implicitly embeds the whole shape information between landmarks into the model. With a limited training set, there may be a domain shift between the test and training sets. If the test set contains images with shapes deviating from those in the training set, it could lead to significant errors(see Fig.2 in the supplementary material). In contrast, as a reinforcement learning method, CgMtQ processes dynamic image patches, enabling quick learning of local landmark features, which is advantageous when data availability is limited.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The practical relevance of the work is questionable, as the authors have not adequately addressed the existing literature on landmark detection in incomplete data and the necessity of continual learning for this problem. The proposed system is overly complex, and the explanations provided are inadequate, making it difficult to understand the algorithms used and their justification. The paper lacks sufficient data for model training and evaluation, which undermines the credibility of the results.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The practical relevance of the work is questionable, as the authors have not adequately addressed the existing literature on landmark detection in incomplete data and the necessity of continual learning for this problem. The proposed system is overly complex, and the explanations provided are inadequate, making it difficult to understand the algorithms used and their justification. The paper lacks sufficient data for model training and evaluation, which undermines the credibility of the results.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I think the paper can be accepted. The authors can do better to improve again the quality of the paper based on the reviewers’ feedback. also, the authors can represent the results in graphical form for a good overview.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I think the paper can be accepted. The authors can do better to improve again the quality of the paper based on the reviewers’ feedback. also, the authors can represent the results in graphical form for a good overview.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers and ACs raise some valid concerns that should be addressed in the final version. However, the merits outweigh the limitations and the paper was considered to make a valuable contribution to the conference.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers and ACs raise some valid concerns that should be addressed in the final version. However, the merits outweigh the limitations and the paper was considered to make a valuable contribution to the conference.



back to top