Abstract

To achieve superior performance, deep learning relies on co- piousness, high-quality, annotated data, but annotating medical images is tedious, laborious, and time-consuming, demanding specialized expertise, especially for segmentation tasks. Segmenting medical images requires not only macroscopic anatomical patterns but also microscopic textural details. Given the intriguing symmetry and recurrent patterns inherent in medical images, we envision a powerful deep model that exploits high-level context, spatial relationships in anatomy, and low-level, fine- grained, textural features in tissues in a self-supervised manner. To realize this vision, we have developed a novel self-supervised learning (SSL) approach called ASA to learn anatomical consistency, sub-volume spatial relationships, and fine-grained appearance for 3D computed tomography images. The novelty of ASA stems from its utilization of intrinsic properties of medical images, with a specific focus on computed tomography volumes. ASA enhances the model’s capability to learn anatomical features from the image, encompassing global representation, local spatial relationships, and intricate appearance details. Extensive experimental results validate the robustness, effectiveness, and efficiency of the pretrained ASA model. With all code and pretrained models released at GitHub.com/JLiangLab/ASA, we hope ASA serves as an inspiration and a foundation for developing enhanced SSL models with a deep understanding of anatomical structures and their spatial relationships, thereby improving diagnostic accuracy and facilitating advanced medical imaging applications

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0271_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0271_supp.pdf

Link to the Code Repository

GitHub.com/JLiangLab/ASA

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Pan_ASA_MICCAI2024,
        author = { Pang, Jiaxuan and Ma, DongAo and Zhou, Ziyu and Gotway, Michael B. and Liang, Jianming},
        title = { { ASA: Learning Anatomical Consistency, Sub-volume Spatial Relationships and Fine-grained Appearance for CT Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel method, named ASA, improving the segmentation performance of CT images by learning high-llevel context, spatial relationships in anatomy, and low-level fine-grained details in a self-supervised manner. The techniques used to achieve are student-teacher training strategy, 3D sub-volume order prediction, volume appearance recovery, and aligning local and related two views.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    State-of-the-art accuracy: The proposed framework outperforms the previous SOTA in segmenting a majority of organs. Efficiency: The method is generalizable and performs well under training-efficient scenarios such as linear probing, outperforming other methods like Swim UNETR and SimMIM. Additionally, the authors conducted traning with various number of training data, validating the performance of the method. Multi-faceted approach: The method leverages various method into one coordinated approach: global and local features, anatomical consistency, spatial relationship between sub-volumes, self-supervised learning under limited data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Complicated method: the method is composed of various methods and several training stages, making it less simple. Limited discussion of the problem: The paper provides a limited explanation to why it is important to learn high-level global features, local-level embeddings, and contextual relationship features. Similarly, the need for learning anatomical consistency is not fully discussed.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper provides detailed information about the method, losses, architecture, dataset details and split. Some parts of the method may seem difficult to understand, so detailed attention and effort is needed to understand it in detail. While the codes are not provided during review, the authors indicated that it will be provided upon acceptance of the submission.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Lack of coherent method explanation: While the proposed methodology is innovative and has demonstrated remarkable performance, comprehending the intricate details of the approach remains an arduous task. It is advisable for the authors to revise the methodology section, including Fig. 2, to enhance coherence and facilitate effective communication between the text and the related figures.

    For future work, I would recommend to try to simplify the method while still addressing the same problems.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend the weak accept based on the lack of clarity and coherence in the explanation of the methodology of the paper. A thorough revision of the methodology section and associated figures to enhance comprehensibility would significantly improve the quality of the manuscript and render it suitable for publication.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces a novel pretraining method called ASA for the transformer-based deep learning architecture SwinUNETR used for medical image segmentation of medical CT volumes. The authors compare ASA with 2 other state-of-the-art pretraining methods and demonstrate that the ASA pretraining beats the other methods at downstream image segmentation tasks by small margins of <0.05 dice scores.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This study introduces a sophisticated pretraining methodology comprising two phases, two jointly trained models, and the integration of multiple loss functions. Using an ablation study, they demonstrate the benefits of using the several stages of pretraining. The study also validates their methods on 3 different CT medical image datasets, apart from the dataset used for pretraining.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The authors do not describe clearly which datasets were used for tuning the hyperparameters of the different losses used. It is also not described how the hyperparameter tuning was performed. This makes it hard to assess potential data leakage in the reported results, which can be a common error while training complex deep learning models with several competing losses. 2) Motivation for the complex pretraining steps is not clear. The ablation study does not compare only using phase 1 training or only using phase 2 training. Why is the rational behind using a 2 phase alterning pretraining and so many loss functions? If these decisions were made to maximize the downstream performance then it should be made clear that there was no data leakage (point 1) across the data used for validation and the data on which the final results are reported. 3) The organization of the paper and the structure is confusing. For example, the experiment design, results, and discussion of the results are described together.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    P

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Please provide information about which dataset was used to determine the hyperparameters such as the components of the different pretraining loss. 2) A high-level discussion contrasting the benefits and limitations of your pretraining method is missing. 3) The terminology “teacher-student” training may lead to confusion as it typically denotes knowledge distillation, where a larger teacher model transfers knowledge to a smaller student model. Given that both networks in your study are of equal size and trained jointly to maximize representation similarity, they align more closely with contrastive training methods. 4) A thorough proofreading is recommended to address grammatical errors, such as those found in Figure 2 caption and Supplementary Table 1. Additionally, strive for precision in language usage, avoiding overly grandiose phrases like “intriguing symmetry” or “model comprehends high-level features.”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novel pretraining method proposed provides a meager improvement on downstream image segmentation tasks at the cost of performing a complex multi-step multi-model pretraining. The structure and formulation in the paper also makes it harder to critically assess whether all steps were taken to prevent data leakage. In such an analysis there is a risk of cherry picking and reporting only the best model variation. However, if our comments have been sufficiently addressed in the rebuttal then this extensive work can be accepted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a new pretraining scheme using spatial information. In an alternating mode, the representation is learned using either the order of slices or relation between small patches. The quality of the pretraining is then evaluated using multiple downstream tasks and different experiments showing an improved performance compared to other pretrainings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-evaluated, with a straight forward approach that can be implemented into other approaches as well. While the proposed approaches are not surprising by itself, the proposed combination is a strong contribution to the state of the art. Furthermore, the resulting models will be made available and could serve as additional foundation models.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While having sufficient approaches for comparison, the authors did not include other state of the art pretraining methods such as SimCLR which would have been nice. Also the complete workflow is depicted in Fig. 2 but this figure is hard to understand.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Given that the code is of good quality, I see no reason why the paper should not be reproducible. The method is described sufficiently and the datasts are open.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please provide additional details on how the hyperparameter were tuned. Both during pretraining and during the downstream tasks.
    2. Please report some measure of uncertainty along numerical comparisons. I missed them especially in Table 1.
    3. This might be not possible for MICCAI, but I strongly advice the inclusion of other pretraining methods as comparison. Using the pretrained models available for SwinUNETR is a first start but having other methods checked too would increase the realibility of the results.
    4. Please report the computing resources required for this paper. Given the long time necessary for some pretraining methods, this could be significant.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like this paper and I am drawn between “Accept” and “Weak Accept”. I found no real flaw so I tended to the first one. A major limitation are the missing references and the rather straight-forward pretraining methods which use existing ideas.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank your insightful feedback and constructive criticism and are excited about the early acceptance, for which our responses are optional. Nevertheless, we are committed to submitting a significantly improved camera-ready version that addresses all critiques. This includes new tables for pretraining and finetuning details, computational reports, standard deviations, explanations of baseline selections, clarifications of our setup, refined motivations and methods, and grammatical corrections for clearer presentation. R1: W:We briefly compare ASA with contrastive learning methods like SimCLR in the introduction. Moreover, ASA is extensively compared with, and shown to outperform, the SwinUNETR pretrained model, which incorporates contrastive learning as one of its three learning tasks. C:(1,4)We will add a new table that details the pretraining and finetuning setups, along with computation consumption, to enhance reproducibility.(2)Beyond the 5 supervised and 2 self-supervised learning methods in Tab.1, we will explore more methods in future work. We chose to compare ASA with SimMIM and SwinUNETR due to their SoTA performance in the SSL domain.(3)The supervised baseline performances reported in Tab.1 established by Liu et al.(arXiv:2301.00785), were presented without standard deviations, we did so to maintain consistency in table format. However, we will add a new table in the appendix that provides these details from multiple runs. R3: W&C:Please see our motivation,Fig.1,and research question. Our method is driven by consistent patterns typical in medical images, often showing significant similarities due to the same imaging protocol. Our method leverages this uniformity to learn high-level anatomical structures and contextual relationships. To boost segmentation performance, the model needs to learn fine-grained features. Subsequently, it was found that training on local-level embeddings helps stabilize the pretraining process and boosts downstream performance further. Thank you for the suggestion to simplify the method. We’re developing a new approach that integrates 2 learning stages and accommodates various body structures. R4: W:(1)For the 3 main results, ASA is pretrained on the AMOS22 dataset’s 240 training split for a fixed number of epochs, choosing the checkpoint with the lowest loss. Due to the target dataset for Tab.1 and Fig.3 being different from pretrianing, data leakage is not a concern. For the label-efficient experiment in Fig.4, ASA is both pretrained and finetuned on a subset of the 240 training split, but assessed using the non-overlapping 120 testing split, ensuring no data leakage during evaluation.(2)In our ablation study, we attempted training solely in phase 1 by adding T_gc into setup 3, but this caused the teacher model to collapse, leading to poor performance. Pretraining exclusively with phase 2 does not align with the objectives we outlined for R3. Nevertheless, to benefit the community, we will include results from p1 and p2 only in the next revision.(3)In the 3 main results, we divided each into experimental setup and result analysis to accommodate various target tasks and setups. Due to page limit, we changed the format for ablation studies, whose clarity will be enhanced in the next revision. C:(2)Besides the somewhat intricate alternative pertaining, another limitation of our framework is the need for recurrent anatomy structure for the same image protocol, due to reliance on absolute position codings prediction, 1D or 3D.(3)Teacher student training approach, inspired by Antti et al(arXiv:1703.01780), Caron et al.(arXiv:2104.14294), and Ma et al.(arXiv:2310.09507), features both models sharing the same architecture without prior knowledge for the teacher. The teacher model is constructed from previous iterations of the student model, aiming to stabilize the pretraining and enrich knowledge accumulation from various perspectives. (4) We will proofread and fix all typos/grammatical issues.




Meta-Review

Meta-review not available, early accepted paper.



back to top