Abstract

Pathological structures in medical images are typically deviations from the expected anatomy of a patient. While clinicians consider this interplay between anatomy and pathology, recent deep learning algorithms specialize in recognizing either one of the two, rarely considering the patient’s body from such a joint perspective. In this paper, we develop a generalist segmentation model that combines anatomical and pathological information, aiming to enhance the segmentation accuracy of pathological features. Our Anatomy-Pathology Exchange (APEx) training utilizes a query-based segmentation transformer which decodes a joint feature space into query-representations for human anatomy and interleaves them via a mixing strategy into the pathology-decoder for anatomy-informed pathology predictions. In doing so, we are able to report the best results across the board on FDG-PET-CT and Chest X-Ray pathology segmentation tasks with a margin of up to 3.3% as compared to strong baseline methods. Code and models are available at github.com/alexanderjaus/APEx.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1464_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1464_supp.pdf

Link to the Code Repository

https://github.com/alexanderjaus/APEx

Link to the Dataset(s)

https://github.com/alexanderjaus/AtlasDataset https://www.cancerimagingarchive.net/collection/fdg-pet-ct-lesions/ https://github.com/Deepwise-AILab/ChestX-Det10-Dataset https://github.com/ConstantinSeibold/ChestXRayAnatomySegmentation/tree/main

BibTex

@InProceedings{Jau_Anatomyguided_MICCAI2024,
        author = { Jaus, Alexander and Seibold, Constantin and Reiß, Simon and Heine, Lukas and Schily, Anton and Kim, Moon and Bahnsen, Fin Hendrik and Herrmann, Ken and Stiefelhagen, Rainer and Kleesiek, Jens},
        title = { { Anatomy-guided Pathology Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a framework to combine pathological features and anatomical features to jointly learn and segment pathological mask and anatomical mask using a joint transformer architecture.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors proposed a way to use transformers to jointly use pathological and anatomical features to jointly segment pathology and anatomy.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty is limited. The paper utilizes transformer which is similar to mask2former to utilize pathology and anatomy.
    2. The paper is not written properly. Please reformat the paper as it has a lot of mistakes in the sentence formation.
    3. Lack of experimental comparison. Please compare with other SOTA transformer based architectures. nnUNet is pretty old, please use Swin-Unetr architecture atleast to compare.
    4. Ablation study is very limited. There is no experiments to show why the specific component is necessary for the architecture.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Lack of novelty. The paper utilizes mask2former based architecture to use pathological features and anatomical features.
    2. Lack of experiments. The model does not compare with recently proposed segmentation methods.
    3. Lack of ablation studies. There is no experiments to show that a particular component is necessary.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Reject — must be rejected due to major flaws (1)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Lack of novelty
    2. Lack of experiments. The model does not compare with recently proposed segmentation methods.
    3. Lack of ablation studies. There is no experiments to show that a particular component is necessary.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Thanks for feedback from the author. From the Author’s response, it seems that Attention-Unet or Unet++ can not be directly compared. I still feel that we can still use both Xray and CT on UNet++ and Attention-UNet atleast separately. I think it is important to compare with recent segmentation SOTA methods.



Review #2

  • Please describe the contribution of the paper

    The paper addresses the question of whether explicitly learned human anatomy can improve a model’s capability to predict pathological structures. Therefore, the authors propose an anatomy-pathology exchange strategy for jointly learning both anatomy and pathology. In this method, the embeddings for anatomy and pathology segmentation are shared. In the transformer-based decoder, anatomy query vectors are used to influence pathology query vectors within the feature mixer. Several mixing designs are implemented, showing improvement in pathology segmentation compared to other baseline learning designs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The anatomy-guided strategy is interesting and beneficial for the community, as it incorporates additional anatomical knowledge into the pathology segmentation task.
    2. The different combination designs of the proposed query mixing strategy all show improvement in pathology segmentation in both the semantic segmentation and instance segmentation paradigms.
    3. The qualitative results further demonstrate the enhancement and correct localization of pathology segmentation when anatomy knowledge is involved.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There is no quantitative comparison of model parameters and GPU memory for training across different baseline methods.
    2. There is no visualization to further illustrate or interpret the clinical observations mentioned in the introduction, such as “a fracture has to be associated with a bone structure or that tumor locations often correspond to anatomical regions”, since the supervision here is not direct.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. It would be interesting to see an analysis of the relationship between anatomy vectors, pathology vectors, and the similarity before and after the mixing process from some cases to show that ‘anatomically related’ cases did improve by using anatomical knowledge.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a knowledge fusion of both anatomical and pathological knowledge to improve pathology segmentation in both semantic and instance segmentation tasks. The findings are interesting, and the results are promising for infusing auxiliary knowledge into the learning process.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author provided more details regarding my concerns, which led me to maintain my rating as the final rating.



Review #3

  • Please describe the contribution of the paper

    The paper introduces APEx, a novel query-based framework for joint anatomy and pathology segmentation. Inspired by clinicians’ approach, APEx incorporates anatomical context to identify pathological tissues. It utilizes a multi-task transformer architecture with separate decoders for anatomy and pathology, exchanging information for improved segmentation. The method achieves promising results on two different datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (i) Clinically Grounded Motivation: The work draws inspiration from real-world clinical practice, where understanding normal anatomy is crucial for pathology detection. This focus on mimicking human cognition is a major strength and holds significant interest for the medical image analysis community.

    (ii) Novel Methodology: APEx presents a new query-based multi-task transformer architecture. It uses separate anatomy and anatomy-enriched pathology queries, effectively integrating anatomical knowledge into the segmentation process. This design is well-explained and is a creative adaption of masked transformer concepts, offering a new approach to incorporate anatomical priors.

    (iii) Systematic Evaluation: The paper showcases a well-structured evaluation strategy. It investigates various methods for incorporating anatomical priors and provides arguments for utilizing them more effectively. Additionally, the evaluation spans two diverse datasets (PET-CT and chest X-rays).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (i) Limited Baseline Comparison: A significant weakness lies in the absence of comparisons against state-of-the-art (SoTA) pathology segmentation algorithms (for example, TransUNet [1] and MedNeXT [2]). Furthermore, the baselines chosen for comparison (U-Net variants) are not comprehensive, particularly for the CheXDet dataset, lacking competitive methods like Attention-UNet [3] and UNet++ [4]. This omission weakens the evaluation’s strength and makes it difficult to assess APEx’s relative performance, especially as the boost reported is modest. Further, no statistical significance of results are provided.

    (ii) Incomplete Evaluation: The paper only reports results for anatomy incorporation methods and architecture ablations on the FDG-PET-CT dataset. Additionally, it solely presents validation scores, omitting test set performance, which hinders a more comprehensive evaluation. Certain APEx design components lack clear performance gains in ablation studies, raising questions about their actual contribution. For instance, the minimal difference between the asymmetrical and symmetrical designs (IoU of 59.56 vs 59.35) requires further investigation.

    (iii) Inconsistent Evaluation Metrics: The paper uses IoU for segmentation on the PET-CT data but uses mAP for the CheXDet radiographs. These choices deviate from standard metrics in medical image segmentation (e.g., Dice score, Hausdorff distance) and should be justified/reconsidered.

    (iv) Limited Results Analysis: The discussion on results is lacking. While the initial hypothesis is strong, the authors don’t explore into how incorporating anatomy affects segmentation (e.g., false positives, boundary accuracy). The analysis relies solely on quantitative numbers and a few qualitative examples in supplementary material.

    (v) Multi-Task Learning Limitation: The APEx framework necessitates both anatomy and pathology labels for training, which can be expensive and time-consuming to obtain in real-world settings. This is a general limitation of this approach, not a specific weakness of the paper.

    References: [1] TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306, 2021 [2] MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation, MICCAI 2023 [3] Attention U-Net: Learning Where to Look for the Pancreas, MIDL 2018 [4] UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation, IEEE TMI 2019

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    (i) The paper does not mention the availability of the source code. Releasing the code would greatly improve reproducibility and allow the community to build upon this work.

    (ii) The paper lacks details about: (a) the range of hyper-parameters explored, (b) the method used to select the final hyper-parameter configuration, (c) the exact number of training and evaluation runs performed and (d) the implementation and tuning procedures used for the baseline methods.

    (iii) The paper would benefit from an analysis of scenarios where the method fails. Understanding these limitations would be valuable for the community.

    (iv) A brief statement on the computing environment (hardware/software) used for the experiments would be valuable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper presents a promising approach with a strong clinical foundation. However, strengthening the evaluation and addressing the limitations mentioned above will significantly improve its contributions.

    (i) Include comparisons against SoTA methods (TransUNet, MedNeXT) and competitive baselines like Attention-UNet and UNet++ on both the datasets. This will provide a clearer picture of APEx’s relative performance.

    (ii) Report test set performance alongside validation scores for both datasets. Ensure all ablation studies are presented for both datasets to fully understand the impact of design choices.

    (iii) Consider using standard evaluation metrics like Dice score and Hausdorff distance for both datasets for better comparison with existing works.

    (iv) Analyze the impact of incorporating anatomical knowledge on segmentation performance. Explore how it affects false positives, boundary delineation, and other relevant aspects. Include a more comprehensive discussion of qualitative results alongside the quantitative data.

    (v) The use of pseudo labels for anatomy in the radiographs dataset is interesting. It would be valuable to explore how the performance is affected by the potential noise in these labels. Consider incorporating an additional dataset for future work to study the effect of pseudo labels versus manual annotations, such as tumor segmentation with LiTS and BTCV datasets for abdominal multi-organ and tumor segmentation.

    The following are some additional suggestions for future works.

    (i) Explore the applicability of APEx to other medical image segmentation tasks and to 3D images.

    (ii) Investigate alternative methods for incorporating anatomical knowledge that might be less reliant on readily available pathology labels.

    (iii) Analyze how APEx performs with weak supervision or semi-supervised learning approaches.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a novel and clinically motivated approach with a well-designed architecture (strength) but requires additional work for a more comprehensive evaluation and analysis (weakness). Overall, the strengths outweigh the weaknesses, and I recommend a (conditional) accept. If the authors address these weaknesses in the rebuttal by incorporating the suggested improvements, the paper has the potential to be a valuable contribution to the field.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have satisfactorily addressed the major weaknesses identified in the reviews. While there may be still some limitations, the approach is interesting and the core idea will likely be of interest to the community. I believe the authors would incorporate the suggested changes in the final version. Therefore, I recommend an accept.




Author Feedback

We thank all reviewers for their feedback, which we incorporate into the final version. We are glad that reviewers appreciate the relevance of integrating anatomy into pathology segmentation models (R1, R3, R4) and rate our work as interesting (R3) and novel (R4). Here, we kindly address concerns, misunderstandings and discuss your interesting propositions.

Comparison to SOTA (Attention Unet, Unet++, SwinUNETR) (R1, R4) In this work we focus on 2D processing, as we want to target both X-Ray imaging and CT scans with the same model architecture. Thus, we do not compare against 3D architectures such as SwinUNETR. The proposed experiments to evaluate the semantic seg networks Attention-UNet and UNet++ on the ChestXDet setting are not applicable. Here, we have to differentiate between instances of a class (like fractures) and thus fall under instance segmentation (cf. Tab 2). In this setting, we surpass the most competitive approach PointRend and the most recent MaskDINO by at least 2.06% mAP and a t-test (p <0.001) indicating a statistically significant improvement.

Paper Structure and Writing (R1) We aimed to structure the paper as comprehensively as possible (acknowledged by R3, R4). Contrary to R3/R4, R1 states the concern to “Please reformat the paper as it has a lot of mistakes in the sentence formation”. We would be happy to integrate any specific suggestions.

Incomplete Evaluation (R1, R4) We kindly disagree with the sentiment that our ablation studies are limited (R1), as we present a total of 14 different ablation experiments in Tab. 1. R4 notes that in the ablation studies our results “solely presents validation scores, omitting test set performance, which hinders a more comprehensive evaluation”. We adhere to a strict evaluation protocol by performing 5-fold cross-validation and developing our method exclusively on validation sets to avoid biasing test performance. Thus, we only evaluate once on the test set with the final model design. While interesting, the ablations on the X-Ray dataset would have entailed the training of an additional 70 models (14 experiments * 5 splits), beyond our computational resources. R4 rightfully notes that the asymmetrical and symmetrical designs exhibit quite similar performances. In this paper, we chose the slightly better performing design (and simultaneously reflecting the pathologies being the target of interest), but agree that further exploration into this would be exciting future work.

Evaluation Metrics (R4) The PET-CT AutoPet dataset, lacking instance-wise annotations, is treated as a semantic seg task. Following the recently published Metrics Reloaded [A] recommendations, we use an overlap-based Metric (IoU) and a boundary-based Metric (BIoU), shedding light on the performance on the boundary regions as suggested in the review. For instance segmentation on X-Ray, we report the recommended standard mAP metric. However, we acknowledge that reporting metrics which are more prevalent in the medical community (e.g. Dice) facilitates better comparability and we will follow the suggestion to add Dice to our work.

Limited Results Analysis (R3, R4) The suggestion to analyze the relationship between anatomy vectors, pathology vectors, and their similarity pre- and post-mixing (R3) is intriguing, thanks! Sadly, due to the limited space, we could only include qualitative visualizations on the most influential anatomical structures chosen by the Cross-Attention Query Mixer (Fig. 2) but are looking forward to a more detailed analysis in the future.

Computational Cost (R3) GPUs: 4 A100 40GB #Params: DeepLab: 40M, UNet: 32.5M, M2F: 44M, Apex: 58M

Code availability and reproducibility (R1, R3, R4) As stated in the abstract of the paper, we will release our models and the source code of our project, report the choice of hyperparameters and ensure reproducibility.

References [A] Maier-Hein et al. “Metrics reloaded: recommendations for image analysis validation.” Nature methods (2024)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Accepts

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Accepts



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Two reviewers recommend accept and one reject. Regarding the latter review, the rebuttal addresses misunderstanding regarding the lack of ablation study and offers compelling explanation for the choice of comparison methods.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Two reviewers recommend accept and one reject. Regarding the latter review, the rebuttal addresses misunderstanding regarding the lack of ablation study and offers compelling explanation for the choice of comparison methods.



back to top