Abstract

Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the `normal’ data distribution. At inference, they aim to segment any pathologies in new images as ‘anomalies’, as they exhibit patterns that deviate from those in ‘normal’ training data. Prevailing methods follow the ‘corrupt-and-reconstruct’ paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned ‘normal’ distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code will be published at: https://github.com/ZiyunLiang/IterMask2

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1365_paper.pdf

SharedIt Link: https://rdcu.be/dZxdJ

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_32

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ZiyunLiang/IterMask2

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Lia_IterMask2_MICCAI2024,
        author = { Liang, Ziyun and Guo, Xiaoqing and Noble, J. Alison and Kamnitsas, Konstantinos},
        title = { { IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {339 -- 348}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel framework for medical image segmentation. This model adopts an iterative scheme to refine the model output to overcome the sensitivity-precision trade-off problem in the generation-based segmentation framework. Also, a frequency map is added to introduce the high frequency map during the training process.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The iterative scheme is novel and can compensate the trade-off problem of unsupervised segmentation frameworks.
    2. The experiments are well-designed and conducted on multiple datasets and compared with multiple methods.
    3. Several analysis are presented clearly to demonstrate the better performance of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The reason for the addition of frequency mask is not clearly stated, and the novelty of this module is relative minor since it is widely used in many computer vision tasks.
    2. The hyper-parameters of the network is not clearly stated and the training process of the separate U-net for the first mask generation is not clear.
    3. Since the network is 2D-Based, the segmentation consistency between slices are not considered in the framework .
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The author indicates in the paper that the code will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The explanation for low-frequency filtering is needed, i.e. why keep the high-frequency mask will help the AE to reconstruct the image?
    2. The hyper-parameters of network training is preferred to be included in the experiment section.
    3. Why another U-Net is necessary for the first step mask generation instead of using the brain mask directly? Also, how to training this network needs to be explained.
    4. The discontinuity of the segmentation masks of different slices is not considered, any smoothing constraint is considered?
    5. In Figure3, the best-performance (~0.77) in ablation study for the same modalities seems does not match the results presented in Table 2 (0.802).
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel method to accomplish the unsupervised segmentation tasks. The proposed iterative mask refinement process is novel for resolving the trade-off problem. The experiments are well-designed and discussed. The performance of this model surpasses multiple previous methods in different datasets.

    However, the intention of the use of the frequency mask is not clearly stated. The details of the network structure is not presented. Also, since the proposed method is a pure 2D framework, which might cause information loss and discontinuity problem compared with 3D methods.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors have developed an unsupervised anomaly segmentation approach that amplifies the reconstruction error in anomalous regions while simultaneously reducing the reconstruction error for normal areas without prior knowledge about the anomaly. The two main ideas introduced are: the use of high frequency image content as an additional input to guide the reconstruction of masked areas; and iterative anomaly spatial mask refinement during inference to remove areas with low reconstruction error to reduce false positives and improve confidence in segmenting anomalies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Approach is well motivated: uses a UNet similar to denoising auto encoder for better reconstruction of finer details and masking with gaussian noise but providing high frequency content to aid the reconstruction of masked area. Evaluation on different pathologies (primary brain tumors and stroke lesions) on three datasets. Thorough comparison with SOTA methods and ablations studies to understand the importance of spatial and frequency masking.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No information of data splits at the patient level is provided (models trained on healthy 2D slices and segment pathologies at test time). Unclear if the reported segmentation metrics (DSC, sensitivity and precision) are at the lesion level or patient level.
    Two separate networks needed; one to handle the first iteration as a special case (UNet to reconstruct the image from high-frequency input) and a second UNet for anomaly segmentation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Some implementation details are missing

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please provide details on the data splits at the patient level and how the evaluation metrics for segmentation were estimated (at the slice vs lesion vs patient level; assuming lesion level based on the breakdown in Table 3).

    Recommend providing the rationale for choosing the train set to be the healthy 2D slices of the three datasets in consideration when multiple healthy brain MRI datasets are available publicly.

    Consider providing the inter quartile range for the metrics, SSIM specifically. The split into small, medium and large lesions provides initial insights on the variation of segmentation metrics but their size limits is not provided.

    The authors mention the sensitivity-precision trade-off as a main concern for diffusion models. For the proposed approach, corruption is by masking and a discussion on sensitivity-precision trade-off in this setting is lacking (which areas are hard to reconstruct even in the absence of anomalies and how much masking is too much to affect the reconstruction quality).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach with the two main contributions are novel. It improves upon SOTA approaches in reducing false positives and improving anomaly segmentation. Only concern is that the hyper-parameters (mask shrinking threshold and frequency mask radius) may need fine-tuning for specific datasets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents IterMask2, a novel unsupervised anomaly segmentation method that uniquely combines spatial and frequency masking techniques. The method involves distorting the input with spatial masks and reconstructing the masked areas using a model trained exclusively on normal data. It iteratively refines these masks to identify areas with low reconstruction error, indicative of normality, thereby improving the reconstruction accuracy of neighboring normal areas, reducing false positives, and focusing the mask on anomalies. Additionally, the approach incorporates structural information through high-frequency image components to enhance the reconstruction of normal areas.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.Exceptional Clarity and Contextual Detail: The paper excels in providing clarity and detailed context, which facilitates easy comprehension of the conceptual merits of the work. The novel aspects of the methodology are described with precision, making it straightforward to understand their significance and function within the broader problem space.

    1. Well-Designed Illustrative Figures: The design of the figures in the manuscript is outstanding. These figures complement the textual descriptions effectively, enhancing the reader’s understanding of the proposed framework and illustrating complex concepts clearly and engagingly.

    2. Robust Performance and Comprehensive Evaluation: The paper demonstrates strong performance metrics, underscoring the efficacy of the proposed method. A detailed evaluation is provided, which meticulously assesses the novel approach across multiple brain lesion datasets, substantiating its effectiveness and potential for broader application.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This manuscript is free of major weaknesses.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper provides detailed technical descriptions that aid in reproducibility efforts, which is commendable. However, the absence of accompanying code limits these efforts. Providing a well-documented codebase would not only enhance reproducibility but also foster collaboration within the community, further solidifying the impact of this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the manuscript is robust, with no significant weaknesses. However, there are a few areas where clarity could be enhanced:

    Introduction - Related Work Section: The latter half of the second page, particularly the ‘Related Work’ section, could benefit from a clearer structure. Currently, the flow seems disjointed compared to the rest of the document. I recommend reorganizing this section into well-defined paragraphs. Additionally, please clarify ambiguous references, such as in the sentence, ‘In that method, the initial mask…’. It is not immediately clear whether ‘that’ refers to your method or to previously discussed work.

    Results Section - Clarification on ‘Best Threshold’: In the results section, a more detailed explanation of what constitutes the ‘best threshold’ for IterMask2 would be helpful. Specifically, it would be useful to know whether the ‘Initial IterMask2’ represents the best threshold as determined during training, or if it reflects an average general threshold applied on test data.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is very well-written, offering a clear and detailed explanation that makes it easy to understand the key concepts and the significance of the work. The diagrams and figures are expertly crafted, greatly aiding in the understanding of the proposed framework. Additionally, the technical details are thoroughly outlined, and the results are both strong and well-examined. These qualities make this paper an outstanding example of what merits a strong accept recommendation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We want to express our sincere gratitude for the reviewer’s positive feedback and appreciation of our work. Your insights serve as a motivating affirmation of our efforts. And we thank the reviewers’ valuable suggestions in helping improve the paper.

  1. Due to the space limit, we couldn’t include all training details in the paper, but you can find all the information about the model, training process, and dataset preprocessing from the code link we provided. We will include the size of small, medium, and large lesions in Tab.3 to the camera-ready paper. Regarding training and testing data, we evaluate our method using 2D models, as commonly done in previous works to simplify experimentation. For this purpose, from every 3D image, we extract slices 70 to 90, that mostly capture the central part of the brain. For model training, from the slices extracted from the training subjects, we only use those that do not contain any tumors. For validation and testing of all compared methods, from each validation and test subject, we use the slice that contains the largest tumor area out of the 20 central slices. Adam optimizer is used and learning rate is set to 1e-4. We train the model for 80000 iterations. The training of the Unet for the first step is the same as the main model, and the only difference between these two models is in the model structure, where this model for the first step has only one input channel. Our models are currently trained and tested only on 2D slices, but we have a 3D version coming soon!

  2. Regarding what is the best threshold with shaded gray cells in Table1,2,3, it is explained in the ‘Human-AI Collaboration’ section. The results with the best threshold for IterMask2 are obtained by determining the optimal threshold for each image during the iterative process, demonstrating the best performance the model can achieve through human-AI interaction. While the non-colored cells for IterMask2 show the result using the same threshold for the entire dataset (the threshold comes from the healthy validation set’s error map). For baseline methods, the gray-shaded best threshold refers to the optimal threshold per-image when computing the Dice score from the final error map for a fair comparison. And the non-colored cells for baselines use the same threshold for the test set which achieves the maximum dice.

  3. We also want to clarify that the sensitivity and precision trade-off we mentioned applies not only to diffusion models but to all anomaly-segmentation methods that involve initial distortion of information followed by the reconstruction or regenerating of the missing information. In these cases, more distortion amplifies the reconstruction errors of anomalies, thereby improving segmentation. However, it also increases the reconstruction error of normal areas, leading to false positives. And this trade-off serves as the main motivation for our work.




Meta-Review

Meta-review not available, early accepted paper.



back to top