Abstract

No studies have formulated endoscopic grading (EG) of gastric atrophy (GA) as a multi-label classification (MLC) problem, which requires the simultaneous detection of GA and its gastric sites during an endoscopic examination. Accurate EG of GA is crucial for assessing the progression of early gastric cancer. However, the strong visual interference in endoscopic images is caused by various inter-image differences and subtle intra-image differences, leading to confounding contexts and hindering the causalities between class-aware features (CAFs) and multi-label predictions. We propose a multilevel causality learning approach for multi-label gastric atrophy diagnosis for the first time, to learn robust causal CAFs by de-confounding multilevel confounders. Our multilevel causal model is built based on a transformer to construct a multilevel confounder set and implement a progressive causal intervention (PCI) on it. Specifically, the confounder set is constructed by a dual token path sampling module that leverages multiple class tokens and different hidden states of patch tokens to stratify various visual interference. PCI involves attention-based sample-level re-weighting and uncertainty-guided logit-level modulation. Comparative experiments on an endoscopic dataset demonstrate the significant improvement of our model, such as IDA (0.95% on OP, and 0.65% on mAP) and TS-Former (1.11% on OP, and 1.05% on mAP). \keywords{Multi-label Classification \and Causal Intervention \and Gastric Atrophy Detection.}

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3296_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/rabbittsui/Multilevel-Causal

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Cui_Multilevel_MICCAI2024,
        author = { Cui, Xiaoxiao and Jiang, Shanzhi and Sun, Baolin and Li, Yiran and Cao, Yankun and Li, Zhen and Lv, Chaoyang and Liu, Zhi and Cui, Lizhen and Li, Shuo},
        title = { { Multilevel Causality Learning for Multi-label Gastric Atrophy Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a method for multi-label gastric atrophy diagnosis, introducing a multilevel causality learning approach to address visual interference and achieving improved accuracy. Through experiments on an endoscopic dataset, it outperforms existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The first program to achieve an automatic diagnosis of multi-label gastric atrophy. The confounding effect of strong visual interference in endoscopic images is illustrated by SCM and proposes a causal intervention strategy to remove it.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the methodology is not clear and the rigor of the experimental part is slightly weak.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    a) The explanation of Multilevel Confounder in Part 2.2 is not clear, and some steps seem to have been omitted from the description of the sampling process in the middle. b) How to get the tensor U in Part 2.3, please add the theoretical basis that u can represent uncertainty. c) In general, the strategy proposed in this article does not have enough evidence to support that the model can weaken the influence of confounder. d) The experimental part does not state whether multiple repetitions of cross-validation experiments have been conducted, and the ablation part does not prove that the individual components are significantly different.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This article is partially unclear in its presentation of causal learning. And the idea of removing confounding factors is not convincing.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The response addressed most of the concerns, although the explanation of the causality aspect remains unclear (lacking sufficient theoretical and experimental support). Nonetheless, it overall meets the quality requirements of a conference paper.



Review #2

  • Please describe the contribution of the paper

    The authors propose a multi-level causality learning method to formulate the endoscopic grading of gastric atrophy as the multi-label classification problem.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper formulates the endoscopic grading of gastric atrophy as the multi-label classification problem. Multilevel causality to construct multilevel confounders is utilized to derive class-aware features that are robust to visual interference. The proposed multilevel causality with transformer leads to the improvement of mAP and OP for gastric atrophy diagnosis compared with state-of-the-art methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The correlations between prediction of the classifier shown in Eq. (1) and the transformer are hard to be interpreted in Fig. 2. The loss function is also unclear in the paper. ResNet is coupled with Transformer [19]. Transformer only structure is expected. The difference between the proposed method and [11] should be clarified.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors claim that the dataset and code will be released after the paper is accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Computing class-aware features based on class token and patch token is also available in [R1]. Technical justification is suggested. [R1] Xu, Lian and Ouyang, Wanli and Bennamoun, Mohammed and Boussaid, Farid and Xu, Dan, “Multi-class Token Transformer for Weakly Supervised Semantic Segmentation”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4310-4319, 2022. The following paper should be reviewed and compared. [R2] Lin N, Yu T, Zheng W, Hu H, Xiang L, Ye G, Zhong X, Ye B, Wang R, Deng W, Li J, Wang X, Han F, Zhuang K, Zhang D, Xu H, Ding J, Zhang X, Shen Y, Lin H, Zhang Z, Kim JJ, Liu J, Hu W, Duan H, Si J. Simultaneous Recognition of Atrophic Gastritis and Intestinal Metaplasia on White Light Endoscopic Images Based on Convolutional Neural Networks: A Multicenter Study. Clin Transl Gastroenterol. 2021 Aug 3;12(8). Please label $E_p$, $E_c$ and $f^k()$ in Fig. 2 for clearness. Please gives the details of the loss function. Based on Eq. (4), $O_p$ is defined but not used in the network. Please explain how to obtain the tensor $u$. When citing references, leave a space before the references. The words in Fig. 3 are relatively small. What is the definition of $K$ in Fig. 3? In Table 2, the bold word should represent the best results. What is the definition of without backbone shown in the third row of Table 2? [11] is published in ICLR 2023.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors formulate the endoscopic grading of gastric atrophy as the multi-label classification problem and solve the problem by using multilevel causality with transformer.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors address most of my concerns.



Review #3

  • Please describe the contribution of the paper

    To facilitate the endoscopic grading of gastric atrophy, the authors designed a structural causal model and utilized PCI to eliminate confounding features. Class-aware features from this process were subsequently employed for multi-label predictions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty: a novel technique for a multilevel causality learning for multi-label classification is proposed.

    Data collection: Endoscopy data was gathered from a substantial cohort of subjects (approximately 4000) with and without gastric atrophy and utilized for the experiments. The commitment to open-source the data is commendable.

    Detailed analysis: The paper includes comparisons with recently published state-of-the-art papers from ICLR and CVPR. Additionally, it conducts further ablation studies to justify the utilization of various components such as the multi-class token transformer, multilevel sampling, self-attention, and uncertainty-guided modulation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of clarity: Some subsections are brief and lack detailed descriptions. It appears that the authors have attempted to condense a full journal paper into an 8-page limit, making it difficult to understand and even more challenging to replicate the methodology.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) The paper does not address the limitations of this work.

    (2) The causal graph is structured with image features causing class-aware features, but the reasoning behind this graph is unclear.

    (3) The paper lacks clarity on how logit-level modulation is suggested to strengthen the accurate causal relationship between images and labels.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a novel technique that surpasses state-of-the-art methods. Additionally, it includes ablation studies to justify the reasoning behind the design choices. Providing more details on the components would enhance the readability of the paper; nonetheless, in its current form, I think it meets the acceptance criteria.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Even though the authors addressed some questions in their rebuttal, overall they did not significantly improve the paper. Therefore, I will keep my scores the same as before.




Author Feedback

Thank you for your valuable comments on our work. Special thanks to R3 and R5 for accepting the paper directly.

  1. Novelty (R1: “first program to achieve an automatic diagnosis of multi-label gastric atrophy” R5: “a novel technique surpasses SOTA methods”);

  2. Sufficient experiments (R3 & R5: “comparisons with recently published SOTA methods”, R5: “justify the utilization of various components”). Q&A1: Code and dataset will be released publicly for reproducibility, along with detailed implementation.

Q2: slightly weak experimental rigor, no…confounder, no… ablation (R1). A2: 1) Table 1 shows that ours achieves the highest on most average overall metrics (OR, OF1, mAP) and CF1, and competitive scores on other metrics. Highest OR and mAP scores mean that ours achieves lower clinical risk and total average accuracy improvement, demonstrating that it can weaken influence of confounders. 2) In Table 2, adding sample, modulation, and attention in turn bring significant improvements (%): 1.58, 1.26, 1.11 on OR, 0.57, 0.59, 0.51 on OF1. Although slightly performance drop on mAP or CR is observed when removing attention or modulation, using both brings overall improvement in ours.

  1. Clarity and organization (R3: “Good”, R5: “Very Good”). Q3: Unclear explanation of multilevel confounder (MC) and some steps omitted (R1). A3: MC represents all possible various confounding contexts that lead to incorrect causalities between image features (IF) and predictions. To mitigate its negative effects on IFs, dual token path sampling learns multilevel features affected by MC, from which multilevel class-aware features (CAFs) are derived, as stated and summarized in 1st and last paragraph of Sec. 2.2.

Q4: Reasoning of structural causal model (SCM) with image features (X) causing CAFs (R5). A4: Our novel SCM leverages CAF (Z) as true causality from X to labels (Y). Although X is affected by confounder contexts (C), backdoor adjustment with do-calculus cuts off link C-X, hindering influence of C on X pass to Z and ensuring that CAF is insensitive to contextual bias.

Q5: Correlations between prediction of classifier in Eq. (1) and transformer in Fig. 2 (R3). A5: 1) P(Y|X=x,C=c,Z=z) in Eq. (1) correlates to obtain prediction (Logits in Logit-level Modulation module in Fig. 2) by (classifier in Sample-level Re-weighting module) on (multilevel CAFs) from (Sampling Features). We will label all variables in Fig. 2 on camera-ready version. 2) Notably, parameters of classifiers used for spatial grouping and final prediction are the same. 3) P(Z|X=x)P(C=c) in Eq. (1) denotes progressive casual intervention implemented by sample-level re-weighting and logit-level modulation (right of Fig.2).

Q6: Different from [11] and [R2] (R3). A6: 1) Our technical advantages include: -Confouder set building: single feature level in [11], our multilevel confounders are sufficient to represent strong visual interferences. -Casual Intervention: sample-level attention in [11], our sample-level reweighting and logit-level modulation is effective, as in Table 2. 2) Clinical difference: [R2] only identifies normality but no gastric site.

Q7: Technical justification with [R1] in obtaining CAF using class and patch tokens (PT) (R3). A7: Different combinations of different features leveraged: [R1] averages intermediate attention weights to multiply PT to refine CAF, we reweight intermediate hidden states in PT path to derive robust CAF that is casual to label.

Q&A8: detailed implementation including: -Loss function: Binary CrossEntropyLoss with sigmoid (R3). -Tensor u in logit-level modulation is obtained by the variance of O (R1 & R3). -u refers to aleatoric uncertainty that captures inherent ambiguity in outputs for a given input (R1). -$O-p$ is part of O (R3). -K in Fig. 3 denotes sampling number of patch token paths (R3). -W/o backbone in 3rd row of Table 2 denotes applying a single class token in transformer (R3). -ResNet-50+ViT-B_16 is also used in IDA and CCD (R3).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces a multi-level causality learning method for multi-label gastric atrophy diagnosis, achieving improved accuracy by addressing visual interference and outperforming existing methods on an endoscopic dataset. The method is interesting and novel. The paper is clearly written, and the experimental setup and ablation study are sound. After rebuttal, all reviewers indicate that their concerns have been addressed and reach consensus about its acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper introduces a multi-level causality learning method for multi-label gastric atrophy diagnosis, achieving improved accuracy by addressing visual interference and outperforming existing methods on an endoscopic dataset. The method is interesting and novel. The paper is clearly written, and the experimental setup and ablation study are sound. After rebuttal, all reviewers indicate that their concerns have been addressed and reach consensus about its acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top