Abstract

Lymph node (LN) assessment is an indispensable yet very challenging task in the daily clinical workload of radiology and oncology offering valuable insights for cancer staging and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians along with high inter-observer variations. Previous CNN-based lesion and LN detectors often take a 2.5D approach by using a 2D network architecture with multi-slice inputs, which utilizes the pretrained 2D model weights and shows better accuracy as compared to direct 3D detectors. However, slice-based 2.5D detectors fail to place explicit constraints on the inter-slice consistency, where a single 3D LN can be falsely predicted as two or more LN instances or multiple LNs are erroneously merged into one large LN. These will adversely affect the downstream LN metastasis diagnostic task as the 3D size information is one of the most important malignant indicators. In this work, we propose an effective and accurate 2.5D LN detection transformer that explicitly considers the inter-slice consistency within a LN. It first enhances a detection transformer by utilizing an efficient multi-scale 2.5D fusion scheme to leverage pre-trained 2D weights. Then, we introduce a novel cross-slice query contrastive learning module, which pulls the query embeddings of the same 3D LN instance closer and pushes the embeddings of adjacent similar anatomies (hard negatives) farther. Trained and tested on 3D CT scans of 670 patients (with 7252 labeled LN instances) of different body parts (neck, chest, and upper abdomen) and pathologies, our method significantly improves the performance of previous leading detection methods by at least 3\% average recall at the same FP rates in both internal and external testing.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0362_paper.pdf

SharedIt Link: https://rdcu.be/dV181

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_58

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0362_supp.pdf

Link to the Code Repository

https://github.com/CSCYQJ/MICCAI24-Slice-Consistent-Lymph-Nodes-DETR

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yu_SliceConsistent_MICCAI2024,
        author = { Yu, Qinji and Wang, Yirui and Yan, Ke and Lu, Le and Shen, Na and Ye, Xianghua and Ding, Xiaowei and Jin, Dakai},
        title = { { Slice-Consistent Lymph Nodes Detection Transformer in CT Scans via Cross-slice Query Contrastive Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {616 -- 626}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a cross-slice query contrastive learning module in a transformer-based object detector, Mask DINO for lymph nodes detection in CT.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper addresses a clinically relevant lymph node detection problem.
    2. The results are encouraging. It shows at least 3% improvement wrt competing methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposal has very limited novelty.
    2. All the steps of the methods are not clearly explained.
    3. Issues with some mathematical formulations.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. How is (d) created in Fig. 1? Is it shown in [20]?
    2. Fig. 2 is not fully self-explanatory. Why the steps of the proposal shown in Fig 2 are not sequentially explained in the texts? What are mask head, box head, class head, or contrastive head? What is their role? It seems like the paper is written only for the readers who are experienced in this area.
    3. How are the multi-scale features are created? Where are the explanations?
    4. What is 2.5D fusion layer? Where do we get the details?
    5. “where loss weights λ1, λ2, and λ3 are set to 1.0, 2.0 and 5.0 by default.” - How is it set?
    6. A serious issue is with the eq. (2). What is the argument of exp() function? Does it take two arguments? Then how is it calculated?
    7. How is 0.5 FPs calculated?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The decision is made based on multiple issues discussed in “detailed and constructive comments for the authors”.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Lymph node detection and segmentation using extended Mask DINO. Keys are 2.5D feature extraction and cross-slice query contrastive learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Not only simply using Mask DINO, but also extracting features with 2.5D strategy. One lymph node spans several slices, and this is a quite important extraction.
    • Cross-slice query contrastive learning allows to determine lymph nodes and similar objects, such as esophagus. There are many kinds of soft tissues around targeted regions, and this is also an important extension.
    • Experiments were performed using datasets from five hospitals. Data collections for this research is another contribution.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • This paper lacks a description of the targeted lymph node regions (such as the neck, lung fields, abdomen, and mediastinum) and does not discuss the differences in their characteristics.
    • Five datasets are used, and four of them seem in-house. It is nice if training and testing are performed only using NIH’s online dataset. Not only reproducibility, but also alleviating concerns about whether it will work with limited datasets.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    In Experiments, training was performed with three datasets (two are in-house), and testing was done with five datasets (four are in-house).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As mentioned above, one concern is about training ability with small amount of datasets.

    While each technical components are well described, the entire steps of training and testing are not clearly described in Section 2 (Method). Although some preprocessing and training conditions are described in “Implementation Details” in Section 3 (Experiments and Results), a step-by-step explanation is desired in Section 2.

    Your datasets are from five hospitals with different types of patients. If you want to say that lymph nodes for different regions from training can be detected (e.g. training on abdominal lymph nodes and inference on neck lymph nodes), you should clearly sort out.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While some concerns are remained, key ideas are results may be enough as MICCAI contribution

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Many thanks for the rebuttal. Remaining concern is about the in-house dataset. The fundamental issue is that the potential of your proposed method cannot be demonstrated without large datasets. While I understand the difficulty of passing ethical approvals, I hope you can publish your in-house dataset or find other applicable datasets to show good performance.



Review #3

  • Please describe the contribution of the paper

    The authors propose a new detector which is based on Mask DINO with a 2.5D backbone and an additional cross-slice query contrastive learning approach for detecting Lymph Nodes (LN) in CT images. The 2.5D fusion approach of the backbone was initially presented in [Mulan, 22 in paper] and is used to extend a 2D backbone to 2.5D while being able to use pre-trained weights. The novel contrastive learning approach for the queries aims to pull queries of the same LN instance together while pushing queries of different instances apart. The manuscript includes multiple well established baselines like nnDetection, nnU-Net, A3D + SATr, LENS and MULAN and several ablation experiments. Evaluation was conducted on an internal test set (test split separated from training data) and an external test set.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors present a novel lymph node (LN) detection framework based on Mask DINO, featuring a 2.5D backbone and a cross-slice query contrastive learning approach tailored for CT images which showed very impressive performance
    • The manuscript provides a comprehensive comparison with multiple established baselines, including nnDetection, nnU-Net, A3D + SATr, LENS, and MULAN, along with several ablation experiments, including DINO + Mask DINO and the ablation of their proposed method.
    • The paper is well-written, complemented by clear figures, and adheres to standard evaluation practices, conducting sound experimentation on a large dataset.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The primary shortcoming of the paper are summarised as follows:

    • The authors put quite a large emphasis on being able to use pre-trained weights throughout the manuscript (abstract, motivation for a 2.5D backbone, conclusion). Nevertheless, it remains unclear if the proposed method used pre-trained weights of some form and if so where the pre-trained weights were obtained from (natural imaging, deep lesion etc.). Furthermore, including information about pre-trained weights for the selected baseline models would be interesting as well. While not strictly necessary, including results without pre-trained weights (if not already the case) would be of interest as well to underline the importance of them - as an additional ablation experiment.
    • The proposed detector was only evaluated for lymph node detection which is a single (yet very difficult and important) problem. Given that the method is not restricted to lymph node detection, the inclusion of other medical problems to show the general applicability of the presented method would significantly enhance the manuscript and highlight the robustness of the proposed method.
    • Since all of the experiments were conducted on a mixture of public and private datasets and there is no information about releasing the code, the reproducibility of this study is significantly limited. Providing reproducible experiments (e.g. only on the public data) and providing the source code would build additional trust in the results.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    As mentioned in the review, the reproducibility of the manuscript is quite limited since the experiments were carried out on a mixture of public and private datasets and there is no mention about making the source code publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are highly encouraged to include additional information on the pre-trained weights to highlight the necessity and influence of them on final model performance. Furthermore, open sourcing the used source code would greatly improve the reproducibility of the study. While probably out of scope for this study, the inclusion of other medical problems as well as public datasets would significantly improve the manuscript in the future.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript provides promising results in a well constructed experimental setup with multiple well established baselines. Nevertheless, some of the details need to be specified to make the manuscript complete and allow for better reproducibility.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thank you to the authors for clarifying the raised points in the reviews. With the added clarifications in the manuscript it will provide a useful resource for the future. Due to the remaining limitations of the method and evaluation (the focus on a single medical problem - i.e. LymphNode detection) the original evaluation of the manuscript remains the same.




Author Feedback

We thank all reviewers for their comments, especially in noting that our paper is well-written (R4), presents a novel lymph node (LN) detector with sensible 2.5D fusion and cross-slice query contrast module (R1, R4), and reports satisfied results on the LN detection task (R1, R3, R4) with dataset contribution (R1). We address the individual comments below. Q1: LN regions/characteristics of each dataset (R1). Indeed, we have provided the dataset description in Table 1 of the supplementary (due to space limit in main text), including resolutions and LN regions in each dataset. Meanwhile, we want to clarify that LNs in different body parts have their own context characteristics. E.g., the model trained only with abdominal LNs does not generalize well in the neck region due to the unseen surrounding anatomical structures. Thus, training the model on LNs from various body regions is imperative to enable universal LN detection. Q2: Results only using public NIH dataset and reproducibility (R1, R4). When trained and tested only on public NIH dataset, the vanilla Mask DINO only has an average recall of 34.26% across 0.5 to 4 FPs. With the proposed 2.5D fusion and cross-slice query contrastive learning, our performance is significantly improved to 43.94%. Once accepted, we plan to release source codes to ensure reproducibility. Q3: Issues of method details (R1, R3). 1) Fig.1(d) is merged 3D LN prediction from 2D slice-level predictions in Fig.1(a-c). We train method [20] using our LN training set and generate this prediction in our testing set. 2) Clarity of Fig.2. The framework consists of a 2.5D backbone and a detection transformer (i.e., transformer encoder and decoder) with multiple prediction heads (i.e., mask head, box head, and class head). All these components are shown in Fig.2 from left to right (grey-color feature maps are the output from the 2.5D backbone). The class, box and mask heads are basic modules of CNN or transformer-based detectors to make class, box and mask predictions, while the contrastive head is our new proposed head defined in Sec.2.2. 3) Multi-scale features. First, we extract multi-scale feature maps from output features of res-block2 to res-block5 in ResNet50, where the feature map will be downsampled once after each res-block. Then, the feature map at each scale is processed by a 2.5D fusion layer to get the 3D-context-enhanced feature map. Finally, these feature maps are collected to form multi-scale 3D-context-enhanced feature maps. 4) 2.5D fusion layer. We briefly describe the 2.5D fusion layer in main text due to space limit. The detailed layer structure is shown in Fig.1 of the Supp. We will add more descriptions in final version. 5) Loss weights. We follow the loss weights λ1-3 as used in original Mask DINO. 6) 0.5 FPs. 0.5 FPs refers to 0.5 false positives per CT scan. E.g., if there are 10 FPs in 20 CT scans, this would equate to 0.5 FPs per scan. Recall at 0.5 FPs is commonly used in lesion or LN detection [2,23,15,20-22].  Q4: Issues of Eq.2 (R3). We acknowledge the oversight and agree that there is an error in Eq.2 of initial submission. exp() is the exponential function, which should take the inner product of two vectors (we forgot to add inner product). We will correct this in final version. Q5: Pre-trained weights (R4). The backbones of the selected baseline models and ours (except 3D nnDet and 3D nnUnet) are all initialized with ImageNet pre-trained weights, while nnDet and nnUNet are randomly initialized. Take our Mask DINO† as an example; we find that without pre-trained weights, the model will take more time (~50 epochs vs 30 epochs with pretained) to converge and drop 4~5% in average recall. We will add pretrained information in final version. Q6: Other medical detection tasks (R4). We initially intend to also examine on DeepLesion, yet, it only provides masks on one slice for each 3D lesion, making it unsuitable for evaluating our method. We may explore other datasets in the future.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviews tend mainly towards rejection for this paper. While the rebuttal have answered some of the issues, there remains fundamental problems that are not fully addressed, such that the potential of the proposed method cannot be demonstrated without using large datasets (in the current state, limited datasets are used). The reviewers also note that the paper has limited novelty, and important information remains missing in terms of the training and weights. The authors are encouraged to further pursue this with more extensive datasets.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviews tend mainly towards rejection for this paper. While the rebuttal have answered some of the issues, there remains fundamental problems that are not fully addressed, such that the potential of the proposed method cannot be demonstrated without using large datasets (in the current state, limited datasets are used). The reviewers also note that the paper has limited novelty, and important information remains missing in terms of the training and weights. The authors are encouraged to further pursue this with more extensive datasets.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces an enhanced lymph node detection and segmentation method using an extended Mask DINO model, incorporating a 2.5D feature extraction and a novel cross-slice query contrastive learning approach.

    The authors have addressed many of the reviewers’ questions. The paper is of good quality, comparing data from multiple institutions and several state-of-the-art models.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper introduces an enhanced lymph node detection and segmentation method using an extended Mask DINO model, incorporating a 2.5D feature extraction and a novel cross-slice query contrastive learning approach.

    The authors have addressed many of the reviewers’ questions. The paper is of good quality, comparing data from multiple institutions and several state-of-the-art models.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I think the paper can be accepted. The authors performed an extensive evaluation on several datasets, which showed superior performance. #R1 lowered the score because the data is not open source, however, this is not neccessarily a criteria.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I think the paper can be accepted. The authors performed an extensive evaluation on several datasets, which showed superior performance. #R1 lowered the score because the data is not open source, however, this is not neccessarily a criteria.



back to top