Abstract

PET-CT integrates metabolic information with anatomical structures and plays a vital role in revealing systemic metabolic abnormalities. Automatic segmentation of lesions from whole-body PET-CT could assist diagnostic workflow, support quantitative diagnosis, and increase the detection rate of microscopic lesions. However, automatic lesion segmentation from PET-CT images still faces challenges due to 1) limitations of single-modality-based annotations in public PET-CT datasets, 2) difficulty in distinguishing between pathological and physiological high metabolism, and 3) lack of effective utilization of CT’s structural information. To address these challenges, we propose a threefold strategy. First, we develop an in-house dataset with dual-modality-based annotations to improve clinical applicability; Second, we introduce a model called Latent Mamba U-Net (LM-UNet), to more accurately identify lesions by modeling long-range dependencies; Third, we employ an anatomical enhancement module to better integrate tissue structural features. Experimental results show that our comprehensive framework achieves improved performance over the state-of-the-art methods on both public and in-house datasets, further advancing the development of AI-assisted clinical applications. Our code is available at https://github.com/Joey-S-Liu/LM-UNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1851_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1851_supp.pdf

Link to the Code Repository

https://github.com/Joey-S-Liu/LM-UNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Liu_LMUNet_MICCAI2024,
        author = { Liu, Anglin and Jia, Dengqiang and Sun, Kaicong and Meng, Runqi and Zhao, Meixin and Jiang, Yongluo and Dong, Zhijian and Gao, Yaozong and Shen, Dinggang},
        title = { { LM-UNet: Whole-body PET-CT Lesion Segmentation with Dual-Modality-based Annotations Driven by Latent Mamba U-Net } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a whole-body PET-CT cancer lesion segmentation procedure composed of dual annotations (separately for PET and CT), a novel hybrid CNN-Mamba based architecture and an “anatomical enhancement module” to more precisely delineate the lesion contours in the CT modality.

    They conduct experiments and an ablation study on both a public and in-house dataset, achieving SOTA or close to SOTA results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very clear, and describes the contribution well. Authors use recent, suitable improvements in DL techniques, specifically the Mamba module. Technically the architecture is not very complicated but the results are at least incrementally good. Qualitative results show that long-range information is used at least in some cases, which is difficult to achive with pure CNN architectures such as nn-Unet.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The improvements are slight (2nd decimal in lowball DICE scores on the public dataset, specifically 0.754 for the proposed method vs 0.742 for nn-Unet). In my experience this is not significant and hard to reproduce. In particular Dice scores below 0.85 are typically not very useful and can only serve as detection markers.

    The improvements when using dual annotations are more marked but not reproducible. it would have been more useful to re-annotate the public dataset and provide those annotations.

    The authors don’t offer to publicize their code or dataset or annotations, further highlighting reproducibility concerns.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No code and no data particularly for the dual-annotation case.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I’m looking forward to discussion this paper with the other reviewers and reading the authors’ rebuttal. I really don’t think such small incremental improvements are worth publishing in MICCAI without strong reproducibility properties.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Incremental improvements on low Dice scores.

    • No training/validation loss curve in the supplementary material. There could be overfitting concerns

    • No code, no new data, no new annotation will make reproducibility a strong concern.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    (1) The paper curated a new dataset for 3D whole body PET-CT lesion segmentation. (2) The paper proposed a new method for PET-CT lesion segmentation. (3) Through experiments, the results demonstrated the effectiveness of the framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) They curated a new dataset with detailed annotations for 3D whole body lesion segmentation. This dataset incorporates both CT and PET modalities, which are commonly used in clinical practice. (2) The integration of Mamba and attention modules within their UNet architecture marks a notable innovation, boosting the model’s performance significantly. They also utilized a two-branch decoder, enabling to segment CT and PET simultaneously. Both help to optimize the workflow and increases the efficiency of the segmentation process. (3) The authors have conducted extensive experiments demonstrating substantial improvements in segmentation performance over existing state-of-the-art (SOTA) methods. Their comprehensive evaluation, which includes both quantitative metrics and qualitative analyses.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Motivation for the AE module is not clear. The motivation behind the inclusion of the AE module is not sufficiently clear, particularly why it is applied exclusively to the CT branch of the network. (2) Results should be further discussed. On the in-house dataset, the proposed method achieved the best Dice score but second best HD95 score. This discrepancy requires a deeper analysis. (3) Implementation details are not clear. The paper does not provide adequate details regarding the inputs for the models designed for single-task segmentation. Is it a fair comparison if they are originally designed for single modality? (3) Availability not stated. The paper does not mention the availability of the newly curated dataset and the codes.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The availability of the data and code should be clearly stated.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) While the use of the AE module in the CT branch is effective, the reason behind this choice remains unclear. A more detailed explanations should be provided. (2) The paper could benefit from a more extensive analysis of the performance discrepancies noted between the Dice and HD95 scores. (3) The availability of the data and the code should be clearly stated.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (1) The paper proposed a novel method for segmenting lesions from PET and CT. (2) Experiments are good enough to prove the effectiveness.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have made efforts to clarify the points raised in the initial review. The motivation for the AE module is based on the fact that CT images require more precise lesion delineation due to blurred boundaries in PET, which is understandable. They also provide more explanations about the results and implementations.



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel network architecture LM-UNet which improves PET/CT lesion segmentation performance as compared to existing methods. In particular, this work emphasizes (the relatively under-explored) better utilization of CT information by incorporating dual-modality based annotations (on in-house dataset) and by using an anatomical enhancement module based on CBAM network (that primarily tries to improve the segmentation of lesions on CT). Unlike generic U-shaped networks for segmentation, the bottleneck in LM-UNet consists of Mamba-encoder to better facilitate long-range modeling. Moreover, the decoding operation is split into two distinct pathways, one facilitating the metabolic activity-based lesion segmentation using PET-based annotation, while the other facilitating the anatomical structure-based lesion segmentation using CT-based annotations. The experiments have been conducted on an in-house (private) dataset, and the AutoPET (public) challenge dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (a) Usually, existing works concatenate PET and CT patches along channel dimension and use that as input to the network. This work proposes to use two differently clipped CT images (lung window and soft tissue window) along with the PET patch. This way of using the CT input is novel (a step towards data-centric AI in PET/CT lesion segmentation). (b) The proposed architecture itself is novel, especially the anatomical enhancement module, for segmentation based on structural information. This is also an efficient utilization of a combination of Mamba architecture within a U-Net (CNN) based framework. (c) The proposed method has been thoroughly benchmarked against the (existing) representative state-of-the-art supervised deep learning methods for medical image segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (a) Since the PET and CT patches are concatenated, I assume that the anatomical structural information is entangled between the two modalities. As a result, it is hard to understand if the AE module actually utilizes the anatomical information solely from the CT data or from a combination of CT and PET (although the output from this branch of decoder backpropagates the error with respect to CT lesions only). An explanation for this would be beneficial. (b) The authors’ have not explicitly discussed the reason for the AE module (given its architecture), should specifically improve performance by learning representations from the anatomical information. How exactly does the CBAM module utilize anatomical information? (c) There are several useful hyperparameters in this work, like depth of Mamba encoder (L), number of channels in latent representation (J), number of hidden dimensions (K), inference sliding window size, structure of ResNet-blocks in the PET-decoder branch, etc. The authors’ do not report the values they used for these hyperparameters (although the value for L is mentioned in the Suppl., but it should have been mentioned in the main paper as well). This limits the potential reproducibility of the method. (d) Moreover, only on the (privately-owned) in-house dataset that actually contains dual-modality annotations, we can claim the importance and improvement due to the use of multiple-modality annotations. The larger AutoPET dataset only contains single-modality annotation, hence one of the key ideas of the paper (i.e., using annotations on both PET and CT) only really applies to the smaller dataset. This, in a way, limits the overall claims made in the paper about the importance of using annotations on both modalities. (e) Several recent works emphasize the importance of task-based (clinical) evaluation of PET/CT lesion segmentation networks such as Liu, Z., et al (2023) [https://doi.org/10.1117/12.2647894], such as estimating the total metabolic tumor volumes, SUVmean, SUVmax, etc. Although the authors’ emphasize the clinical utility of their method since they utilize the rich information from both CT and PET, the evaluations were solely performed on task-agnostic metrics like DSC and HD95. This might limit the clinical applicability of their method since networks performing well on DSC and HD95 might still fail to accurately estimate clinically-relevant metrics, as also discussed in Ahamed, S., et al (2023) [https://arxiv.org/pdf/2311.09614.pdf].

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No code was shared. Although the manuscript is largely well-written, it lacks in specifying some of the details about various hyperparameter choices, limiting the potential for the reproducibility of the work. Moreover, the in-house dataset that actually contains the dual-modality annotations was private and was not shared with the work, although the AutoPET dataset is public (although this dataset doesn’t contain dual-modality annotations).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (a) As discussed in the weakness section above, please explain how the given structure of the AE module enhances segmentation performance based on anatomical information, since the PET and CT inputs are channel-concatenated and thus their representations are entangled throughout the network. Discuss the specific role of CBAM network too. (b) Explain or report the values of different hyperparameters used in the work in the main paper as well, so the main paper by itself becomes self-contained. (c) In the section 3.4 Qualitative Evaluation, the authors’ have explained how LM-UNet is superior that other networks in differentiating pathological and physiological uptakes (especially within kidneys). Please explain this more explicitly within Fig 3 (by expanding Fig 3 caption) since this would be hard for the non-clinical audience to understand. What do the yellow-dashed boxes and darker and lighter red shades of patches represent in Fig 3 (please add a legend to improve the readability of Fig 3)? (d) Discuss the various limitations of this work in the main paper, as explained above in the “weakness section”. (e) Please discuss how you plan to test/improve the clinical applicability of the proposed method (testing on additional clinically relevant metrics, better/newer way to utilize PET/CT data, etc.) in your future work with relevant citations in the Conclusion/Discussion section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a novel network incorporating Mamba-like architecture along with CNN-based U-shaped networks which improves lesion segmentation performance over existing SOTA methods. The paper is largely well-written and the benchmarking is pretty thorough. I am still not fully convinced about some of the claims made in the paper regarding utilization of dual modality annotations (since this was demonstrated only on one of the datasets), or how AE (or CBAM) module utilizes structural information. A discussion about these and other aspects as mentioned in the “Constructive comments” section above might make the manuscript suitable for acceptance to MICCAI 2024.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I couldn’t find the the codes for LM-UNet on either Github or via Google search. It might still be available there and open-sourced but probably not very readily available (not among the first 100 or so links I checked).

    Looking at the network architecture in Fig 2, the AE module seems to be working on the both CT and PET data (since the input to the downstream AE module originates from both PET and CT inputs upstream, leading to the AE getting an entangled feature representation from both modalities). Although the branch with the AE module does end up with providing supervision signal via the CT based ground truth. As a result, I still don’t think that AE module utilizes the anatomical information from the CT solely. This was one of my questions in review, but I do not think the authors’ have provided a convincing explanation to this in their rebuttal.

    I would suggest the author’s to include the explanations regarding the different choices made in this work (if the paper is accepted, since you can add half page of extra explanatory content), be it the use of AE module only for CT, the importance of CBAM, different hyperparameter choices, and some more qualitative results aimed to explain clinical relevance. This is definitely a strong work in terms of method development, but the lack of the availability of CT-based annotation almost limits the scope of this work (which the authors’ themselves point to in their rebuttal).

    In the light of these limitations, I would still like to maintain my score of 4 (Weak Accept). Congratulations to the authors on this work and good luck with the decisions.




Author Feedback

We would like to thank the reviewers (R1, R3, and R4) for their valuable and constructive comments, which have significantly improved our manuscript. In response to the concerns, we have provided the following responses.

Methodology Concerns: The motivation of the AE module (R3 and R4). R3 concerned about why it is applied exclusively to the CT branch of the network. R4 concerned about how the AE module enhances segmentation performance based on anatomical information. 1) In our work, we aimed to highlight the structural and functional differences between PET and CT modalities. Lesions on CT are required to be more precise than on PET, as PET images often have relatively blurred boundaries in high metabolic areas. Therefore, we introduced the AE module to focus on the CT branch. 2) The AE module consists of two components: multi-scale convolution and CBAM. The multi-scale method captured fine details and overall structure in CT images, while CBAM highlights important features and suppresses noise, leading to improved segmentation performance.

Experimental Concerns:

  1. Incremental improvements on low Dice scores (R1). 1) We acknowledged the improvements in Dice scores compared to nnU-Net seem modest in Table 1. 2) However, considering that nnU-Net is a well-established benchmark model, our method demonstrated competing results. 3) We are not pursuing the ultimate hyperparameters for the best performance, just to verify the performance comparison of all models under the same settings. 4) Our method outperformed nnU-Net in distinguishing between pathological and physiological metabolism, providing a more comprehensive understanding of the model’s strengths through qualitative evaluation.
  2. Implementation details (R3 and R4). R3 concerned about adequate details regarding the inputs for the models designed for single-task segmentation. R4 concerned about the values we used for several useful hyperparameters. 1) As explained in Sec. 3.3 of the manuscript, we utilized the original encoder design for multi-channel input, and adjusted the decoder structure to suit various tasks. This approach ensured fair comparisons across different experiments. 2) Due to space limitations, we didn’t provide a comprehensive reporting of the hyperparameters. However, we conducted experiments with Mamba depths of 6, 12, 18, and 24, maintained a fixed hidden dimension of 768, and explored sliding window sizes of 96 and 128.
  3. Overfitting concerns (R1). By observing loss curves during the training process, we employed various regularization techniques and carefully tuned hyperparameters to mitigate overfitting. Although we monitored the loss curves, we did not include them in the manuscript, following the approach of other papers in the field.
  4. Several recent works emphasize the importance of task-based (clinical) evaluation of PET/CT lesion segmentation networks. Discuss the clinical applicability of the proposed method (R4). 1) We have reviewed the papers you mentioned above prior to submitting the manuscript. In our future work, we plan to conduct more extensive data analysis, including examining SUVmax distribution and organ distribution, which are relevant to clinical applicability. 2) We will incorporate additional metrics such as precision and recall, which are crucial in clinical practice for assessing the degree of lesion detection.

Reproducibility Concerns:

  1. No code was shared (R1, R3 and R4). You can search for ‘LM-UNet’ on GitHub for our code which is open-sourced. The official GitHub link will be provided upon acceptance.
  2. Improvements with dual annotations are more marked but not reproducible (R1). Due to temporarily data confidentiality and alignment with hospital requirements, we preserve the unique characteristics of PET and CT modalities. However, we understand the reproducibility of this strategy might be challenging. Our intention was to inspire similar initiatives rather than expand the scope of our work.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    one reviewer increased, overall rebuttal looks good, no significant issue.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    one reviewer increased, overall rebuttal looks good, no significant issue.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top