Abstract

Early identification of lymphoma patients with poor prognosis is crucial to determining personalized treatment plans and improving prognosis. Currently, commonly used prognostic biomarkers include clinical variables such as International Prognostic Index. Quantitative parameters based on PET/CT and deep learning methods have also shown promising results. However, there are still several challenges in PET/CT-based prognostic studies: heterogeneity in the number and location of lesions, insufficient representation of lesion features, and the lack of anatomical context modeling of the lesions. We propose a novel framework named LAMP, with lesion-anatomy context fusion and attention-based multi-lesion aggregation as its two key components. The former takes into account information about the surrounding anatomical organs of the lesions to improve their representation. The latter treats each lesion region as an instance, assigning attention scores that reflect the contribution of each lesion, and aggregates them accordingly. A total of 229 lymphoma patients were collected to evaluate our model. In prediction tasks for progression-free survival and overall survival, the 5-fold cross-validation C-index is 0.791 and 0.828, respectively, outperforming existing models based on clinical variables and deep learning. LAMP has the potential to become a clinical auxiliary tool to differentiate patients with varying risk levels, facilitating the development of personalized treatment plans

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1634_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaSon_Lymphoma_MICCAI2025,
        author = { Zhang, Song and Zhang, Jiajin and Qiu, Liheng and Liu, Wei and Jin, Dakai and Lu, Le and Yang, Shenmiao and Yan, Ke},
        title = { { Lymphoma Prognosis with Lesion-Anatomy Context Fusion and Attention-Based Multi-Lesion Aggregation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {324 -- 334}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In their manuscript “Lymphoma Prognosis with Lesion-Anatomy Context Fusion and Attention-Based Multi-Lesion Aggregation”, the authors propose a novel method for assessing lymphoma prognosis leveraging lesion anatomy context fusion together with transformer-based multi-lesion aggregation in order to derive progression free (PFS) and overall survival (OS) in lymphoma patients.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the manuscript felt mostly well-written and comprehensible and depicted a clear and relevant application scenario with satisfactory results.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Particularly with respect to future reproducibility, I felt that in its current form the manuscript contains multiple larger unclarities that should be sufficiently addressed before publication, which I would like to point out as follows:

    • First, regarding the overall evaluation it felt to me that the choice of using a 5-fold CV is somewhat unfortunate. This is in particular due to the rather low number of available patients (N=229), leading to a large expected variance due to the experimental setup. Particularly due to this setup, the described low variance across folds (stds of down to 0.001) seemed somewhat unintuitive and may significantly underrepresent the actually variance seen if faced with a different cohort. However, since the authors do not provide an evaluation on any external or split-off dataset, this assumption cannot be tested adequately. A more adequate estimate on only a single cohort may instead be given if using bootstrapped confidence intervals, as it is possible using the bootstrapping approach from Efron et al.. This is particularly underlined by the results observed in the ablation experiment in Tab. 2, which shows superiority in 1y-OS predicition for the baseline approach, while ABMLA, LACF as well as the full model lead to worse results.
    • Regarding the depicted Kaplan-Meier curves for PFS and OS, the p-values also do not seem to represent the uncertainties that are highly likely under the specific experimental setup, notably including additional covariates, such as overall immune response, patient age, comorbidities, etc..
    • Overall, I would therefore expect limited reproducibility on a separate dataset.

    • Regarding the average pooling introduced in Sec. 2.2, I felt that it may clearly limit the information acquired from the context. Notably, additional factors such as scanner noise & scattering, biological differences due to sex, age or body weight, as well as many other covariates might significantly influence the results here. It would be strongly appreciated if the authors would briefly discuss how they took care of these factors while deriving their feature description.

    • Regarding the experimental setup, on p6 the authors describe they utilize a batch size of 1 (which may itself be a tricky choice due to the inner workings of many normalization layers) with a gradient accumulation over 32 samples “due to the nonuniform lesion numbers across patients”. Since the authors propose an approach that is capable of fusing multiple regions, what is the contribution of this setup and how does it support training?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • In the abstract, the authors describe their method to be “significantly” outperforming other SoA approaches - a claim repeated at multiple occasions. However, no significance tests have been conducted at any point other than within their method (cf. Fig.2), and would be difficult with the chosen experimental setup. Thus, the word “significant” should be omitted.
    • In the introduction, p2, I would have liked reading something about the current clinical assessment, which could have served as a baseline and would strongly support the authors’ argument regarding superiority, if shown.
    • Also in the introduction, the authors claim that graph attention mechanisms fail to take into account “crucial spatial context, including lesion distribution and its anatomical correlation with surrounding organs”. Why is that? Does this only refer to the cited publications? It should be noted that there is a large variety of region-aware GCN approaches, and even one of the cited approaches specifically takes into account the anatomical subregion [28].
    • On p3, the authors claim that the learned attention score is interpretable and can show how each lesion location contributes to prognosis prediction. However, this is neither shown nor clearly evident, since attention scores may themselves be limited in interpretability. While I understand the point the authors make, this should either be shown through (at least qualitative) results or underlined by an adequate citation.
    • On p4, the authors outline the results of their lesion segmentation approach. While first it is not fully clear what the authors’ FPV and FNV values refer to (percentages? ml? voxels?), secondly the results should be outsourced to the results section, rather than be put into the methodological description. Finally, the word “significantly” should be omitted due to the lack of any significance testing.
    • On p6, the authors write that they propose an “innovative solution”. I would recommend using a less bold wording here, such as “a novel solution”, “we propose a reformulation of …”, etc..
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (see above)

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents an automated deep learning-based approach for lymphoma prognosis using PET-CT images. Specifically, it introduces a two-stage method: the first stage focuses on extracting meaningful high-resolution features from PET-CT scans, while the second stage predicts a survival risk score based on these features. The primary contribution lies in the second stage, where two key modules, LACF and ABMLA, are introduced. LACF integrates lesion and anatomical features using an attention mechanism: anatomical features, extracted from the pretrained TotalSegmentator network, are fused with the lesion features extracted from the first stage. The ABMLA block employs a self-attention mechanism to aggregate these anatomy-aware features, assigning different attention scores to individual lesions and combining them to generate the final global risk score.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key innovation in this work is the use of a self-attention block to aggregate diverse latent features while assigning varying attention scores, enhancing model interpretability. This enables the final predicted risk score to be visually represented and supported by individual lesion scores, reinforcing the clinical significance of the self-attention weights learned during training. By prioritizing interpretability, this approach ensures the proposed method remains relevant for clinical applications, making it an essential feature in any framework implementation.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Despite the promising introduction of the ABMLA block, the rest of the framework lacks a rigorous and fair comparison with state-of-the-art (SOTA) solutions.

    Specifically:

    • The authors do not adequately justify the choice of using two independent encoders for feature extraction instead of a single encoder block.
    • The connection between the three decoder branches and the two encoders is not clearly explained, nor is it justified how these branches effectively extract CT, PET, and fused PET-CT features independently.
    • Regarding the initial segmentation step, the claim that the chosen configuration “significantly” outperforms the baseline nnUNet is not quantified.

    For the LACF block (anatomy and lesion feature fusion), a more appropriate comparison would have been against a network that takes both PET-CT and TotalSegmentator masks as input, allowing the network to learn to fuse this information during training.

    Finally, for the downstream risk score prediction task, the authors should have included comparisons with other transformer-based networks, such as ViT, as well as standard convolutional-based classification networks like ResNet or SegNet.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Overall, the paper is well-structured and easy to follow, providing a detailed explanation of all the introduced blocks in the framework. I particularly appreciated the methodological section, where each component is presented and justified in terms of its clinical significance and interpretability.

    However, some aspects could be better explained. For instance, the process of concatenating lesion features and anatomy-aware lesion features remains unclear—specifically, why average pooling was used instead of a simple masking approach. Additionally, further clarification is needed on how anatomy-enhanced lesion features are computed using cross-attention. If organ features serve as the value in the attention block, wouldn’t it be more appropriate to refer to the fused features as “lesion-enhanced anatomy features” instead?

    Finally, regarding the use of TotalSegmentator for extracting anatomy-specific features, I assume the features were directly obtained from the pretrained network. However, since TotalSegmentator generates its segmentation masks through an ensemble of five different models, each focusing on different organs, how was this ensemble-based prediction accounted for in the feature extraction process presented in the paper?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the lack of a rigorous comparison, which raises doubts about the actual advantages and performance of the proposed method over existing solutions, the introduction of the two attention-based blocks is noteworthy. Their application in enhancing clinical interpretability adds significant value to the study. Given that key methodological choices—such as the selection of baseline methods for comparison—are properly justified, this work can be of interested to be presented to the MICCAI scientific community.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    In the paper, a novel approach for segmenting and evaluating lymph nodes in conjunction with the surrounding organs. For this end, the authors propose a new approach that combines a segmentation approach jointly with an prediction approach. The major novelty in this is the consideration of organ structures in addition to the lymph nodes in assessing the state of the patient.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Lymphoma is currently under-investigated and the proposed approach is a nice and relevant addition to existing approaches. It adds a creative idea to the current line of reserach.

    The used patient cohort is rather large for the given disease.

    The experiments are well-conducted

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The clarity of the paper could be improved. Although it is understandable and the order is meaningful, I had some difficulty in understanding some details.

    The technical novelty is present, but not too big. The primary improvement is the general idea and its implementation.

    The proposed joint solution for segmenting and classification is technically not necessary. The paper could have been stronger with a clear focus on the prediction.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    I do not feel confident that I would be able to reproduce the proposed solution, it would be great to release some code or update the explanations with more technical details. (I know that the latter is difficult for a MICCAI paper due to the length restrictions)

    The authors claim that the features of the segmentation should be meaningful for the downstream task also but do not evaluate that claim. This would further strengthen the paper.

    The smaller numbers in the table are the standard deviation? This is unclear to me, additional clarification would reduce ambigiouty.

    Only private data are used for the evaluation. Two points: The paper would benefit from also testing it on a public dataset (although I don’t have one in mind) and a ethical vote is missing for the private data.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I found the paper being interesting and well-done. The underlying idea is novel, the evaluation is good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

N/A




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top