List of Papers Browse by Subject Areas Author List
Abstract
Accurate T-staging classification of nasopharyngeal carcinoma (NPC) is crucial for guiding individualized treatment strategies and predicting patient prognosis. However, this task remains challenging due to the limitations of unimodal approaches, which often fail to capture the full complexity of NPC progression, and the severe class imbalance in clinical datasets, where early-stage cases (T1 / T2 stage) are significantly underrepresented. In this paper, we propose a Prototype-Aware Dynamic Fusion Network (PDF-Net), a novel multimodal framework that integrates MR images with Epstein-Barr virus (EBV) DNA tabular data to improve NPC T-staging classification. Our framework introduces two key components: (1) the Dynamic Multi-Modal Alignment (DMMA) module, which aligns MR imaging features with EBV DNA data to capture complementary information across modalities, and (2) the Optimal Prototype-Aware Transport (OPAT) module, which incorporates a Prototypical Constraint to enhance the representation of T2-staging features and mitigate class imbalance. To the best of our knowledge, PDF-Net is the first framework to leverage EBV DNA data as an auxiliary tool for T-staging classification, significantly improving accuracy and robustness. Experimental results in a real clinical dataset demonstrate that our approach outperforms state-of-the-art methods, achieving an accuracy of 0.8006 ± 0.0488 and an AUC of 0.8191 ± 0.0551 for T1C images, highlighting its potential to advance NPC diagnosis and personalized treatment strategies.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3748_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{LuWan_PDFNet_MICCAI2025,
author = { Lu, Wantong and Han, Xu and Wei, Yibo and Ye, Zanting and Lu, Lijun},
title = { { PDF-Net: Prototype-Aware Dynamic Fusion Network for Nasopharyngeal Carcinoma T-staging Classification with Epstein-Barr Virus DNA } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15960},
month = {September},
page = {485 -- 494}
}
Reviews
Review #1
- Please describe the contribution of the paper
The main contribution of this paper is the development of PDF-Net, a multi-modal framework that integrates MR images with EBV DNA tabular data for NPC T-staging classification. The method includes two innovative components: (1) The Dynamic Multi-Modal Alignment module, which aligns MR image features with EBV DNA data to capture complementary information across modalities. (2) The Optimal Prototype-Aware Transport module, which incorporates a Prototypical Constraint to address class imbalance, particularly for underrepresented T2-stage cases.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
(1) The PDF-Net framework integrates two diverse modalities—MR images and EBV DNA data—demonstrating a new way of handling multi-modal data for NPC classification. (2) The introduction of the DMMA and OPAT modules addresses significant challenges like class imbalance and the alignment of multi-modal features, which is a contribution to the field. (3) The experimental results demonstrate that PDF-Net outperforms state-of-the-art methods on a real-world clinical dataset.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(1) The authors claim that their framework is the first to incorporate EBV DNA as an auxiliary tool for T-staging classification. However, as mentioned in the AJCC Staging Manual (page 107, [R1]), the T0 stage in NPC already considers EBV-positive cervical node involvement, which implies that EBV DNA is already considered in the T-staging system. This raises questions about the novelty of incorporating EBV DNA in the proposed method.
(2) The abstract does not specify which task’s accuracy is being reported. It would be helpful for the authors to clearly state that the accuracy mentioned refers to T-staging classification of NPC.
(3) In the section Prototype Extraction, it is unclear whether the Prototypical Network is trained together with other networks or separately for the extraction of feature F_P. It is also unclear whether the Prototypical Network is used for classification into T2 vs. non-T2 directly through a classifier applied to the extracted features (F_P), or if other methods are used. Further clarification on the training procedure for the Prototypical Network would improve the transparency of the methodology.
(4) There are several writing inconsistencies in the paper: (4a) Inconsistent punctuation in citations: The body text and citations should be properly separated by commas. (4b) Repetition of full terms and abbreviations: The paper often repeats both the full term and abbreviation (e.g., “Vision Transformer (ViT)”). This can be streamlined by consistently using either the full term or the abbreviation once introduced. (4c) Unnecessary capitalization and abbreviations: Terms like “Cross-Entropy Loss (CE Loss)” are not needed, as the abbreviation “CE Loss” is not referenced in the later sections of the paper. It’s important to avoid redundant abbreviations or terms that are not used consistently throughout the manuscript.
[R1] Amin, M.B., et al., AJCC Cancer Staging Manual, 2017, eighth ed. Springer, New York. Page 107.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
As pointed out in Question-7, the main factors leading to my recommendation are the novelty concerns regarding EBV DNA and several writing and formatting issues. These points are critical and should be addressed for the paper to be considered for acceptance.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The authors provided a generally adequate rebuttal addressing several of the concerns raised, particularly regarding the novelty of their approach and clarification of the methodology. However, the core issue remains that the claimed novelty of incorporating EBV DNA into T-staging classification is weakened by prior clinical guidelines already considering EBV positivity in NPC staging (as per the AJCC manual). Although the authors clarify that their focus is on T2–T4 stages rather than T0, the broader claim of being the “first to incorporate EBV DNA” still feels overstated. Additionally, while the authors explained how the Prototypical Network is trained separately, this detail should have been clearly presented in the original manuscript. The paper would benefit from stronger validation across multiple datasets or more detailed ablation studies to support the effectiveness of the proposed fusion method. Given these unresolved concerns, I maintain a Weak Reject stance.
Review #2
- Please describe the contribution of the paper
This study innovatively integrates EBV DNA data with MR images for automated diagnosis of nasopharyngeal carcinoma (NPC) staging, while proposing a method to enhance the model’s focus on early-stage NPC samples.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
This article elaborates on how to integrate EBV DNA data with MR images to improve the accuracy of nasopharyngeal carcinoma identification and provides methods to enhance stage-specific identification accuracy, facilitating reproducibility and promotion.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The study relied exclusively on single-center data, and the generalizability of the results requires further validation. There are some advices to improve the article: 1.In Fig 1, the part of OPAT, colorful balls are used to represent the vectors. Does the size of the ball have any meanings? If does, it’s is necessary to explain it. Besides, the meaning of light blue and red ball is not shown in the legend. 2.The Prototype Extraction part is in Paragraph 2.2 that describes OPAT module. However, this part seems not belong to OPAT module, which is shown in Fig 1. Please check whether the structure of this part is appropriate. 3.In the part of Prototype Extraction, it is said that T3 and T4-stage patients is randomly selected. What is the proportion of this selection? Does the number of selected non-T2 samples equal to T2 samples? It is not clearly described. 4.Showing the result of the accuracy of T2-stage patients could strongly proof that the model enhance the attention to T2-stage cases.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The study demonstrates that integrating EBV data with MR imaging could enhances the identification of nasopharyngeal carcinoma (NPC) staging and provides methods to improve early-stage NPC detection, though the data validation remains limited.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I have no further comments on the paper, and my positive rating remains unchanged.
Review #3
- Please describe the contribution of the paper
This paper proposes a Prototype-Aware Dynamic Fusion Network (PDF-Net), a multimodal framework that integrates MR images with Epstein-Barr virus (EBV) DNA tabular data to improve NPC T-staging classification. In specific, PDF-Net leveragtes a dynamic multimodal alignment (DMMA) module to align image and DNA features, and an optimal prototype-aware transport (OPAT) module to enhance the representation of T2-Staging features for class imbalance mitigation.
The proposed method is evaluated on a clinical dataset for T-staging classification, demonstrating favorable improvements in performance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
S1: This paper explores integrating EBV DNA data with images while addressing the class imbalance issue, which is of interest to the community. They are the first to incorporate EBV DNA data for T-staging classification. S2: Compared to the prior methods evaluated in the paper, the proposed method demonstrates non-trivial improvements in performance. S3: The paper is well-structured and easy to follow.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
W1: This paper aims to address the class imbalance issue, but does not show the results for minoroty groups, which weakens the claim. W2: Do the comparing multimodal methods use the same image and tabular encoders as PDF-Net for a fair comparison? W3: The architecture design/decision to align tabular features to image features, rather than image features to tabular features, is not well justified. here is no theoretical or experimental explanation for this choice in the paper.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The colors and lines in Fig. 1 (c) are confusing. What do purple and red balls represent? Do those red double arrow lines indicate that the representations are being pushed apart?
The implementation details are incomplete. What are the learning rate, optimizer, the number of training epochs?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I find the research idea to integrate DNA data with images while addressing class imbalance through prototypical constraint is interesting. However, the paper would benefit from a clearer justification for aligning tabular to image features, as well as ensuring consistency in encoder usage across compared multimodal methods.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I really appreciate the detailed feedback from the authors.
After reading the explanations in the rebuttal, my concerns regarding the consistency of encoder usage and the motivation for aligning tabular to image features have been addressed. The authors have also committed to improving the metric reporting in the final version, and I look forward to seeing these revisions and clarifications incorporated.
I have therefore decided to keep my positive score.
Author Feedback
We sincerely thank all reviewers for their thoughtful and constructive comments. Novelty and Motivation (R1-Q1 / R3-Q3 / R2-General): We thank Reviewer #1 for raising an important point. While the AJCC manual uses EBV positivity to define T0-stage NPC, this is a binary diagnostic criterion limited to identifying nodal involvement without a visible primary tumor. In contrast, our study targets locally advanced NPC (T2–T4), which accounts for most real-world cases due to the difficulty of early detection. We will revise the manuscript to clarify that our task focuses on this clinically dominant subgroup. Although EBV is a recognized biomarker in NPC diagnosis [1], prior work has focused on prognosis or treatment monitoring [2] rather than integrating EBV DNA into predictive models. Our work is, to our knowledge, the first to model temporal EBV DNA dynamics across multiple timepoints and fuse them with MR imaging in a unified learning framework for objective T-staging. By reporting classification metrics (e.g., AUC), we provide the first quantitative assessment of EBV’s predictive contribution to staging, going beyond post hoc survival correlations. To enable this, we propose a novel alignment-based fusion that maps auxiliary EBV features into the image feature space, leveraging their complementarity. We further introduce prototype-aware constraints to enhance recognition of underrepresented T2-stage cases under class imbalance. Regarding alignment direction, we map tabular features to image features based on data availability and representational hierarchy. MR images are consistently available and clinically central, while EBV DNA is often missing due to high cost and collection difficulty. ViT-derived image features provide richer spatial-semantic context, making them a more stable reference space for alignment. Our dataset is single-center due to the lack of dual-modality public cohorts, but we ensured rigorous partitioning and validation. The proposed framework is modular and generalizable to future multi-center settings. [1] Quantitative analysis of cell-free Epstein-Barr virus DNA in plasma of patients with nasopharyngeal carcinoma. 1999. [2] The prognostic value of plasma Epstein-Barr viral DNA and tumor response to neoadjuvant chemotherapy in advanced-stage nasopharyngeal carcinoma. 2015. Unclear Parts (R1-Q3 / R2-Q2/Q3): The Prototypical Network is trained separately for T2 vs. non-T2 classification using the largest ROI slice. For each epoch, we save the training T2 prototype and evaluate on the test set, selecting the best-performing prototype as FP, which is fixed for OPAT. We adopt few-shot episodic training (5 support + 5 query per class), maintaining a 1:1 ratio between T2 and randomly sampled T3/T4 cases. We will add a “Prototype Extraction” subsection in the final version. Figures and Visual Clarity (R2-Q1 / R3-Optional): In Fig. 1(c), colors denote modality: blue = image, green = EBV, orange = prototype. Ball sizes represent patient instances and are purely visual. In the alignment space, gray = EBV features; multi-colored = image features; red arrows = OT-based alignment. We will clarify this in the figure legend. Implementation and Experimental Details (R3-Q2 / R3-Optional): All baselines use the same ViT and FT-Transformer encoders as PDF-Net. Training used Adam with a learning rate schedule of 1e-4 → 5e-5 (epoch 50) → 2e-5 (epoch 100) over 150 epochs. Best checkpoint selected via validation AUC. Details will be included in the final version. Others (R1-Q2/Q4 / R2-Q4 / R3-Q1): We will revise the abstract to explicitly state that the reported accuracy and AUC refer to T-staging classification using masked T1C images. We will also improve writing consistency, citation formatting, and remove unused abbreviations. While class-wise metrics were omitted due to space, our design—including OPAT and prototype constraint—specifically targets minority class improvement. We will make this clearer in the final version.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This is a borderline submission. For the camera-ready version, the authors are encouraged to improve the clarity of the paper, particularly by addressing the concerns raised by Reviewer 1.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The authors has addressed several concerns about the novelty. But it is still not clear about the design as stated by R1 about EBV positivity in NPC staging. Also, all reviewers pointed out the limited validation and should considered the wide multi-center datasets. After reading the paper, reviewers’ comments and authors’ feedback, I consider to reject this paper as this submission still needs improvements.