Abstract

Accurate tracking of abdominal lymph nodes (LN) across follow-up computed tomography (CT) scans is crucial for colorectal cancer staging and treatment response evaluation. However, establishing reliable LN correspondences remains underexplored due to challenges including scale variations, low resolution, difficulty distinguishing nodes from adjacent structures, inability to handle tissue deformation, and dynamic visibility. To address these challenges, we propose an asymmetric matching framework that strikes a balance between enhancing LN specificity and contextual correlations. For specificity, we achieve cross-dimensional feature consistency and generate discriminative LN features via self-supervised learning on orthogonal 2D projections of 3D node volumes. For correlation, we develop a graph model capturing lymphatic topology within scans, reinforced by temporal contrastive learning that encourages consistency between matched node pairs across CT. To balance specificity and correlation, we propose a multi-module architecture that integrates volumetric LN features with projection embeddings through attention-based fusion, enabling confidence-calibrated similarity assessment across temporal scans. Experimental results demonstrate that our solution provides reliable lymph node correspondence for clinical follow-up and disease monitoring. Code is available at https://github.com/maoyij/Asymmetric-Matching.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3853_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/maoyij/Asymmetric-Matching

Link to the Dataset(s)

N/A

BibTex

@InProceedings{MaoYij_Asymmetric_MICCAI2025,
        author = { Mao, Yiji and Zhang, Yi and Zou, Xinyu and Zheng, Yuling and Huang, Hao and Zhang, Haixian},
        title = { { Asymmetric Matching in Abdominal Lymph Nodes of Follow-up CT Scans } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {34 -- 43}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The contribution of the paper is an asymmetric matching framework that integrates spatial and temporal information in Abdominal Lymph Nodes of Follow-Up CT Scans.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method seems interesting and unique. The paper addresses an emerging clinical problem. The paper shows a massive performance jump.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) The text in the figures is too small, it’s very difficult to read in the printed version. 2) I am a bit surprised. The entire paper is on asymmetric matching, but nowhere it is not defined, which must be. 3) Need to correct the typos-like pronlem on page 3. 4) In Fig. 1, the flow denotes that (c) is the initial step. Also, the texts support that. Section 2.2 explains the Specificity Spatial Extraction Module first. Then why it’s (c) in Fig. 1? Am I missing something? 5) “… nine orthogonal 2D projections from each 3D LN are treated as positive samples …” - How did you get the orthogonal 2D projections? Also, how did you get the 3D feature vector as well from a CT sub-volume? 6) What is meant by the specificity in the SE module? What is the motivation for 2D-3D contrastive loss? 2D and 3D representations of the same sub-volume lie in two different spaces, how can we compare them? When you measure the similarity between f^2D and f^3D, it will never be similar. I am lost. Please explain. 7) Without proper explanations, multiple methods have been used sequentially to come up with an integrated solution. What is the purpose of using a Graph Neural Network? I was reading TE, suddenly I found GNN is being used without explaining the the motivations. 8) The proposed method appears overly complex, and the most significant drawback of the paper is its lack of clarity.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the weaknesses.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The authors’s reponses do not clear my doubts.



Review #2

  • Please describe the contribution of the paper

    This paper addresses the challenging and underexplored problem of lymph node tracking in abdominal follow-up CT scans, where node appearances and disappearances between timepoints add complexity beyond traditional lesion tracking tasks. Unlike conventional approaches that rely on center-point matching, the authors formulate the task as an asymmetric matching problem, enabling the identification of newly appearing and disappearing lymph nodes—an important clinical consideration. To tackle this problem, the authors propose a novel multiple object tracking framework composed of three core modules: (a) a correlation-extraction module, (b) a specificity-correlation balancing strategy, and (c) a specificity-spatial extraction module. The method takes as input a pair of lymph node ROIs (one from each timepoint) and predicts whether they represent the same anatomical structure. The framework is evaluated using a newly introduced private dataset with annotated segmentations and node correspondences across two timepoints. Evaluation metrics include matching precision, new lymph node rate, and disappeared lymph node rate, which are well-aligned with the clinical motivation. An ablation study demonstrates the contribution of each module, and experiments show that model performance remains robust across different backbone architectures (ViT and ResNet variants). The proposed method outperforms a representative baseline from computer vision (“Learnable Graph Match-ing: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking”). Furthermore standard lesion tracking approaches using conventional metrics (e.g., CPM@Radius, mean Euclidean distance) are evaluated on the newly introduced dataset.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper addresses a clinically relevant and technically challenging problem: lymph node tracking in longitudinal CT scans, where nodes may appear or disappear between timepoints—a scenario not adequately handled by standard lesion tracking methods.
    • The authors propose a novel task formulation based on asymmetric matching rather than conventional center-point tracking, which more accurately reflects the nature of lymph node progression and regression.
    • A new private dataset is introduced for this task, including manual segmentations and inter-timepoint correspondences, which is a valuable contribution to the field.
    • The proposed framework is thoroughly evaluated through an ablation study, highlighting the importance of each module within the architecture.
    • The method demonstrates strong performance and clearly outperforms a representative baseline (Learnable Graph Matching), while also benchmarking against standard medical lesion tracking methods.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Clarity and Reproducibility: The method is highly complex and currently lacks sufficient detail to ensure reproducibility. The training pipeline is not clearly described—specifically, it is unclear whether all components are trained jointly or in a staged manner (e.g., contrastive learning followed by graph learning and fine-tuning). The process by which the Correlation Extraction Module is fine-tuned via the Specificity-Correlation Balancing Strategy also requires clarification. For example, is the balancing strategy used only for inference or also for training?
    • Unclear Inference/Prediction Formulation: Despite introducing several loss functions, the paper does not clearly explain how the final predictions are generated. It remains ambiguous whether outputs come directly from the graph network or are modified/overwritten by the balancing strategy. Furthermore, the integration of different modules (e.g., correlation module, SCB, SEAG) and how their outputs are combined to yield the final results should be better explained.
    • Graph Learning Ambiguity: The graph construction process is briefly mentioned but not detailed—specifically, when and how often the graph is constructed. There is also confusion around the feature representations: Are the 128-dimensional features in TE and SEAG extracted from the same encoder? Are these features pre-trained using only the supervised contrastive loss?
    • Figure Clarity: The main method figure is difficult to interpret. Repetitive color usage (e.g., red representing different elements in different subfigures) causes confusion. The temporal scheduling of training (parallel vs sequential) is not visually or textually explained. Additionally, several inconsistencies exist between the figure and the main text—for instance, loss term LtmfL_{tmf}Ltmf appears only in the figure, while LscbL_{scb}Lscb is present in the text but missing from the figure. The visual flow between T1/T2 volumes, patches, and ROIs across different panels is also unclear.
    • Limited Value in Certain Experiments: The experiment evaluating different backbone networks (ViT vs ResNet variants) adds little value, as it does not meaningfully contribute to understanding the model’s performance or limitations. This space could be better used to provide deeper insight into the model architecture and training strategy.
    • Ineffective Visualizations: The heatmaps in Figure 2 do not clearly convey useful or interpretable information and could be replaced by a more illustrative depiction of the model’s internal reasoning or tracking decisions.
    • Comparison to Related Work Needs Contextualization: The comparison with prior methods (e.g., Table 4) lacks interpretability. Since the problem formulation differs from standard lesion tracking (e.g., direct matching vs center-point tracking with lesion localization), a direct comparison is difficult. It would be helpful to propose a proxy evaluation setup (e.g., CPM@Radius plus postprocessing to determine newly appeared/disappeared instances) to establish a fairer comparison or, at minimum, acknowledge these differences more explicitly.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper addresses a relevant and important clinical problem with a novel task formulation. However, substantial clarification is needed for the reader to fully understand and assess the proposed method. The manuscript would benefit greatly from:

    • Clearer and more detailed methodological explanation: Key aspects such as the training schedule, module integration, and output generation pipeline should be described more explicitly. This would significantly enhance the paper’s readability and reproducibility.
    • A more intuitive and self-explanatory figure: The current method figure is difficult to interpret due to ambiguous visual encoding and lack of temporal context. A revised version with a clearer flow of information and consistent color coding would be helpful.
    • Improved space allocation: A considerable amount of space is dedicated to experiments with limited added value (e.g., backbone comparison and standard lesion trackers). This space could be better used to provide in-depth explanations of the core framework.
    • Reconsideration of the heatmap visualizations: The presented heatmaps do not clearly illustrate trends or offer significant insight into the model’s behavior. Alternative visualizations or more interpretive commentary would strengthen this section. Overall, the idea and task are valuable, but the clarity of presentation must be improved to fully appreciate the contributions.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper addresses an important and clinically relevant problem and proposes a novel task formulation, the significant methodological ambiguities and lack of clarity in key parts of the paper currently outweigh its contributions. The proposed method is complex, yet the description is insufficient to ensure a clear understanding or reproducibility. Additionally, some experimental sections provide limited insight and could be better utilized to strengthen the methodological presentation. In its current form, the paper would require substantial revisions to be suitable for acceptance. While the work shows promise and could potentially be impactful, the depth of revision needed goes beyond what is feasible within the scope of a single review cycle.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The authors still did not mention all details about the model inference. (E. g. From rebuttal “The outputs of SCB subsequently refine the identification of new/disappeared nodes, enhancing overall accuracy”; How are the matchings redefined? If there are two different classification results which one is then taken?)

    I am still not convinced that the big amount of missing details in the model description can be solved until publishing at MICCAI (Especially, as there is not another review cycle). Thus, i did not change my opinion on the paper.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel asymmetric matching framework for tracking abdominal lymph nodes (LNs) across longitudinal CT scans, addressing key challenges such as anatomical variability, deformation, and ambiguous appearance. The method enhances node specificity by enforcing feature consistency across 2D projections of 3D LN volumes using self-supervised contrastive learning. Another contrastive loss component to ensure temporal consistency of the same LN features among different time points also exists. To improve contextual correlation, the authors develop a graph-based model that captures intra-scan lymphatic topology and uses temporal prototypes for inter-scan alignment. A dual-stream architecture combines volumetric features and projection embeddings via attention-based fusion, yielding confidence-calibrated similarity scores for accurate LN correspondence.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Tracking lymph nodes over time is crucial for cancer staging and treatment monitoring, yet rarely tackled due to anatomical and imaging challenges.
    • Using 2D slices to enhance contrastive learning through 2D–3D alignment is a clever strategy to increase the diversity of views in a uni-modal setting. This cross-dimensional approach improves feature consistency and boosts node-level distinctiveness.
    • Experiments show that the framework is agnostic to different 3D networks for feature extraction (Table 2).
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The final loss function is not given. Is it a weighted sum of all the explained loss components? If so, what are the weights?
    • No data augmentations are mentioned. The contrastive losses can benefit from aggressive augmentations.
    • Too many unnecessary in-line mathematical expressions (e.g., cosine similarity, attention, symbols with many sub- and superscripts), creating clutter and reducing readability.
    • Overall, the language of the paper is not very clear. There are also typos here and there.
    • Experiments are conducted on a private dataset. In sake of comparability and reproducibility, public datasets could be used alongside. There aren’t many to find, but since the proposed method should work with any cancer type with multiple small nodules and not just lymph nodes, lung nodules from NLSTt (https://conferences.miccai.org/2022/papers/468-Paper0060.html) could further verify the performance.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Please reformulate NLR and DLR as “accuracies”.
    • Recall should be reported along precision. False negative nodules are important, as also stated by the authors.
    • Why is your method not included in Table 4? Please update.
    • What is the final loss function? (see 7.)
    • In section 2.3, the wording “128-dimensional 3D features” causes ambiguity. Rephrase as “128-dimensional feature representations/embeddings from 3D views”.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall this is a satisfactory paper. There are some weaknesses (see 7.) and improvements to make (see 10.).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their suggestions and affirmation. R2 mentions “interesting and unique method” and “a massive performance jump”. R3 highlights “2D–3D alignment is a clever strategy”. R4 notes “strong performance” and “thoroughly evaluated”. We are pleased to address all the concerns.

[Writing and Presentation Issues, R2, R3, R4] We have (1) rewritten and emphasized the asymmetric matching definition in Section 2.1 for improved clarity; (2) redrawn Fig. 1 to enhance visual presentation; (3) corrected all typos and refined mathematical formatting; and (4) reorganized the manuscript and improved the writing for better logical flow and readability. The code will be available to support reproducibility.

[Module Design and Functionality, R2, R4] The SE module captures fine-grained differences to enhance inter-node specificity, which is critical for matching due to the similar appearance and semantics of lymph nodes. To address this, we extract nine 2D views from each lymph node’s 3D cube—three orthogonal slices and six diagonal/oblique cuts. Each 2D view is encoded into a 128-dimensional vector aligned with its 3D representation. We treat a node’s 2D views and 3D representation as positive pairs, and use 2D views from other nodes as negatives to build contrastive loss. The SEAG and SCB modules function independently, both receiving 128-dimensional representations of lymph nodes from the SE module. SEAG refines features by modeling intra-CT relational context with a GNN, where lymph nodes are nodes connected by edges based on pairwise Euclidean distances. Node and edge updates integrate neighboring information, as detailed in Sec 2.3. The SCB module balances specificity and correlation between lymph node pairs across CT scans through multi-view fusion, enabling detection of new/disappeared lymph nodes. TE is a feature enhancement strategy that introduces an auxiliary loss based on contrastive learning to support model training.

[Training Strategy, R3, R4] (1) The model is trained end-to-end in a joint manner. At each training step, the input data is first processed by the SE module to extract features, which are then fed into the SEAG and SCB modules. Each module computes its own loss, and together with the auxiliary loss from the TE strategy, the total loss is the unweighted sum of all four losses. (2) Data augmentation is not used in this study, as it led to decreased performance in experiments. This is likely due to the small size of lymph nodes, which makes them sensitive to subtle distortions, thereby impairing specificity and correlation modeling.

[Inference Formulation, R4] All modules, including SE, SEAG, and SCB, contribute to inference. SE performs feature extraction. The outputs of SEAG are used to compute pairwise similarities between lymph nodes across CT, and applies the Hungarian algorithm for initial matching. The outputs of SCB subsequently refine the identification of new/disappeared nodes, enhancing overall accuracy.

[Experiments Evaluation Concerns, R3, R4] (1) The heatmaps were intended to intuitively illustrate how the model identifies the same lymph node across CT. While we believe they provide meaningful insights, we understand they may not be essential and are open to removing them if necessary. (2) The backbone comparison has been removed as suggested. (3) Regarding dataset generalization, we considered the NLSTt but found its one-to-one scan matching setup different from our asymmetric matching task. While we can include additional results on NLSTt to demonstrate applicability, its lack of prior matching studies limits its comparative value. (4) Lastly, we agree that asymmetric lymph node matching is a distinct task and will clarify the differences from center-point tracking to better contextualize comparisons. Introducing postprocessing in the evaluation setup may obscure whether improvements arise from the baseline method or the postprocessing itself, compromising fair comparison.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    There are several major issues of this paper, e.g., entire paper is on asymmetric matching, but nowhere it is not defined; method is highly complex yet lacks sufficient details, limited value in the experiment (using different backbones), etc. Hence, recommend rejection.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a method for asymmetric matching to accurately track abdominal lymph nodes across follow-up CT scans. However, the manuscript requires significant improvements in clarity before it can be considered for acceptance.



back to top