Abstract

Pulmonary Embolism (PE) is a life-threatening condition. Computed tomography pulmonary angiography (CTPA) is the gold standard for PE diagnosis, offering high-resolution soft tissue visualization and three-dimensional imaging. However, its high cost, increased radiation exposure, and limited accessibility restrict its widespread use. In this work, we aim to introduce faster diagnosis opportunities by using 2D chest X-ray (CXR) data. CXR provides only limited two-dimensional visualization and is not typically used for PE diagnosis due to its inability to capture soft tissue contrast effectively. Here, we develop a novel methodology that distills knowledge from a trained CTPA-based teacher classifier model embedding to a CXR-based student embedding, by feature alignment - leveraging paired CTPA and CXR features as supervision, which can be readily acquired. This enables us to train without requiring annotated data. Our approach utilizes a latent diffusion model to generate CTPA-based PE classifier embeddings from CXR embeddings. In addition, we show that incorporating cross-entropy loss together with the corresponding loss of the teacher-student embeddings increases performance, bringing it close to clinical-level performance. We show state-of-the-art AUC in a PE categorization task using only the initial CXR input. This approach broadens the diagnostic capabilities of CXRs by enabling their use in PE classification, thereby extending their applicability beyond traditional imaging roles. The code for this project will be made available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3177_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/meshims/Cross-Modal_CXR-CTPA_Knowledge_Distillation

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CahNoa_CrossModal_MICCAI2025,
        author = { Cahan, Noa and Sizikov, Meshi and Greenspan, Hayit},
        title = { { Cross-Modal CXR-CTPA Knowledge Distillation using latent diffusion priors towards CXR Pulmonary Embolism Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {125 -- 135}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper is about diagnosing Pulmonary Embolism (PE) from X-rays. In training, Computed tomography pulmonary angiography (CTPA) data is available, which is used to help the prediction. The way it is used is to map X-ray embeddings to CTPA embeddings, which are then used as input to the PE classifier.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The figures present a clear overview of the technical concepts. The method part also clearly describes the methods, although I have concerns about positioning this work as knowledge transfer, see below.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Related work:

    • The paper does not refer to any prior work on X-ray to CT generation, which is very related (Xprospect, X2ct, VolumeNeRF, XctNet, etc).
    • Similarly, no refeernce to PE classification studies.

    Method:

    • Referring to the entire approach as knowledge distillation is confusing, as the authors correctly state that they commonly use teacher-student networks. Instead, they use a latent diffusion model (LDM), which has been used for modality translation before. Instead of reconstruction, they directly use the PE classifier in latent space.
    • CT scans are processed with 2D network, then sequence of embeddings for one scan are concatenated. Why not using a 3D network?

    Results:

    • The only comparison is to the original knowledge distillation (KD) paper from Hinton in 2015. No comparison to more recent KD work. Most importantly, no comparison to other work on PE classification.
    • A problem with working in the latent space is that it is not clear whether pathologies are preserved or not. In generative tasks, generating pathologies is the critical and most challenging part. It is only indirectly evaluated with classification accuracy. The dataset is unbalanced with 34% of PE cases. Unfortunately, only accuracy is reported (73%), not balanced accuracy. Given these numbers, there is a limited insight as to whether pathologies are truly simulated.
    • The statement “All results are statistically significant” is confusing. What has been compared? I don’t believe that the comparison of sensitivity of 73.91 and 73.913 is statistically significant. -In the comparison of trainable parameters. What exactly is counted? The proposed approach does need two encoders and a classification model. Are they considered?
    • It would have been interesting to see how simpler models would perform in the prediction of CTPA embeddings.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of referring to related work, clearly positioning the approach, and then comparison to current state-of-the-art. Framing the task as knowledge distillation is not suitable. This seems like an attempt to claim novelty (diffusion in knowledge distillation), although a basic latent diffusion model is used. Hence, I don’t see a clear technical novelty.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a new approach to diagnosing pulmonary embolism (PE) using chest X-rays (CXR) instead of the gold standard computed tomography pulmonary angiography (CTPA). The authors develop a cross-modal knowledge distillation method that uses latent diffusion models to transfer knowledge from CTPA to CXR embeddings, allowing for PE diagnosis from the more accessible CXR modality. The approach demonstrates promising results with an AUC of 0.824, approaching the performance of CTPA-based methods while requiring fewer parameters.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1-new application in PE diagnosis using CXR 2-the use of diffusion models with cross modal knowledge distillation is relatively new 3-evaluations look feasible

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1-performance comparison is limited. Although authors said no prior studies, there are many other cross modal distillation method s that can be used / compared.

    2-datasize is small compared to Stanford’s recent dataset on the same topic (>20,000 if I remember correctly)

    3-clinical evaluation is missing.

    4-single center study , hence generalization ability of the method is limited.

    5-THe paper would benefit from analyzing the types of cases where the model fails.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a potentially impactful approach to PE diagnosis using more accessible imaging modalities - CXR. While there are some limitations, particularly regarding clinical validation and generalizability, the technical contribution is sound and the results are promising. Comparison with other cross modal distillation will make the paper more solid.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    the paper has three more methods to compare, some clarifications are made, and also the paper should be considered from application point of view. I think authors did also a good job in rebuttal.



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel cross-modal knowledge distillation (CMKD) framework using latent diffusion models to enable pulmonary embolism (PE) diagnosis from chest X-rays (CXRs). Key contributions include:

    Diffusion-based CMKD: A generative prior model translates CXR embeddings into CTPA-like latent representations, bridging the modality gap without requiring annotated CXR-PE labels.

    Synthetic data augmentation: Leverages the RSPECT dataset (7,000+ CTPA scans) with synthetic DRR-generated CXRs for pretraining, addressing limited paired data.

    Near-CTPA performance: Achieves an AUC of 0.824 on CXR-based PE classification, closing the gap with CTPA-only models (AUC: 0.858) while using 10x fewer trainable parameters than classic CMKD methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Innovative modality fusion: First to apply diffusion priors for CXR-to-CTPA embedding alignment in PE diagnosis, outperforming standard CMKD by 5.4% AUC (0.824 vs. 0.77) [Table 1].

    Computational efficiency: The diffusion prior model uses only 49M parameters vs. 126M in classic CMKD, enabling faster training on a single GPU.

    Clinical practicality: Eliminates reliance on annotated CXR-PE datasets, which are scarce due to CXR’s low diagnostic sensitivity for PE.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Limited benchmarking: Fails to compare against recent CMKD frameworks like X2CT [1] (transformer-based cross-modal fusion).

    Unclear clinical relevance: No validation on external cohorts or radiologist evaluations of generated embeddings’ interpretability.

    Simplistic loss function: Relies solely on L2 loss for embedding alignment, ignoring contrastive or adversarial objectives used in state-of-the-art CMKD.

    [1] Zhu, Rui, et al. “A Multimodal Fusion Generation Network for High-quality MR Image Synthesis.” 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2024.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the diffusion-based CMKD approach is novel and computationally efficient, the lack of benchmarking against 2023–2024 SOTA methods and unproven clinical applicability limit its immediate impact.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper aims to speed up the diagnosis of Pulmonary Embolism (PE) by using 2D chest X-ray (CXR) data instead of gold standard Computed Tomography Pulmonary Angiography (CTPA). The authors present a methodology that distills knowledge from a trained CTPA-based teacher classifier model embedding to a CXR-based student embedding through feature alignment - using paired CTPA and CXR features as supervision.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a new way to diagnose Pulmonary Embolism (PE) by using 2D chest X-ray (CXR) data instead of gold standard Computed Tomography Pulmonary Angiography (CTPA). A feasibility study of a PE categorization task using only the initial CXR input is presented. This approach extends the diagnostic capabilities of CXRs by enabling their use in PE classification, thereby extending their applicability beyond traditional imaging roles.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In Table 1, how do the authors explain that their proposed model is less sensitive than the classical CKMD model?

    The results presented in Table 2 are not clear enough.

    In Figure 4, to show how the process evolves, more t-SNE stages could be shown.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed new way to diagnose pulmonary embolism (PE) by using 2D chest X-ray (CXR) data instead of the gold standard computed tomography pulmonary angiography (CTPA) is attractive and has many potential applications.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their constructive feedback. Below we address key concerns raised:

Novelty and Technical Contribution: To our knowledge, ours is the first approach using CXR images directly for PE classification. Using generative modeling to shift between modalities, we open the possibility for earlier screening for PE, already in the ER, using widely accessible chest X-rays. We thus view the primary contribution of our work to be - “Application studies including work focusing on translation”. Also relevant to CAD. Our novelty lies explicitly in the application of the 1D diffusion prior within the context of CMKD in medical imaging. While diffusion models have been widely used for image generation, our application of diffusion for direct classification embedding-level knowledge transfer between modalities, without reconstructing images, represents a significant applicative innovation; The method can be generalized to any other modality while using 10x fewer trainable parameters than classic CMKD methods. Dataset[R1]: Our dataset (898 paired CXR-CTPA cases) is the largest reported for this task. Stanford’s large dataset is unpaired, making it unsuitable for our paired approach. We will explicitly clarify this distinction.

Comparisons Requested: Other CKMD Methods[R1,R3,R4]: We performed comparisons to recent CKMD methods[1,2,3], these were not included initially due to limited space. These results (AUC ranges 75.2%-80.13%) will be added to Table 1.

Image Generation[R3,R4]: We clarify explicitly that our diffusion prior operates on 1D embeddings for classification, unlike voxel-based image synthesis methods. Thus, direct image-quality metric comparisons are not applicable. We will broaden related work discussions accordingly.

PE Classification Methods[R4]: To our knowledge, ours is the first approach using CXR images directly for PE classification, making comparisons with existing CTPA-based methods inapplicable. Extensive experiments were done for optimal CTPA PE classification selection, and we are willing to add these details if reviewers find it beneficial.

Exploring different loss functions[R3]: We did not rely solely on L2 loss for embedding alignment. We chose L2 to isolate the effect of the diffusion prior. We highlight the existing BCE variant (+0.03 AUC) in Table 2 demonstrating extensibility and will explicitly add these results for completeness. In addition, we experimented with adding contrastive, feature-attention [3] and prototype loss [2] to our objective but did not add them to Table 2 as they did not improve our results. We will update the results to Table 2 accordingly.

Interpretability and Failure Cases[R1,R3]: We have existing Grad-CAM visualizations highlighting anatomical relevance and will mention future directions including these visualizations for enhanced clinical interpretability.

CTPA PE Classifier Selection[R4]: The choice follows SOTA methods for CTPA PE classification and leverages powerful 2D pre-trained backbones. This rationale for choosing this architecture will be added.

Performance Metrics and Presentation Clarity[R4,R5]: To enhance clarity, the revised manuscript will include: [R4]Balanced accuracy requested, computed as the average of sensitivity and specificity provided for existing predictions in Table 1 and 2 (e.g. in Table 1 accuracy: 73%; balanced accuracy: 73.46% for our model). [R4]Clarified statistical significance in Table 1: “All AUC results are statistically significant with Kolmogorov–Smirnov test, with p ≤ 0.05. [R5]We will improve the presentation of Table 2. [R5]Fig 4: We extend the existing t-SNE embedding plots at 30% and 60% training progress. [R4]Table 1 counts trainable parameters only (both encoders are frozen). We will clarify this in the caption and provide total parameter counts in the Supplement.

[1] Ienco,D.et al.(2024). arXiv:2408.07080; [2]Wang,S.et al(2023).arXiv:2303.09830.; [3]Yang,G.et al.(2023).Doi:10.1038/s41598-023-43986-y




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers raise some concerns, e.g., data coming from one center only and limited evidence for clinical utility. Nonetheless, based on the merits of the work, 3 of the 4 reviewers recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top