Abstract

Positron Emission Tomography / Computed Tomography (PET/CT) plays a critical role in medical imaging, combining functional and anatomical information to aid in accurate diagnosis. However, image quality degradation due to noise, compression and other factors could potentially lead to diagnostic uncertainty and increase the risk of misdiagnosis. When evaluating the quality of a PET/CT image, both low-level features like distortions and high-level features like organ anatomical structures affect the diagnostic value of the image. However, existing medical image quality assessment (IQA) methods are unable to account for both feature types simultaneously. In this work, we propose MS-IQA, a novel multi-scale feature fusion network for PET/CT IQA, which utilizes multi-scale features from various intermediate layers of ResNet and Swin Transformer, enhancing its ability of perceiving both local and global information. In addition, a multi-scale feature fusion module is also introduced to effectively combine high-level and low-level intermediate features through a dynamically weighted channel attention mechanism. Finally, to fill the blank of PET/CT IQA dataset, we construct PET-CT-IQA-DS, a dataset containing 2,700 varying-quality PET/CT images with quality scores assigned by radiologists. Experiments on our dataset and the publicly available LDCTIQAC2023 dataset demonstrate that our proposed model has achieved superior performance against existing state-of-the-art methods in various IQA metrics. This work provides an accurate and efficient IQA method for PET/CT. Our code and dataset are available at https://github.com/MS-IQA/MS-IQA/.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2780_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiSiq_MSIQA_MICCAI2025,
        author = { Li, Siqiao and Hui, Chen and Zhang, Wei and Liang, Rui and Song, Chenyue and Jiang, Feng and Zhu, Haiqi and Li, Zhixuan and Huang, Hong and Li, Xiang},
        title = { { MS-IQA: A Multi-Scale Feature Fusion Network for PET/CT Image Quality Assessment } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {402 -- 412}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose MS-IQA, a novel multi-scale feature fusion network combining ResNet and Swin Transformer features for PET/CT image quality assessment, and introduce PET-CT-IQA-DS, the first dedicated 2,700-image dataset with radiologist-assigned quality scores.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper introduce PET-CT-IQA-DS, the first dedicated 2,700-image dataset with radiologist-assigned quality scores. This dataset fills a critical gap in PET/CT quality assessment datasets and holds significant potential to advance research in PET/CT quality assessment.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The fusion of CNN and Transformer architectures has already been utilized in no-reference IQA (NR-IQA) tasks for both natural images [1] and medical images [2], demonstrating limited novelty.

    Regarding the LDCTIQAC2023 challenge:

    Why was the KROCC metric omitted when the evaluation originally included three metrics (PLCC, SROCC, and KROCC)? Could you provide the official source/reference for all participating teams’ results in LDCTIQA2023?

    [1] Lao, Shanshan, et al. “Attentions help cnns see better: Attention-based hybrid image quality assessment network.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. [2] Song, Tao, et al. “Md-iqa: Learning multi-scale distributed image quality assessment with semi supervised learning for low dose ct.” 2024 IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, 2024.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method lacks sufficient innovation, and the experimental evaluation is incomplete due to missing metrics.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents MS-IQA, a novel multi-scale feature fusion network for PET/CT image quality assessment (IQA), which integrates both low-level and high-level features by utilizing ResNet and Swin Transformer. It introduces a multi-scale feature fusion module with a dynamically weighted channel attention mechanism to enhance perceptual capabilities. Additionally, the authors created the PET-CT-IQA-DS dataset, consisting of 2,700 PET/CT images rated by radiologists, filling a critical gap in existing resources. Experimental results demonstrate that MS-IQA outperforms state-of-the-art no-reference IQA methods, showcasing its effectiveness in improving diagnostic accuracy in medical imaging.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper include:

    1. Innovative Approach: The introduction of the MS-IQA model, which combines low-level and high-level features through a multi-scale feature fusion network, addresses the limitations of existing IQA methods that typically focus on local or global aspects.

    2. Dynamic Feature Fusion Mechanism: A dynamically weighted channel attention mechanism effectively integrates features from various layers, enhancing the model’s ability to understand detailed artefacts and anatomical structures.

    3. Unique Dataset Creation: The construction of the PET-CT-IQA-DS dataset, consisting of 2,700 PET/CT images with radiologist-assigned quality scores, provides a valuable resource for evaluating and developing IQA methods in medical imaging.

    4. Superior Performance: Experimental results demonstrate that MS-IQA outperforms state-of-the-art no-reference IQA methods on PET-CT-IQA-DS and LDCTIQAC2023 datasets, highlighting its effectiveness and robustness.

    5. Comprehensive Validation: The inclusion of ablation studies validates the importance of each component in the MS-IQA framework, providing strong evidence for the proposed approach’s effectiveness.

    6. Clinical Relevance: By addressing image quality challenges in PET/CT imaging, the work has significant implications for improving diagnostic accuracy and clinical decision-making, making it relevant to the medical field.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The main weaknesses of the paper include:

    1. Limited Evaluation Spectrum: While the paper focuses on PET/CT imaging, it acknowledges that the model has not been extensively tested across a broader range of medical imaging modalities, potentially limiting its generalizability.

    2. Reliance on Radiologist Scores: Although valuable, the quality scores assigned by radiologists may introduce variability based on individual interpretation, which could affect the robustness of the dataset and the evaluation of the model.

    3. Potential for Overfitting: The model’s complexity and training on a relatively new dataset might lead to overfitting, mainly if the dataset is not large enough or diverse enough to represent the variability in clinical practice.

    4. Absence of Comparison with FR and RR Methods: The focus on no-reference methods means that the paper does not explore or compare the efficacy of full-reference (FR) and reduced-reference (RR) IQA methods, which may also be applicable in specific clinical scenarios.

    5. Lack of Real-World Testing: While the experimental results are promising, the model’s performance in real-world clinical settings is not assessed, which is crucial for establishing its practical utility.

    6. Complexity of Implementation: The advanced architecture, while effective, may pose challenges for implementation in clinical practice due to the need for significant computational resources and expertise in deep learning techniques.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Clear Objective Statement While the introduction presents a solid background on the significance of PET/CT image quality assessment (IQA), it would benefit from a more explicit articulation of the research objective or hypothesis. Including a concise statement of the main goal at the end of the introduction would better guide readers through the paper’s motivations.
    2. Dataset Description The PET-CT-IQA-DS dataset is a substantial contribution. However, more details about the image selection process and the criteria used to determine image quality levels are needed. Additionally, providing basic demographic information or expertise levels of the annotating radiologists would strengthen the dataset’s credibility.
    3. Model Architecture Visualization The MS-IQA model is well described in the text, but adding a schematic diagram would significantly improve clarity. A visual representation of the interaction between the ResNet and Swin Transformer modules and the multi-scale feature fusion mechanism would enhance the reader’s understanding and aid reproducibility.
    4. Training Details Additional details on training parameters would improve reproducibility. This includes batch size, learning rate, optimizer type, number of epochs, and any data preprocessing or augmentation techniques used. These details are important for other researchers attempting to replicate the results.
    5. Evaluation Methodology Using SROCC and PLCC as evaluation metrics is appropriate, but a further explanation would be useful. Describing how these metrics were computed, what they indicate about performance, and their relevance to real-world diagnostic tasks would enhance the results section.
    6. Ablation Study Insights The inclusion of an ablation study is appreciated. However, a more in-depth analysis of the results would strengthen the conclusions. For example, explaining how specific modules impact model performance positively or negatively would provide more actionable insights for model development.
    7. Hyperparameter Tuning Information about hyperparameter tuning strategies is currently limited. A brief description of how parameters were chosen or optimized (e.g., grid search, validation strategy) and which configurations yielded the best performance would further support the robustness of the proposed method.
    8. Generalizability Discussion The model performs well on PET/CT images, but its applicability to other imaging modalities is worth exploring. A brief discussion on how the MS-IQA framework might generalize to MRI, CT, or ultrasound data would expand the paper’s relevance and demonstrate broader potential.
    9. Limitations and Future Work Acknowledging limitations is crucial. Potential issues such as dataset bias, overfitting, or limited clinical validation should be discussed. Outlining possible improvements or future directions (e.g., testing on multi-centre datasets, integration into clinical pipelines) would also present a more balanced and forward-looking perspective.

    10. References and Literature Context The related work section is generally appropriate, but a more apparent distinction between the proposed method and existing ones would help position the paper more strongly. Summarizing the key advantages of MS-IQA compared to previous approaches would emphasize its novelty and contributions.

    11. User Accessibility For broader adoption, the authors could consider including user documentation. This might entail usage examples, installation instructions, or suggested applications. Such guidelines would make it easier for practitioners and researchers to implement and evaluate the model in their settings.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Recommendation: Justification: Strength of Contribution: The paper introduces a novel and effective method (MS-IQA) for PET/CT image quality assessment, supported by a newly curated dataset and strong experimental results. These contributions are substantial and fill existing gaps in the field. Technical Soundness: The model architecture is well-designed, effectively combining ResNet and Swin Transformer. The multi-scale feature fusion approach with attention mechanisms is technically innovative and well-supported by ablation studies. Experimental Validation: The performance on two datasets and the thorough ablation studies provide compelling evidence for the method’s effectiveness. Results align with the stated goals and demonstrate clinical potential. Clarity and Organization: The manuscript is straightforward, logical, and well-organized. It effectively communicates complex ideas, although technical terminology and lack of visual aids may present challenges to non-experts. Reproducibility: The authors take meaningful steps toward reproducibility by releasing the code and dataset. However, missing training details and hardware specifications should be added to ensure that others can replicate the results thoroughly. Weaknesses Are Addressable: The noted limitations (e.g., generalizability to other modalities, reliance on radiologist scores, and implementation complexity) are typical for this research stage and can be addressed with minor revisions or clarifications in the paper. Required Revisions: Add a clear architectural diagram of MS-IQA. Include training details: learning rate, batch size, optimizer, number of epochs, etc. Discuss the clinical interpretability and potential for real-world integration. Expand on limitations and generalizability to other modalities. Add context on inter-rater agreement or variability in radiologist scoring (if available).  

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel multi-scale feature fusion network for PET/CT image quality assessment. The network integrates two branches—ResNet50 and Swin Transformer—and fuses their aligned intermediate outputs through a multi-scale feature fusion module designed with a dynamically weighted channel attention mechanism. Additionally, the authors constructed a dedicated dataset tailored for the specific requirements of PET/CT image quality assessment.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    First, the paper constructs and open-sources a dataset tailored for PET/CT image quality assessment, which is highly commendable as it facilitates further research in this field.

    Additionally, the authors integrate knowledge from graph theory to develop an adaptive graph-based channel attention mechanism (as shown in Equations (4)-(6)), providing a novel paradigm for multi-scale feature fusion mechanisms.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper adopted RN50 and Swin Transformer architectures for the dual-branch framework. Although these modules have been widely applied in image processing tasks, I believe that to validate the adaptability of the proposed framework, it would be beneficial to introduce additional architectures in the ablation experiments for comparative analysis and discussion.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (1) There are still some details in this article that need to be revised. For example, in Section 2.3 of the papar, it is mentioned that the concatenated features from the both branches are first passed through an average pooling layer followed by a 1×1 convolutional layer, which conflicts with the workflow illustrated in Figure 1 and the open-source code. The paper needs to be revised to align with the actual architecture.

    (2) The structure proposed in the paper exhibits a certain degree of innovativeness. For example, the proposed AGCA module introduces graph theory into channel attention mechanisms to refine attention computation. Additionally, the authors conducted sufficient experiments and visualizations to verify and analyze the performance of the proposed model.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank all three reviewers for their constructive comments and appreciations such as “highly commendable dataset” (R1), ”sufficient experiments and visualizations” (R1), “significant potential to advance research in PET/CT quality assessment” (R2), “Innovative Approach” (R3) and “Superior Performance” (R3).

To R1: Q1.1: Discrepancy between Section 2.3 and the workflow shown in Figure 1/code. A1.1: Thanks for your correction. The actual structure of our model is consistent with the workflow illustrated in Figure 1 and the open-source code. In Section 2.3, the description of operations before entering AGCA module contains a mistake. It should be a 1×1 convolution layer followed by an average pooling layer. We will revise this mistake in the final version.

Q1.2: Suggestion to include additional backbone architectures. A1.2: Thanks for your suggestion. We fully agree that including additional architectures could enhance generalization analysis. In fact, we also evaluated other backbones such as U-Net. However, due to space limitations, we regret that we were unable to include this part in paper.

To R2: Q2.1: Limited novelty of CNN–Transformer fusion. A2.1: Thanks for your comment. As shown in referenced AHIQ and MD-IQA, CNN–Transformer fusion has been explored in NR-IQA. However, we respectfully argue that our method introduces novel innovations. AHIQ aligns a single shallow CNN with ViT via deformable convolution, whereas our method extracts multi-scale features from 4 stages of ResNet and Swin Transformer, enabling more comprehensive modeling. MD-IQA applies simple attention layers for CNN–ViT fusion, while our method introduces a novel graph-theory-based channel attention module, enabling more structured feature integration.

Q2.2: Omission of KROCC in evaluation. A2.2: Thanks for your comment. We sincerely apologize for not including KROCC in our evaluation. In this paper, we primarily reported SROCC and PLCC, the most commonly used NR-IQA metrics measuring rank correlation and linear consistency. KROCC evaluates pairwise correlation and is highly correlated with SROCC. As per your suggestion, we explored the KROCC performance of our method and found the same trend as SROCC and PLCC. However, we regret that we are unable to provide new experimental results due to rebuttal policy restrictions.

Q2.3: Request for the official source of LDCTIQA2023. A2.3: The results of all 6 participating teams in LDCTIQA2023 were from [1], which states that “six teams submitted their final algorithms in the testing phase.” [1]Lee, Wonkyeong, et al. “Low-dose computed tomography perceptual image quality assessment.” Medical Image Analysis 99 (2025): 103343.

To R3: Q3.1: Limited Evaluation Spectrum & Reliance on Radiologist Scores & Potential for Overfitting. A3.1: Thanks for your comment. We conducted experiments on two representative scenarios: PET/CT (multi-modality) and low-dose CT (single-modality), indicating strong generalization. Since image quality is closely related to human perception, it is a commonly adopted practice in IQA to score image quality through subjective experiments conducted by domain experts. To minimize bias, we employed experienced radiologists to independently score. We also used regularization, dropout, and data shuffling to reduce overfitting.

Q3.2: Absence of Comparison with FR and RR Methods & Lack of Real-World Testing & Complexity of Implementation. A3.2: Thanks for your comment. FR and RR methods require distortion-free reference images, which are often unavailable in real-world medical imaging. Therefore, our work focuses on the more practical NR-IQA methods and includes comparison with SOTA NR-IQA methods for fairness. Our dataset was constructed based on real clinical cases, aiming to reflect realistic scenarios. Our lightweight backbones also leads to a user-friendly implementation on GPU. We hope to apply our method in real clinical environments for further validation in the future.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top