Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Positron emission tomography (PET) reconstruction from low-dose to standard-dose acquisitions poses a significant challenge in medical imaging. While integrating Magnetic Resonance Imaging (MRI) for complementary guidance shows promise for enhancing reconstruction fidelity, current multi-modal approaches typically treat PET and MRI uniformly, neglecting their inherent asymmetry within the multi-modal context. This leads to insufficient utilization of anatomical guidance provided by MRI and neglects the unique metabolic characteristics of PET. To address these limitations, we propose MAK-GAN, a novel Generative Adversarial Network (GAN) that incorporates Multi-level Adaptive Kernels to distinguish feature extraction and interaction strategies between the primary (PET) and auxiliary (MRI) modalities in the asymmetric multi-modal PET reconstruction task. Specifically, we design a Multi-Kernel Extraction (MKE) block for both PET and MRI branches, replacing linear projections in vanilla Transformers with hierarchical multi-kernel convolutions. This enables efficient extraction of modality-specific features at multiple scales while reducing computational overhead. Subsequently, we asymmetrically introduce an Adaptive-Kernel Interaction (AKI) block in the PET branch. This block integrates self- and cross-attention modules to dynamically generate weights for adaptive kernels, preserving PET-specific characteristics while utilizing MRI’s anatomical information. Finally, we incorporate two PET-centric optimization strategies to prioritize PET during reconstruction: a residual connection for direct LPET-to-SPET mapping, and an edge-aware consistency loss to enforce structural coherence. Experiments demonstrate superiority on two PET/MRI datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1812_paper.pdf

SharedIt Link: https://rdcu.be/eHwNg

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_31

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZenXin_MAKGAN_MICCAI2025,
        author = { Zeng, Xinyi AND Zeng, Pinxian AND Wang, Yan AND Cui, Jiaqi AND Zhou, Luping AND Jiang, Caiwen AND Zhang, Han AND Shen, Dinggang},
        title = { { MAK-GAN: Multi-level Adaptive Convolutional Kernels for Asymmetric Multi-modal PET Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {324 -- 334}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a GAN-based framework to reconstruct standard-dose PET from Low-dose PET and MRI. It introduced a Multi-Kernel Extraction (MKE) block for both PET and MRI branches to extract modality-specific features at multiple scales, an Adaptive-Kernel Interaction (AKI) block in the PET branch to dynamically generate weights for adaptive kernels, and two PET-centric optimization strategies to prioritize PET during reconstruction.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method is motivated by a well-identified problem in medical imaging: reconstructing standard-dose PET from low-dose PET using MRI as auxiliary information. The authors consider domain-specific aspects and propose an interesting architectural design that incorporates multi-kernel projections within transformer blocks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Questionable effectiveness of the PET-Centric optimization strategies:
  - the proposed end-to-end residual connection directly adds the input Low-dose PET to reconstruct the PET image, which raises concerns about shortcut learning. This approach may encourage the model to bypass meaningful learning blocks by simply leveraging the input, undermining the impact of the proposed MKE and AKI blocks. Supporting this concern, the ablation studies show only marginal performance gains, particularly in SSIM, when the proposed blocks are included.
  - the second strategy, edge-aware consistency, seems ill-suited for PET image reconstruction.
  - PET inherently has lower resolution and blurry edges, and the emphasis is typically on intensity distribution rather than edge sharpness. Forcing the model to enhance edges may shift focus away from clinically relevant features. Together with the proposed end-to-end residual connection, this raises further concerns about whether the model is merely enhancing input edges rather than learning meaningful mappings, which again points to a risk of shortcut learning.
2. Insufficient dataset transparency: Key details about the two datasets are lacking. No references are provided for either dataset, and it is unclear whether they are publicly available. Regarding the Dynamic-PET dataset:
  - Are the 16 subjects distinct from those in the Clinical dataset?
  - What is the disease distribution within the Dynamic-PET dataset?
  - Do these subjects also have corresponding standard-dose PET scans?
3. Ambiguities in the input strategy: The data preparation strategy is not very clear. As the dataset only includes 16 subjects, it is very small in the amount. The authors proposed that ‘each image is sliced into overlapping patches with a stride of 8, resulting in 729 patches with a size of 64 for each patient’. This raises several questions:
  - Are both PET and MRI patchified?
  - Are these patches the actual basic unit of both input and reconstruction?
  - How are patchification artifacts addressed when reconstructing the final images? Further, given the small number of subjects (16), patch-wise processing may still introduce inconsistencies or overfitting risks, which should be addressed.
4. Unconvincing qualitative and quantitative results: The qualitative comparisons shown in Figure 2 do not clearly outperform existing methods. The authors only showed the results from two other methods, and the reconstructed SPET images from these baseline methods appear visually comparable to the proposed method. Quantitatively, the reported metrics are also similar across methods, and the performance gains are not substantial.
In addition, the authors report statistical significance only for PSNR, however, it is more important to see the significant test on SSIM and NMSE, as they are more indicative of reconstruction fidelity.

Moreover, the authors’ statement about the PSNR statistical significance: “paired t-test shows p-values for PSNR consistently below 0.05 for both datasets in most cases”, is vague, as it is unclear how many cases this applies to, and whether it is enough to support the claimed improvements.

Minor:
1. The rationale for using different strides for query, key, and value projections in the MKE block is unclear. Further explanation would help justify this design choice.
2. The method lacks detail on how the hyperparameters α and β are selected. A brief discussion or sensitivity analysis would be beneficial.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the method is motivated by a clinically relevant problem and introduces some interesting design elements, the contributions remain ambiguous and the improvements in both qualitative and quantitative aspects are not convincing.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed most of my concerns. My remaining concern would mainly be that the qualitative and quantitative results are not strong enough to fully convince. Overall, I would give a weak accept.

Review #2

Please describe the contribution of the paper

This paper proposes a new GAN-based framework for translating low-dose PET (LPET) to standard-dose PET (SPET), guided by structural MRI. The framework incorporates two key components: a Multi-kernel Extraction (MKE) block to capture modality-specific features, and an Adaptive-kernel interaction (AKI) block to effectively leverage anatomical information from MRI.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper improves upon the existing self-attention mechanism by replacing standard linear projections with multi-scale kernel feature extraction, enabling the model to capture richer spatial context at varying receptive fields.

The proposed AKI (Adaptive-kernel interaction) block utilizes three convolutional kernels to extract cross-modality information. It includes:

Intra-AK, a self-attention mechanism where queries, keys, and values are all derived from LPET features to model intra-modality dependencies.

Inter-AK, which reuses the query from LPET but uses keys and values from MRI features to integrate anatomical guidance.

Additionally, a BSK (Balanced-Static Kernel) is incorporated as a static attention weight to further enhance anatomical consistency.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The paper introduces the BSK (Balanced-Static Kernel) as part of the AKI block, but does not provide a clear explanation or justification for the use of a static attention weight, what’s the value of BSK and how it selected. To better justify its effectiveness, the authors should consider including a “no BSK” variant in the ablation study to isolate its contribution.

Additionally, the paper lacks comparison with recent diffusion-based multi-modality reconstruction methods, which have shown strong performance in related tasks. Including such baselines would provide a more comprehensive evaluation and help position the proposed method within the current state of the art.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The spacing between equations and surrounding text is too tight, which affects readability. The authors should consider improving the formatting for better presentation.

If GFLOPs is not a key contribution of the paper and does not strongly support the main claims, it may be reasonable to remove this metric to avoid distraction.

In Figure 2, the lower portion could be reduced in size. If both subfigures share the same legend, the authors could consider combining them to conserve space and better allocate room for equations or additional content.

In Figure 1, the residual connection is labeled with the symbol “add,” yet the method section describes it as a concatenation. This inconsistency should be clarified. The figure could be improved by adding this explanation directly next to the matrix multiplication section, where there is still available space.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Refer to section 6,7,10
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper presents a GAN-based approach for multi-modal PET image reconstruction. The authors introduce a novel Multi-Kernel Extraction (MKE) module that employs multiple convolutions with varying kernel sizes, along with a self-attention mechanism, to effectively extract features from PET or MRI images. Additionally, they propose an Adaptive-Kernel Interaction (AKI) module, which leverages multi-kernel projection and attention mechanisms to dynamically generate convolutional kernels. This design aims to preserve PET-specific features while incorporating fine-grained structural details from MRI images. Furthermore, a Sobel-based edge consistency loss is introduced to enhance boundary preservation. Experimental results on two datasets show quantitative and qualitative results.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(a) This paper combines multi-kernel convolutions and self-attention mechanisms in the proposed Multi-Kernel Extraction (MKE) module. Previous studies have extensively explored multi-scale feature extraction using convolutional layers with varying kernel sizes, as well as the use of self-attention mechanisms for visual feature modeling. Therefore, integrating these two approaches is a logical and well-motivated design choice. (b) This paper introduces the Adaptive-Kernel Interaction (AKI) module, which employs multi-kernel projection and an attention mechanism to dynamically generate convolutional kernels, rather than directly extracting features. This design is both reasonable and interesting, as it allows the generated kernels to adapt to changes in the input images. Consequently, the kernels become image-dependent, enabling greater flexibility in capturing fine-grained details. (c) The proposed GAN-based method incorporates not only the conventional generator and adversarial loss, but also introduces an edge-aware consistency loss. This additional loss term helps to better preserve edge and boundary details, which is particularly beneficial for image restoration tasks. Overall, this design is both novel and interesting. (d) The evaluations conducted in this paper are generally reasonable and well-executed. Experiments on two datasets show improvements both quantitatively and qualitatively. The use of a t-test on PSNR adds credibility to the reported gains. In addition, the ablation studies on the MKE and AKI modules, as well as the analysis on the ratio of kernel channels, are thorough and informative.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(a) The edge-aware consistency loss is designed to preserve structural details. However, its impact is neither explicitly analyzed nor discussed in the experimental results. Providing a quantitative or qualitative evaluation of structural preservation would strengthen the justification for its inclusion and improve the overall clarity of the method’s effectiveness. (b) Although the t-test results in Table 1 demonstrate a statistically significant improvement in PSNR, similar statistical analyses are not reported for SSIM and NMSE. Moreover, Table 1 shows that the improvement in SSIM is relatively marginal. The lack of statistical validation for all metrics, combined with the limited gain in SSIM, weakens the overall assessment of the proposed method’s effectiveness across multiple dimensions. (c) In Figure 2, the images corresponding to different methods have very similar color tones, making it difficult to distinguish the differences without close inspection. It is recommended to enhance the visual contrast or adopt more distinguishable color mappings to make the differences easier to identify.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The naming convention in Table 2, such as “Model-1,” is not sufficiently intuitive. For example, when viewing the table in isolation, it is unclear what “Model-1” specifically represents. While this is not a major weakness, and does not affect the novelty or technical strength of the paper, it slightly hinders readability. It is recommended to adopt a more descriptive naming scheme, such as “Baseline+X,” to clearly indicate the modifications made to the baseline model. For instance, “Model-1” could be renamed to “Baseline+MKE” to reflect the inclusion of the MKE module. This would make it easier for readers to follow and interpret the ablation study.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a GAN-based method for multi-modal PET image reconstruction, introducing the MKE and AKI modules to enhance feature extraction and adaptively generate convolutional kernels. The design is reasonable, and the integration of multi-kernel and attention mechanisms is interesting. The edge-aware consistency loss is also a thoughtful addition. The paper is well-organized and presents results on two datasets. However, there are several weaknesses. The impact of the edge-aware loss is not clearly demonstrated. Statistical analysis is only provided for PSNR, but not for SSIM and NMSE, which weakens the overall evaluation. The improvements in SSIM are relatively marginal. Visual results are hard to distinguish due to similar color tones. Overall, I give a weak acceptance. The method is technically sound and has potential, but the experimental results are not strong enough to fully convince. Further analysis and clearer presentation would improve the paper.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed several of my concerns in the rebuttal. They clarified the importance of the edge-aware consistency loss and provided relevant references to support its contribution. For the concern regarding statistical testing of NMSE and SSIM, the authors explained that these metrics are already close to saturation. They also addressed the concern about the minimal variation in Figure 2 and promised to enhance the contrast of the color bar in the final version. One remaining point is that, while citing relevant literature helps to justify the importance of the edge-aware consistency loss, an ablation analysis would have made the argument more convincing. Including such analysis in future work would strengthen the paper further. Overall, I maintain my recommendation of a weak accept.

Author Feedback

Thanks to all reviewers for constructive comments.

Q1: Rationale of methodology A1: - BSK(R1): Unlike intra- and inter-AKs that adaptively determine weights based on features, BSK uses randomly initialized learnable weights (its “static” nature is compared to AKs’ feature-based weighting), aiming to balance AKs by preventing their underlearning in early stages. We also explored the “no BSK” variant (with ratio {1:1:0} in Fig.3(b)) and found it performs suboptimal. More details will be added.

PET-centric strategies(R2,R3): We clarify that the end-to-end residual connection not only enhances the dominance of the PET modality but also reduces the difficulty of prediction (by focusing on residuals rather than the image). Similar strategies have also been used in [1] RKEM (TMI 2023) and [2] PK-Trido (TMI 2024), and proved no possible shortcut learning. Additionally, edge-aware consistency enhances high-frequency information, enforcing the model to capture finer details and edge sharpness, as shown in [3] AR-GAN (MIA 2022). These two strategies serve as complementary innovations of our method, ensuring the concept of PET’s dominance while driving performance gains, as demonstrated by the Model-3 comparison in the ablation study. We’ll provide further explanations.

Q2: Dataset details and data processing(R3) A2: - More dataset details: The two datasets are in-house. The 16 subjects in Dynamic-PET are distinct from those in Clinical dataset with different acquisition systems. The disease among these subjects is Mild Cognitive Impairment. All subjects have corresponding standard-dose PET scans. Further details will be added.

Data preparation strategy: Both PET and MRI images are registered, and consistent patchification is applied to both modalities during preprocessing, using patches as basic input units. The final PET images are formed by stitching estimated patches together, averaging overlapping regions to minimize boundary artifacts. While the processing helps mitigate overfitting by increasing sample size, it cannot fully eliminate the issue. We’ll clarify these details in revised version and build larger multi-modal datasets in the future.

Q3: Improvement and statistical significance of results(R2,R3) A3: Though SSIM and NMSE are key evaluation metrics, their values are close to saturation in PET reconstruction and generally do not show statistical significance, as shown in [1-3]. Thus, small gains are considered meaningful. Besides, the substantial PSNR improvement further supports the superiority of our method. All PSNR comparisons, except for MTransGAN in Dynamic-PET, show statistical significance (marked with “*” in Table 1). This indicates that the performance gains are of notable magnitude compared to SOTA works.

Q4: Comparison with SOTA Diffusion(R1) A4: We reproduced [4] PET-diffusion (MICCAI 2023) on Clinical dataset, finding a PSNR 0.25 dB lower than our method.

Q5: Visualization in Fig.2(R2,R3) A5: Due to space limits, we only selected two best-performers for visualization. In zoomed-in areas of Fig.2 (second row), our method shows details closer to the SPET images. We’ll enhance the contrast of the color bar for error maps to better highlight the qualitative differences.

Q6: Minor Issues A6: - Hyper-params α and β(R3): Based on trial studies, the L1 loss (with α) is slightly larger than the edge loss (with β) but much smaller than the adversarial loss (with 1). Our optimal allocation of α=100 and β=50 effectively balances these loss terms. We’ll include a brief discussion of their selection.

K/V projection with stride 2(R3): By decreasing the scale of K/V matrices, the attention computation costs are effectively reduced with minimal performance loss.

Illustration and formatting(R1): We’ll correct inconsistent symbols in Fig.1 and improve the formatting in revised version.

GFLOPS and Rename(R1,R2): We’ll remove GFLOPs in tables and rename the ablation models by replacing ID with detailed module names.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors propose a GAN-based method for low-dose PET reconstruction using MRI guidance, featuring MKE and AKI modules. Strengths include a clear network design, effective multi-modal integration. Weaknesses include limited analysis of the BSK component, insufficient statistical validation, and lack of comparison with diffusion models. Post-rebuttal responses addressed key issues. I recommend acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

MAK-GAN: Multi-level Adaptive Convolutional Kernels for Asymmetric Multi-modal PET Reconstruction

Author(s):