Abstract

Through-plane super-resolution (SR) in brain magnetic resonance imaging (MRI) is clinically important during clinical assessments. Most existing multi-contrast SR models mainly focus on enhancing in-plane image resolution, relying on functions already integrated into MRI scanners. These methods usually leverage proprietary fusion techniques to integrate multi-contrast images, resulting in diminished interpretability. Furthermore, the requirement for reference images during testing limits their applicability in clinical settings. We propose a TEst time reference-free through-plane Super-resoLution network using disentAngled representation learning in multi-contrast MRI (TESLA) to address these challenges. Our method is developed on the premise that multicontrast images consist of shared content (structure) and independent stylistic (contrast) features. Thus, after progressively reconstructing the target image in the first stage, we divide it into shared and independent elements during the structure enhancement phase. In this stage, we employ a pre-trained ContentNet to effectively disentangle high-quality structural information from the reference image, enabling the shared components of the target image to learn directly from those of the reference image through patch-wise contrastive learning during training. Consequently, the proposed model enhances clinical applicability while ensuring model interpretability. Extensive experimental results demonstrate that the proposed model performs favorably against other state-of-the-art multi-contrast SR models, especially in restoring structural fine details in the through-plane direction. The code is publicly available at https://github.com/Yonsei-MILab/TESLA.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0343_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Yonsei-MILab/TESLA

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ChoYoo_TESLA_MICCAI2025,
        author = { Choi, Yoonseok and Jung, Sunyoung and Al-masni, Mohammed A. and Yang, Ming-Hsuan and Kim, Dong-Hyun},
        title = { { TESLA: Test-time Reference-free Through-plane Super-resolution for Multi-contrast Brain MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {584 -- 593}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a multi-contrast MRI super-resolution approach that requires no Ref images during the testing phase. Extensive experiments are conducted and show promising performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Disentangling multi-contrast images to shared contents and distinct stylistic features, and utilizing patchwise contrastive learning to distill the Ref features toward soft SR features, sounds novel and feasible. (2) Eliminating the requirement of Ref images during testing is of great value, effectively expanding the application scope of multi-contrast MRI SR. The quantitative and qualitative results suggest the proposed method outperforms those baselines that require Ref images, which is a strong strength of this paper.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) This paper focuses on through-plane SR. However, it is unclear why the proposed method innovations are only applied to through-plane SR, as they do not seem to have a unique design for it. (2) The baseline methods (e.g., MINet) are designed for in-plane SR. For example, MINet uses an upsampling tail that increases the resolution in two directions, but the through-plane SR task only increases the resolution in one direction. Please clarify. (3) The detailed network structure and hyperparameters are not provided. (4) Some descriptions are confusing. For example, the authors state that “using an optimized nnU-Net [8] for gradually reducing the domain gap of shared content with HR Ref as well as mitigating structural distortion between LR Tar and HR Tar” (Page 3). But Eq. (1) only shows the latter and does not demonstrate how to reduce the domain gap of shared content. Additionally, it is unclear why nnUNet, commonly served as a segmentation model, is used here as a low-level vision model.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work has good innovation, but the lack of clarity in the description of the method and experiments causes limited reproducibility, despite the authors’ statement of releasing codes upon acceptance. I do hope that the authors can provide reasonable answers to the main weakness mentioned above, and then I will seriously consider increasing the score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I would like to see this innovative, simple-to-apply method to be published in MICCAI. I strongly recommend that the authors carefully address all issues raised by reviewers in the revised version.



Review #2

  • Please describe the contribution of the paper

    The authors present a deep learning-based multi-contrast MRI super-resolution approach and demonstrate that the proposed method outperforms existing methods across multiple datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method effectively disentangles structural and style components, reconstructing final HR images based on the hypothesis that multi-contrast images share common structural content. Contrastive learning is adopted to extract structural features.

    Extensive experiments on diverse datasets highlights the robustness and generalizability of the proposed approach.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The authors fail to provide details on how to deal with through-plane images.

    The specific contribution of the first stage in the progressive reconstruction process is not clearly explained.

    It is recommended to visually present equations 2–4 in Figure 1 to enhance clarity and aid reader comprehension.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the proposed method and the manuscript focuses on an important topic.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a through-plane super-resolution (SR) model trained on multi-contrast brain MRI with the underlying assumption of shared structural information and unique style between contrasts. The SR from low resolution (LR) to high-resolution (HR) is conducted iteratively via an nnUNet. Subsequently, a structure enhancement phase is supposed to mitigate smoothing effects of the SR phase through patch-wise contrastive learning and a data consistency term between SR reconstruction and original LR volume. The method is rigorously tested on multiple datasets and compared to several baselines, showing superior quantitative and qualitative results.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-structured and mostly easy to follow.
    2. Contrary to many existing approaches, TESLA requires no test-time HR contrast reference for SR of LR volumes.
    3. The authors present a convincing quantitative and qualitative evaluation on multiple datasets and baselines. TESLA outperforms all baselines including those that require HR references for inference.
    4. A detailed ablation study shows quantitative and qualitative impacts of the proposed building blocks.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Complex training setup including a 2-stage approach and a pretrained ContentNet raising questions with respect to training stability and hyperparameter choice, e.g. when applied to a new domain. Providing more details on training complexity and hyperparameter search could help to better clarify this aspect.
    2. In Fig.1 it is not fully clear which parts are used during training only and which during inference. Better clarification here would help to understand the purpose of the different building blocks.
    3. Although the authors show one case where TESLA is applied on a domain different from the one it was trained on (IXI and inhouse), more experiments and analysis, as well as outlining the limitations of the approach in terms of domain shifts would be helpful to assess the clinical applicability of the tool.
    4. Minor: Fig 2: The image x_raw for DC is of weird ratio.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Innovative two-stage method combining iterative SR via nnUNet with a structure enhancement phase (using patch-wise contrastive learning and data consistency) achieves superior results without needing HR test-time references.
    • Comprehensive experiments and ablation studies support the method’s effectiveness across multiple datasets.
    • While the training setup is complex and more details on hyperparameter tuning and domain shifts would be beneficial, these issues do not outweigh the clear performance gains and strong evaluation.
  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed all before mentioned concerns and weaknesses including training stability, adaptation to new domains, and out-of-distribution performance.




Author Feedback

R1: Details on through-plane processing are missing. We clarified in Sec. 3 that LR Tar is generated by downsampling only along the through-plane (z) axis using b-spline interpolation, reflecting the anisotropic acquisition patterns commonly observed in clinical MRI.

R1, R3: First stage contribution unclear As described in Sec. 2, the PR stage serves two key roles. First, directly mapping highly downsampled LR Tar to HR Tar often fails to recover fine structures due to high scaling factors and anatomical complexity. Gradual upsampling can alleviate this issue. Second, since multi-contrast MR images share structural information, PR reduces the domain gap between LR Tar and HR Ref by generating a coarse Soft SR Tar. This improves alignment for contrastive learning in the SE phase. However, the domain gap reduction is a conceptual effect and not easily formulated as an equation.

R1: Equations (2)–(4) in Fig. 1 We revised Fig. 1 to indicate where Eqs. (2)–(4) are applied. The updated figure is available here(https://www.notion.so/TESLA-fig1-1f8a564497d280b19ac4d9bef0304d64?pvs=4) and will be included in the camera-ready version.

R2, R3: Complex training setup and unclear hyperparameters. Two-stage designs are widely used in medical image reconstruction to separate coarse prediction from fine-detail refinement [TSCNet, DBGAN]. Following this design, the PR stage aligns structural content, and the SE stage refines it via contrastive learning. As illustrated in Fig. 4, ContentNet is pretrained using a fixed loss setup (L1 + SSIM + adversarial) and a PatchGAN discriminator to effectively disentangle structure from HR Ref. All loss weights were equally applied and reused in the SE phase without additional tuning. For all datasets (IXI, HCP, BraTS21, WMMS), we used identical hyperparameters (described in Sec. 2), and training was stable and reproducible across domains. Despite being multi-stage, TESLA does not require dataset-specific tuning.

R2: Training vs. inference components in Fig. 1 We revised the Fig. 1 caption to clarify module usage: “HR Ref and ContentNet are used only during training; inference requires only PR, the encoders (E_c and E_s), and decoder (G) in SE.” This distinction supports our reference-free design at test time.

R2: Domain shift analysis We added cross-domain results where TESLA trained on IXI was tested on BraTS21 and WMMS. While key anatomical structures were preserved, fine pathological details such as tumor margins and MS lesions were harder to reconstruct, indicating both robustness and limitations.

R3: Design specific to through-plane SR TESLA is tailored for through-plane SR, addressing anisotropic degradation common in clinical MRI. The PR stage performs slice-wise upsampling to reduce stair-step artifacts. SE enhances spatially smooth features to recover z-axis detail using contrastive learning and disentangled representation learning. Data consistency is enforced along the z-axis.

R3: MINet and 2D upsampling We agree that MINet performs 2D in-plane upsampling, while TESLA targets only the z-axis. Despite this, TESLA achieves superior through-plane reconstruction quality, as shown in Fig. 2 and Fig. 3. We will clarify this in Sec. 3.

R3: Network structure PR stage: consecutive Optimized nnU-Net. SE stage: E_s contains 4 downsampling layers and outputs a 256-dimensional style vector, while E_c includes 2 downsampling layers and 4 residual blocks. G: Four AdaIN-based residual blocks, followed by two upsampling layers (each with bilinear upsampling and a Conv2d that halves the feature dimension)

R3: Why use nnU-Net for SR? The nnUNet framework, though primarily designed for segmentation, offers several features transferable to super-resolution tasks. Its encoder-decoder architecture with skip connections preserves spatial details by combining low-level texture information and high-level features, critical for reconstructing high-frequency components in super-resolution.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Please carefully address all issues raised by reviewers in the revised version.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper is recommended for acceptance, as all the reviewers have reached a unanimous positive consensus.



back to top