Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Acquiring images in high resolution is often a challenging task. Especially in the medical sector, image quality has to be balanced with acquisition time and patient comfort. To strike a compromise between scan time and quality for Magnetic Resonance (MR) imaging, two anisotropic scans with different low-resolution (LR) orientations can be acquired. Typically, LR scans are analyzed individually by radiologists, which is time consuming and can lead to inaccurate interpretation. To tackle this, we propose tripleSR, a novel approach for fusing two orthogonal anisotropic LR MR images, to reconstruct anatomical details in a unified representation. Our multi-view neural network is trained in a self-supervised manner, without requiring corresponding high-resolution (HR) data. To optimize the model, we introduce a sparse coordinate-based loss, enabling the integration of LR images with arbitrary scaling. We evaluate our method on MR images from two independent cohorts. Our results demonstrate comparable or even improved super-resolution (SR) performance compared to state-of-the-art (SOTA) self-supervised SR methods for different upsampling scales. By combining a patient-agnostic offline and a patient-specific online phase, we achieve a substantial speed-up of up to ten times for patient-specific reconstruction while achieving similar or better SR quality. Code is available at https://github.com/MajaSchle/tripleSR.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2881_paper.pdf

SharedIt Link: https://rdcu.be/eHwPD

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_17

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MajaSchle/tripleSR

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SchMaj_Faster_MICCAI2025,
        author = { Schlereth, Maja AND Schillinger, Moritz AND Breininger, Katharina},
        title = { { Faster, Self-Supervised Super-Resolution for Anisotropic Multi-View MRI Using a Sparse Coordinate Loss } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {172 -- 182}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a self-supervised method for multi-view MRI super-resolution that does not require HR ground truth. It introduces a sparse coordinate-based loss and a two-phase training pipeline, aiming to fuse two orthogonal anisotropic MR scans into a unified high-resolution output. The method achieves competitive PSNR to some baselines while offering faster inference.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The methodology is presented in a clear and logical manner.
2. Inference time optimization sounds a promising way in medical domain.
3. The experiment is comprehensive including both quantitative and qualitative comparisons, cross-dataset evaluations.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The proposed method appears relatively straightforward, primarily relying on feature-space interpolation (via trilinear interpolation) rather than learning direct voxel-space reconstruction. If my understanding is correct, the architecture functions more like a registration or alignment module rather than a fully standalone super-resolution framework. In this context, the design seems analogous to the motion compensation modules commonly used in video super-resolution pipelines, such as the one described in [a]. While effective and efficient, the core contributions—namely feature fusion, coordinate-based sparse loss, and interpolation—build on existing ideas and may be viewed as incremental rather than fundamentally novel.
  - [a] Caballero J, Ledig C, Aitken A, Acosta A, Totz J, Wang Z, Shi W. Real-time video super-resolution with spatio-temporal networks and motion compensation. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 4778-4787).
2. Since the model is supervised only on the intersection regions M_ax and M_cor, it is unclear how well the network generalizes to voxels outside these regions. The paper would benefit from a more detailed analysis of prediction quality beyond the supervised areas—both in terms of error metrics and qualitative assessments.
3. The proposed approach focuses on fusing orthogonal views (multi-view), whereas many of the baseline methods (e.g., SMORE, BISR in its original form) are designed for multi-contrast inputs. This discrepancy may limit the fairness of direct comparisons. Moreover, a brief survey of the literature suggests that multi-contrast super-resolution is a more common and widely studied problem in the MRI community. It would strengthen the paper to more clearly articulate the motivation for tackling multi-view SR specifically, including any clinical relevance or unique challenges that make this task particularly important.
4. The description of the data pipeline could benefit from clarification. It is not explicitly stated whether the input shape h x w x (d/e) refers to LR or HR images. The training process seems to start from LR volume h x w x (d/e) to reconstruct a higher-resolution volume of shape h x w x d. However, for inference and evaluation, the text implies the need for HR data to assess performance. If this is correct, does inference use inputs of shape h x w x (d/e/s) to generate predictions of shape h x w x (d/e)? Clarifying this would help readers understand the generalizability and practical deployment of the model. Moreover, the purpose and context of the downsampling mentioned at the beginning of page 5 is ambiguous. Is it one of the processes to obtain h x w x (d/e/s) shape data?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My concerns are mainly focus on the simplicity of proposed method and the motivation to investigate multi-view instead of multi-contrast MRI SR. In addition, the comparison between multi-view input with multi-contrast baselines seems not fair.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Thanks to the authors for the detailed feedback, and to Reviewer 3 for the response in the discussion.

Actually, I’m aware of the difference, as I previously noted in discussion: “The primary distinction here is merely the adaptation from a multi-frame to a multi-view context.”

However, my main concern lies with the following point: motion compensation in video super-resolution is commonly treated as a preprocessing step and is rarely considered a standalone SR work—as evidenced by the referenced 2017 paper, where it was not presented as an innovation.

Given this, I find it difficult to see how adapting such a preprocessing step from the multi-frame to the multi-view domain constitutes a sufficiently significant contribution to support a publication on its own.

Review #2

Please describe the contribution of the paper

this paper proposed a self-supervised SR framework for MRI, utilizing two orthogonal anisotropic views using coordinate-based implicit representation, with two-phase pipeline that do patient-agnostic offline pertaining first and a more adaptive fast patient-specific online training phase.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The two-phase with SSL make the method is well-motivated and potentially generalizable better to many different clinical settings, reduced inference time also make it good for deployment in realtime, which is often overlooked by many SR methods, particularly those vit-based approaches.
2. The sparse coordinate loss is an interesting setup, good fo arbitrary-scale upsampling and resolution.
3. the results on popular public datasets BRATS and HCP showed clear improvement over SSL baselines with significant faster inference speed.
4. no HR data needed for training, which is proven to be critical for supervised approach [1].
[1] Yu, Pengxin, et al. “RPLHR-CT dataset and transformer baseline for volumetric super-resolution from CT scans.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2022.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. although novel from a engineering point of view, the key components certainly reminds me of several works like the mentioned INR-based method, multi-vew SR like SMORE, SAINT, TVSRN.
2. although the aforementioned methods are supervised, the authors should report their performance as a upper bound comparison to give reader basic ideas on how difficult the tasks is, where is the limit.
3. although SSIM/PSNR are classic evaluation metric, more and more critics have been received in recent years the value of these metrics. researchers tried to prove better SR result by reader study, downstream task, etc., which this work lacks.
4. you mentioned ablation study but I don’t see an ablation study. do you just mean the big table with online or offline? the depth of ablation study is not enough, should investigate the sparse loss and patch sampling strategy.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Please add executable code and setup given you are using public dataset
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is clear and intersting, the result is sounding. Need more clarification, need better literature review, and address my comments and points in weakness if feasible.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

it addressed all my concerns perfectly

Review #3

Please describe the contribution of the paper

This paper proposes a simple-yet-effective sparse coordinate loss-based multi-view MRI super-resolution method.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The proposed method is simple, yet the performance evaluated on multiple datasets is quite good. Moreover, the results in Table 3 indicates a good generalization ability. (2) The dual-stage “offline training - online patient-specific fine-tuning” strategy is adopted, which effectively shortens the time compared to baseline “BISR” that sorely relies on online patient-specific training. Therefore, the proposed method can benefit real-time reconstruction of HR MR images.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) To my knowledge, it is generally unusual to acquire images in the same modality from different views in real cases. The clinical value would be greatly improved if the method can be evaluated on multi-view and multi-modal MR images. (2) Since there are no regularization terms (e.g., total variational penalty) in the loss, one concern is the online patient-specific tuning may suffer from overfitting. Is any strategy such as early-stop adopted to avoid this? How is the number “10” from “10 additional epochs” (in Section 3.3) determined? (3) Why is the performance of “training on Br T1 CE and testing on Br T1” better than that of “both training and testing on Br T1”?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the innovation in methodology is somewhat limited, the proposed method is effective and requires low time cost. The experiments on multiple datasets and modalities reveal the method’s promising clinical value.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors’ rebuttal effectively address my concern about “why training on T1CE, testing on T1 outperforms training on T1, testing on T1”. I would like to see this simple-yet-effective paper to be published.

Author Feedback

We thank all reviewers for their suggestions and interest in our fast self-supervised MRI super-resolution method. We have grouped the comments topics-wise and answer below. We will improve the clarity of the manuscript accordingly. Code will be published with the camera ready version.

Clinical value (R2, R3) of single-contrast multi-view (MV): Assessing HR information is essential for several diseases, but motion artifacts, that increase with longer scan durations, pose a challenge. Splitting the scan into two orthogonal anisotropic views can improve patient comfort and reduce motion, especially in populations who struggle to remain still. Similar MV anisotropic acquisitions are already used in rheumatology and abdominal imaging. We will strengthen this motivation more clearly in the final paper. While the baseline BISR was designed for multi-contrast+MV SR, its principle does not differ for single-contrast MV and performs very good on this task, as the omission of multi-contrast does not increase difficulty. To the best of our knowledge, there is no other suitable baseline for self-supervised MV SR than BISR. We focused on single-contrast SR in our paper, but will consider multi-contrast for future work.

Distinction & Baseline (R1): In contrast to SMORE, SAINT, and TVSRN, our method enables the joint SR of two anisotropic LR images in an unsupervised manner with arbitrary resolution. We will include these methods in our related work section. A comparison to a supervised method (e.g. TVSRN) as upper bound will be incorporated in future work.

Methods and Evaluation Metrics (R1): Incorporation of additional metrics like LPIPS, a reader study and downstream task is planned for future work to showcase the clinical benefits of the method.

Ablation Study (R1): We misphrased this and rather meant the comparison between offline and online. An ablation is not really possible because these are the core components of our proposed method.

Registration/alignment module (R2): Compared to motion compensation in video SR, our work focuses on the fusion of LR views to combine areas with sparse and dense voxel information into one representation. While our model incorporates parts that build on prior work, the core novelty lies in the sparse coordinate loss, which enables self-supervised joint reconstruction of orthogonal 3D images.

Generalization (R2): Our approach assumes that both LR images show the same FOV. During offline and online phase, only M_ax and M_cor are available. The metrics are assessed by comparing the predicted SR result to the GT HR image. This means we analyze the generalization performance to regions that are not available during training. When using separate FOVs for each LR image, our method could be extended to SISR.

Data pipeline (R2): We agree with the reviewer that the terminology for the inference can be improved. The input shape h x w x (d/e) refers to LR images. During inference only these LR images (h x w x (d/e)) are available. Only for the final evaluation, i.e. assessment of metrics the GT HR images (h x w x d) are used. The dataset description (p.4/5) underscores that during training any random downsampling in the range [2,4] is possible and during inference the two specific downsampling scales 2 or 4.

Inference/Overfitting (R3): We determined the number ‘10’ by tuning on the validation set. Referring to related work BISR, strong adaptation to a specific patient is beneficial, and has a low risk of “traditional” overfitting due to the inherent inductive bias of the network. Early stopping can be initiated by measuring Mutual Information that saturates after a specific number of epochs.

Results (R3). The performance of “training on Br T1 CE and testing on Br T1” could be better than that of “both training and testing on Br T1” because the CE training data offers additional contrasts and patterns to be learned that are also relevant to represent data without contrast. A brief discussion will be added.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors have addressed the reviewers concerns. Outstanding concerns about novelty have been discussed and in my opinion this method satisfies the requirement for novelty here.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Faster, Self-Supervised Super-Resolution for Anisotropic Multi-View MRI Using a Sparse Coordinate Loss

Author(s):