Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Anisotropic resolution remains a fundamental challenge in 3D microscopy, where axial resolution is significantly lower than lateral resolution due to physical limitations. To address this, we propose a self-supervised volume super-resolution (VSR) framework named Diffusion to Resolution (D2R), which leverages 2D diffusion priors to enhance axial resolution without requiring high-resolution (HR) volume as supervision. D2R consists of three stages: (1) learning biological priors via a 2D diffusion model trained on high-resolution XY slices, (2) generating pseudo-HR lateral (XZ/YZ) volumes through cross-plane fusion, and (3) performing stable structure distillation to train a 3D VSR network. To further improve VSR quality, we introduce Axial Enhancement Network (AENet), a 3D VSR model incorporating lightweight channel attention to enhance fine details while maintaining inter-slice continuity. Extensive experiments on FIB-SEM datasets demonstrate that D2R-AENet outperforms state-of-the-art self-supervised methods in both image similarity and membrane segmentation accuracy, achieving performance close to supervised approaches. These results validate the effectiveness of our framework in high-fidelity volumetric reconstruction under practical conditions where HR references are unavailable. Codes are available at https://github.com/hmzawz2/D2R-models.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2453_paper.pdf

SharedIt Link: https://rdcu.be/eHw4v

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05127-1_45

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/hmzawz2/D2R-models

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CheBoh_Selfsupervised_MICCAI2025,
        author = { Chen, Bohao AND Zhang, Yanchao AND Lv, Yanan AND Han, Hua AND Chen, Xi},
        title = { { Self-supervised Axial Super-Resolution for Volume Microscopy via Diffusion-Guided Structure Distillation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {467 -- 477}
}

Reviews

Review #1

Please describe the contribution of the paper
This paper proposes a three-stage approach for volumetric image reconstruction and enhancement using diffusion models and 3D convolutional neural networks. The proposed method proceeds as follows:
- 2D Diffusion Model Pretraining: The authors pretrain a 2D diffusion model using XY slices of the volume.
- Volume Reconstruction Using Slice-by-Slice Diffusion: Using the pretrained 2D model, the authors reconstruct the full volume by generating ZX and ZY slices slice-by-slice.
- 3D Regression Using Pseudo High-Resolution Volumes: The reconstructed volume is treated as a high-resolution pseudo-label to train a 3D CNN model. The model is trained to map low-resolution 3D volumes to pseudo high-resolution ones.
The key idea of the paper is to use the output of a diffusion-based reconstruction as a pseudo-label to train another 3D regression model.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This work achieves superior qualitative performance, as evidenced by consistently higher PSNR and SSIM scores compared to existing state-of-the-art methods, indicating improved image fidelity and structural similarity
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Limited Novelty and Technical Contribution
  - The first two steps in the pipeline appear similar to previously published approaches (e.g., Lee et al.), and the final step relies on standard supervised learning using a 3D CNN. The method largely reuses established techniques, and it is unclear what unique technical innovations are being introduced. More explicit comparisons and citations are needed to clarify how this work differs from prior methods.
2. Unclear Benefit of 3D CNN with Pseudo HR Supervision -The rationale for using pseudo high-resolution outputs from a diffusion model to train the 3D CNN is not clearly explained. It is unclear how this strategy improves the final reconstruction quality or addresses the limitations of existing approaches. Since the training data consists of pseudo volumes rather than true high-resolution ground truth, there is a risk that the 3D CNN may not faithfully reproduce accurate structures. The observed improvements in PSNR and SSIM may largely stem from the use of SSIM and FFL losses, which are not applied in the other methods compared against.
3. Lack of Detail in the Diffusion-based Reconstruction Step
  - The paper provides insufficient technical detail and lacks clear references for the diffusion model component. Readers would benefit from a more thorough description of the diffusion process, training objective, and integration within the pipeline.
4. Inconsistent and Unconvincing Visual Results
  - Although the reported PSNR and SSIM scores show improvement, the visual quality of the reconstructed axial views remains questionable. As seen in the provided images, the results appear blurry, and models that directly use diffusion-based reconstruction often produce more realistic-looking results. It seems that the regression-based 3D CNN trained on diffusion outputs may smooth out details, leading to improved PSNR but perceptually worse results.
5. Discrepancy with Previously Reported Results
  - There is a notable inconsistency between the reported results and previous studies. For example, Lee et al. [8] report a PSNR of ~27 dB on the FIB-25 dataset, while this paper reports significantly lower performance (~25 dB). This discrepancy needs to be addressed and clearly explained.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

At its current stage, the paper demonstrates limited technical novelty and lacks sufficient clarity and justification in several key areas, especially rational behind 3D volume regression using pseudo volume, and performance of otehr baseline methods in the result.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper proposes a self-supervised axial super-resolution method that does not require high-resolution volume as supervision. The method is based on pseudo high-resolution generation with diffusion priors and a distillation stage to acquire a 3D model to produce the final 3D volume. Extensive experiments show the effectiveness.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method design is clear and shows strong motivation in each stage, i.e., generation priors, pseudo-HR generation, and structure distillation.
- The paper is easily comprehensible.
- The method achieves good performances.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- It is not clear how and why the priors can ensure the high-quality generation of XZ and YZ plane in illustrating the training of diffusion model (Sec 2.1). It can be made clear how the generative model is trained and designed.
- The resolution of the volumes (HR and LR) is not described in Sec 3.1.
- The inference stage can be more illustrated in Method.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Are the dimensions (X,Y,Z) balanced? What if a certain dimension, e.g., Z, requres longer frames? How might the method be adapted to accommodate this change?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See Strengths & Weaknesses.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper presents a self-supervised volume super-resolution framework (D2R) aimed at improving axial resolution. It also introduces a 3D neural network (AENet) that enhances the performance of the framework. The proposed method is thoroughly evaluated on both synthetic and real datasets, with solid comparisons against state-of-the-art (SOTA) approaches, demonstrating its effectiveness.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well written, clearly organized, and easy to follow.
2. Both the proposed framework and the super-resolution network are novel and well motivated. The method is carefully validated on synthetic and real datasets, which supports the claims effectively.
3. The D2R framework uses pre-trained 2D diffusion models to generate pseudo high-resolution data, which is especially useful in medical imaging scenarios where high-quality data is limited. Since this framework can be applied to other volume super-resolution methods, it has the potential to inspire future research.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. In Section 2.3, it says: “Since the XY plane lacks explicit constraints, in Stage III, we solely train VSR networks on lateral planes (XZ/YZ) of pseudo-HR volumes.” Could the authors clarify what is meant by “lacks explicit constraints” in this context?
2. The illustration of Stage III in Figure 1 is a bit unclear. It seems that the lateral-direction (XZ/YZ) slices are used for training, but the input and output of AENet are shown as XY slices.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(6) Strong Accept — must be accepted due to excellence
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a novel and well-validated self-supervised framework with strong practical relevance, especially in medical imaging. Its solid technical contributions, clear presentation, and potential impact justify a strong accept.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

To all: We sincerely thank all reviewers for their time and insightful feedback. To R1: Q1: Limited Novelty and Technical Contribution A1: Our D2R framework introduces a novel, decoupled design for VSR, distinct from methods like Lee et al. [8] that employ complex, end-to-end diffusion models. In D2R, the 2D diffusion model generates an intermediate pseudo-HR volume. This volume then guides the training of 3D VSR model (AENet) in Stage III. During the distillation stage, AENet leverages its 3D convolutions to enforce robust structural continuity across slices, which is a key distinction from methods that process XZ/YZ slices independently. This focus on 3D coherence directly yields the significantly improved image similarity and structure details. Q2: Unclear Benefit of 3D CNN with Pseudo HR Supervision A1: The pseudo-HR volume serves as intermediate supervision for training the 3D VSR model. The key benefit in Stage III lies in ability of AENet to distill robust structural transformations across slices, rather than directly replicating the imperfect pseudo-HR input. This process enables consistent 3D structure learning. While we incorporate FFL [6] for details, the main performance gains are attributed to the D2R framework itself. For example, IsoVEM [4] also adopts SSIM loss but still underperforms compared to D2R-SRUNet and D2R-AENet. Q3: Lack of Detail in the Diffusion-based Reconstruction Step A3: We adopt an SDE-based diffusion framework, with the diffusion process and training objective outlined in Sec.2.1. Due to space limitations, please refer to code implementation for more details. Q4: Inconsistent and Unconvincing Visual Results A4: Although the outputs of D2R-AENet may appear visually smoother, the model emphasizes structural fidelity, which is critical for accurate biological interpretation. Conversely, while Lee et al. [8] may yield sharper textures, its membrane segmentation accuracy is lower, suggesting such sharpness does not equate to superior biological structure recovery. Q5: Discrepancy with Previously Reported Results A5: The PSNR discrepancy of Lee et al. [8] on FIB-25 arises from different LR volume generation methods. Like vEMDiffuse [11], we use slice-sampling to better reflect real-world VSR conditions. In contrast, Lee et al. [8] used average downsampling, which reduces zero-mean noise and blends HR information into LR inputs, making the VSR task easier and potentially affecting PSNR values. To R2: Q1: About the Diffusion Priors A1: In Stage I, a 2D diffusion model is trained to restore HR slices from LR inputs and is then applied slice-by-slice on the XZ/YZ planes of LR volumes to generate a pseudo-HR volume V^H. As V^H is generated independently along XZ/YZ and lacks XY consistency, in Stage III, AENet is trained only on these XZ/YZ slices. The SDE design and training details are in Sec 2.1. We will clarify this pipeline in the revision. Q2: Volume Resolutions A2: As noted in Sec 3.1: FIB-25 HR is 10 nm isotropic, LR axial resolution is 80 nm; EPFL HR is 5 nm isotropic, LR axial resolution is 40 nm. Q3: Illustration of Inference A3: As described in Sec. 2.3 and Fig. 1(d), the AENet takes a slice sequence and a relative depth as input, and predicts an output slice at the depth. We will add a clarifying sentence to Sec. 2.3 to further summarize inference process. To R3: Q1: Clarification of “Lacks Explicit Constraints” A1: The phrase refers to the structural artifacts introduced in XY planes of the pseudo-HR volume, just as illustrated in Fig. 2 (see Lee et al. [8]). These artifacts arise because the constituent XZ/YZ planes of the pseudo-HR volume are independently generated by 2D diffusion model without explicit constraints of structural continuity in XY plane across them. Q2: Figure 1 Unclear A2: As stated in Sec 2.3, Stage III uses only XZ/YZ slices of the pseudo-HR volume. Fig. 1(c) shows these sequences, with their orientation adjusted for clarity. We will improve it to eliminate confusion.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Self-supervised Axial Super-Resolution for Volume Microscopy via Diffusion-Guided Structure Distillation

Author(s):