Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Noise in medical imaging is an inevitable challenge, often stemming from acquisition artifacts, varying imaging protocols, and external interference. While some studies suggest that noise can enhance model robustness, excessive or unstructured noise degrades training quality and classification performance. This issue is further exacerbated in federated learning settings, where individual clients have limited local data, making it difficult to train robust models independently. Federated imputation has been explored as a solution, yet existing methods do not fully leverage federated learning settings for optimal noise reconstruction. In this work, we introduce a novel encoder-decoder based federated imputation method, designed to replace noisy images with more representative reconstructions before training. Experimental results demonstrate that classification models, trained with images imputed by the proposed method, consistently outperforms those trained with raw noisy images and without noisy images, highlighting the importance of effective noise handling in federated learning-based medical imaging.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1791_paper.pdf

SharedIt Link: https://rdcu.be/eHxbY

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05185-1_10

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

PathMNIST dataset: https://github.com/MedMNIST/MedMNIST PathMNIST is a benchmark dataset derived from histopathology images, created for evaluating deep learning models.

BibTex

@InProceedings{ChaYun_Decentralized_MICCAI2025,
        author = { Chang, Yunyoung AND Noh, Yeonwoo AND Lee, Sang-Woong AND Lee, Minwoo AND Noh, Wonjong},
        title = { { Decentralized Noise Handling in Medical Imaging: Encoder-Decoder Based Federated Imputation for Robust Training } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {97 -- 105}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes an end-to-end federated denoising network, mainly achieved through the prediction of noise mask. And a module combining Swin-transformer was proposed to achieve more accurate reconstruction. The effectiveness of the proposed denoising network was demonstrated through single center experiments, federated experiments, and auxiliary experiments on downstream classification task.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper proposes to achieve end-to-end image denoising by predicting noise masks in federated learning environments.
2. A novel swin-transformer based reconstruction network is proposed to achieve accurate image denoising.
3. The method performs well, surpassing the Wave Point.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The experimental description is ambiguous. Does LN refer to Layer Norm? If so, how are the LN (proposed) and Swin (proposed) experiments conducted?
2. Lack of comparison with state-of-the-art methods. It should be compared with the other denoising methods to demonstrate that the proposed method can better assist downstream task.
3. Lack of key explanations of experimental details. What are the weights of the three losses?
4. Whether the mask generated by the Noise Prediction Module has undergone supervised training. Especially the results of noisy mask images. It is better to show the predicted mask that can assist in understanding.
5. In Table 3, is there any difference between the image denoised through the network and the image with Drop Noise? Why is there such a big difference in accuracy?
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes an end-to-end federated denoising method by improving the effectiveness of image reconstruction through Swin transformer. However, the comparison with other methods is insufficient. The evaluation is not enough.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper introduces an end-to-end federated imputation framework that automatically predicts occlusion masks from noisy medical images and performs inpainting via an encoder–decoder integrated into the FedAvg loop, eliminating the need for external masks and improving robustness across non‑IID clients. It further incorporates a SwinTransformer-based WaveMix Module, combining 2D wavelet decomposition with shifted‑window self‑attention to suppress artifacts and preserve fine details. Experiments on PathMNIST demonstrate significant gains in reconstruction metrics and downstream classification performance compared to baselines.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper considers an essential problem in federated learning and propose a new method to handle it.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The writing needs to be improved. For example, the learning setting is not very clearly highlighted, does the training of classification model still in a federated scenarios? It’s better to use a figure to demonstrate your overall pipeline.
2. Typos. In the first column of Fig. 2, the noisy image seems quite different from the original image, is the image incorrectly placed?
3. While this is a supervised learning problem, have you discussed other related Federated SSL methods? For example, [1-5].
4. Fig.1 also needs to be optimized. It focuses on introducing the network details, but fails to highlight the key idea and novel points of the proposed method.
5. The numbers of considered benchmakrs and compared baselines are very limited. The generalization ability is hard to verify through these experiments.
[1] Li Q, He B, Song D. Model-contrastive federated learning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 10713-10722.

[2] Zhang F, Kuang K, Chen L, et al. Federated unsupervised representation learning[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1181-1193.

[3] Shuai Z, Wu C, Tang Z, et al. Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity[J]. arXiv preprint arXiv:2404.03854, 2024.

[4] Han S, Park S, Wu F, et al. Fedx: Unsupervised federated learning with cross knowledge distillation[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 691-707.

[5] Rehman Y A U, Gao Y, De Gusmão P P B, et al. L-dawa: Layer-wise divergence aware weight aggregation in federated self-supervised visual representation learning[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 16464-16473.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Federated learning is a very impactful and worth exploring domains in medical imaging.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Reviewers have carefully addressed each of my questions and concerns, therefore I recommend Accept after the rebuttal.

Review #3

Please describe the contribution of the paper

In this paper, an encode-decoder approach to remove noise from medical images is proposed for model training in the context of federated learning. The federated learning is mimicked to show the feasibility of the intended approach.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper proposes a complete pipeline ranging from noise detection to noise removal for model training in a context of data scarcity. The noise prediction module and the deep feature extractor are a novel way to improve the pipeline proposed in WavePaint. Simulating a federated learning environment is also a good approach to show the feasibility of the approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The results presented are not clear and the classification results could be biased if the same data were used for training the proposed denoising model and for training the classification model.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Minor comments: It would be interesting to have the information about inference time. Diffusion models generally achieve good performance, but are very time-consuming. This could be a good argument for your paper.

Is the number of blocks in Figure 1 correct? For example, does the network only have one Swin WaveMix block?

To define the mask, are the values rounded according to the rule: if >=0.5 then 1 otherwise 0?

Is there a particular reason for not including the Deep Feature Extractor for the low-frequency features (LL)?

It is mentioned that ‘ upscale or resize them to 224×224’. what upsampling or resize methods were used?

It is not clear how the type of noise added was selected and how it was generated.

The number of epochs and the machine used for the experiments are missing in the methods section. The criteria for selecting the weighting of the individual components of the loss function should be explained.

Are the results included in the tables of the test set? This should be clearer.

Two proposed networks are mentioned in the tables, but only one model was proposed in the entire paper. LN is not defined. Is it the noise mask predictor? If so, does this mean that the networks were evaluated individually, the noise prediction network and the noise removal network? If the results of the noise predictor are not available, they should be added, as the prediction of the region to be inpainted is of great importance.

The first line of the classification section may not be correct.

In the federated environment, each client uses their part of the data to train the denoising model? This should be mentioned for the sake of clarity.

Major comment: Was the classification model trained with the same training data that was used to train the denoising model? If so, it means that the denoised data used to train the classification model has an unfair advantage over the noisy data. That is, if the denoising model overfitted the training data, it means that it produces near duplicates of the data without noise. This would be comparable to training the classification model with the data without noise. Therefore, it should be clarified whether the same training samples that were used to train the denoising model were also used to train the classification model. For a fair comparison, the classification model should be trained with cases that were not considered when training the denoising model.

It would be interesting to add an image of the classification model. Most importantly, it should include some images of inpainted cases to visually assess the quality of the reconstruction. Some cases with real noise would also be interesting.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The lack of clarity of the results and how the tests were performed.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

“First, the data used for training the denoising model is clean data which is 50% of original training data. The data used for training the classification, in four methods, are clean data (Drop Noise), clean+corrupted data (With Noise) and clean+restored data (Proposed LN and Swin).” -> I am still not convinced about the impossibility of biases. However, I believe the paper should be accepted if the remaining reviewers do not rise this concern, due to the overall concept of the proposed method. Although, this biases possibility should be studied deeper. Also, it should be tested the use of only not noisy data in the downstream task used for testing, i.e., all data without any noise/reconstruction.

Author Feedback

We would like to thank Reviewer-1(R1), Reviewer-2(R2), and Reviewer-3(R3) for their insightful and constructive comments.

R1-W1&R3-Minor9: LN refers to Layer Normalization (LN). In the experiments, LN (proposed) is conducted by converting all Batch Normalization of benchmark model, WavePaint, into LN. Swin(proposed) is conducted by using LN and Swin-Transformer module.

R1-W2&R2-W5: As R1 and R2 pointed out there are lack of comparison with state-of-the-art methods. However, generally, the benchmark model, WavePaint, is known to be the best denoising model relative to its compact size [1]. Therefore, in this work, we simply compared the proposed models with WavePaint only and showed that it yields a higher performance with much lighter parameters. As part of our future work, we plan to compare it with new state-of-the-art methods.

R1-W3: In our paper, the loss function was weighted sum of three losses: L=(1-A)L1+AMSE+LPIPS, where A was set to 0.5 in our experiments. For fair comparison, we used the same loss function and weights of the benchmark.

R1-W4: At the section of Noise Prediction Module, we explained the Module has undergone unsupervised training.

R1-W5: First, the data used for training the denoising model is clean data which is 50% of original training data. The data used for training the classification, in four methods, are clean data (Drop Noise), clean+corrupted data (With Noise) and clean+restored data (Proposed LN and Swin). Second, the accuracy gap results from the difference in total training data volume.

R2-W1: The classification model is still in federated scenarios. We will mention the overall pipeline on the final version.

R2-W2: Yes, Fig. 2 is incorrect. We’ll try to replace it with the correct image in the final version.

R2-W3: While our primary goal was to demonstrate how inpainting in a federated setting influences downstream task performance, we will consider the methods for future research.

R2-W4: We will try to highlight the key ideas and novel points of our Swin-Transformer based Deep Feature Extractor of Fig.1 in the final version.

R3-W1&Major1: Because of the following reasons, it is not likely there are any biases in our framework. First, the data partition explained on R1-W5 shows there could be no biases. Second, the denoising model is supposed to affect the classification model by improving the quality of the data that the classification model is using for training.

R3-Minor4: The LL captures low-frequency features, and we wanted the model to retain and utilize this feature as it is. Therefore, we didn’t apply the Deep Feature Extractor.

R3-Minor5: While PathMNIST originally provides data in 28×28 resolution, it was mistakenly assumed that the 224×224 version was obtained through resizing. However, the dataset officially includes both 28×28 and 224×224 versions by default. We will correct this in the final version.

R3-Minor6: The mask application rule is simple. We added Noisy Mask to the existing types used in the original noise rule, and all noise types were generated with an equal proportion.

R3-Minor7: We set the number of global rounds for both federated inpainting and classification to 10, with each client performing 10 epochs of local training per round. For the machines we used were 4 Geforce RTX 2080 Ti. The loss function is explained on R1-W3.

For R3-Minor 1,2,3,8, and 10, what you said is correct and we agree with your point.

We believe the addressed points will help to clarify our work and improve the overall quality of the paper. Again, we are grateful for your feedback, and we are looking forward to reflecting your suggestions in the final version.

[Reference] [1] Zhong, Cheng, et al. “Restoring intricate Miao embroidery patterns: a GAN-based U-Net with spatial-channel attention.” The Visual Computer (2025): 1-13.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This work has two positive reviewers and a negative reviewers. After checking the comments and rebuttals, I agree with positive reviewers to accept this work. The authors are suggested to further revise the paper based on the reviewer comments and rebuttal contexts.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper received mixed reviews. Some reviewers concern about the evaluation part being unconvincing. In addition, the noise types may not be realistic to simulate clinical practice, and the method may not generalize well to address real noise. I agree with Reviewer #3 on possible biases.

back to top

Decentralized Noise Handling in Medical Imaging: Encoder-Decoder Based Federated Imputation for Robust Training

Author(s):