Abstract

Unsupervised anomaly detection (UAD) in brain imaging is crucial for identifying pathologies without the need for labeled data. However, accurately localizing anomalies remains challenging due to the intricate structure of brain anatomy and the scarcity of abnormal examples. In this work, we introduce REFLECT, a novel framework that leverages rectified flows to establish a direct, linear trajectory for correcting abnormal MR images toward a normal distribution. By learning a straight, one-step correction transport map, our method efficiently corrects brain anomalies and can precisely localize anomalies by detecting discrepancies between anomalous input and corrected counterpart. In contrast to the diffusion-based UAD models, which require iterative stochastic sampling, rectified flows provide a direct transport map, enabling single-step inference. Extensive experiments on popular UAD brain segmentation benchmarks demonstrate that REFLECT significantly outperforms state-of-the-art unsupervised anomaly detection methods. The code is available at https://github.com/farzad-bz/REFLECT

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/5136_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/farzad-bz/REFLECT

Link to the Dataset(s)

BraTS 2021 dataset: http://www.braintumorsegmentation.org ATLAS v2 dataset: https://atlas.grand-challenge.org

BibTex

@InProceedings{BeiFar_Reflect_MICCAI2025,
        author = { Beizaee, Farzad and Hajimiri, Sina and Ben Ayed, Ismail and Lodygensky, Gregory and Desrosiers, Christian and Dolz, Jose},
        title = { { Reflect: Rectified Flows for Efficient Brain Anomaly Correction Transport. } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The author is the first to propose using Rectified Flows for brain anomaly detection.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors made the first attempt to apply rectified flow models for anomaly detection in medical images.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Some of the references cited by the author are fake references, such as reference [16]. Whether the reference is a reference generated by artificial intelligence (ChatGPT).

    2. In the field of anomaly detection, numerous state-of-the-art works have been validated in both industrial inspection and brain imaging. However, the comparative methods adopted in this study are somewhat limited, making it difficult to fully assess the contributions made by the authors.

    3. The authors only present the reconstruction results and anomaly scores of their own model, without demonstrating the reconstruction performance of the competing methods or providing a thorough analysis of the reasons behind the advantages of the proposed approach.

    4. The authors lack critical ablation studies to validate the effectiveness of the proposed method; for instance, they did not demonstrate the specific validation results of the sample pair generation approach and the rectified flow.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Lack of comparisons with related anomaly detection works.
    2. The ablation experiments do not capture the core contributions of the authors.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Although the author applied the method to the task of brain anomaly detection, the paper did not reflect the special design for the characteristics of brain medical images. There are already a large number of general methods in the field of medical image anomaly detection that are widely used in brain anomaly detection scenarios, but the author only chose to limit the method to brain detection for comparison. The limitations of this comparison method lead to the paper failing to fully highlight the contribution of the proposed method, especially the advantages compared with existing general methods in the task of brain image detection. ​

    From the description of the method, the current framework is closer to the direct migration application of the Rectified Flow in the task of anomaly detection, lacking structural improvements or theoretical innovations for this task. In the ablation experiment, the author did not clearly disassemble the specific contribution of the proposed module to the performance improvement, and thus failed to clearly explain the original technical contribution of the method.



Review #2

  • Please describe the contribution of the paper

    The authors propose using rectified flows for unsupervised reconstruction-based anomaly detection. It is similar to diffusion models, except rather than learning the mapping from noise to normal images it learns a direct linear trajectory from abnormal images to their normal counterparts (learned within the latent space of a VAE). To train this they introduce synthetic anomalies into the latent representation of healthy images, using masks generated by random walks to select a region to blend in an external texture. The anomaly score is the average of the reconstruction error in the latent space and the reconstruction error in image space.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method achieves strong state-of-the-art results across multiple modalities (FLAIR, T1CE, T1, T2) and pathologies (glioma and stroke lesions).
    • The use of rectified flows is both intuitively explained at a high level and thoroughly detailed mathematically, making it very accessible for a reader who is not familiar with them.
    • The creation of synthetic anomalies in a VAE’s latent space is interesting, as the majority of recent synthetic-anomaly-based work has been operating in image space.
    • Adaptation of diffusion-model-like methods for anomaly detection is a important research direction, as it shows that applying methods which excel at generating data directly to application areas such as anomaly detection is not as simple as it may seem.
    • The authors provide an anonymized code repository, which is very helpful and clearly shows the authors commitment to open, reproducible research.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • W1 The replacement vector description at the end of section 3.1 is a bit confusing. The final equation (equation 7) says the vector is randomly sampled independently of location, but just before the use of textured images is mentioned and afterwards the paragraph mentioned spatial dependencies. Section 4.1 confirms that you do use the describable textures dataset, presumably as part of the creation of the replacement vector. Please could this stage be clarified by the authors.

    • W2 Could the reflow process be explained a little more? From my understanding it is learning a mapping to normal images from those which have been corrupted and then corrected by the 1-REFLECT model. If one more sentence could be added to solidify this it would aid readability.

    • W3 This work seems to have some similarities with [23] (using the works citation number, included in full below this weakness), both take diffusion-like approaches (rectified flows for this work, cold-diffusion for [23]) directly mapping from abnormal images to normal images, both use the timestep to control anomaly severity in their respective synthetic anomalies and both use a random walk to create the anomaly masks (although this is obscured in [23] as they inherit it from previous work). Currently [23] is only cited within a list of 8 methods as examples of diffusion models being used for unsupervised anomaly detection, but given the practical similarities of the two works it would be more appropriate to have a sentence or two so that readers can see the growing prevalence of these sorts of methods.

    Reference 23: Marimont, S.N., Siomos, V., Baugh, M., Tzelepis, C., Kainz, B., Tarroni, G.: En- sembled cold-diffusion restorations for unsupervised anomaly detection. In: MIC- CAI. pp. 243–253 (2024)

    • W3b Similarly, as the practical aspects of this method are structured similarly to self-supervised synthetic anomaly detection works it would be good to reference some (ideally they should be fully compared against but this is outside the scope of a rebuttal).

    • W4 The choice of using 2D slices from 3D images which contain pathologies is likely making the models robust to non-pathological abnormalities in the chosen datasets, leading to results that are likely higher than would be in clinical practice. For example, in BraTS only the tumors are annotated as anomalous, but the surrounding tissue is squashed and deformed by the presence of the nearby tumor. The models will see these deformities during training, and so do a good job at not labelling them as anomalous at test time, however if the same model were to be applied to a different pathology which different adjacent non-pathological abnormalities it may struggle. In this way the results may be overestimating the performance that these models would have in a clinical setting. It would be better to train on healthy slices from fully healthy volumes, such as those in the HCP, IXI or CamCAN datasets, as these would better resemble the situation of these models being used in practice.

    • W5 The use of only unhealthy slices for evaluation is not ideal, as a common problem for anomaly detection methods is false positives. By testing on all unhealthy slices there is some uncertainty about whether the model is biased to always change some aspect of the image. Similarly, by selecting the single slice where the pathology is most prominent from each subject it is being shown the ‘easy’ cases. I understand that the authors were likely following the evaluation procedure of previous work, but it would be good to move towards better evaluation setups in the future.

    • W6 Some small typos:
    • The second paragraph of the introduction says “out-of-distribution detention” instead of “out-of-distribution detection”
    • The github’s readme text says “Train MAD-AD” instead of “Train REFLECT” under the Train subheading
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    It would be cool to visualise anomalies created in the VAE’s latent space in image space (via the VAE decoder) but understand that space limitations makes this difficult (an unfortunate casualty of the lack of supplementary material this year). Maybe put some in the github’s readme?

    Also, to emphasise, thank you for providing an anonymized code repository. It can be annoying to create but your effort has not gone unnoticed!

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The majority of the work is strong and clear, but some areas need to be clarified a bit. The experiment setup weaknesses are frustrating but not uncommon in the anomaly detection field, so do not warrant rejection, however I truly encourage the authors to consider these when approaching future anomaly detection work. On balance I between a weak accept (4) and a standard acceptance (5) but want to see the authors responses to the weaknesses listed (which are ordered from most to least important).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I am satisfied that the authors have addressed my concerns, in particular the computation of the replacement vector was unclear in the main paper but the two variants are clearly described in the rebuttal. Having made that change I would encourage the authors to also review their evaluation to make sure it is clear which variant is presented at each point.

    Re similarities with DISYRE: I agree that there are distinct differences between the presented method and DISYRE, I just wan these to be clear to readers who are not as familiar with the sub-field.

    Re experiment setup, I would say that it is also common practice to use slices from fully healthy images, DISYRE itself follows this to name one example. If I were to guess its probably roughly 50/50 between anomaly detection methods at MICCAI, so I highly doubt using an all-healthy volumes in the train set would raise any concerns. Similarly evaluating on both unhealthy and healthy slices would be very normal, as that better reflects the clinical setting, and thus not hinder your submission.

    To conclude, the method is strong, novel and clear. If the authors follow through on the changes they propose it will be a welcome addition to MICCAI 2025.



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is the application of Rectified Flows (suggested for Diffusion models [1] ) to the Unsupervised Anomaly Detection (UAD) task with MRI, by generating the healthy counterparts of the unhealthy brain images for detection of anomalous regions. Compared to other diffusion methods, Rectified Flows learns a linear trajectory mapping between sample and target distributions, which minimizes the time steps needed in the diffusion process to even a single step and generates high-quality healthy images. Additionally, the paper introduces a masking strategy based on random walks and textures to increase robustness.

    [1] Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: The Eleventh International Conference on Learning Representations (ICLR) (2023)

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well structured and easy to read. The paper addresses a clear gap in the scientific research, and the implementation of Rectified Flows to UAD task is very interesting which can potentially have wide range of applications in the medical imaging field. The approach presented achieves substantial gains over the state-of-the-art. Providing anomaly maps increase the explainability of the chosen approach and trustworthiness. The code will be provided, which is an advantage. A second method (2-REFLECT) is provided in the paper, which is obtained by applying the reflow process again, to clarify the question of “How many flow processes are needed?”. The findings are similar to that found in [2], such that adding another flow can decrease slightly the performance metric however using U-shaped timestep distribution and LPIPS-Huber loss might address this issue.

    [2] Lee, S., Lin, Z., & Fanti, G. (2024). Improving the training of rectified flows. Advances in Neural Information Processing Systems, 37, 63082-63109.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Qualitative comparison between the baseline methods and the proposed approach is lacking, thus making it harder to judge the real qualitative performance gains. For the ablation studies, the use of the textured masking and random walk strategy is not investigated further leading to the confusion whether to the improvements solely come from rectified flows or the masking approach. Limitations and future work is missing to show the broader impact of the proposed method. Time and parameter comparison of each method can help the reader understand the usability as well.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is novel and interesting with possible wider range of applications in medical imaging. The paper is well written and structured making it easy to read. The experiments show important performance gains over the state-of-the-art. Anomaly maps provide useful insight to qualitative performance. Though it would be more helpful to see the qualitative comparison with the baselines as well and the time/parameter comparisons.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The proposed approach is novel and holds potential for broader applicability across medical imaging tasks. The paper is well-written and clearly structured, making it easy to follow. Experimental results demonstrate meaningful performance improvements over state-of-the-art methods, and the inclusion of anomaly maps offers valuable qualitative insights.




Author Feedback

We sincerely thank the reviewers for their insightful comments and are thrilled with the overall highly positive feedback. We are pleased that they recognize our method as novel (R3) and interesting (R2,R3), being an important research direction (R2),as it addresses a clear gap (R3). Reviewers stress the substantial empirical improvements compared to existing SoTA (R2,R3), acknowledging its potential for a wide range of applications (R3). Last, reviewers also point to the clear structure of the paper (R3), where methodology is intuitively explained and accessible (R2).

R1-Wrong referencing: Thank you for pointing this out. We take full responsibility for the oversight, as some citations were unfortunately carried over from related works, e.g, [16] was inadvertently copied from [6] (correct citation “Kascenas et al.,Denoising autoencoders for unsupervised anomaly detection in brain MRI. MIDL’22”). We sincerely apologize and will carefully review and correct all references. R1-Compared methods limited. We respectfully disagree. We compared against two very recent SOTA models specifically proposed for medical imaging: IPMI’25 [6] and an oral presentation at MICCAI’24 [19]. R2-Replacement vector clarification: Thank you for pointing this out, and we apologize for the confusion. To clarify, each replacement vector is either (1) a cropped segment from the latent representation of a textured image (to impose large structured anomalies), or (2) a weighted combination of a pixel-wise random vector and an image-level vector, in such a way that it follows standard normal distribution. We will revise the text to clearly describe this process. R2-Reflow process: Your understanding is correct. The reflow process is devised to further straighten the trajectories by learning a mapping from abnormal inputs to clean, anomaly-free images obtained using the 1-REFLECT model. This process will avoid crossing trajectories induced by input paired samples and refines the trajectories toward target distribution(normal anomaly-free brains here). We will add a clarifying sentence in the camera-ready version to improve the readability of this section. R2-Similarities with DISYRE: Thanks! We agree that there are conceptual similarities with DISYRE and we will revise the related work section to better contextualize our method within this emerging class of approaches. However, we would like to stress that our method is distinct in several key ways. Unlike DISYRE, which relies on carefully designed synthetic anomalies, our approach modifies the latent space using noise or textured image embeddings without any constraint on generating synthetic anomalies. This makes our method more flexible and potentially more robust to rare or unseen abnormalities, an essential goal in UAD. Additionally, rectified flows enable efficient single-step inference, in contrast to the multi-step nature of diffusion models in DISYRE. R2-Self-supervised synthetic anomaly detection: We will try to include and discuss relevant works for this topic. R2-Experiment setup (W4&W5).We thank R2 for the very constructive and appropriate comments and appreciate the reviewer’s understanding. We completely agree that revisiting experimental designs that better reflect real-world clinical scenarios is paramount. However, we also note that deviating from commonly accepted practices often raise concerns during peer review. Having said this, we remain committed to improving our methodology and aligning future work with stronger evaluation protocols, for example by considering more clinically realistic setups. R3-#Params: We chose MAD-AD as it is the best baseline. MAD-AD (~260M), Ours-Default: (~140M), and Ours XS model (~16M). R1&R3- Additional qualitative results, ablations or extended analysis and limitations (R3). Due to length constraints (as noted by R2) we had to make selective choices, focusing on the most critical components. We plan to expand on additional details in future work.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper received 2 acceptance and 1 rejection recommendations. R1’s main concern lies in theoretical innovations and lack of additional experiments. Given the submission page limit of MICCAI and its application-oriented nature, I agree with R2, and I believe the comparisons in the submission is already sufficient as a MICCAI paper. Therefore, my recommendation is to accept.



back to top