Abstract

Accurate anomaly detection in brain MRI is critical for early disease diagnosis, yet existing single-sequence reconstruction methods often fail to distinguish pathological anomalies from both normal anatomical variations and multi-sequence contrast discrepancies. We propose MultiTransAD, a novel framework that leverages inter-sequence contrast differences as primary biomarkers for unsupervised anomaly detection. Our approach introduces: (1) a disentangled architecture with anatomical edge constraints to decouple sequence-invariant anatomy from sequence features, (2) cross-sequence translation error analysis for direct anomaly quantification, and (3) dual-level anomaly detection combining pixel-level errors and patch-level feature dissimilarities. Evaluated on BraTS 2021, MultiTransAD achieves state-of-the-art performance with Dice scores of 0.6334 (14.6% improvement over reconstruction baselines) and AUROC of 0.9722, validating the effectiveness of multi-sequence contrast analysis while establishing a extensible cross-sequence translation paradigm. The code is publicly available at: https://github.com/zhibaishouheilab/MT-AD

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2272_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/zhibaishouheilab/MT-AD

Link to the Dataset(s)

https://www.med.upenn.edu/cbica/brats2021/

BibTex

@InProceedings{ZhaQi_MultiTransAD_MICCAI2025,
        author = { Zhang, Qi and Hu, Yibo and Sun, Jianqi},
        title = { { MultiTransAD: Cross-Sequence Translation-Driven Anomaly Detection in Multi-Sequence Brain MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {387 -- 396}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose MultiTransAD, an unsupervised anomaly detection method that leverages contrast-anatomy-disentangled MRI sequence translation of healthy brains to detect anomalies at test time through pixel-wise errors and latent feature dissimilarity. The authors claim state-of-the-art unsupervised anomaly detection for brain tumors from multi-sequence MRI.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The study identifies and addresses a core limitation in multi-sequence MRI anomaly detection, specifically the issue of reconstructing sequences individually while disregarding cross-sequence information.
    2. Encoding anatomical and structural information through gradient maps is an interesting idea.
    3. The ablation study nicely evaluates the individual components of the proposed approach.
    4. The code is publicly available.
    5. The authors compare to a large variety of baselines.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Major

    1. The comparison with Liang et al., 2023, is not sound because it does not compare the translation between the same MRI sequences. Hence, an evaluation of whether the disentanglement module actually improves UAD compared to a pure translation-based approach is unclear.
    2. During their evaluation, the authors always only translate to the FLAIR sequence. While this might be the best sequence to translate to for UAD, the authors do not justify why this might be the case or why they do not translate to other sequences also available in the dataset.

    Minor

    • Visualization and terminology issues: Figure 1 is overly complex with unexplained symbols, inconsistent terminology (mixing “sequence” and “modality”), identical MRI sequences despite the method requiring different ones, and the “Cross-model alignment module”, which is not described in the text, should likely be termed “Cross-modal (…).” These visualization problems make it difficult to understand the core methodology.
    • Mathematical presentation: Multiple equations contain undefined terms ( $\phi_{patch}$ in Equation 1,  $\phi_{patch}^{edge}$ in Equation 5), and lack clear justification (particularly for dual threshold clipping in Equation 4).
    • Methodological justification gaps: To me the advantages of the proposed method over previous work (Liang et al., 2023, Huang et al., 2024) is unclear. Also, the post-processing choices lack clear justification.
    • Dataset and evaluation concerns: It is unclear to me what is meant by using “normal” images from BraTS since this dataset only contains subjects with brain tumors. Do the authors here refer to slices without tumor in them?
    • Discussion: The discussion section inadequately addresses shortcomings of previous methods.
    • Structural and formatting issues: Implementation details, e.g., number of ViT layers, appear in methodology sections rather than in the dedicated implementation section 3.1. Table 1 exceeds page margins, and table captions are incorrectly positioned below tables rather than above.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the overall impression of the paper is good (ablation study, introduction) the comparison to the main competitor (Liang et al, 2023) seems inadequate and makes me wonder whether the authors actually improve over this method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed my main concern about the fairness of the comparison with Liang et al. In the final version of the paper, I suggest that the authors mention that they selected the best sequences for both methods. Otherwise, the comparison seems dubious.



Review #2

  • Please describe the contribution of the paper

    The manuscript proposed a new self-supervised framework for multi-sequence brain MRI anomaly detection. Specifically, this method proposes using Contrastive Alignment Loss to learn invariants between modalities, designs an Adaptive Fusion Module to merge style and edge features, and calculates dissimilarity between styles, achieving outstanding UAD performance on the BraTS 2021 dataset.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a clear motivation, innovatively integrating features between edges and content across different modalities, while combining both pixel-level and patch-level differences to enhance multimodal anomaly detection performance.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1.Designing constraints on disentangled latent representations to learn invariant anatomy across modalities for improving medical anomaly detection performance has been explored in previous works [1], which partially reduces the novelty of this work. [1] Zhang, Yinghao, et al. “A model-agnostic framework for universal anomaly detection of multi-organ and multi-modal images.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023. 2.The feature dissimilarity map contributes to the primary performance improvement, but the role of and AECM requires further discussion. Adding ablation results that only remove AECM would better demonstrate the separate contributions of cross-modality alignment and edge constraint. Additionally, I believe the strategy for considering similarity between source and target features could also apply to other translation-based methods. Further comparison between the results of the proposed method and applying this strategy to Cyclic methods would better clarify the contributions of and AECM. 3.The paper uses four MRI modalities from BraTS 2021 for training, but only reports the results of T1 to FLAIR and T1ce to FLAIR conversions. Additionally, the Cyclic UNet only shows T2/FLAIR-related results. How does the proposed method perform in T2/FLAIR-related detection?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Providing the rationale for utilizing only healthy T2 scans from the IXI dataset, along with a more in-depth analysis, including considerably comparation and necessary ablation between the proposed method with other translation-based approaches, would enhance the potential impact of this work.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a novel unsupervised anomaly detection framework with notable innovations in cross-modal alignment, multimodal style and edge feature fusion, and dual-level anomaly detection.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors resolved the concerns raised by me in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    Image translation-based anomaly detection method. Instead of compressing/corrupting and reconstructing a single scan, the authors propose conditioning on an alternative sequence (T1 vs FLAIR etc) and performing a style transfer between the two domains to determine the residual. They then also generate a coarse feature similarity map that when multiplied with the pixel-wise reconstruction supresses low pixel error and elevates high error.
    The authors evaluate against 5 reconstruction-based methods and a translation-based method. Additionally, they compare the works qualitatively in Fig 2 and ablate components in Table 2.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Extensive evaluation and ablation studies with improvements on the BraTS 2021 test set (Table 1). Even compare against another cyclic architecture and show strong improvements over that proving that it isn’t the sole source of the performance gain (Table 1). Concept is very new and growing for utilising cross-sequence reconstructions for anomaly predictions. Highly efficient in-comparison to previous diffusion based architectures (Diffusion: $\mathcal{O}(T)$ vs a single run of two encoders and a generator (this work)). Massive improvement in segmentation of small, fine-grained anomalies (Fig 2), which I know is a difficult issue with reconstruction methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Only a single anomalous test set is used. However availability of multi-sequence data is rare. Architecture diagram (Fig 1) feels quite overwhelming. Perhaps it is worth splitting the figure up into subfigures for example b and c are not part of the same image and additionally moving the cross model alignment module to where that is explained in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Couple of formatting issues could do with fixing - line overflowing on page 1 & 8, Tab 1 overflowing. Maybe rotate the text in the first column 90 degrees clockwise? If space is still a struggle, convert results to %s (x100). Authors cite [12] as similar cross-sequence works, and highlight differences but I would recommend adding a short sentence on efficiency as that’s the primary downfall with DDMs.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Cross-sequence anomaly detection is new, interesting and shows promise on smaller anomalies. This approach works incredibly well without a slow iterative generator like a DDPM. Whilst, I recommend acceptance, I emphasise fixing Table 1 and other minor overfull latex lines.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    After careful consideration, the reviewer has chosen to keep the initial accept recommendation. This reviewer’s primary concerns surrounded the overwhelming Fig 1. The authors in the rebuttal have explicitly indicated that Fig 1 “will be split into subfigures (translation workflow, feature map, detection) with consistent terminology (‘sequence’ instead of ‘modality’)”. While, this reviewer also highlighted that only one anomalous dataset had been used the high quality qualitative and quantitative assessments mitigate this.

    Furthermore, as other reviewer’s indicated the mathematical presentation and justifications have been explained in rebuttal and listing that they have updated the manuscript to address these is great. Finally, upon second review, this reviewer noticed the ViT artefacting in Fig 2, where perhaps a CNN generator / multi scale adapter [1,2,3] would’ve made a higher performing design decision. Additionally, this reviewer feels that while it is fantastic that Fig 2 includes so many other works, early works could be swaped for seeing the T1 input that the author’s model conditions on. Also, while it may be clear to some that “C” in Fig 1 is concatenate it is still an unlabelled operation ($\oplus$ as is used in Eq 7, may be better).

    [1] Kerssies, Tommie, et al. “Your ViT is Secretly an Image Segmentation Model.” arXiv preprint arXiv:2503.19108 (2025). [2] Chen, Zhe, et al. “Vision transformer adapter for dense predictions.” arXiv preprint arXiv:2205.08534 (2022). [3] Cheng, Bowen, et al. “Masked-attention mask transformer for universal image segmentation.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.




Author Feedback

We sincerely thank the reviewers for their constructive feedback. Below, we address major concerns and outline revisions to strengthen clarity and rigor. Novelty & Comparison with Prior Work Reviewer 1 & 3

  1. Zhang et al. (2023) vs. Our Work (R1): While Zhang et al. proposed a general multi-organ/modal framework using disentangled representations, our work introduces novel, brain MRI-specific innovations: (1) cross-sequence translation paradigm directly leveraging inter-sequence contrast discrepancies as primary biomarkers, avoiding reconstruction overfitting; (2) Anatomical edge constraints (AECM) and dual-level anomaly detection (pixel + feature) to preserve structural integrity and enhance sensitivity. (3) Most importantly, unlike Zhang’s single-sequence focus, our framework pioneers domain-specific disentanglement tailored to multi-sequence MRI analysis, establishing a new paradigm for neurological anomaly detection.
  2. Liang et al. (2023) (R3): Liang’s relies on single-sequence cyclic-translation (e.g., T2→FLAIR→T2), detecting anomalies via cycle inconsistency. In contrast, MultiTransAD pioneers cross-sequence translation (e.g., T1ce→FLAIR) to directly quantify inter-sequence contrast mismatches, avoiding pathology propagation. This fundamentally different paradigm explains the performance gap (Table 1: 0.5080/0.6334 vs. 0.4537/0.4966 in Dice). Even we remove the proposed modules, MultiTransAD can achieve competitive or better performance with Liang’s (ablation studies in Table 2). Another advantage of MultiTransAD is its excellent sensitivity to smaller anomalies by the advantage of dual-level anomaly detection (Figure 2, Case 1). In contrast, single-sequence reconstruction/cyclic-translation methods including Liang’s fail to capture them due to overfitting. Experimental Design & Fairness Reviewer 1 & 3
  3. Sequence Selection (R1,3): FLAIR is clinically prioritized for tumor detection (BraTS benchmark standard). While our framework supports any sequence pair (e.g., T2→T1), we report T1ce→FLAIR/T1→FLAIR due to their maximal contrast differences. Performance for other pairs (e.g., T2→FLAIR) remains competitive but not demonstrated due to page limit.
  4. Comparison Fairness (R3): All baselines report results for their optimal sequences and we think that’s a fair comparison. Liang’s method fundamentally differs (single-sequence cyclic-translation vs. cross-sequence translation), making direct sequence alignment infeasible.
  5. “Normal” Data in BraTS (R3): BraTS subjects are pathological, but individual slices may lack tumors. We selected normal slices from BraTS training data using official tumor masks. This ensures domain alignment with test data.
  6. Postprocessing (R3): Median filtering and 55-pixel threshold follow prior UAD works to suppress noise. Further ablation showed little Dice variation for thresholds 50–60 pixels. Revisions & Limitations Reviewer 1 & 2 & 3
  7. Visualization (R2,3): Figure 1 will be split into subfigures (translation workflow, feature map, detection) with consistent terminology (“sequence” instead of “modality”).
  8. Mathematical Clarity (R3): Equation terms (e.g., image patching $\phi_{patch}$ and edge projection $\phi_{patch}^{edge}$) are now defined explicitly in Section 2.2. Dual-threshold clipping (Eq. 4) is used to retain structural edges (mid-range gradients) and suppress noises and artifacts.
  9. Flexibility (R2): MultiTransAD can also be applied to other multi-squence/modality anomaly datasets and tasks, such as MSSEG2015/MSLUB datasets.
  10. Expanded Discussion (R3): Limitations of prior methods will be added, e.g., VAE’s overfitting and cyclic methods’ pathology propagation.
  11. Table 1 (R2,3): Adjust its column formatting and table caption. Broader Impact Our framework pioneers cross-sequence translation as a biomarker for MRI anomaly detection, validated by SOTA results (Dice: 0.6334 in BraTS) and clinician-aligned design. It can be extended to other pathologies (e.g., stroke) and modalities




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The reviewers broadly agree that the core idea—using cross-sequence translation with innovations like contrastive alignment and adaptive fusion—is promising and relevant. The method is well-motivated, empirically solid, and shows improved sensitivity to small anomalies.

    However, a few key points require clarification in the rebuttal. Most importantly, the comparison with prior work, particularly Liang et al. (2023), is not directly aligned in terms of sequence translation direction, making it unclear whether the performance gain stems from architectural innovations or different modality mappings. Clarification is also needed regarding the use of “normal” data in BraTS, which contains only pathological cases. It should be made explicit how normal slices were defined and selected. Finally, the post-processing strategy—e.g., excluding regions smaller than 55 pixels—should be briefly justified.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper receives an initial review of 1WR (R3), 1WA (R1), and 1A (R2). After rebuttal, all three reviewers change to Accept. The main concerns raised include: 1) R1 and R3 question the novelty compared to prior work, particularly Zhang et al. (2023) for disentangled representations and the fairness of comparison with Liang et al. (2023) since different MRI sequences are used, making it unclear whether the disentanglement module actually improves performance over pure translation-based approaches. 2) R3 points out that the authors only translate to FLAIR sequence without justification and lack mathematical clarity with undefined terms in equations. 3) R2 notes the limitation of using only a single anomalous test set, though acknowledges multi-sequence data availability is rare. 4) All reviewers criticize the overwhelming Figure 1 architecture diagram and formatting issues. However, the reviewers also recognize strengths including the novel cross-sequence translation paradigm for anomaly detection, extensive evaluation with strong improvements on BraTS 2021, high efficiency compared to diffusion models, and excellent performance on small anomaly detection. The authors’ rebuttal successfully addressed the main concerns about comparison fairness and experimental design, leading to acceptance from all reviewers.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a novel unsupervised anomaly detection framework that leverages cross-sequence MRI translation and dual-level residual analysis to detect brain anomalies.

    The reviewers unanimously appreciated the practical motivation, clear experimental improvements, and sensitivity to small anomalies, while minor concerns around figure clarity, formatting, and sequence comparison fairness were sufficiently addressed in the rebuttal. Given its methodological soundness, clinical relevance, and solid empirical support, I recommend acceptance.



back to top