Abstract

Multi-sequence magnetic resonance imaging (MRI) plays a critical role in tumor diagnosis but relies heavily on manual interpretation, which is both labor-intensive and dependent on expert knowledge. While deep learning-based diagnostic methods show significant potential, they typically require large datasets for effective training. However, the high cost of data collection and annotation often limits the available dataset size. This highlights the need for models that can effectively train on small datasets, mitigate overfitting, and achieve reliable performance. To address these challenges, we propose RadioFormer, a novel model that incorporates radiologist inductive bias to facilitate efficient learning on small MRI datasets. Unlike traditional 2D or 3D architectures, RadioFormer emulates the radiologist’s diagnostic process by explicitly parsing MRI data into three hierarchical levels: (1) single-sequence slice feature extraction, (2) multi-sequence slice information aggregation, and (3) inter-slice information (volume) aggregation. Each level builds upon the previous one, ensuring smooth information flow and a hierarchical understanding of lesion characteristics. By integrating expert knowledge into its design, RadioFormer effectively leverages inductive bias to enhance model generalization on small datasets. We evaluated RadioFormer on three public datasets for brain, breast, and liver tumor classification, where it achieved state-of-the-art performance across all tasks. The code and pre-processed data for RadioFormer are available at https://github.com/aa1234241/RadioFormer/tree/master.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1494_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/aa1234241/RadioFormer/tree/master

Link to the Dataset(s)

N/A

BibTex

@InProceedings{BaiXia_RadioFormer_MICCAI2025,
        author = { Bai, Xiaoyu and Xia, Yong},
        title = { { RadioFormer: Integrating Radiologist Inductive Bias for Tumor Classification on Multi-Sequence MR Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {548 -- 558}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a transformer based architecture including three main steps to merge information from multiple MR sequences and from 3D image series. The approach is inspired by the radiologists’s approach to manual image examination and inductive bias. The approach is evaluated on thee public datasets of cancer and MRI and extensively compared to other approaches. The numerical results show an advantage of the approach is most cases.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is relatively well written. The approach is extensively evaluated and compared to other approach on three datasets. Ablation studies demonstrates the impact of all three steps of the approach and their ordering.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While the approach seem to perform relatively well in most datasets, its novelty and justification remains weak. It consists essentially of a combination of well established transformer blocks and the analogy to human interpretation of medical images via inductive bias remain conceptual. The authors claim that the approach is expected to perform best in low data setting, but no experiment is specifically evaluating this.

    On key component of the approach is also to crop on the tumor region, which implies a very strong and costly requirement to have tumor masks. As far as I understand, this is also a key but unfair advantage when compared to all other methods used in Tables 1 and 2 which are not using the tumor masks. To address this, an ablation study of the crop operation is needed.

    While relatively well written overall, the paper could be much clearer on several aspects. First, the text descriptions in Section 2 do not align well with the architecture description in Fig. 2. It would be very useful to introduce feature representations notations in Fig. 2 to follow. Expectations are not provided for the purpose of the MSA module. An ablation study of the latter would be beneficial. For all results based on average performance with the LLD-MMRI dataset, bootstrapped confidence intervals (CI) should be provided based on fold-wise performance estimations. This would be particularly important to assess the statistical significance of the ablation studies presented in Tables 3 and 4. It is not clear if the results in Table 3 correspond to the LLD-MMRI dataset. Tables 3 and 4 would be presenting stronger results if run over over the test set and over all 5 folds with CI reported. It is not clear if results presented in Table 4 correspond to the validation or test set of LLD-MMRI dataset (Fold 1). What is meant by single- or multi- “phase” tokens in Section 2.1 is unclear.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While extensively evaluated and compared to other methods over three datasets, the novelty/justification remains limited and the advantage of performance might be primarily attributed to the cropping operation of the approach. Minor flaws of the writing/clarity also hinders the understanding of reported results and method.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    While the authors clarified my concern about unfairness of cropping, many aspects remain unclear and unjustified. In particular, the authors’ rebuttal do not mention or provide information on how they plan to make the manuscript more clear in all aspects raised by the reviewers.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a radiologist-inspired model, RadioFormer, for tumor classification in multi-sequence MRI data. The model architecture is built upon the vision transformer (ViT) and is structured into three stages: single-sequence slice feature extraction, multi-sequence slice information aggregation, and inter-slice information aggregation. The model is evaluated on three public MRI datasets, and the results are encouraging.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The radiologist-inspired design is interesting.
    2. A comprehensive evaluation of the proposed model is performed on three diverse MRI datasets.
    3. The paper is well-written and easy to follow.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The authors mention that during training and testing, lesion volumes are cropped randomly. This raises concerns about the potential exclusion of regions of interest (ROIs). How did the authors ensure that ROIs were not inadvertently excluded during the random cropping process?
    2. The proposed model is built based on ViT. The authors should clarify why ViT was chosen over other transformer variants. Additionally, an ablation study comparing different transformers would strengthen the claim that ViT is the most appropriate choice.
    3. Most of the baseline methods used for comparison are relatively dated, with only one from 2025. Why were more recent methods (especially those from 2024) not included in the comparison?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Please include the statistical evidence in the Abstract and Conclusion to make it easier for the readers to get insight about the research work.
    2. Please include a bulleted list in the Introduction section to clearly outline the primary objectives of the research work.
    3. Acronyms such as “ViT” should be spelled out in their first use.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In my opinion, while the method is interesting, concerns regarding the selection of baseline methods and the clarity of certain methodological choices should be addressed in the rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed my concerns. Therefore, I recommend the paper for acceptance.



Review #3

  • Please describe the contribution of the paper

    The authors introduce RadioFormer, a radiologist-inspired hierarchical Transformer model designed to learn from modestly sized tumor datasets effectively. The proposed RadioFormer is examined on three public datasets and achieves the best performance across all three datasets, highlighting its superior capability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed RadioFormer

    • has a novel architecture for 3D image classification.
    • achieves competitive performance compared to some existing models.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Suspected desk rejection. 1) ‘Do not modify or remove the provided anonymized author section’ is a requirement for the MICCAI 2025 paper template. 2) ‘Please ensure your paper is properly anonymized.’ The link to the code repository should use anonymous GitHub. 3) ‘Authors are not allowed to change the default margins, font size, font type, and document style.’ The default color for URLs in the LaTeX template and final publishing PDF file is blue (\renewcommand\UrlFont{\color{blue}\rmfamily}), not green or red. 4) The table format should follow the published paper in the MICCAI proceedings.
    • The name of the proposed method, RadioFormer, has been used in a preprint paper (https://arxiv.org/abs/2504.19161).
    • References should be up to date, for example, the paper for the ReMIND dataset has been published. Juvekar, P., et al.: ReMIND: The Brain Resection Multimodal Imaging Database. Sci Data 11, 494 (2024)
    • The Reference section should follow the Springer Reference Style. Please refer to the previous papers in the MICCAI proceedings.
    • The source code cannot be validated during review because there is no code in the repository that the authors provided (https://t.ly/l6mFr, https://github.com/aa1234241/RadioFormer/tree/main). Could the authors provide an anonymized link to the source code for review?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is innovative for 3D image classification. However, the authors provided no code in the repository, and the link to the code repository should use anonymous GitHub.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    My concerns are resolved after reading the authors’ rebuttal.




Author Feedback

We sincerely thank all reviewers for their time and constructive feedback. We are encouraged by Reviewer 2 and Reviewer 4 acknowledging our radiologist-inspired design, innovative architecture, and competitive performance.

  1. Novelty and Justification (R1, R2) While RadioFormer uses standard Vision Transformer (ViT) blocks, its core novelty lies in the hierarchical architecture specifically designed to emulate a radiologist’s step-wise diagnostic reasoning. This bio-inspired structure, rather than the specific Transformer variant, is the key innovation leading to performance gains. The analogy to human interpretation informs the three-level processing hierarchy (single-sequence slice, multi-sequence slice, inter-slice aggregation). We also tested Swin transformers at the first level and observed comparable results.

  2. Tumor Region Cropping and Fairness (R1, R2) All methods were evaluated using the same pre-processed tumor-cropped subvolumes derived from dataset annotations. Starting from ~512×512×n MRI volumes, we first register all sequences, then crop tumor regions based on dataset-provided annotations. The cropped volumes are resized to 128×128×16, and during training, we apply a random crop of 114×114×12 as data augmentation. This crop introduces only a small margin, making it highly unlikely that the tumor is excluded. This standardized preprocessing ensures fair and consistent comparison.

  3. Low Data Setting (R1) Results on modest datasets (ABL: 94 cases, ReMIND: 71 cases) suggest effectiveness in data-limited settings. Dedicated low-data experiment is future work.

  4. MSA Module (R1) MSA is standard in our Transformer blocks. Its role is standard attention. Ablating MSA removes core Transformer function. No custom or non-standard attention modules are used.

  5. ViT Choice/Ablation (R2) ViT was chosen for performance/simplicity. Focus was hierarchical design, not variant comparison.

  6. 2024 Method Evaluation (R2) We evaluated VideoMamba (Li et al., 2024) on the LLD-MMRI Phase I dataset using 5-fold cross-validation. The F1/Kappa scores for each fold are: 0.71 / 0.68 0.68 / 0.68 0.71 / 0.66 0.69 / 0.61 0.68 / 0.64 These results are consistently lower than those of our proposed RadioFormer, further validating its effectiveness.

  7. Anonymized Source Code (R4) We have released pretrained models and the cropped tumor subvolumes on the LLD-MMRI dataset to facilitate reproducibility. The code is available at the following anonymized link: https://anonymous.4open.science/r/RadioFormer-03E6

  8. Naming and Format Issue (R4) We appreciate the reviewer’s suggestion and will revise the naming and formatting accordingly in the final version.

  9. Results in Table 3 and 4 (R1) Table 3 presents results on the validation subset of Fold 1 in the LLD-MMRI Stage I protocol, which consists of 316 cases (251 for training and 65 for validation). In contrast, Table 4 reports results on the official LLD-MMRI Stage I test set, using the model trained on the Fold 1 training/validation split—the same setup as in Table 1.

  10. Report the bootstrapped confidence intervals (CI). We report bootstrapped 95% confidence intervals (CI) over 500 resamples for each of the 5 folds in Table 1. Metrics are reported as F1 / Kappa intervals: [67.9 88.6]/[66.0 87.6] [58.1 81.3]/[58.8 81.5] [65.1 85.1]/[62.4 85.3] [65.5 86.1]/[64.5 86.6] [68.5 87.6]/[66.7 87.1]




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors did a good job in the rebuttal, including addressing most of the technical concerns from R1. I think this interesting paper which is ‘radiologist-informed’ would raise discussion at MICCAI.



back to top