Abstract

Unpaired medical image synthesis aims to provide complementary information for an accurate clinical diagnostics, and address challenges in obtaining aligned multi-modal medical scans. Transformer-based models excel in imaging translation tasks thanks to their ability to capture long-range dependencies. Although effective in supervised training, their performance falters in unpaired image synthesis, particularly in synthesizing structural details. This paper empirically demonstrates that, lacking strong inductive biases, Transformer can converge to non-optimal solutions in the absence of paired data. To address this, we introduce UNet Structured Transformer (UNest) — a novel architecture incorporating structural inductive biases for unpaired medical image synthesis. We leverage the foundational Segment-Anything Model to precisely extract the foreground structure and perform structural attention within the main anatomy. This guides the model to learn key anatomical regions, thus improving structural synthesis under the lack of supervision in unpaired training. Evaluated on two public datasets, spanning three modalities, i.e., MR, CT, and PET, UNest improves recent methods by up to 19.30% across six medical image synthesis tasks. Our code is released at https://github.com/HieuPhan33/MICCAI2024-UNest.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0456_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0456_supp.pdf

Link to the Code Repository

https://github.com/HieuPhan33/MICCAI2024-UNest

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Pha_Structural_MICCAI2024,
        author = { Phan, Vu Minh Hieu and Xie, Yutong and Zhang, Bowen and Qi, Yuankai and Liao, Zhibin and Perperidis, Antonios and Phung, Son Lam and Verjans, Johan W. and To, Minh-Son},
        title = { { Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors identify a limitation of Transformer-based models in unpaired image synthesis due to a lack of strong inductive biases, leading to suboptimal solutions. To overcome this, they introduce UNest, which incorporates structural inductive biases to enhance structural synthesis in unpaired settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper empirically demonstrates that, lacking strong inductive biases, Transformer can converge to non-optimal solutions in the absence of paired data.
    2. This paper introduces a simple-yet-effective architecture, coined UNet Structural Transformer (UNest), applying a dual attention strategy: structural attention for the foreground and local attention for the background.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Add arrows in Fig. 1(a) to annotate the areas where the synthetic effects differ.
    2. The description of how the Attention maps in Fig. 1(b) were obtained is not clear.
    3. In Fig. 2(c), is there any special meaning to the colors of the encoders and decoders for UNest 𝐺XY and UNest 𝐺YX? This is not explained. Why do the encoder of UNest 𝐺XY and the decoder of UNest 𝐺YX have the same color?
    4. The diagram of CycleGAN in Fig. 2(c) is not detailed. CycleGAN has two generators and two discriminators, but there is only one discriminator in Fig. 2(c), which could mislead readers.
    5. How can you ensure that the ground-truth binary mask obtained by SAM is accurate?
    6. The results of AttentionGAN in Fig. 3 are noticeably smaller in size compared to the results of other models, which is not explained.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In Fig. 1(a), the addition of arrows to highlight the areas where the synthetic effects differ would greatly enhance the clarity and impact of the figure.
    2. Please provide a clearer description of how the Attention maps in Fig. 1(b) were obtained.
    3. It is important to clarify whether the colors used for the encoders and decoders in UNest 𝐺XY and UNest 𝐺YX have any special meaning in Fig. 2(c).
    4. The diagram of CycleGAN in Fig. 2(c) should accurately represent the architecture of the model. Please revise the diagram to clearly show both generators and both discriminators to ensure the accuracy of the representation.
    5. Please provide details on how you ensure the accuracy of the ground-truth binary mask obtained by SAM.
    6. The noticeable difference in the size of the results from AttentionGAN compared to other models in Fig. 3 should be explained.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an important problem in unpaired medical image synthesis by proposing a novel architecture to improve the synthesis of structural detail. The authors identify a limitation of Transformer-based models in unpaired image synthesis due to a lack of strong inductive biases, leading to suboptimal solutions. To overcome this, they introduce UNest, which incorporates structural inductive biases to enhance structural synthesis in unpaired settings. But there are some details in the figures of the paper that need to be modified and explained.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors have proposed a method for translation using attention based on segmentation and pixel features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method introduces semantic information by incorporating segmentation with SAM.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method segments only the foreground and background, which is somewhat coarse. It would be more interesting and potentially beneficial to use segmented tissues instead.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It is difficult to be convinced that segmenting the foreground and background will significantly enhance performance. The authors should provide more detailed explanations and conduct ablation studies.

    Additionally, the authors could explore the feasibility of incorporating tissue or organ segmentation to potentially improve results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Method novelty

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel architecture called UNet Structured Transformer (UNest) to address the limitations of Transformer models in unpaired medical image synthesis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation of this paper is clear. In response to the challenge of unpaired medical image synthesis, a structural induction bias is incorporated into the Transformer architecture to focus on discriminative areas, thus enhancing the synthesis of anatomical structures in unpaired image synthesis.
    2. This paper provides sufficient experiments and detailed qualitative and quantitative experimental results.
    3. The visualization effect of this paper is clear, such as Fig. 5, which is easy to assist in understanding the visual focus area of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There are grammar errors and logical problems in this paper. It is recommended to check and polish the context of the entire text.
    2. It is suggested that this paper supplement the quantitative comparison results between the proposed algorithm and the comparative algorithm in terms of experimental efficiency, such as parameter and FLOPs.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the above strengths and weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has a clear description and detailed experiments. It is recommended to accept it with slight modifications.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top