List of Papers Browse by Subject Areas Author List
Abstract
Synthesizing multi-phase contrast-enhanced CT (CE-CT) images is clinically significant, as it can mitigate clinical risks such as radiation exposure and allergic reactions to contrast agents. However, existing methods treat multi-phase synthesis as separate tasks, failing to maintain the inter-phase dependencies and consistency between synthesized multi-phase CE-CT images. Moreover, the limited variability in CT intensity distributions makes it challenging to capture subtle variations in multi-phase imaging. For the first time, we propose a novel Causality-driven Spatio-temporal Generator (CSGen) for synthesizing multi-phase CE-CT imaging through three key novelties:
1) Using a novel phase-causality to creatively exploit the multi-phase variation content for driving the multi-phase CE-CT synthesizing, addressing the challenge of capturing multi-phase discriminative features through one model.
2) Introducing a new Spatio-temporal Transformer to establish the spatio-temporal correlation between multi-phase CE-CT images for leveraging multi-phase inter- and intra-dependencies and improving synthesis quality.
3) Multi-phase adversarial learning is designed for enhancing multi-phase discriminative feature learning. Experimental results (mean PSNR: 31.15, mean SSIM: 0.9066, mean NMAE: 3.17) demonstrate that CSGen outperforms state-of-the-art synthesis methods, and, for the first time, successfully synthesizes multi-phase CE-CT images.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2963_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{ZhuQik_Causalitydriven_MICCAI2025,
author = { Zhu, Qikui and Wu, Hao and Zhang, Yanyan and Li, Shuo},
title = { { Causality-driven Spatio-temporal Generator for Multi-phase Contrast-enhanced CT Synthesis } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15962},
month = {September},
page = {111 -- 120}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes a method to synthesize multi-phase contrast-enhanced CT (CECT) images—namely arterial, venous, and delayed phases—from a single non-contrast CT (NCCT) input. The authors argue that existing methods treat each phase generation as an independent task and fail to capture cross-phase dependencies and subtle intensity differences that are crucial in multi-phase CT imaging. To address these challenges, the paper introduces a causality-driven spatio-temporal generator architecture that incorporates phase-specific content representations and a transformer-based design to model inter-phase correlations. An adversarial latent autoencoder is used to extract phase-specific latent codes, and a multi-phase adversarial learning strategy is applied to supervise the synthesis across phases.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Problem Relevance: The task of synthesizing contrast-enhanced CT from NCCT has high clinical relevance, particularly in settings where contrast administration is contraindicated or unavailable.
Attempt to Model Cross-phase Relationships: The use of a spatio-temporal transformer and the design intent to model inter-phase dependencies acknowledge an important limitation of prior phase-isolated synthesis methods.
Structured Latent Representation: The idea of using phase-specific latent codes to condition synthesis introduces a degree of disentanglement, which can contribute to more controllable or interpretable generation under certain circumstances.
End-to-end Framework: The model is trained in an end-to-end manner, and the integration of latent encoding and adversarial training within a unified framework is well-aligned with trends in deep generative modeling.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
Misuse and Overstatement of “Causality” The term “causality-driven” is heavily emphasized but not supported by rigorous causal inference methodology. There is no evidence of intervention modeling, counterfactual reasoning, or explicit use of structural causal models (SCMs). The so-called “causality” is reduced to inserting a latent vector (representing contrast material and tissue response) into the key of a self-attention layer—a weak and insufficient gesture toward true causal modeling. This weak form of conditioning is better interpreted as structured representation learning or disentanglement, not causal inference.
-
Architectural Novelty is Limited The spatio-temporal generator largely follows the now-standard pattern of convolutional layers followed by attention blocks, as in many recent vision transformers. The cross-phase and inter-phase attention mechanisms are described in general terms, but the architecture does not introduce a fundamentally new design or mechanism beyond conventional multi-input attention. The phase-specific latent code conditioning is minimal and does not appear to influence the generation process in a significant, controllable, or interpretable way.
-
Weak Adversarial Training Design The “multi-phase adversarial learning” setup uses one discriminator per phase, which is a common and simplistic design in multi-domain GAN literature. There is no mechanism to ensure inter-phase consistency across generated phases (e.g., no shared discriminator, cycle consistency, or contrast dynamics regularization). Without any form of cross-phase constraint, it’s unclear how the adversarial framework enforces phase coherence, which is central to the paper’s motivation.
-
Insufficient and Narrow Evaluation The dataset is private and limited to 92 CT scans, including training and testing. This raises serious concerns about generalizability, reproducibility, and robustness. Evaluation metrics are limited to pixel-level similarities (PSNR, SSIM, NMAE), which do not reflect the clinical relevance or perceptual plausibility of generated CECT images. Intensity distribution plots are insufficient as a substitute for expert-based or task-based evaluations. Baseline comparisons are narrow, omitting many recent and relevant methods in one-to-one, one-to-many, and multi-modal medical image synthesis, especially in MRI literature, which has tackled analogous problems. No visual qualitative results are presented that convincingly demonstrate the superiority or radiological validity of the generated images.
-
Lack of Clinical Validation The work lacks any form of expert validation (e.g., radiologist scoring) or downstream task evaluation (e.g., lesion detection or segmentation). Without such validation, the claim that the generated images are clinically meaningful remains unsubstantiated. There is no discussion of potential risks, failure cases, or how the synthetic CECT images could be integrated into clinical workflows.
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Clarify the Use of “Causality” The paper should either adopt a more rigorous causal inference framework (e.g., structural causal models, interventions, counterfactuals) or drop the term “causality-driven” and instead describe the method as phase-aware or temporally structured.
Enhance the Generator Architecture Consider designing a mechanism that enforces inter-phase consistency explicitly—e.g., shared attention heads, temporal cycle losses, or anatomical structure alignment—rather than treating each phase independently.
Expand Evaluation Broaden the baseline comparisons to include more recent and diverse approaches. Incorporate both pixel-level and perceptual quality metrics (e.g., FID, LPIPS), and provide qualitative samples with radiological annotations or assessments.
Include Clinical Validation Engage medical experts to evaluate the realism and utility of the generated images. Ideally, this should include blind scoring or downstream task performance comparisons (e.g., diagnosis, segmentation).
Improve Dataset Transparency Consider releasing at least a portion of the dataset or providing detailed statistics on the patient cohort and acquisition parameters to help the community evaluate generalizability.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(1) Strong Reject — must be rejected due to major flaws
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Justification of Recommendation: This paper tackles an important problem—synthesizing multi-phase contrast-enhanced CT (CECT) images from a non-contrast CT (NCCT)—and proposes a method that attempts to model phase-wise dependencies through a spatio-temporal transformer and adversarial training. While the problem setting is relevant and the architectural framework is well-structured, the paper ultimately falls short in several critical areas that limit its contribution and credibility.
Major factors leading to rejection: Overstated Claims of Causality: The paper heavily emphasizes a “causality-driven” design, yet it lacks any rigorous treatment of causal inference. The modeling of latent variables such as contrast agent and tissue response is not operationalized in a way consistent with causal frameworks (e.g., interventions, counterfactuals). The use of the term “causal” appears more like a loose metaphor than a methodological foundation.
Limited Novelty in Architecture: The proposed spatio-temporal generator structure is built from standard components (convolutions + transformers), and the phase-conditioning mechanism via latent code insertion is minimal. These elements constitute a sound design but do not offer meaningful architectural innovation beyond current literature.
Weak Evaluation Protocol: The experiments are conducted on a small, private dataset of only 92 scans, and use only pixel-level similarity metrics (PSNR, SSIM, NMAE). The lack of clinically relevant evaluation, expert assessment, and baseline comparisons with more recent and diverse methods severely undermines the validity of the results.
Absence of Clinical Validation: Despite the clinical motivation, the paper offers no radiologist evaluation or evidence that the synthetic CECT images are diagnostically useful. Without this, the claim of clinical applicability remains speculative.
Incomplete Baseline Coverage: Many relevant works in one-to-one, one-to-many, and especially multi-modal medical image synthesis (e.g., MRI-based) are not considered. This limits the ability to assess where this method stands relative to the state of the art.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The authors have provided additional experimental results and architectural clarifications in response to the initial review. In particular, the inclusion of segmentation-based metrics such as Dice and RVD, and the comparison against a recent baseline, are welcome additions that strengthen the empirical aspect of the paper to some extent.
However, the central concerns—particularly regarding the overstatement of causal modeling, limited architectural novelty, lack of inter-phase consistency enforcement, and insufficient clinical validation—remain largely unresolved.
The authors attempt to position their method as causality-driven, presenting a causal diagram where the non-contrast CT serves as anatomical context and the phase-specific latent codes are treated as intervention variables. However, while the language of causal inference is used, the actual implementation does not meet the essential criteria of a structural causal model. Specifically:
There is no formal definition of a structural causal model, such as a set of structural equations or a generative process involving independent external factors.
The latent codes are learned internal representations derived from contrast-enhanced images and do not represent independent variables that can be meaningfully manipulated.
The idea of intervening on latent codes is better understood as conditional generation based on learned features, rather than a true interventional operation in the causal inference sense.
The model does not simulate alternative outcomes under different contrast conditions for the same anatomy, and thus cannot be considered to perform counterfactual reasoning.
In this light, the use of terms like “causality-driven” is conceptually misleading. The work is more accurately described as a structured or disentangled representation learning framework that models phase-specific image generation, rather than one grounded in causal inference.
From an architectural standpoint, while the model is well-structured and tailored to the task, it relies heavily on existing components such as convolutional networks, transformer blocks, and phase-wise latent code conditioning. These are standard tools in generative modeling, and their combination here, though coherent, does not constitute a significant architectural innovation.
The adversarial training approach also follows a conventional design. Using separate discriminators for each phase is a common practice, and the training setup does not include mechanisms to enforce consistency or coherence across generated phases—despite such consistency being a core motivation of the paper.
The segmentation-based evaluation is appreciated, but the overall evaluation protocol remains limited. Without perceptual quality measures or expert assessment, it is difficult to judge the clinical relevance or utility of the generated images. The claim of practical applicability remains speculative without such validation.
In summary, the paper addresses a clinically relevant problem with a sound technical pipeline. However, the causal claims are not sufficiently grounded in formal methodology, and the architectural and evaluative contributions fall short of representing a substantial advancement. I therefore maintain my original recommendation for rejection.
Review #2
- Please describe the contribution of the paper
This paper introduces a novel casuality-driven spatiotemporal transformer for synthesizing multi-phase CECT imaging simultaneously.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea of a causality-driven framework that decouples phase and content is interesting, and the paper presents solid ablation studies demonstrating the necessity of each module.
- The task of generating multi-phase CECT is important and highly relevant to clinical applications.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The writing is somewhat unclear. 1.1 In Section 2.3, the loss function is difficult to follow—for instance, it is unclear where x^i originates from. Given that this component is a key contribution of the paper, additional explanation and clarification are needed. 1.2 In Section 2.1, the dimensionality of z is not specified. Does the codebook contain only three codes? If so, it is unclear how such a limited representation is sufficient for accurate image reconstruction in the first stage. Additionally, \hat{y} in Equation (2) is undefined. 1.3 The figure 2.1 is hard to follow, and I can’t find any explanation for this figure in the text.
- Regarding the tumor-wise evaluation, the rationale for computing PSNR, SSIM, and NMSE solely within tumor regions is not well motivated. If the goal is to assess sensitivity to tumor-specific features, a more meaningful approach might be to evaluate performance on downstream tasks, such as tumor segmentation.
- The choice of baseline methods is not comprehensive. In particular, considering the strong performance of diffusion-based architectures in recent image generation and translation tasks, the absence of any diffusion-based baseline comparisons is a notable omission.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Minor points:
- The font size in two figures is too small, making them difficult to read. Please consider adjusting the layout to improve figure clarity and overall readability.
- The paper would benefit from additional implementation details to enhance reproducibility. For example, please clarify the model architecture and its hyperparameters, the volume size used during training, and the method used for multi-phase image registration.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The idea of decoupling phase properties from content is interesting and the task of generating multi-phase CECT is important. The ablation studies are solid to show the effects of proposed modules. However, due to the unclear writing, I give the current score.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have addressed most of my concerns, so I keep my current score for acceptance.
Review #3
- Please describe the contribution of the paper
Authors introduce a novel framework for synthesizing three-phase contrast-enhanced CT (CE-CT) images from non-contrast CT (NCCT) using a single model. The proposed method, Causality-driven Spatio-temporal Generator (CSGen) addresses key limitations in existing methods by ensuring cross-phase consistency and eliminating the need for contrast agents. The approach combines phase-causal modeling, a cross-phase Spatio-temporal transformer (StTransformer), and multi-phase adversarial learning (MpAL) to capture structural and temporal relationships across phases. Authors validate CSGen on real CE-CT datasets, which achieves superior performance over state-of-the-art techniques and demonstrates strong potential for safer, contrast-free diagnostic imaging.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
CSGen main strength is that it is the first to synthesize three-phase contrast-enhanced CT (CE-CT) images from non-contrast CT (NCCT) using a single model. It introduces a causal graph reasoning approach to capture phase dependencies and a cross-phase spatio-temporal transformer that effectively models both spatial and temporal correlations across imaging phases. Additionally, the proposed multi-phase adversarial learning enhances the discriminator’s ability to distinguish phase-specific features, improving the quality of synthesized images. The method is comprehensively validated on real CE-CT datasets, showing superior performance across metrics and demonstrating strong clinical relevance by eliminating the need for contrast agents. These contributions make the work both technically innovative and potentially impactful in advancing better and safer diagnostic imaging.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Authors did not address any limitations of the proposed method. While authors demonstrate robustness by evaluating their CSGen framework across various synthesis techniques and achieving notable metrics such as higher SSIM and PSNR and lower NMAE, they fail to acknowledge potential shortcomings. For instance, it remains unclear how well the method generalizes across different anatomical regions, imaging conditions, or scanner variations. Discussing these aspects would strengthen the paper’s credibility and help assess its broader clinical applicability.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Authors should revise the in-text citation numbering to ensure that citations appear in the correct sequential order.
In Fig. 1, point 2), the phrase “We uses” should be corrected to “We use.”
In Fig. 3(a), the authors list several methods; CycleGAN, AGAN, CyTran, CausalGAN, and True CT, but it is unclear which of these represents the proposed method, CSGen. Authors should clarify this by labeling their method with a clear and consistent acronym, and ensure it is easily distinguishable from the baselines used for comparison.
Authors must include more references to prior work to support the novelty and contextual relevance of their proposed method.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(6) Strong Accept — must be accepted due to excellence
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Authors present a highly novel and technically sound framework for synthesizing multi-phase contrast-enhanced CT (CECT) images from non-contrast CT (NCCT) using a single model. The integration of phase-causality, a spatio-temporal transformer, and multi-phase adversarial learning demonstrates a well-rounded and innovative approach that addresses the core challenge of phase-consistent synthesis. Authors provide extensive evaluations across multiple datasets and consistently outperform state-of-the-art methods across multiple quantitative metrics. These strengths, combined with the clinical relevance and potential to reduce reliance on contrast agents, justify a strong recommendation.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Although the authors did not initially address limitations such as generalizability to different anatomical regions, imaging conditions, or scanner variability, they acknowledged in the rebuttal that the current study is based on a single hospital and committed to validating the approach on multicenter datasets in future work. Given the technical strength of the method, the quality of the results, and the willingness of the authors to incorporate these important clarifications, I believe that the paper deserves acceptance.
Author Feedback
We thank Reviewer#1(R1) for acknowledging that our work “high clinical relevance,” addresses “an important limitation of prior methods,” “contributes to more controllable interpretable generation”. Reviewer#2(R2) for recognizing the “interesting idea, solid ablation, important and highly relevant to clinical”. Reviewer#3(R3) for highlighting that our work “highly novel and technically sound, addresses key limitations, achieves superior performance, strong potential for contrast-free diagnosis, technically innovative.” To R1: 1) Misuse Causality: Our method aligns with the principles of causal representation learning, framing multi-phase CE-CT synthesis as a counterfactual reasoning task. The NCCT as the anatomical context X, the phase-specific latent codes C_i serve as phase intervention variables. The synthesized multi-phase CE-CT images Y_i represent counterfactual outcomes, modeled as P(Y_i | do(C_i = 1), i), where i∈{AP,VP,DP}. Based on above, our model generates phase-specific images through causal representation learning.
2) Architectural Novelty: The novelty of our work lies in formulating multi-phase CE-CT image synthesis as a sequence synthesis task, while establishing spatio-temporal correlations between multi-phase images. Our StTransformer with two attention modules is newly designed to model spatio-temporal correlation between multi-phase images.
3) Adversarial Training: Our adversarial training does not focus solely on the differences between one synthetic and real CE-CT image. It incorporates multiple true CE-CT as negative samples to enable multi-phase adversarial learning, enhancing discriminator to capture phase-specific features.
Table 1. Compared with latest method.
. - Holistic - - Local - Method PSNR SSIM NMAE PSNR SSIM NMAE AdverDM[1] 30.37 0.8842 3.53 30.29 0.8796 3.57 Our 31.15 0.9066 3.17 31.20 0.9120 3.05 4) Evaluation: To the best of our knowledge, ours is the largest four-phase CE-CT dataset. We evaluated performance from holistic and local perspectives, providing a comprehensive evaluation. The latest method is compared (Table 1).
Table 2. Segmentation performance.
Method Dice RVD CycleGAN 0.57 ± 0.42 0.53 ± 0.42 AGAN 0.58 ± 0.41 0.52 ± 0.47 CyTran 0.58 ± 0.39 0.52 ± 0.43 AdverDM[1] 0.60 ± 0.38 0.50 ± 0.39 Our 0.62 ± 0.37 0.48 ± 0.36 5 ) Clinical Validation: We used VNet pre-trained on the LITS dataset to segment liver tumor from synthesized arterial-phase CE-CT images for clinical validation. Our method achieved the highest Dice 62% (Table 2), highlighting its clinical utility.
To R2:
1) y^i in loss: y^i represents the true CE-CT image, i∈{VP, AP DP}.
2) Dimension of codebook: Each phase has a codebook of dimension R^(n×h×w) , where 𝑛=1024, ℎ=𝑤=8. Three codebooks are used for the VP, AP, and DP phases.
3) Fig. 2.1: Fig. 2.1 is the causal diagram of phase-causality. The multi-phase CE-CT images y^i are generated based on two independent variables: contrast agent A and content C. The responses from various organs (C^i, Cause) caused by CA generate the multi-phase CE-CE imaging (y^i, Effect).
4) Tumor segmentation: We used VNet pre-trained on the LITS to segment liver tumors from synthesized arterial-phase CE-CT images for clinical validation. Our method achieved the highest Dice 62% (Table 2), highlighting its clinical utility.
5) Compared with diffusion: We compared method with the latest diffusion-based method[1], achieving the best average PSNR, SSIM, NMAE across three phases (Table 1).
To R3: Shortcoming: Our study is based on a single hospital. Future work will involve multi-center datasets to assess generalizability and clinical utility. [1] Zhu et al. Cross domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen CT imaging. Pattern Recognition (2025):111695.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
Main issues were addressed after the rebuttal. I recommend to accept this paper though minor issues still exist.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The manuscript was rejected after rebuttal due to inadequate responses to key concerns. Although the authors included experimental results, crucial issues persisted. The claims of causal modeling were flawed, lacking a formal structural causal model and counterfactual reasoning, which made the descriptions misleading. The model combined standard components without notable innovation and lacked consistency mechanisms. Evaluation was limited to segmentation metrics, without perceptual or clinical validation. Despite the relevance of the problem, the flawed causal claims, modest novelty, and insufficient validation led to rejection, even with the authors’ mention of future multicenter plans.