Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

CardioMetabolic Risk (CMR) assessment requires numerous risk factors derived from anthropometric measurements, sphygmomanometry, and blood tests. Deep Learning enables CMR factors to be acquirable from a medical image (e.g., fundus), however, model-per-factor approach is insufficient solution in cost-efficiency. It is also challenge to predict multiple factors simultaneously from a single image, since the CMR factors are inter-correlated among themselves but also correlated with fundus features in various depths. To address this challenge, we propose Self-Propagative multi-task Learning (SePL) which utilizes comparatively simple 6 CMR factor predictions as prior knowledge to guide predicting more complex CMR factors. The proposed SePL propagates its initial predictions to a latent space, enriching unimodal features into multimodal representation. A discriminative mixture of experts leverages the relevant prior for 9 CMR factor predictions. The training and testing of SePL use 5,232 sets of fundus images and corresponding CMR factors. Experimental results demonstrate that the proposed SePL outperforms the existing methods up to 10.46% of AUC and 8.07% of MAE across all 15 CMR factor predictions. The code is available at https://github.com/shko0215/SePL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2691_paper.pdf

SharedIt Link: https://rdcu.be/eG4D5

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05182-0_55

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/shko0215/SePL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{KoSeo_SelfPropagative_MICCAI2025,
        author = { Ko, Seonghyeon AND Yang, Huigyu AND Bum, Junghyun AND Le, Duc-Tai AND Choo, Hyunseung},
        title = { { Self-Propagative Multi-Task Learning for Predicting Cardiometabolic Risk Factors } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {564 -- 573}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a novel model, Self-Propagative Multi-task Learning (SePL), which leverages the prediction of six relatively simple CMR factors as prior knowledge to guide the estimation of more complex CMR factors.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The model integrates multi-task and multi-modal learning, combining fundus images and anthropometric data to effectively estimate CMR-related variables.
2. This approach has potential clinical utility for improving CMR risk assessment.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Figure 1 could be improved with appropriate rescaling to enhance clarity and readability.
2. The predicted values for gender, age, height, weight, BMI, and waist may not be essential, and the approach would be more convincing if compared against models that use direct anthropometric inputs.
3. Since BMI can be directly derived from height and weight, treating it as an independent prediction target may introduce redundancy.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In the experiment, comparing anthropometric factors is unnecessary, as they are predetermined.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper proposes a multi-task learning framework to simultaneously predict 15 cardiometabolic risk (CMR) factors from retinal fundus images. Traditionally, these CMR factors require various measurements (e.g., anthropometric data, sphygmomanometry, and blood tests), making assessment time-consuming and resource-intensive. Previous works relied on training separate models for each factor, which is computationally inefficient.

To address this, the authors introduce the Self-Propagative multi-task Learning (SePL) framework. This method first predicts six relatively simple CMR factors (such as age, height, and BMI) and then leverages these as prior knowledge to guide the prediction of the remaining nine, more complex factors. A key component is the Discriminative Mixture of Experts (DMoE) module, which fuses image features with the initially predicted anthropometric factors using a gating network and expert network structure. The framework is evaluated on a dataset of 5,232 fundus images.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper introduces an innovative approach that simulates multimodal prediction using only unimodal input (fundus images), by first predicting simple anthropometric features and then propagating them for more complex risk factor estimation.

The single-model multitask setup is more computationally efficient than training one model per target, which is a significant practical advantage in clinical deployment.

The paper is well-written, with clear motivation, thorough architectural explanations, and alignment with clinical reasoning behind the model design.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The comparison with prior state-of-the-art methods is limited. While the cited works span 2018–2022, many rely on older architectures (Inception, MobileNet, DenseNet), whereas the proposed method builds on ConvNeXt (CVPR 2022). This introduces a potential unfair advantage and makes it difficult to isolate the benefits of the proposed framework from those of the backbone architecture.

No statistical testing is performed to support claims of performance improvement. Confidence intervals or p-values comparing the SePL method with baselines are absent, making it hard to assess the significance of observed gains.

The performance boost appears to stem primarily from the multitask learning framework rather than the self-propagative structure. However, without statistical comparisons, this is difficult to confirm.

The clinical relevance of predicting simple anthropometric features such as age, gender, or height from fundus images may be limited, as these values are readily available in most clinical settings.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Is it necessary to use deep learning for predicting features such as age, height, or gender—values that are routinely available in clinical workflows? The use of complex models for predicting these factors may be seen as over-engineering.

The drop in baseline model performance (ConvNeXt) on the anthropometric features is surprising—particularly when simpler models like MobileNet or DenseNet outperform it. Can you explain this discrepancy?

In future work, the integration of imaging and anthropometric data using dedicated multimodal learning frameworks could provide a more robust setup. I recommend checking those papers: Pölsterl, S., et al. Combining 3d image and tabular data via the dynamic affine feature map transform. MICCAI2021 Grzeszczyk, M.K., et al. Tabattention: Learning attention conditionally on tabular data. MICCAI2023 Hager, P., et al. Best of both worlds: Multimodal contrastive learning with tabular and imaging data. CVPR2023 Du, S., et al. TIP: Tabular-image pre-training for multimodal classification with incomplete data. ECCV 2024 Grzeszczyk, M.K., et al. TabMixer: Noninvasive estimation of the mean pulmonary artery pressure via imaging and tabular data mixing. MICCAI2024
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite its limitations in terms of comparative evaluation and lack of statistical analysis, the paper presents a compelling framework for multitask prediction of CMR factors using only fundus images. The idea of simulating multimodal learning by predicting simple factors and leveraging them to guide more complex predictions is innovative.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper
Contribution:
1. This paper proposed the self-propagative multi-task learning (MTL) framework, which adopts a structured MTL approach to (1) predict easier labels from the input image and (2) predict the harder labels using both the input image and the predicted easier labels. In the second step the mixture-of-expert approach is used to further improve the performance
2. Experiments on multi-task regression illustration both higher accuracy and efficiency of the proposed model.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The framework is innovative and clear. It utilize the domain prior knowledge to regularize and guide the MTL learning process and improve the performance.
2. The computational cost is also significantly lower than the baselines models.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Currently the division of easy / hard tasks seems to require domain knowledge, which potentially limit the use of the proposed approach
2. Currently used easier labels (e.g. patient weight and height) are very easy to get in practice, which decrease the significance of the proposed approach. For those features perhaps directly using the existing labels from the dataset might give similar performance.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
1. It would good to provide more details on how the easy / hard cases are divided
2. It would be also quite interesting to check the expert weights and see which ones are more informative, which may bring us some new knowledge.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The framework is innovative and clear.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We are deeply grateful for your constructive feedback and will modify the article according to your insightful suggestions. R1/Q7-1: We believe it’s a reasonable and general strategy. In many multi-task medical imaging scenarios, one can identify an easily obtainable target which has a strong correlation with the input. In future applications, one could also use data-driven methods to help determine which tasks to treat as priors, thus reducing reliance on manual selection. R1/Q10-1: The case division is mainly upon the fact that anthropometric measures are readily available and intrinsically related to the other CMR factors. We will provide more details on the camera-ready paper. R1/Q10-2: It would indeed provide insight into which predicted priors are most influential for each complex task. We empirically identify the 4 experts show the best performance with reasonable weights. We will provide the figure for expert activations in future extension work. R2/Q7-1: We will rescale Figure 1 to improve its clarity. R2/Q7-2: We include BMI as an independent prediction task from a learning perspective. Prediction of BMI directly provides an additional supervisory signal that help the model. There could be image cues (such as certain retinal fat deposit indicators or vascular changes) that correlate with BMI, and vice versa. Interpretability techniques will be adopted in future studies to understand this redundancy. R3/Q7-1: In an ideal analysis, one would hold the constant backbone and compare across single-task, multi-task, and our self-propagative multi-task learning. Table 3 shows that the single task learning approach with ConvNeXt backbone presents lower performance than the prior state-of-the-art methods which utilize older architectures (Table 2). This proves that the proposed SePL benefits from self-propagative multi-task learning, not from the recently proposed backbone model. R3/Q7-2: We recognize its importance and will do this in our future work. We will also clarify the consistency of improvements: SePL is never worse than the baselines on any CMR factor in our experiments, which gives us some confidence in the robustness of the improvement. R3/Q7-3: We understand the importance of statistical validation, and this will be prioritized in our future work. In the proposed SePL, the self-propagation architecture further boosts performance on top of MTL. The consistent improvements are observed across all CMR predictions. R3/Q10-2: Simpler models (MobileNet ~4M params, DenseNet ~20M params) can indeed match or outperform larger models (ConvNeXt ~28M) on relatively easy prediction tasks, such as anthropometric feature estimation. ConvNeXt also has a larger receptive field and more sophisticated architectural components tailored for complex visual patterns, might initially struggle or underperform on these simple tasks. R3/10-3: We appreciate the forward-looking suggestions. In future work, we plan to explore such multimodal integration. The cited papers will be an excellent guide. R1/7-2,R2/7-2/12-1,R3/7-4/10-1: In a routine clinical workflow, it would indeed be readily available. Our approach is not meant to imply that one should do this in practice. We envision the scenario where a simply obtained fundus photo (e.g., portable camera) is analyzed to estimate risk profile for cardiometabolic diseases without any additional tests. The model would output all factors, and this will be beneficial for telemedicine and remote screening applications. To this end, our goal is to enable a model to perform comprehensive risk estimation from the image alone, allowing us to benefit from multi-modal cues without requiring multi-modal inputs. It is necessary to report how well our model predicts the anthropometric factors, both to verify the effectiveness of the shared feature extractor for those tasks and to ensure that the propagated prior knowledge is accurate. We will present in-depth comparisons in future extensions of our work.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Self-Propagative Multi-Task Learning for Predicting Cardiometabolic Risk Factors

Author(s):