Abstract

Since the COVID-19 outbreak, global health systems have faced unprecedented challenges, with mechanical ventilation playing a critical role in supporting patients in ICUs. However, precise adjustment of ventilation parameters remains complex, requiring continuous monitoring and personalized interventions by clinicians. This paper introduces a novel formulation of ventilator parameter adjustment as a composite problem involving optimal stopping and subsequent decision optimization, supported by a domain-specific dataset reflecting real-world scenarios. We propose a framework utilizing Large Language Models (LLMs) to enhance interactivity and interpretability, leveraging their extensive clinical knowledge from large text corpora for informed decision-making. The framework addresses two key tasks: developing scheduled prompts for optimal stopping to replicate clinical observation processes and implementing Best Action Imitation Learning for robust ventilator parameter optimization. Experimental results show significant improvements in LLMs’ ability to predict optimal stopping points and optimize decision-making, advancing clinical ventilator control. To our knowledge, this is the first application of LLMs to this dual-task paradigm.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0932_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HaoTeq_MVPLLMs_MICCAI2025,
        author = { Hao, Teqi and Tan, Xiaoyu and Li, Bin and Wang, Xuemin and Qu, Chao and Xu, Yinghui and Qiu, Xihe},
        title = { { MVP-LLMs: Optimizing Intervention Timing and Subsequent Decision Support for Mechanical Ventilation Parameter Control Using Large Language Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {457 -- 466}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a novel formulation of ventilator parameter adjustment as a composite problem involving optimal stopping and subsequent decision optimization, supported by a domain-specific dataset reflecting real-world scenarios.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    framework utilizing Large Language Models (LLMs) to enhance interactivity and interpretability, leveraging their extensive clinical knowledge from large text corpora for informed decision-making. The framework addresses two key tasks: developing scheduled prompts for optimal stopping to replicate clinical observation processes and implementing Best Action Imitation Learning for robust ventilator parameter optimization.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper is just applying existing methods with no solid contributions.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (1) Strong Reject — must be rejected due to major flaws

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is just applying existing methods with no solid contributions.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The authors failed to provide satisfactory response to my previous comments. Hence, I recommend rejection.



Review #2

  • Please describe the contribution of the paper
    1. Novel approach for introducing LLMs in Ventilation Parameter Optimization - The paper has presented a pioneering application of Large Language Models to control mechanical ventilation parameter which can be used to aid the medical professionals making critical decisions.
    2. Evaluation on specialized dataset - The paper presents a dataset built with the help of professionals using multiple ventilation parameter datapoints over a period of time.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper discusses a method to include history of patient treatment to adjust the ventilation parameters with the help of sequential decision making process
    2. The proximity to the stopping point using scheduled prompts is well thought of and explained, suggesting that the method will be efficient to understand and predict stopping point.
    3. Experimental evaluation - The paper has presented internal baselines and strong improvements in the results using the proposed framework
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Clarity of the methodology - the proposed approach is clearly novel but it could have more impact if the framework is explained more in detail. The current description lacks sufficient detail in explaining the use of LLMs in the proposed framework and is hard to follow.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. This study can be a pioneer for leveraging LLMs in mechanical ventilation parameter optimization and help focus on the use of AI in this domain.
    2. Lack of clarity - the paper can be strengthened if more detailed description of parameter prediction framework is provided.
    3. Analysis - The paper has presented that Qwen1.5-14b has shown promising results for optimal stopping task, but for predicting ventilation parameters, the model doesn’t have that high performance compared to other techniques. It would significantly strengthen the paper if a discussion on this discrepancy is included.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    All my comments are addressed. I have no further questions



Review #3

  • Please describe the contribution of the paper

    This paper proposes a framework named MVP-LLMs, which leverages large language models (LLMs) to optimize two critical tasks in mechanical ventilation management: identifying the optimal stopping point and providing subsequent decision support. By formulating clinicians’ decision-making logic as an optimal stopping and policy optimization problem, and collecting a clinical dataset, the authors demonstrate the potential of LLMs to emulate ICU physician workflows. Experimental results show that the use of scheduled prompts and chain-of-thought (CoT) techniques significantly enhances model performance in both stopping point identification and parameter adjustment tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work innovatively formulates mechanical ventilation adjustment as a two-stage process—optimal stopping and decision-making—better aligning with real-world clinical practice.
    2. A clinically validated ICU mechanical ventilation dataset was collected, with parameters derived from expert physician experience, ensuring both representativeness and practical relevance.
    3. Reinforcement learning and chain-of-thought (CoT) methods are employed to effectively enhance the LLM’s ability to predict optimal stopping points while improving clinical interpretability to some extent.
    4. Comprehensive baseline comparisons and ablation studies confirm the effectiveness of incorporating CoT reasoning and historical data.
    5. This study addresses the critical challenge of ventilator parameter adjustment in the ICU, offering substantial clinical value.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The dataset comprises only 165 patients, with limited size and diversity, which may hinder the generalizability of the model.
    2. The reinforcement learning approach relies on a Gaussian-based assumption, which may not fully capture the complexity of real-world clinical scenarios, potentially compromising reliability and interpretability.
    3. The absence of external validation data limits the ability to assess the robustness of the proposed method.
    4. While the method incorporates historical data as a key advantage over traditional approaches, it lacks direct comparison with conventional strategies and other sequential learning models, such as those based on RNNs or Transformers.
    5. In baseline comparisons and ablation studies, the 1.5B and 14B models are based on Qwen2/Qwen1.5, while the 8B model is LLaMA3; differences in model architectures may also contribute to performance variation.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This study demonstrates strong methodological innovation and experimental design, offering valuable insights into the application of AI in clinical decision-making. Despite limitations in data scale and clinical validation, the proposed framework and results represent a significant academic contribution.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    accept




Author Feedback

We sincerely thank the reviewers for their valuable comments and constructive suggestions. Below, we provide point-by-point responses.

  1. Clarity of the methodology and use of LLMs: We appreciate the reviewer’s recognition of the novelty of our dual-stage formulation. To clarify: MVP-LLMs reformulates ventilator adjustment as a sequential decision-making task, consisting of (i) optimal stopping and (ii) parameter adjustment. We leverage LLMs not as black-box predictors, but as structured decision agents using scheduled prompts and Chain-of-Thought (CoT) reasoning to simulate clinical thinking. Specifically, at each step, LLMs receive historical trajectory data and generate both stopping decisions and action recommendations. This process is detailed in Figures 2–3 and Equations (2)–(5). We will further enhance the description with diagrams and concrete examples in the revised version.

  2. Dataset size and diversity: We acknowledge the relatively small cohort (165 patients). However, each patient contributes extended multi-step ventilation records, resulting in 3,000+ sub-trajectories with observation-action transitions. This provides sufficient granularity for training and evaluating sequential strategies. Moreover, all features are selected and verified by ICU professionals to ensure medical validity. We apply regularization, early stopping, and patient-level cross-validation to improve generalization. Expanding the dataset to additional institutions is ongoing work.

  3. Gaussian-based reward assumption: The Gaussian reward curve in Eq. (2) was chosen to approximate positional relevance in stopping decisions, reflecting clinical heuristics (i.e., physicians often intervene near peak response points). While it simplifies the reward space, our empirical results (e.g., Table 2) validate its effectiveness. Scheduled prompts derived from this reward formulation allow LLMs to better localize temporal signals and produce interpretable output. We agree this assumption may not capture all real-world nuances and will discuss alternatives (e.g., learned rewards or mixture models) in the revision.

  4. Absence of external validation: Due to privacy constraints, our study is based on a single-institution dataset. Nonetheless, we mitigate overfitting via stratified patient-level 5-fold cross-validation and reward shaping. All CoT trajectories are reviewed by ICU physicians to ensure clinical reliability. We have initiated external data acquisition through institutional partnerships and plan to report validation results in future work. To further support reproducibility, we also plan to release a portion of the non-sensitive data upon publication.

  5. Comparison with traditional sequential models: We fully agree that comparing with RNN or Transformer baselines can enhance experimental rigor. However, the goal of this work is to explore the interpretability and decision rationality enabled by LLMs’ language capabilities—something conventional models lack by design. Our focus was on using natural language to reflect clinicians’ observation-reasoning-action cycle, rather than purely optimizing numerical performance. Nevertheless, we value this suggestion and will include RNN/Transformer-based baselines in the final version to provide a more comprehensive view of modeling trade-offs.

  6. Architecture inconsistency across models: We appreciate the reviewer’s concern. Our intent was not to isolate model scaling within one architecture, but to explore whether our method generalizes across diverse LLM families (Qwen and LLaMA). Comparisons are made within each model family to ensure fairness (e.g., MVP vs. CoT Prompt in Qwen2-1.5B). Notably, MVP-LLMs consistently outperform corresponding baselines across all sizes. We will clarify this motivation and include additional ablations using Qwen2-only variants in the opensource project.




Meta-Review

Meta-review #1

  • Your recommendation

    ; Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    ; The authors should clarify the points raised by Reviewer 1 and Reviewer 3 in their rebuttal.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept;

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the application is impactful and the conceptual novelty is appreciated, I must note that the methodological section currently lacks the rigor and detail expected for full academic scrutiny. Nonetheless, I remain interested in how the work will evolve.



back to top