Abstract

Accurately recognizing surgical action triplets in surgical videos is crucial for advancing context-aware systems that deliver real-time feedback, thereby enhancing surgical safety and efficiency. However, recognizing surgical action triplets <instrument, verb, target> is challenging due to subtle variations, complex interdependencies, and severe class imbalance. Most existing approaches focus on individual triplet components while overlooking their interdependencies and the inherent class imbalance in triplet distributions. To address these challenges, we propose a novel framework, Curriculum Contrastive learning with feature Mixup (CurConMix). During pre-training, we employ curriculum contrastive learning, which progressively captures relationships among triplet components and distinguishes fine-grained variations through hard pair sampling and synthetic hard negative generation. In the fine-tuning stage, we further refine the model using self-distillation and mixup strategies to alleviate class imbalance. We evaluate our framework on the CholecT45 dataset using 5-fold cross-validation. Experimental results demonstrate that our approach surpasses existing methods across various model sizes and input resolutions. Moreover, our findings underscore the importance of capturing interdependency among triplet components, highlighting the effectiveness of our proposed framework in addressing key challenges in surgical action recognition. The official implementation is available at https://github.com/MIDAS-SurgAI/CurConMix.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0872_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MIDAS-SurgAI/CurConMix

Link to the Dataset(s)

N/A

BibTex

@InProceedings{JeoYon_CurConMix_MICCAI2025,
        author = { Jeon, Yongjun and Shin, Jongmin and Park, Seonmin and Kim, Bogeun and Park, Kanggil and Oh, Namkee and Jung, Kyu-Hwan},
        title = { { CurConMix: A Curriculum Contrastive Learning Framework for Enhancing Surgical Action Triplet Recognition } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {150 -- 159}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work uses curriculum learning for the first time for surgical triplet recognition. The proposed CurConMix framework analyses the differences between triplets through contrastive learning and solves the problem of limited training samples and class imbalance using the hard pair sampling strategy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work subtly introduces curriculum learning into a surgical triplet recognition task to model the interrelationships between instances of triplets.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. What is the training cost of curriculum learning from simple to complex tasks?
    2. The authors mention that hard negative and hard positive samples are determined for each instance based on pre-calculated similarity scores and labeling information. But in Fig. 1(a), I don’t see that the positive and negative sample pairs are determined based on the instrument and target values. So I am confused about this part of the operation.
    3. The authors mention the use of self-distillation and mixup techniques in the fine-tuning stage to cope with the class imbalance problem. However, Section 2.2 only mentions the mixup technique while ignoring the introduction of self-distillation. In addition, the authors do not provide the total loss function for model training, which makes the overall constraints of the framework not very clear.
    4. Continuing the above issue, the authors mention that the fine-tuning phase solves the class imbalance problem, but the experimental part does not reflect the relevant ablation study on the improvement of rare triplet recognition. I think this fine-grained ablation study should be included to demonstrate the robustness of the proposed method for mitigating class imbalance.
    5. Although the authors re-trained and tested TERL [1] using the official code, there is a significant drop in the results compared to those in the original paper. I prefer to believe the results provided in the original paper. In this regard, the results of CurConMix-Ens on the APivt metric (40.7) are not significantly superior compared to TERL-Ens (40.4). [1]. Tail-enhanced representation learning for surgical triplet recognition. MICCAI, 2024.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work innovatively introduces curriculum learning on the task of surgical triplet recognition. But, there is a lack of clarity in the description of the training strategy and loss function. In addition, there is a lack of ablation experiments for category imbalance. Further, the experimental results do not have a clear advantage.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents a sophisticated pipeline for surgical triplet recognition in cholecystectomy procedures that advances beyond current state-of-the-art approaches. The authors enhance model optimization through an innovative contrastive learning pipeline featuring three key contributions: a hard-negative mining strategy, self-distillation mechanisms, and mixup techniques. The experimental evaluation is methodically structured, comparing their proposed model against established state-of-the-art approaches using a recognized dataset and standard evaluation metrics. This comparative analysis is complemented by a comprehensive ablation study that systematically demonstrates the individual impact of each proposed optimization component. This thorough experimental design effectively validates the merit of each contribution while establishing the overall superiority of their integrated approach in the context of surgical triplet recognition tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Although contrastive learning and the optimization strategies employed are not novel in themselves, the authors skillfully in integrate these techniques into a triplet recognition pipeline. They provide clear explanations and compelling justifications for each methodological inclusion. The clarity of presentation and logical progression of ideas makes the paper engaging and accessible, providing readers with a comprehensive understanding of the technical contributions while maintaining a pleasant reading experience.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Although the paper is well organized and clear and the integration of the key contributions into the state of the art pipeline is nice, I believe that the overall significance of the contribution appears relatively modest.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The contribution of this paper is minor, as it is related to a few simple additions to the optimization pipeline of a non-modified model architecture

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper
    1. A curriculum contrastive learning approach that progressively models triplet component relationships (target → instrument+target → full triplet)

    2. Hard pair sampling with synthetic negative generation to enhance feature discrimination while addressing limited data

    3. Effective class imbalance mitigation through self-distillation and mixup techniques during fine-tuning

    4. State-of-the-art performance on CholecT45 dataset across multiple model configurations​​​​​​​​​​​​​​​​

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Systematic approach to modeling triplet relationships through curriculum progression

    2. Effective class imbalance mitigation strategy, particularly for minority classes

    3. Versatile performance improvements across different model architectures

    4. Comprehensive empirical validation through ablation studies​​​​​​​​​​​​​​​​

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. introduction that abruptly introduces the concept of triplet representation without sufficient context

    2. The statement that “surgical actions are commonly represented as triplets” appears without proper justification, making it too abrupt for readers

    3. Inadequate explanation of why interdependency modeling is critical compared to independent component recognition

    4. Abstract and introduction sections lack polish and clear flow between concepts​​​​​​​​​​​​​​​​

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results show that the proposed framework improves the triplet detection compared to baseline results. The curriculum contrastive concept is an intuitive approach to model the interdependency stepwise. Even thought the features are not modeled temporally it performs very well.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank the reviewers for their thoughtful evaluation and acceptance of our paper. We deeply appreciate the constructive feedback and are encouraged by the recognition of our methodological contributions and experimental rigor. All reviewers highlighted the methodological innovation of our work. Our paper presents a comprehensive pipeline for Surgical Action Triplet Recognition in laparoscopic cholecystectomy procedures, outperforming existing state-of-the-art methods (R1, R2, R3). Reviewer noted that our work is the first to introduce curriculum learning for this task(R1), and effectiveness in modeling interdependencies between triplet components(R1, R2). Additionally, our integration of contrastive learning and hard pair sampling with synthetic hard negatives was recognized for enhancing feature discrimination and addressing limited data and class imbalance (R1, R2, R3). Our fine-tuning strategy using self-distillation and mixup was acknowledged as an effective approach for mitigating class imbalance (R2). Reviewer 3 appreciated the clear explanation and solid justification of each methodological component. The reviewers also found our empirical validation convincing and comprehensive. Our framework achieved state-of-the-art performance across various architectures on the CholecT45 dataset (R1, R2, R3), and demonstrated robust performance even without temporal modeling (R2). Reviewer 3 highlighted our ablation studies as evidence of the framework’s robustness and well-structured experimental design. We are also grateful for the positive comments on the clarity and organization of the manuscript. Reviewer 3 noted that the logical flow and presentation made our technical contributions easy to follow. We would like to take this opportunity to clarify several points raised during the review: Curriculum design and training cost [R1]: As described in Section 2.1, our curriculum learning follows a clinically and empirically grounded sequence— → <instrument + target>→ full triplet—based on [1], where 〈target〉 consistently shows the lowest accuracy. This order reflects real-world clinical reasoning and yielded the best performance among tested alternatives. Clarification of Fig. 1(a) [R1]: The anchor sample is shown centrally, and positive/negative pairs are constructed based on label per curriculum stage. For example, in the stage, anchors with “gallbladder” are paired with samples sharing the same target as positives, and with different targets (e.g., “adhesion”) as negatives. Self-distillation and loss function [R1]: We acknowledge that the paper does not fully describe the self-distillation component or the total loss formulation. We will clarify these details in the revised manuscript. TERL performance discrepancy [R1]: We reproduced TERL using the official codebase. While slight performance variation may stem from differences in GPU architectures, it is worth noting that the original TERL paper does not specify detailed hardware settings, which may affect exact reproducibility. Nevertheless, CurConMix consistently outperforms all baselines across backbones, demonstrating its robustness. Writing clarity [R2]: We will revise the manuscript to improve flow and readability. Once again, we thank the reviewers for their valuable insights. [1] Nwoye, C.I., Alapatt, D., Yu, T., Vardazaryan, A., Xia, F., Zhao, Z., Xia, T., Jia, F., Yang, Y., Wang, H., et al.: Cholectriplet2021: A benchmark challenge for surgical action triplet recognition. Medical Image Analysis 86, 102803 (2023)




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top