List of Papers Browse by Subject Areas Author List
Abstract
Surgical phase recognition plays a crucial role in surgical workflow analysis, enabling various applications such as surgical monitoring, skill assessment, and workflow optimization. Despite significant advancements in deep learning–based surgical phase recognition, these models remain inherently opaque, making it difficult to understand how they make decisions. This lack of interpretability hinders trust and makes it challenging to debug the model. To address this challenge, we propose SurgX, a novel concept-based explanation framework that enhances the interpretability of surgical phase recognition models by associating neurons with relevant concepts. In this paper, we introduce the process of selecting representative example sequences for neurons, constructing a concept set tailored to the surgical video dataset, associating neurons with concepts and identifying neurons crucial for predictions. Through extensive experiments on two surgical phase recognition models, we validate our method and analyze the explanation for prediction. This highlights the potential of our method in explaining surgical phase recognition. The code is available at https://github.com/ailab-kyunghee/SurgX.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2547_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/ailab-kyunghee/SurgX
Link to the Dataset(s)
Cholec80 dataset: https://camma.unistra.fr/datasets/
BibTex
@InProceedings{KimKa_SurgX_MICCAI2025,
author = { Kim, Ka Young and Kim, Hyeon Bae and Kim, Seong Tae},
title = { { SurgX: Neuron-Concept Association for Explainable Surgical Phase Recognition } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15969},
month = {September},
page = {531 -- 541}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper presents a concept-based explanation framework for surgical phase recognition. The authors aim to enhance the interpretability of existing surgical phase recognition models by mapping their internal neuron activations to domain-specific, human-understandable concepts.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper addresses a critical challenge in the field - explainability and interpretability of deep learning models in surgical phase recognition.
- It is well-structured and clearly written, making it easy to follow the proposed methodology.
- The authors make a commendable effort to include quantitative evaluations of explainability, rather than relying solely on qualitative results, which is a common limitation in this kind of work.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- A key assumption is that the visual features extracted by the Visual Encoder are aligned with those learned by the phase recognition models. However, the experiments do not clearly validate this assumption.
- The framework appears to focus on positively correlated neuron activations. It remains unclear whether neurons with strong negative activations, which could also provide meaningful explanations, are considered.
- The method for handling ties in representative sequence selection and neuron contributions is not discussed. Clarification on how such ties are resolved would improve transparency.
- The accuracies of the underlying phase recognition models are not reported. Including these would provide valuable context for interpreting the results.
- The cosine similarity values reported across experiments are relatively modest (<0.7). An analysis or discussion of potential reasons behind these values and their implications would strengthen the paper.
- Further details are needed on how the threshold for neuron-concept annotation was chosen and computed.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The method is interesting and addresses a key problem in CAI. However, one of the main assumptions needs to be tested. Also, clarifications and intuitions for the quantitative results should be provided. Therefore, a chance for rebuttal should be provided.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
One of my main concerns was the potential mismatch between the feature spaces of the visual encoder and the phase-recognition model: without explicit alignment, how can we trust that a concept tied to the encoder’s embedding truly corresponds to a given neuron of the phase recognition model? The authors address this convincingly in their rebuttal. By operating directly on the video frames and ensuring a sufficiently diverse frame set, they argue that a neuron’s activation pattern remains reliably linked to meaningful visual concepts. Their ablation study, which compares several frame-selection strategies, further supports this point: strategies that draw on a broader, more varied frame set yield clearer neuron–concept matches. As my major concerns are addressed in the rebuttal, I would recommend an accept. However, I would urge the authors to include these justifications in the camera ready version of the paper for better comprehension.
Review #2
- Please describe the contribution of the paper
The paper introduces SurgX, a concept-based explanation framework designed to enhance the interpretability of surgical phase recognition models. By associating neurons in deep learning models with human-understandable surgical concepts, SurgX provides explanations for model predictions. This is the first work to apply concept-based explainability to surgical phase recognition, addressing the “black-box” nature of deep learning models in this domain. The framework includes methods for constructing surgical-specific concept sets, selecting representative neuron sequences, and identifying influential neurons for predictions. The approach is validated on two models (Causal ASFormer and TeCNO) using the public dataset, showing its ability in explaining both correct and incorrect predictions.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
New idea in concept-based explanation framework for surgical phase recognition. While concept-based explainability has been explored in image-based models, this is the first adaptation for surgical video models, which involve temporal dependencies and clinical specificity.
Addresses a critical gap in surgical AI, where interpretability is essential for regulatory compliance and clinical trust.
Specialized concept sets: Introduces three tailored concept sets (CholecT45-W, CholecT45-S, ChoLec-270) for cholecystectomy, leveraging surgical action triplets and lecture-derived terminology.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Evaluation metrics of Concept Alignment Score and Prediction Interpretability Score are based on cosine similarity in SurgVLP’s embedding space. This assumes SurgVLP’s embeddings perfectly align with clinical semantics, which may not hold.
Paper lacks deeper discussion on the experimental results. Especially, the analysis of incorrect prediction. For example, the description of “In the right example of (b), the neuron which is annotated with ‘hepatocystic triangle’ and ‘cystic artery is isolated between clips’ led to misprediction of ‘(3) Clipping and Cutting’.” Authors should further elaborate on whether the ‘hepatocystic triangle’ and ‘cystic artery is isolated between clips’ are indeed very easily mistaken from the frames, and explain the connection among the concept sets, image frames and the ground-ture. It can further show the effectiveness of the interoperability of the method and the clinical meaningfulness.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Authors should indicate whether they will release the datasets and code, which will be considered as important contributions.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The overall idea of the method is considered novel and inspiring. It would be better if authors can address the major weaknesses of the paper.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This paper introduces SurgX, a novel concept-based explanation framework specifically designed for surgical phase recognition models, which is the first study to apply concept-based interpretability to this domain. SurgX offers a new lens to understand model decision-making in surgical video analysis. The paper presents a method for constructing specialized concept sets tailored to a cholecystectomy dataset, enabling more meaningful associations between learned concepts and neural activations. Finally, SurgX is validated on two state-of-the-art models, demonstrating that the concept-neuron associations it identifies are both interpretable and valuable for explaining model predictions.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
A key strength of this work lies in its novel approach to enhancing the interpretability of surgical phase recognition models through concept-based explanations. SurgX provides a method to systematically identify neurons that contribute to model predictions and associating them with medically relevant concepts.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
While the paper presents a compelling framework for interpreting surgical phase recognition models, there are several areas that could be improved. First, the motivation for associating concepts with individual neurons requires further justification. It is unclear why single-neuron associations are assumed, especially when deep learning models typically rely on distributed representations and multiple neurons normally contribute to predictions. Additionally, intuitively, one might expect the most activated neurons to be the most relevant, yet this is not fully explained. Another limitation is the choice of evaluation models. Both ASFormer (2021) and TeCNO (2020) are relatively old. It limits the significance and generalizability of the results to newer architectures. VLMs are increasingly explored for video recognition tasks. Finally, the paper does not report the performance of the two models on the Cholec80 dataset, making it difficult to assess the effectiveness of the proposed SurgX method.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Are only neurons in the penultimate layers annotated?
- How are the example frames selected in Section 2.2? Are they randomly sampled or manually selected?
- In figure 3, why only identifying the best-contributing neuron? Shouldn’t model predictions be attributed to some other neurons as well? If only identifying one best-contributing neuron, wouldn’t it be the neuron with the highest value after activation?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While the proposed SurgX framework introduces a new method for improving interpretability in surgical phase recognition, the method lacks some justifications and clarity in several areas. The rationale for associating concepts with single neurons is underexplained, especially given the distributed nature of neural representations. Additionally, the two models from 2020 and 2021 limit the broader relevance of the findings, particularly as newer architectures and VLMs are gaining traction in video understanding.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We appreciate all reviewers for their valuable comments and constructive feedback. The reviewers recognized that our concept-based explanation framework for surgical phase recognition, SurgX, is the first (R1), novel (R3), and inspiring (R1), addressing a key challenge in surgical AI (R2). We also appreciate that R2 acknowledged our efforts to quantitatively evaluate SurgX. We will address all comments in the camera-ready version and release codes and concept sets.
[R2,R3] Surgical Phase Recognition Models Most phase recognition models use TCN or Transformer architecture. To demonstrate SurgX’s generality, we selected TCN-based TeCNO and Transformer-based Causal ASFormer. Causal ASFormer combined with LoViT [15] features achieved competitive accuracy among models with publicly available code. The accuracy of the model was 91.22 on Cholec80.
[R2] Alignment Assumption Clarification R2 noted that we assumed the Visual Encoder’s features align with features learned by the phase recognition models. However, SurgX does not make this assumption. Instead, we select frames that strongly activate specific neurons, then extracts features using SurgVLP’s Visual Encoder for neuron-concept annotation, avoiding embedding misalignment between the models.
[R3] Associating with Individual Neurons SurgX is inspired by [1,3,9,12,11] that successfully explained models using single-neuron associations. Multiple neurons can share a concept, but associating concepts with individual neurons still accounts for this. Although Fig.3 shows only the best-contributing neuron due to space limitations, multiple neurons sharing the same concept can be selected as Important neurons.
[R3] Relevance Definition While R3 noted that “one might expect the most activated neurons to be the most relevant, yet this is not fully explained”, SurgX defines relevance using the Shapley value-based contribution score (Eq.2) [2,10], rather than activations.
[R3] Annotation Layer Selection While SurgX can be applied to all layers, we specifically chose the penultimate layer because, according to [3], neurons closer to the final layer tend to learn high-level concepts.
[R2] Focusing on Positive Neuron Activations SurgX uses the ReLU activations, so negative activations do not exist. Since ReLU activations close to 0 are interpreted as inactivity rather than negative activity, we focused only on positive activations.
[R2] Neuron-Concept Annotation Threshold Following [12], the adaptive threshold for the i-th neuron in the l-th layer is calculated as: T(l,i) = max(A(l,i)) - (1 - α) × (max(A(l,i)) - min(A(l,i))), A denotes the activations.
[R2] Ties Since all sequences and neurons exceeding the threshold are selected, tie issues do not arise.
[R3] Example Frames Selection For each neuron, we automatically select frames exceeding the activation threshold and the preceding N frames at intervals of K.
[R1] Reliability of Evaluation Metrics Concept Alignment and Prediction Interpretability Score are calculated as the cosine similarity between concepts and phase texts, thereby avoiding the primary misalignment issue between cross-modalities inherent to VLMs. Additionally, for reliability, we average word- and sentence-level similarities.
[R2] Analysis of Cosine Similarity R2 noted that our reported cosine-similarity seems modest. While it ranges from (-1,1), distributions can vary across VLMs and concept sets. On ChoLec-270, the similarity distribution has mean 0.19, std 0.20; a score of 0.6 lies in the top 2.27%.
[R1] Incorrect Prediction Analysis When the GT is “Gallbladder Dissection” but mispredicted as “Clipping and Cutting”, 88.22% of these cases involve neurons associated with the concept “cystic artery is isolated between clips.” In contrast, when correctly predicted, 92.88% of cases do not involve these neurons. In Fig. 3b, the presence of visible clips leads the model to detect the concept “cystic artery is isolated between clips,” resulting in the misprediction.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
Please consider R2’s comment about adding the rebuttal justifications in the camera-ready version, which is an important clarification for the reader.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A