Abstract

Artificial intelligence (AI) has shown great potential in medical imaging, yet its adoption in veterinary medicine remains limited due to data scarcity and anatomical complexity. This study introduces a novel transformer-based edge representation learning network for verifying rotated vertebral bodies in canine thoracic X-ray images. The proposed method integrates a localization module to identify the spinous process, a transformer encoder for global feature extraction using a self-attention mechanism, and an edge encoder to enhance feature extraction of finegrained details, improving classification performance. Experimental results demonstrate that our method achieves superior accuracy, precision, and recall, outperforming state-of-the-art (SOTA) methods with a classification accuracy of 0.7838. Furthermore, the ablation study confirms that including the proposed encoders significantly impacts performance, demonstrating their effectiveness in improving classification accuracy. These findings highlight the importance of multi-scale feature extraction in veterinary imaging and suggest that EdgeANet can be a valuable tool for AI-assisted X-ray verification in veterinary and human medical applications.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1525_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LeeIn_EdgeANet_MICCAI2025,
        author = { Lee, In-Gyu and Oh, Jun-Young and Choi, Hyewon and Kam, Tae-Eui and Lee, Namsoon and Hyun, Sang-Hwan and Lee, Euijong and Jeong, Ji-Hoon},
        title = { { EdgeANet: A Transformer-based Edge Representation Learning Network for Canine X-ray Verification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {188 -- 198}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    While artificial intelligence (AI) has made significant strides in human medical imaging, its application in veterinary medicine remains relatively limited. This study attempts to bridge that gap by exploring the use of AI for detecting vertebral rotation in canine thoracic X-rays—a task complicated by anatomical variability and scarce labeled data. Although the challenges remain substantial, this work represents a meaningful step toward introducing AI into real-world veterinary diagnostic settings. As such, it contributes to broadening the scope of AI research into less-studied domains like pet healthcare.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a well-organized pipeline that brings together existing components in a cohesive manner tailored for the task. YOLOv10 is employed in the localization stage to efficiently identify the spinous process, leveraging its fast and accurate detection capabilities. A transformer encoder is then used to extract global contextual features, allowing the model to capture the overall anatomical structure. At the same time, Canny edge detection is applied to highlight local structural details, which are further encoded through a dedicated edge encoder. These two complementary feature types—global and edge-level—are fused to inform the final classification. While the individual components are not novel, the integration is thoughtfully designed to address the challenges of spinous process verification in X-ray images.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    One major limitation of the study lies in the relatively small dataset size, with only 190 thoracic X-ray images across multiple species. Such a limited sample may be insufficient for training deep learning models effectively, particularly for tasks involving fine-grained anatomical classification. In addition, due to the challenges of accurately localizing the spinous process—especially in low-contrast and overlapping anatomical regions—the dataset required manual annotation by radiology specialists. The manual labeling step also introduces potential variability, depending on the annotator’s interpretation and experience. These factors highlight the need for larger-scale datasets and more robust, automated labeling techniques in future work.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    ​The authors show that their proposed model outperforms state-of-the-art (SOTA) models; however, they have not conducted comparative experiments with recent hybrid models such as CoAtNet and Hybrid-ViT. Given that these models combine the strengths of Convolutional Neural Networks (CNNs) and Transformers, it is essential to evaluate the proposed model against them to comprehensively assess its performance. For instance, CoAtNet has demonstrated superior accuracy by effectively integrating convolution and attention mechanisms . Similarly, Hybrid-ViT models have shown promising results in various image classification tasks . Therefore, incorporating such hybrid models into the comparative analysis would provide a more robust validation of the proposed model’s efficacy.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper contributes to expanding AI research into less-studied domains like pet healthcare, it has notable limitations. The dataset comprises only 190 thoracic X-ray images from various species, which is relatively small for training deep learning models effectively. This limited data volume may hinder the model’s ability to generalize well to unseen cases. Additionally, the approach involves integrating existing components—YOLOv10 for localization, a transformer encoder for global feature extraction, and an edge encoder utilizing Canny edge detection for fine-grained feature extraction. While this integration is well-executed, it does not introduce novel methodologies. Furthermore, the study lacks comparative experiments with recent hybrid models that combine Convolutional Neural Networks (CNNs) and Transformers, such as CoAtNet and Hybrid-ViT. These models have demonstrated superior performance by effectively merging convolutional and attention mechanisms. Evaluating the proposed model against these hybrids would provide a more comprehensive assessment of its performance.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    While the dataset size is limited and the proposed model primarily integrates existing components—hence offering limited methodological novelty—I acknowledge that the application of AI to veterinary imaging, a clinically meaningful yet underexplored domain, represents a valuable extension of research boundaries. However, the authors’ effort to ensure annotation reliability was particularly impressive. The labeling process involved eight veterinarians with 1–5 years of clinical experience and 1–2 years of specialization in veterinary radiology, serving as a well-considered approach to enhance objectivity and reduce bias. Especially in the veterinary field, where high-quality annotated data remains scarce, this kind of initiative could stimulate future research and dataset development, and help broaden the application scope of the MICCAI community.



Review #2

  • Please describe the contribution of the paper

    The manuscript introduces EdgANet, a novel transformer-based edge representation network for vertebra classification in canine X-rays. It combines YOLOv10 for localization, a transformer encoder for global features, and Canny edge detection for fine-grained details, outperforming other deep learning models.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel Transformer-Based Edge Representation Approach: The introduction of EdgANet, which fuses global features from a transformer encoder with fine-grained edge features from Canny edge detection, is a unique and promising approach for vertebra classification in canine X-rays.

    2. Multi-Stage Feature Extraction: The combination of YOLOv10 for localization, a transformer encoder for global feature extraction, and edge-based fine-grained features adds a multi-scale feature representation, which can improve classification accuracy.

    3. Comprehensive Benchmarking: The study compares EdgANet with multiple state-of-the-art deep learning models, including Swin Transformer, ResNet, DenseNet, EfficientNet, and ConvNeXt, demonstrating its superiority in classification performance.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Small Dataset Size: The dataset consists of only 90 abnormal and 100 normal images, which is extremely limited for training deep learning models, especially transformers that require large-scale training data. There is no discussion on data augmentation, which could have improved generalization. The authors should justify how they mitigated the risk of overfitting with such a small dataset.

    2. Lack of Justification for Using Canny Edge Detection: The study only employs Canny edge detection for fine-grained feature extraction, without exploring other edge detection techniques such as Sobel, Laplacian, or structured edge detection methods. A justification for choosing Canny edge detection over other methods is necessary. Additionally, an ablation study comparing different edge detection techniques could strengthen the paper.

    3. Potential Bias in Data & Evaluation: There is no mention of how the dataset was collected, annotated, and balanced. If the images are from a single source, the model may not generalize well to other clinical settings. The authors should include cross-validation or external validation on an independent dataset to assess model robustness.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors that influenced my overall score for this paper are its novelty, methodological soundness, and thorough experimental validation. The proposed EdgANet presents a unique transformer-based edge representation approach that effectively combines global contextual information from a transformer encoder with localized fine-grained features via Canny edge detection. This hybrid design is particularly well-suited for challenging tasks such as vertebra classification in canine X-rays, where both global structure and local details are critical. Additionally, the multi-stage feature extraction pipeline—incorporating YOLOv10 for localization, transformers for semantic understanding, and edge-based refinement—demonstrates a well-thought-out integration of complementary techniques. The authors also provide a comprehensive comparative analysis against several state-of-the-art models (e.g., Swin Transformer, ResNet, DenseNet, EfficientNet, ConvNeXt), showcasing the superiority of their proposed method across multiple evaluation metrics. This solid benchmarking supports the robustness and practical relevance of the work. Overall, the combination of methodological innovation, practical application, and strong empirical results justifies a positive recommendation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The primary contribution of this work is EdgeANet, a novel architecture that fuses transformer-based global feature learning and edge-based fine-grained structural representation to detect rotated vertebral bodies in canine thoracic X-rays. The paper is positioned in the underexplored area of veterinary radiology and demonstrates how combining localization (YOLOv10), vision transformer encoders, and Canny-based edge encoding can yield significant improvements over conventional CNN and transformer models.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The integration of a transformer encoder with an edge-focused module is a thoughtful approach, especially for fine structural variations in radiographic images.

    The task of X-ray quality verification, particularly identifying rotational errors, is meaningful in veterinary medicine where such quality control is crucial. Veterinary imaging is less explored in MICCAI literature. The paper addresses a real diagnostic challenge.

    The model outperforms strong baselines such as Swin Transformer, ConvNeXt, and EfficientNet across accuracy, precision, recall, and F1-score. The ablation study further confirms the effectiveness of each module.

    The task is well defined, with sufficient context on why such quality verification is necessary in veterinary practice.

    Using YOLO for localization and visualizing confusion matrices provides a degree of transparency useful for clinical acceptance.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While the combination of transformer and edge encoding is sensible, each individual component is derived from existing methods (YOLOv10, ViT, Canny). The novelty is more in integration than in algorithmic innovation.

    While the use of Canny detection is stated, it’s unclear whether edge features are used as auxiliary inputs, concatenated, or merged via attention—more clarity is needed.

    Although performance differences are reported, statistical significance between methods is not analyzed, which is critical given the small dataset of 190 cases. While results are promising, a small test set and lack of prospective or external validation reduce the strength of the claim of robustness.

    implementation and methodological details are sometimes missing: training times, hardware used, image preprocessing specifics (e.g., resolution, normalization), and augmentation parameters.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Consider clarifying whether the Canny-processed edge image is fed as a separate channel or merged differently.

    An error analysis—especially false positives or negatives—could help improve clinical insight.

    Future iterations might incorporate explainable AI methods (e.g., Grad-CAM) to highlight which regions influenced classification. this may ease the clinical acceptance and provide valuable feedback for next generation of the model design.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper tackles an under-explored but clinically relevant problem in veterinary imaging and proposes a well-motivated model with competitive performance. However, the limited dataset size, lack of external validation, and absence of code/data availability raise concerns. Addressing the issues during rebuttal (especially about edge encoding and dataset generalization) could raise its impact.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Although large revisions have been proposed to clarify dataset limitations, justify edge‐detection choices, and detail implementation specifics, the manuscript remains constrained by the very small sample size, the lack of systematic comparison of alternative edge filters, and the absence of external validation; consequently, greater empirical support and methodological transparency would be required before acceptance could be recommended.




Author Feedback

We sincerely thank the reviewers for reviewing our manuscript and providing valuable feedback. We have carefully considered all suggestions and have planned detailed revisions to address the reviewers’ concerns, as outlined below.

  1. Small Dataset, Data Acquisition, and External Validation: We agree that our dataset is limited; however, this reflects the practical realities of veterinary clinical imaging. We proactively managed labeling bias through multi-veterinarian validation. Traditional augmentation methods were impractical because of discrepancies with clinically obtained data; future studies may explore synthetic augmentation methods such as GAN. These limitations and future directions will be clearly articulated in the revised manuscript’s “Dataset” section. Radiographs were obtained using appropriate exposure settings based on abdominal thickness measurements, 200–300 mA, 60–80 kVp, and a focal film distance of 100 cm. Thorax radiographs were taken using a digital radiographic system (Toshiba RotanodeTM, Tokyo, Japan). These images were processed with a radiographic system (BLADE v1., Median International Co., Anyang, Korea). To mitigate annotator bias highlighted by Reviewer #2 and to ensure objectivity in the absence of external datasets for validation, labeling was conducted through rigorous multi-cross validation by eight veterinarians, each with 1–5 years of clinical experience and 1–2 years of specialization in veterinary radiology. The labeling criteria were based on the “Textbook of Veterinary Diagnostic Radiology” by Donard Thrall, one of the most authoritative references in veterinary diagnostic imaging. Our dataset includes exclusively cases confirmed by at least six of the eight veterinarians to ensure objectivity. Moreover, we performed a 4-fold cross-validation and monitored loss curves to control and mitigate potential overfitting. We assessed traditional augmentation methods but concluded that they lack clinical applicability. We may explore generative models to improve generalization in future works. These details will be clarified in our revised manuscript’s “Dataset” section.

  2. Justification for Canny Edge Detection and Implementation Details: We agree that clearer justification for the use of Canny edge detection and additional implementation details are needed. Canny edge detection was selected for its effective Gaussian noise-filtering capability, non-maximum suppression with adjustable thresholds. These characteristics enabled superior qualitative edge detection on X-ray images and significantly improved model performance. Representative visual examples demonstrating the superiority of Canny edge detection will be provided in the “Supplementary Materials”. Our experiments used a Ryzen 7 7800X3D CPU and four RTX 4090 GPUs. Hyperparameters were empirically tuned based on initial experiments to ensure an optimal balance between sensitivity and specificity. For example, we set the lower and upper thresholds for Canny edge detection to twice and 2.5 times the median of the channel pixel values, respectively. A detailed implementation settings and comparison of edge-detection methods and explicit descriptions of the integration process will be provided in the “Methods” and “Supplementary Materials” sections of the revised manuscript.

  3. Additional Experiments While the direct application of Grad-CAM was limited by our network’s FC fusion architecture, we appreciate this valuable suggestion. This limitation occurs because our model improves performance by fusing features extracted from the FC layers at the end of the two encoders. We will document this limitation. In addition, comparisons were conducted using SOTA models based on the same benchmark dataset; however, hybrid models were not included in this comparison. We will incorporate a more detailed analysis including these models in future studies. Error analysis details will be systematically added in the “Supplementary materials”.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a method combining transformer-based global features with edge-based local features for classifying vertebrae in canine X-rays. The topic is interesting and the application area is less explored. However, the contribution is quite limited, and several concerns from reviewers were only partly addressed in the rebuttal.

    Main issues include a small dataset, missing comparisons with recent related methods, and lack of clear technical details. While the idea has potential, the paper does not yet meet the level of novelty, clarity, and thorough evaluation we expect for acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top