Abstract

We introduce ColonSLAM, a system that combines classical multiple-map metric SLAM with deep features and topological priors to create topological maps of the whole colon. The SLAM pipeline by itself is able to create disconnected individual metric submaps representing locations from short video subsections of the colon, but is not able to merge covisible submaps due to deformations and the limited performance of the SIFT descriptor in the medical domain. ColonSLAM is guided by topological priors and combines a deep localization network trained to distinguish if two images come from the same place or not and the soft verification of a transformer-based matching network, being able to relate far-in-time submaps during an exploration, grouping them in nodes imaging the same colon place, building more complex maps than any other approach in the literature. We demonstrate our approach in the Endomapper dataset, showing its potential for producing maps of the whole colon in real human explorations. Code and models are available at: https://github.com/endomapper/ColonSLAM

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1110_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1110_supp.pdf

Link to the Code Repository

https://github.com/endomapper/ColonSLAM

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Mor_Topological_MICCAI2024,
        author = { Morlana, Javier and Tardós, Juan D. and Montiel, José M. M.},
        title = { { Topological SLAM in colonoscopies leveraging deep features and topological priors } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method that take advantage of dense and accurate reconstructions of anatomy from endoscopy image sequences and topological submaps with linear connectivity with respect to time of acquisition. This paper builds on these two foundations by further searching for graph connectivity based on image descriptor similarity within a neighbourhood.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper clearly identifies the problem it is trying to solve - building a topological map of the entire colon from reconstructions of anatomy and topological submaps, rather than building a more meaningful representations of anatomy. This clarification, at the start of the paper, sets clear expectations for the reader, which is helpful. The paper builds its methods on solid foundations.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Methods are hard to follow as pieces of information that are connected to each other appear is disjointed sections of the paper. Evaluation between various methods is hard to compare due to the use of different thresholds.

    Readability is somewhat poor due to the methods and experiments section being interspersed with comments on improvements achieved (which should appear in results) and prior art (which should appear in prior art), respectively.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    From the description, the method described seems hard to reproduce. For instance, few details about the MLP used to determine whether a new image should be assigned to a node or not are provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The clarity in defining the scope of the paper and building on strong foundations is the biggest strength of this paper. Authors should carry over the clarity into explanation of the methods. As mentioned earlier, methods are hard to follow as pieces of information that are connected to each other appear is disjointed sections of the paper. For instance, various aspects of CudaSIFT-SLAM are explained in different sections of the paper, making it difficult to understand a major foundation that this paper builds upon. Another ambiguity is at the explanation of comparison of image descriptors within L described mid page 5. Should this be assumed to be performed by the MLP at the end of page 4? Authors should number the equation g = d_A - d_B and reference the equation on page 5. Similarly, although less disconnected, for score_L computation (mid page 5) and use (further down on page 5). By not numbering these details, they get lost in text and are hard to reference. These types of disconnect make details hard to follow / difficult to find.

    Window size m is described in 6, but it does not seem to appear again. Assuming that window size is the size of omega, authors should describe it as such so that it is clear to the reader.

    Why are different th_sim used for different methods being compared? It would seem that low th_sim would result in lower recall for these methods, weakening the claim of the presented method having highest recall. It would make sense to fix the threshold to be able to properly compare these methods. If not, please include an explanation for the different thresholds.

    The sentence “All approaches except LightGlue improve their precision significantly when the topological prior is applied” does not seem to match values in Table 1 (0.97 > 0.95). Additionally, if there is not test for significance described, authors should avoid statements about significance in improvement. Please also describe the underline/bold in Table 1.

    Readability comments:

    • ‘previous’ and ‘posterior’ both mean behind. Do you mean anterior and posterior, meaning in front and behind, in terms of links per node? If so, please rectify this throughout the text.
    • ‘leverage on’ can be replaced with ‘leverage’ throughout the text
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Given the overall lack of clarity and reproducibility as well as the way in which evaluations are performed (specifically with varying thresholds for different methods) would require substantial changes.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Authors have sufficiently addressed all major concerns. The main concern about comparisons between different methods using different thresholds has been explained by authors and does not need them to repeat the experiment.



Review #2

  • Please describe the contribution of the paper
    • The paper proposes a metric-topological SLAM system called ColonSLAM that can map the whole colon and create a complex graph.
    • The paper also proposes a visual place recognition network.
    • The paper evaluates with Endomapper data, where it shows that the method can build complex maps that covers the entire colon.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper presents a first metric-topological SLAM system called ColonSLAM that can map the whole colon and create a complex graph.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The organization of the Related Work section, especially in the latter two subsections, could benefit from restructuring for better clarity and flow.

    • The selection process for the two sequences used as ground truth appears to be unexplained. Describing how these sequences were chosen would clarify the rationale for the evaluations conducted.

    • The Conclusions section could benefit from a more thorough discussion on the implications of the findings.

    • While a limitation is noted, its impact on the research or potential future directions is not explored. An expanded discussion regarding how this limitation affects the findings and future research possibilities would be valuable.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Related Work

    • “Despite its simplicity, Colon Mapper is … same patient.” The connection between “ColonMapper” and the work presented in this paper would be beneficial.
    • The pronoun “They” in “They build a graph” creates ambiguity and specification would enhance clarity.
    • “We empirically found that..” The inclusion of empirical findings in the Related Work section is unusual. The authors might consider relocating these findings to sections that could be more contextually relevant.

    Node Building

    • CudaSIFT-SLAM, recent feature-based SLAM …”, may seem redundant as it is described in multiple sections. Experiments
    • Details regarding the manual labeling process would be useful.
    • In “We manually labelled”, the description about the individuals involved would be useful as well.
    • The term “with enough matching power” is vague. A definition would be helpful.
    • In Table 1, the significance of bolded and underlined values is not explained. A brief description or legend would improve the interpretability of the table.

    Some minor typos were found:

    • “cronological” in section 3.3
    • “recall an being” in section 4.2
    • “perfromance” in section 4.2

    • Figures 3 and 4 in the supplementary material are not referenced in the text
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper requires substantial improvements in organization, clarity, and flow. These enhancements are essential for improving readability and clearly articulating the research’s contributions and implications.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have successfully addressed the major concerns.



Review #3

  • Please describe the contribution of the paper

    Colonoscopic video sequences are decomposed in ‘submaps’, consistent small disjoint 3D maps. The contribution is to create a topological graph, deciding if a submap belongs to an already explored node (covisible submaps) or if a new node must be created.

    1. Localization Network L: from the difference of their descriptors, a multi-layer perceptron predicts if two images are similar or not. An existing foundational network is specifically retrained.
    2. Application-specific topological connectivity priors (e.g. distance constraints between submaps) limit the search space of candidate nodes.
    3. A transformer-based image matcher is used to speed-up the research if enough images of two nodes are similar. This image matcher lacks robustness and submaps can still be classified in a same node based on network L only. The method is evaluated on two sequences manually curated. Compared to previous methods, the precision is stable but the recall factor is significantly improved at similar cost.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the method is sound, with an elegant combination of existing and fine-tuned methods
    • results show an increase of robustness in comparison with existing methods of the authors, in a very challenging context.
    • qualitative results are very convincing.
    • the related work section is comprehensive, and the proposed approach is clearly justified with respect to existing works
    • overall, the paper is very well written and extremely clear
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No major weakness honestly, only very minor comments.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Minor details only:

    • Fig. 1, 2: what is the meaning of the green circles next to the nodes? This is eventually described at the end of section 4.2, but for readability this should appear in the figures caption as well.
    • Table 1: the runtime for your full method (L + Topological Prior ° LightGlue) is 25 min?? Do you mean 25s, or even less? (since in the text you state that “LightGlue […] reduces computation time by a 5…”)
    • a couple of mispelling: continuosly, perfromance
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Sound method, novelty, increased robustness in a very challenging context. No weakness in the method, study, or manuscript.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The paper was already clear in my opinion, and the proposed (minor) modifications are relevant.




Author Feedback

We appreciate the reviewers’ evaluation and constructive feedback. We address the major points raised in their comments. -Clarity and organization (R1, R4) We agree that paper readability could be improved with some reorganization, but we believe it can be achieved without major modifications. We have adjusted the following: Main idea: Sec 3.3 now starts with: “ColonSLAM receives a linear topological graph formed by all submaps from CudaSIFT-SLAM. The main idea behind ColonSLAM is to identify which submaps represent the same colon location, merging them into the same node. This observation capability builds traversability links between distant nodes, resulting in a richer graph than the linear one.” CudaSIFT-SLAM: explanations have been unified in the Sec 2, easing the understanding and reducing redundancies. Comparison of image descriptors (R1): descriptor similarity is obtained by the MLP of our L network. We have slightly rewritten Sec 3.2 for clarification: “Our localization network L predicts if two images come from the same place or not, and we use it to determine if the incoming submap is already included in the map. The network is composed of a backbone and a 5-layer MLP. The backbone is initialized from the endoscopy foundational model EndoFM, which, for an image I extracts a global descriptor d \in R^768. To decide if two images I_A, I_B come from the same place, we subtract their descriptors $g = d_{A} - d_{B}$ and feed g to the MLP followed by a softmax, predicting if they are similar or dissimilar. We fine-tune the last two layers of the backbone and the MLP using a cross-entropy objective. Training details are explained in Sec. 4.1.” We have formulated the acceptance conditions in 3.3 as numbered equations for easy reference.

-Reproducibility (R1, R4) We have included a figure detailing the network architecture. Upon acceptance, we will release ColonSLAM source code, the trained models and the submaps extracted by CudaSIFT-SLAM, so results can be fully reproducible.

-Varying thresholds for different methods (R1) We have explained it in Sec. 4.2: “We apply a different threshold for each of the networks as the score distribution given by each network is different. To allow a fair comparison between them, we tuned the best threshold for every network in terms of precision-recall performance.”

-Conclusions (R4) Conclusions have been modified: “We have presented ColonSLAM, the first topological SLAM able to build rich graphs of the whole colon, capturing the complexity of the colonoscopy exploration. Leveraging on our robust localization network and guided by topological priors, ColonSLAM is able to reliably build a graph by finding traversability and covisibility connections between distant nodes. The graphs obtained with ColonSLAM will serve as personalized patient maps, paving the way to assisted navigation and disease monitoring in colonoscopy. In future work, we will focus on finding even longer term relationships i.e. entry-withdrawal and second explorations of the same patient as they are a limitation for ColonSLAM. Finding these long-term correspondences is the key to the building and exploitation of personalized patient maps.”

-Other comments (R4) Sequence selection and labeling. We have explained this in Sec. 4.2: “We chose the same sequences as ColonMapper, the closest work to ours, easing the comparison. Labeling was done following the text footage available in the Endomapper dataset, created by the doctor during the exploration.” (R1) Table I. All approaches except LightGlue[….] (pag 7) corrected to “All approaches improve their precision when the topological prior is applied.”. (R1,R4) We have included “Bold: best. Underlined: second best” in the table caption. (R4) Connection with ColonMapper, stated in Sec 2: “ColonMapper builds the map and afterwards localizes. Our ColonSLAM performs a proper topological SLAM, simultaneously localizing and updating the map for each new incoming submap.”




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal provides sufficient information to address the questions. The reviewers have increased their scores. I also recommend to accept this paper as it is a good contribution.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal provides sufficient information to address the questions. The reviewers have increased their scores. I also recommend to accept this paper as it is a good contribution.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top