Abstract

Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy. Rendering synthetic endoscopic videos by traversing pre-operative scans (such as MRI or CT) can generate structurally accurate simulations as well as ground truth camera poses and depth maps. Although image-to-image (I2I) translation models such as CycleGAN can imitate realistic endoscopic images from these simulations, they are unsuitable for video-to-video synthesis due to the lack of temporal consistency, resulting in artifacts between frames. We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos with differentiable rendering. MeshBrush uses the underlying geometry of patient imaging data while leveraging existing I2I methods. With learned per-vertex textures, the stylized mesh guarantees consistency while producing high-fidelity outputs. We demonstrate that mesh stylization is a promising approach for creating realistic simulations for downstream tasks such as training networks and preoperative planning. Although our method is tested and designed for ureteroscopy, its components are transferable to general endoscopic and laparoscopic procedures. The code will be made public on GitHub.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2259_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/juseonghan/MeshBrush

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Han_MeshBrush_MICCAI2024,
        author = { Han, John J. and Acar, Ayberk and Kavoussi, Nicholas and Wu, Jie Ying},
        title = { { MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work proposes MeshBrush, a method for neural mesh stylization of anatomical models, which can can generate realistic and consistent video sequences mimicking the style of real endoscopic footage by learning per-vertex textures on a 3D mesh from patient imaging data and using differentiable rendering.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Innovative approach for neural mesh stylization of anatomical models, generating realistic video sequences resembling real endoscopic footage. Improved realism compared to untextured renderings, utilizing patient imaging data and per-vertex textures. Potential for further improvement in photorealism, as indicated by relatively high FID and KID compared to real endoscopic data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited generalization to other anatomical structures or procedures. Room for improvement in achieving photorealism compared to real endoscopic data. Computational complexity, especially for larger mesh resolutions or longer video sequences. Insufficient evaluation on downstream tasks to validate practical utility.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Firstly, the method is only demonstrated on a single anatomy (renal collecting duct) and it is unclear how well it would generalize to other anatomical structures or procedures with different visual appearances. Secondly, although the stylized videos show improved realism compared to untextured renderings, FID and KID are still relatively high compared to real endoscopic data, suggesting room for further improvement in photo-realism. Besides, the training process involves iteratively rendering and stylizing views, which could be computationally expensive, especially for larger mesh resolutions or longer video sequences, the authors should provide more details on relative performances. Additionally, considering the aspect of evaluation, while the feature matching and structure-from-motion experiments demonstrate the temporal consistency, a more comprehensive evaluation on downstream tasks like depth estimation, camera tracking, or surgical skill assessment is needed further validate the practical utility of the approach.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The effectiveness and efficiency of method.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I think this paper has certain merits, since in present stage, researchers focused on how to reconstruct the 2D endoscopic view into 3D scene, if the authors can transfer the 3D anatomy style into endoscopic view, we can have more groundtruth to optimize the scene reconstruction results.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a 3D-consistent style transfer method for neural stylization of endoscopic videos. A mesh is segmented from a CT scans, and a data-driven approach to estimating vertex texture using style transfer (image-to-image, I2I) is proposed. Once each vertex texture is obtained, realistic endoscopic videos are created.

    Results demonstrate that the generated images have matching features that are sufficient to estimate 3D shape using structure from motion (SfM), which is not possible using previous methods. Some quantitative results using FID/KID that measure distance from real endoscopy images, as well as nearest neighbor feature matching are presented. Finally, an ablation study of the components of the algorithm (no Fourier vertex encoding, no heatmap-weighted loss function) are shown.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a very novel and clever approach to consistent style transfer in 3D. To achieve this, some tricks (Fourier vertex encoding, HM weighted loss) are proposed.

    The approach is general and can be applied outside of endoscopy.

    Results that demonstrate structure from motion estimation are very convincing.

    The paper is generally clearly written, although the result section is a little hard to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The evaluation section provides only limited qualitative results. A visual comparison showing the advantages of the proposed method over I2I would be helpful.

    Results from Table 1 are not consistent, and don’t demonstrate advantages of the proposed approach. I also did not understand which feature extraction method is used to obtain matches. On the other hand, the SfM results are convincing and should be highlighted earlier on the paper.

    It took me several reads to understand the approach. Specifically, Figure 2 is missing key details, e.g., where style transfer supervision comes in. I encourage the authors to add labels used in the text (e.g., A, B) to Figure 2 to streamline the text.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors provide high-level implementation details in Section 4 and promise to release their code. The approach is validated on an in-house dataset (which they authors did not state whether it will be public or not), but the proposed methodology should be re-implementable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please provide qualitative results (frame renderings) demonstrating advantages of the proposed methodology.

    Please report SfM and qualitative results with different styles transferred onto the mesh, to demonstrate generalizability.

    Please also provide additional details about how the mesh is cropped from a CT scan. Does the mesh need to be smoothed/preprocessed for the approach to work?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a clever approach, but evaluation is weak

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Based on the authors rebuttal and other reviews, the paper presents a strong innovative technical contribution and therefore should be accepted. Some aspects of the evaluation are missing, but the material is enouch to demonstrate a proof of concept.



Review #3

  • Please describe the contribution of the paper

    The authorsare focused on enhancing ureteroscopy imaging, introducing MeshBrush, a specialized neural network designed for mesh visualization. As novelty, the authors state that their innovative method synthesizes temporally consistent videos with distinguishable rendering. Validation of the approach was conducted using renal collecting duct mesh.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Multiple state-of-the-art works are dedicated to advancing realistic simulated environments, particularly for endoscopic imaging or its equivalent. The authors of this study have achieved an interesting methodology and results. This work introduces several technical advancements, including:

    • Enhancing texture resolution
    • Implementing a novel approach to integrate color into the mesh, resulting in enhanced realism
    • Employing a different deep learning architecture to incorporate finer details during training. In particular, the utilization of Fourier Feature Representation is an intriguing concept introduced by the authors. Additionally, the authors conducted an ablation study, which clearly validates the potential of the described technique.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the technical aspect is both relevant and well articulated, my primary critique pertains to the experiments section, which is rather limited. The authors assert that “this method has only been tested for a renal collecting duct model; however, the proposed method is generic and self-supervised, thus holds potential for application across other anatomies and procedures.” While I acknowledge their perspective, and technically there are no limitations apparent for other applications, it would be beneficial to include additional results from various anatomies in the final paper.

    Furthermore, the organization of the paper leaves much to be desired, with images and tables placed incorrectly, thereby hindering the readability of the document.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • If possible, please provide another example result in a different anatomy.
    • Please comment on how the method will work when using surgical instruments? Or when realizing specific medical tasks (e.g. smoke).
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Technically, the paper is excellent. I have some concerns about the experimental section. For this reason, my recommendation is accept. The paper is well written.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers (R1, R3, R4) for their comprehensive evaluation of MeshBrush, a neural mesh stylization method for endoscopy to generate realistic views and consistent features. To the best of our knowledge, the proposed work is the first of its kind, supported by R1 and R3’s assessment. The reviewers classified our work as “innovative” (R1), “clever and convincing” (R3), as well as “interesting and well articulated” (R4). We appreciate the reviewers’ enthusiasm in the proposed work’s methodology as well as their relevant concerns.

In this rebuttal letter, we would like to address 3 main aspects: 1) the applicability to other anatomies and other downstream vision tasks (R1, R4), 2) insufficient evaluation and presentation of our work (R3, R4), and 3) practical concerns regarding realism and data preparation (R1, R3, R4).

1) Applicability and generalizability. Prior to stylizing our mesh, we train an image-to-image (I2I) style transfer network to translate rendered views from CT to realistic endoscopic views, which is used as ground truth for MeshBrush. We claim that the method is generalizable to other anatomies since I2I methods have already been proposed in other anatomies [ex. Aldo et al. Computer Methods and Programs in Biomedicine 200 (2021), Mathew et al. CVPR 2020] (R1, R4). Nevertheless, we plan to extend our work in a journal paper where we provide a more comprehensive evaluation of our approach to other anatomies.

To address R1’s concern about other downstream tasks like depth estimation and camera tracking, we also plan to include these components in the extended work. However, the results from Structure-from-Motion (SfM) are promising in support of downstream vision tasks as SfM directly estimates the camera trajectory and can be used to generate depth estimations.

2) Inadequate evaluation and presentation. Despite our desire to add more qualitative results, the manuscript page constraints led us to include them in only Figures 1 and 3 (R3). To address the inconsistent evidence in Table 1 (R3), we emphasize that a stylized mesh produces distinct textures sufficiently for a much harder task than matching ORB features (which was done using OpenCV, R3), namely Structure-from-Motion, as shown in Fig. 4.

Regarding Figure 2 missing key details (R3), we will clarify variables in the Methods section by adding them to the figure. If accepted, we will correct the notation in the figure and paragraph to be more cohesive, as well as restructure the figures to better organize the paper and improve its readability (R4). We will also add further details for extracting mesh from CT (cropped from CT using 3D Slicer and subdivided to increase texture resolution, R3).

3) Realism and data preparation. Regarding R1’s assessment of realism being an area of future improvement, realism can be improved with more sophisticated rendering strategies since the mesh textures are learned. Furthermore, we kindly urge the reviewers to recognize that our novel work is a foundation for consistent style transfer in endoscopy. To address computational complexity concerns (R1): our mesh is still relatively small with only a few thousand vertices. As a result, rendering a scene is on the magnitude of a few seconds; mesh stylization only took 6 hours on an RTX4090 GPU. Although other anatomies will have various sizes, shapes, and geometric complexities, the training time will still be tractable as MeshBrush’s applications do not require real-time execution. Once the mesh is stylized, it can be used for any downstream task without further modifications. Regarding longer video sequences (R1) and interferences like smoke or surgical instruments (R4), this will only impact the I2I style transfer network training, which our work assumes to be ready on-hand.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers recommend acceptance, and I believe this work is interesting and strong enough for publication at MICCAI. I will go with the reviewers’ universal opinion, and recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers recommend acceptance, and I believe this work is interesting and strong enough for publication at MICCAI. I will go with the reviewers’ universal opinion, and recommend acceptance.



back to top