Abstract

Three-dimensional reconstruction of the surgical area based on intraoperative laparoscopic videos can restore 2D information to 3D space, providing a solid technical foundation for many applications in computer-assisted surgery. SLAM methods often suffer from imperfect pose estimation and tissue motion, leading to the loss of original texture information. On the other hand, methods like Neural Radiance Fields and 3D Gaussian Split require offline processing and lack generalization capabilities. To overcome these limitations, we explore a texture optimization method that generates high resolution and continuous texture. It designs a mechanism for transforming 3D point clouds into 2D texture space and utilizes a generative network architecture to design 2D registration and image fusion modules. Experimental results and comparisons with state-of-the-art techniques demonstrate the effectiveness of this method in preserving the high-fidelity texture.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2885_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zhe_Misaligned_MICCAI2024,
        author = { Zheng, Jieyu and Li, Xiaojian and Mo, Hangjie and Li, Ling and Ma, Xiang},
        title = { { Misaligned 3D Texture Optimization in MIS Utilizing Generative Framework } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper addresses how to provide better texture blending in MIS. They approach this via 1. using a GAN to register image pairs, 2. warping the images to better align, 3. fusing said images using a neural network into a merged image. They evaluate qualitatively on the Hamlyn dataset and the SCARED dataset, along with showing their blending performance compared to EndoMotion and EMDQ-SLAM.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths:

    • This paper addresses a clear need of better alignment and more robust image fusion for scene reconstruction in endoscopy. Their qualitative results are promising, and texture-preserving.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses:

    • The flow of the paper is difficult to read. I find it difficult to understand which pieces in the math correspond to which pieces in the Figures (eg. 1)

    • The paper proposes a method for pairwise blending, but it is unclear how these pairs are selected to then be fused. For example, in a downstream application, how would we select which pairs to use.

    • The ground truth texture map is not provided, the details on what the ground truth images are used for PSNR,SSIM,…  in Table 1 is unclear to me. If it is an image in the image pair, then the analysis is unclear since the network already ingresses these.

    • There is no visualization or analysis of the optical flow used in the flow registration part of fusion. How can we know this part is useful for the algorithm? Couldn’t it get a better image for the GAN by just picking one of the input frames?

    • Although the authors say otherwise, the SCARED dataset does have ground truth that can be used to estimate corresponding points using the ground truth depth in combination with the recorded poses.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Here are some edits I believe could make the paper more clear:

    • The paper can be difficult to parse. The title, for example, would make more sense as something like: “Aligning 3D Textures in MIS using Generative Frameworks”

    • Gaussian Splatting (not Splitting).

    • There is a fair amount of space spent on attention/cross-attention. This can be referenced, or shrunk in order to allow more space to describe the rest of the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The need for better fusion of texture from multiple 3D views is clear in MIS. This paper provide promising qualitative visual results. That said, with my understanding of the paper, this provides a means for 2-views, and thus does not solve the problem of multi-view blending. How this would be used in practice is unclear, and this would need to have a motivation to justify its position. In addition, the analysis is unclear to me; in particular where the ground truth comes from, and why SCARED can not be used for ground truth correspondences.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I am changing my score from weak reject to weak accept, and I thank the authors for clarifying their results. The primary reason for weak accept is that the clarity of the paper could use improvement. This could be both in text and in the figure. For example, more clearly annotating the figure with the GAN (Fig 1) and registration module (Fig 2). It would make it more easily apparent if flow output, F_*, was in the diagram for the registration module.

    In the rebuttal, the authors have addressed my concerns by stating how this method is for blending image pairs 1-10 frames apart, along with the usage of images in the pairs for a pseudo-GT via a photometric reconstruction error. They also cleared up briefly how their blending could be used in practice with a SLAM framework.

    3/4: ranking in rebuttal stack



Review #2

  • Please describe the contribution of the paper

    The paper addresses the important problem of texture blurring and loss in 3D reconstruction due to inaccurate depth and pose estimation, proposing an effective solution that leverages 2D mapping and deep learning techniques to optimize texture quality from misaligned data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper lie in its novel formulation for 3D texture optimization, strong evaluation on public MIS datasets, potential for real-time application in surgical guidance, and technical soundness with a detailed methodology. Firstly, the paper introduces a new approach to address the problem of texture blurring and loss in 3D reconstruction due to misaligned data in MIS. It formulates the problem in 2D texture space by transforming 3D point clouds using camera projection and interpolation regression, enabling the use of effective 2D generative models. The coarse-to-fine registration module with transformers for optical flow prediction and the multi-scale fusion module for deeply integrating information are innovative components of the framework. This novel formulation tackles the limitations of existing methods and provides a fresh perspective on optimizing texture quality in 3D reconstruction. Secondly, the method is thoroughly evaluated on two public in-vivo datasets (SCARED and Hamlyn) and compared against state-of-the-art methods (EndoMotion and MBB). Quantitative results using SSIM, PSNR, and VIF metrics demonstrate superior performance in texture consistency and quality. Qualitative results visually confirm the method’s ability to correct misalignments, preserve details, and achieve high-fidelity texture reconstruction. The strong evaluation on relevant datasets strengthens the validity and robustness of the proposed approach. Thirdly, the paper discusses the potential integration of the proposed method into real-time SLAM frameworks for intra-operative surgical navigation and guidance. Improved 3D texture quality could enhance visualization and situational awareness for surgeons during MIS procedures. The validation on in-vivo datasets further supports the clinical feasibility and potential for translation. This aspect highlights the practical value and applicability of the research in real-world surgical scenarios. Lastly, the paper presents a technically sound and detailed methodology. It provides a clear description of the technical approach, including the 2D mapping process, registration module architecture, and fusion module design. The use of a pre-trained generative model (VQ-GAN) and the incorporation of appearance, total variation, and perceptual losses demonstrate a well-grounded methodology. Ablation studies and comparisons with alternative approaches further validate the design choices. The thoroughness in explaining and justifying the technical components enhances the credibility and reproducibility of the work. In summary, the paper’s novel formulation, strong evaluation results, potential for clinical application, and technical soundness constitute its major strengths. The innovative approach to 3D texture optimization, the comprehensive experiments on public datasets, the discussion of real-time surgical guidance, and the detailed methodology make this work a valuable contribution to the field of 3D.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of this paper lie in its limited evaluation of generalization and robustness, lack of direct comparison with learning-based 3D reconstruction methods, limited discussion on computational efficiency and real-time performance, absence of clinical validation and user feedback, and lack of ablation studies and sensitivity analysis. Firstly, while the method is evaluated on two public MIS datasets (SCARED and Hamlyn), the performance on data from different surgical scenarios, anatomical regions, or imaging conditions is not extensively explored. The paper lacks an in-depth analysis of how the method handles variations in lighting, occlusions, or tissue deformations, which are common challenges in MIS. Assessing the generalization of the proposed approach to a broader range of datasets and conditions would strengthen its practical applicability and robustness. while the paper mentions the potential for integration into real-time SLAM frameworks, it does not provide a detailed analysis of the computational requirements and runtime performance of the proposed method. The scalability of the approach to high-resolution images or larger-scale reconstructions is not thoroughly discussed. More information on the trade-offs between texture quality and computational efficiency would be helpful for assessing the method’s practical usability in real-world surgical scenarios.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is well-structured, and the methodology is clearly described. However, there are several areas that could be improved to strengthen the paper’s contributions and impact. Firstly, the paper’s main strengths lie in its novel formulation for 3D texture optimization, strong evaluation on public MIS datasets, potential for real-time application in surgical guidance, and technical soundness with a detailed methodology. The innovative approach of transforming 3D point clouds into 2D texture space and utilizing a generative network architecture with coarse-to-fine registration and multi-scale fusion modules is commendable. The comprehensive experiments on SCARED and Hamlyn datasets, demonstrating superior performance compared to state-of-the-art methods, validate the effectiveness of the proposed method. The discussion on the potential integration into real-time SLAM frameworks for surgical navigation highlights the practical value of the research. Absence of clinical validation and user feedback: To strengthen the paper’s clinical relevance, consider incorporating validation or evaluation by clinical experts. Assess the usability, interpretability, and clinical value of the enhanced 3D texture quality through user studies or feedback from surgeons. This would support the claims regarding the potential impact on surgical practice and decision-making.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    there are several areas that need improvement, as mentioned in the weaknesses section where the score can be improved, such as:

    Limited evaluation of generalization and robustness Lack of direct comparison with learning-based 3D reconstruction methods Limited discussion on computational efficiency and real-time performance Absence of clinical validation and user feedback Lack of ablation studies and sensitivity analysis

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper describes a novel texture optimization method for 3D reconstruction of surgical scene from RGB images. Unlike the other referenced methods, that do not account for overlaying 3D information from 2 subsequent images, this approach transforms the 3D points clouds from these two images, then relies on the camera projection and a regression algorithm to obtain a fused 2D image. After this, a generative model is used to fuse misaligned information from the two frames. The approach is evaluated on a test set collected from the Hamlyn dataset. By computing Similiarty index, peak SNR and visual fidelity, the texture optimization of the approach is compared to other popular techniques. Several qualitative examples are also shown for better understanding.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written, easy to follow, has a logical experimental validation and excellent visuals. In particular, I like the choice of zooming in on cross sections in Fig 3 and Fig 4. I could immediately understand the benefit of this technique.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One of the main arguments in the introduction is that NeRF and 3DGS are not general (although this is not elaborated on) and require offline processing. The final results are however not compared to these approaches (which would be interesting to see) or at least, demonstrated in a way that validates that the technique is general (it’s only applied on one test set from Hamlyn based on my understanding) or online / fast. In order to argue the necessity of this new technique and value - add, I think these are important investigations to be done.

    The methods that are compared to / evaluated against are also not mentioned explicitly in the introduction. Are these fair techniques to compare to?

    The paper is good and the results are well put together but without these questions answers it’s difficult to say if this contribution is significant.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No additional comments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I think that the weaknesses that I have outlined should be addressed in order to strengthen the argument for developing this technique in the first place. Or the benefit of this algorithm over the described approaches should be modified to better highlight the contribution.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think this paper is strong as is and with better clarification as to why this method had value add over the other described techniques, it will be a good fit for MICCAI.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I am changing my score from a weak accept to an accept. I thank the author’s for their detailed responses to my comments. My accept was initially weak because I didn’t feel that the benefit of this approach over others was clear but now I understand.

    I feel this paper will make a nice contribution to the MICCAI proceedings.




Author Feedback

We appreciate the reviewers’ time, effort, and valuable comments to enhance our paper. They found our approach interesting and novel (R3, R4), with rigorous experiments (R3) and strong evaluation results (R4), showing good qualitative effects (R1, R3) and clinical application potential (R4). All reviewers (R1, R3, R4) recognized our work’s importance in addressing the critical issue of texture loss or blurring in 3D reconstruction. Below, we provide succinct responses to their major concerns.

  • GT of SCARED (R1): We have tested that GT poses in SCARED are not very accurate, which result in noticeable pixel errors when reprojected at the image level.
  • Pairwise blending and how to apply (R1,R4): During training, we randomly select image pairs from the original video with a frame gap of 1-10 frames and crop 256×256 regions for data augmentation. So this algorithm is designed to address texture stitching and optimization of two neighboring frames rather than solving the multi-view problem all at once. It can be integrated as a plugin into existing SLAM algorithms, continuously optimizing neighboring keyframes to enhance 3D texture and reduce accumulated errors.
  • Evaluation Methods (R1): We directly input Hamlyn images for testing, so we do not have GT for the optimized textures. Any misalignment in features due to inaccurate optical flow will result in noticeable ghosting or artifacts in images generated by the frozen VQ-GAN, making the textures worse than the input images. Therefore, we can indirectly measure the optical flow accuracy and the texture consistency by calculating the PSNR and SSIM, and use VIF to assess the quality of optimized textures. Figures 3 and 4 also confirm our results.
  • Only GAN(R1): Multiple observations of the same scene from different viewpoints can restore high-resolution textures and structures. Our algorithm follows this principle, synchronously building the texture and pixel relationship between two frames to achieve texture optimization and stitching. Directly using GAN to optimize a single image would lose the relationship between frames.
  • Method generality(R3): After training on SCARED, we directly validated it across 4 test sets of the Hamlyn, rather than just 1 test set, which is shown in Table 1. Figure 3 provides qualitative examples from completely different videos. This demonstrates that the good performance of our algorithm is general.
  • Advantages(R3): We introduced EndoMotion[9] and EMDQ-SLAM[20] for comparison because they are existing texture optimization methods. The former corresponds to texture-weighted summation operations, while the latter corresponds to multi-band blending texture operations. Additionally, compared to NeRF and 3DGS, they can be used for incremental, near real-time 3D reconstruction. Therefore, we believe that comparing them is reasonable and effective. Our algorithm’s results are clearly superior to theirs and can simultaneously optimize and stitch textures between two frames.
  • Speed(R3, R4): The proposed texture optimization network can infer at a speed of 8 FPS, which is faster than offline processing with NeRF and 3DGS. Although it is not real-time, it can be applied only to the keyframes of SLAM to ensure effectiveness.
  • Robustness analysis(R4): The training set includes various lighting and tissue deformations by selecting image pairs across frames. By utilizing the designed self-supervised loss, the network learns to overcome these complex conditions. Currently, the algorithm does not consider instrument occlusion.
  • New Experiments(R3, R4): This year’s new regulations prohibit the addition of new experimental results. Violating this rule will result in immediate rejection. However, after further improving the algorithm, we will design experiments based on your valuable suggestions and conduct corresponding analyses. Clarifications discussed will be included in the manuscript. Thanks again for your insightful comments and constructive suggestions.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top