Abstract

Reconstruction of deformable tissues in robotic surgery from endoscopic stereo videos holds great significance for a variety of clinical applications. Existing methods primarily focus on enhancing inference speed, overlooking depth distortion issues in reconstruction results, particularly in regions occluded by surgical instruments. This may lead to misdiagnosis and surgical misguidance. In this paper, we propose an efficient algorithm designed to address the reconstruction challenges arising from depth distortion in complex scenarios. Unlike previous methods that treat each feature plane equally in the dynamic and static field, our framework guides the static field with the dynamic field, generating a dynamic-mask to filter features at the time level. This allows the network to focus on more active dynamic features, reducing depth distortion. In addition, we design a module to address dynamic blurring. Using the dynamic-mask as a guidance, we iteratively refine color values through Gated Recurrent Units (GRU), improving the clarity of tissues detail in the reconstructed results. Experiments on a public endoscope dataset demonstrate that our method outperforms existing state-of-the-art methods without compromising training time. Furthermore, our approach shows outstanding reconstruction performance in occluded regions, making it a more reliable solution in medical scenarios. Code is available: https://github.com/CUMT-IRSI/DnFPlane.git.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2081_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2081_supp.zip

Link to the Code Repository

https://github.com/CUMT-IRSI/DnFPlane.git

Link to the Dataset(s)

https://med-air.github.io/EndoNeRF/

BibTex

@InProceedings{Bu_DnFPlane_MICCAI2024,
        author = { Bu, Ran and Xu, Chenwei and Shan, Jiwei and Li, Hao and Wang, Guangming and Miao, Yanzi and Wang, Hesheng},
        title = { { DnFPlane For Efficient and High-Quality 4D Reconstruction of Deformable Tissues } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes the DnFPlane algorithm (NeRF based approach) designed to address reconstruction challenges in complex scenarios occurring in endoscopic stereo videos. The main contributions are (1) a framework that computes a dynamic-mask to filter features at the time level, which allows the network to focus on active dynamic features; and (2) using the dynamic mask as guidance, color values are iteratively refined through Gated Recurrent Units, improving further the reconstruction results. Experiments in a public dataset shows state-of-the-art performance in rendering of synthetic views and 3D reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The positive aspects are: (1) the idea of the paper is, in general, simple to follow; (2) the main contributions of the proposed NeRF based architecture (Dynamic-mask Generation, Dynamic Feature Enhancement, Color Iterative Refinement) are not major scientific contributions (the base pipeline is similar to LerPlane[22]), but are simple and seem to be effective; (3) the experimental results show top-performance in rendering quality of synthetic views and 3D scene computation when compare to state-of-the-art approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The negative aspects of the paper are: (1) it’s not clear from the paper if the competing approaches (EndoNeRF, EndoSurf and LerPlan) were re-trained using similar settings (e.g. number of training images, effort for hyper-parameter tunning, etc…) or the trained models of the authors were used? (2) I was not able to understand all parts of the proposed architecture:

    • In Section 2.3, F_{X’Y’} is obtained by fusing F_{XT} and F_{YT}. What computations involve this fusion step?
    • Delta is obtained by subtracting F_{XY} from F_{X’Y’}, but a strange symbol is used in the text for this calculation?
    • For the dynamic feature enhancement, you employ the dynamic mask (I am assuming it’s a binary vector) and you perform element-wise multiplication with the fused features. Nevertheless, from Figure 1, this binary multiplication does not give rise to zero values in the enhance feature. Shouldn’t it as the dynamic mask is binary? I think it would be important to improve the presentation of Section 2, better explaining the relevant steps of the pipeline; (3) The authors motivate this investigation as it will potentially allow surgeons to navigate surgeries in a more comprehensive and precise manner, improving the safety and success rates of surgeries. Can you comment if the quality of the proposed rendering/reconstructing pipeline has reached the quality needed for a real medical application? Is the rendering time acceptable for a clinical setting? (4) Gaussian Splatting [A] has been introduced last year as being a more efficient alternative with respect to NeRFs. It would have been relevant to understand the pros and cons of the proposed NeRF approach with respect to Gaussian Splatting.

    [A] “3D Gaussian Splatting for Real-Time Radiance Field Rendering”, Kerbel et. al, ACM Transactions on Graphics, 2023

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Strange sentence in Section 2.2: “To enhance the efficiency of training and rendering, inspired by LerPlane [22], representing the surgical process as a 4D volume” (2) In page 2, the sentence “Firstly, the obstruction of surgical instruments in soft tissues can easily result in depth distortions in the reconstruction results…” is strange. Can you explain why these are depth distortions? Aren’t these simply errors in the depth estimation caused by the instruments in the soft tissue?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a new NeRF based architecture for synthetic view generation and 3D reconstruction targeting stereo endoscopy videos. The experimental results show state-of-the-art performance in one public datasets. Nevertheless, several aspects of the proposed architecture are not well presented in the paper, and a discussion about the suitability of the rendering/3D reconstruction quality and time of the proposed approach for a real medical setting is missing. In case the authors tackle my comments, provide a better explanation of relevant aspects of the pipeline, and discuss how far the rendering/reconstruction quality and time are from the requirements of a real medical setting, then I would be willing to increase my rating.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper deals with the optimization of time-efficient NeRF methods, e.g. the LerPlane algorithm. The optimization refers to both the distortion of depth maps and color blurring due to complex, dynamic changes, such as those caused by instruments. The strategy presented here is based on the targeted reinforcement of relevant dynamic features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A strength of the work is the clear and easy-to-understand description of the proposed approach and the developed framework.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A weakness of the work is the uncritical presentation and discussion of the results from the comparative study. The global performance parameters listed in Table 1 show a rather marginal trend that is not statistically proven. 1 and 3 show examples that can prove the effects predicted in the claims, but overall it is not proven that the performance improvement shown can generally be attributed to these local differences. Furthermore, there is no critical discussion of possible negative effects of selective Emphasis on certain features at the expense of other features.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are recommended to have a self-critical discussion of their paper: What could be possible negative effects of Dynamic Feature Enhancement and how would they become noticeable? To what extent are the results of the comparative study generalizable? How can clinical relevance of the demonstrated quality improvement be demonstrated?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Little self-critical approach and discussion.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose an efficient algorithm capable of coping with 3D reconstruction challenges in dynamic scenarios. Their method uses the dynamic field to guide a static field, generaring a method that can filter features in real time, focusing in active dynamic feaures and reducing depth distorsions. Using the generated dynamic masks, the method refines color rendering making use of a GRU, improving the quality of the tissues detail after reconstruction

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Very clear and concise motivation for the clinical problem and the interest of incorporating 4D reconstrucntion of deformable tissues, a methods for attaining higher performance in this task

    2. The authors make a point for the lack of methods addressing this problem and they nicely describe the contribution at the end of the introduction. They also clearly define the scope of their research (but a discussion on limitations seems to be missing).

    3. The authors provide a schematic representation of the proposed solution although this could be improved in my opinion (comments later on)

    4. The experimental design is good, several experiments have been carried out and ablation studies to show the effectiveness of the DSF and CIR modules, although I have some comments on the balance between model complexity for the obtained results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. I had a hard time trying to understand the method. It could be nice to have a more conceptual figure (think about a graphical asbtract) to make the contribution easier to grasp. Some comments on this are detailed in question 10.

    2. One of my main concerns is that although the results of the method show a significant improvement over other methods in the state of the art (EndoNerf, EndoSurg and LerPlane), the authors does not perfom a statitical test (particularly for their own model configurations). It could be interesting to see some box plots and have an idea of the standard deviations of the methods and the overall robustness. How was the data partitioned?

    3. Related to the previous point, could the authors explain why the improvements gained by using DFE and CIR are so marginal? What would be the processing time if those modules are removed? The proposed model without those modules seems to still do quite fine. What would be the impact of not using them?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper makes use of the EndoNerf dataset, which is publicly available dataset. If the code is made available it could be easily reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In Figure 2, it could be nice to have a a bar below showing what the different tensors are (some of them are easy to understand, other not so much). Figure 1could be used for providing more conceptual description of the method instead of the qualitative comparison alone

    2. It could be interesting to have an idea of the number of parameters of the DFE and CIR modules, as well as the introduce complexity, as they offer very little improvement over the baseline. A statitistical test could also be nice to see if the results are statistically significant.

    3. Could the authors ellaborate a bit more on how the weights for the losses where selected? Particularly for Ltv y Lts

    4. It could be interesting to have some rosbustness study using box-plots to see the how the different methods perform over the entire dataset as they do to provide standard deviations either. This could help evaluating if the overall proposal and the modules indeed show a significant improvement.

    5. If possible it could be interesting to see qualitative analyses showing the method peformance in the best sample, the mean sample and the worst sample. The authors could discuss in this way the limitations of the method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper is a solid contribution and it could be a good paper for MICCAI. However, the writing and organization of the paper could be improved to simplify certain aspects and enhance the readability of the paper. More details need to be given and more clarity about the contributions of the paper could make it easier to grasp.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I maintain my original review for the paper if the authors address my comments as discussed in their rebuttal




Author Feedback

We thank Reviewers#3,#4,#5 for their valuable feedback. All comments are carefully considered and will be reflected in the final version. In the future, we will create a homepage that includes our code, more details and additional quantitative analyses(box-plots, variance). R3&R5: [Q]:DFE module, feature fusion, zero values [A]:The DFE might reduce the feature information of the Z-plane. However, since we concat the enhanced features with the original fused features. There is no sacrifice of any features, so there will be no blank areas during decoding. In the DFE module, we adopt element-wise addition for the fuse process. We also tried using cross-attention to fuse features, which not only did not improve performance but also increased training time. We will update Fig.2 and the description in the revised version. R3&R4: [Q]:Results details, robustness [A]:The metrics shown in the tables are the averages of multiple results. Due to space limitations, we can not include the variance of these metrics in the table. Compared to other related studies, our model has a smaller variance, which also reflects that our model is more robust. R3&R4: [Q]:Performance improvement, metrics [A]:Our work focuses on reducing the depth reconstruction errors caused by the occlusion of instruments. However, the generally used evaluation metrics compare the rendering results from the main view, which can not reflect the differences in depth. This significant visual contrast in Fig.1 is usually more meaningful than slight improvements in quantitative metrics. R3&R5: [Q]:clinical relevance, quality needed for medical application [A]:The ENDONERF dataset is collected from real surgical scenes. The previous studies have demonstrated significant practical value in clinical medicine. We have further reduced the depth errors. This undoubtedly serves as a meaningful boost to practical medical applications. So, we believe the rendering quality has reached the “quality needed”. Furthermore, we will collaborate with doctors to optimize our approach for practical medical applications in the future. R4: [Q]:Limitation, a graphical abstract for method [A]:Our method is only applicable to networks that model dynamic and static fields separately. However, separating dynamic and static fields can reduce training time. A clear graphical abstract of Fig.2 and bar about different tensors will be provided in the revised version. R4&R5: [Q]:Dataset, more information, rendering time [A]:The dataset is a sequence of RGB images containing surgical instruments. DnFPlane is used to render a 4D tissue structure without surgical instruments. So, the RGB image sequence does not need to be partitioned. Regarding the loss function weights, we made appropriate adjustments based on the baseline. The DFE and CIR modules only resulted in a slight increase in parameters, and the time cost is almost negligible. The entire model completes the training and rendering process within 3 minutes. Individual rendering meets real-time requirements. R5: [Q]:Comparison study [A]:To ensure fairness, we strictly followed the requirements and hyperparameter settings of the comparative works, replicating their results under the same conditions. We also reference the metrics provided in their papers (showing the better metrics in our tables). R5: [Q]:unclear expression [A]:The “strange symbol for Delta” indicates element-wise subtraction. The “strange sentence in Section 2.2” aims to convey that we represent the surgical scene as a 4D volume consisting of spatial dimensions XYZ and time dimension T. This structure greatly reduces the computational cost of NeRF-based methods. The “depth distortions” on Page 2 actually refer to depth errors. These will be uniformly revised in the revised version. R5: [Q]:3DGS [A]:The 3DGS requires point cloud data to initialize the 3D Gaussians, resulting in a large number of parameters. The advantage of NeRF is its implicit representation, which has fewer parameters.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents an efficient and high-quality deformable tissue reconstruction method. By rethinking the value of dynamic and static fields, the authors use dynamic fields to guide static fields at the time level to address depth distortions at regions of instrument occlusion without compromising training time. Compared to previous methods, our DnFPlane outperforms existing state-of-the-art methods on common endoscopic datasets. Reviewers required a more detailed explanation of the architecture, as well as comments regarding the omission of some sort of statistical analysis when presenting results. After careful consideration of the authors’ rebuttal, most reviewers lean towards weak accept of the paper, with one weak reject. I agree that the authors have adequately addressed the major concerns and questions raised by the reviewers regarding the novelty of the proposed methods and some of the technical aspects. This said, I lean towards a weak accept of the paper because of the improved results when compared to state-of-art.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper presents an efficient and high-quality deformable tissue reconstruction method. By rethinking the value of dynamic and static fields, the authors use dynamic fields to guide static fields at the time level to address depth distortions at regions of instrument occlusion without compromising training time. Compared to previous methods, our DnFPlane outperforms existing state-of-the-art methods on common endoscopic datasets. Reviewers required a more detailed explanation of the architecture, as well as comments regarding the omission of some sort of statistical analysis when presenting results. After careful consideration of the authors’ rebuttal, most reviewers lean towards weak accept of the paper, with one weak reject. I agree that the authors have adequately addressed the major concerns and questions raised by the reviewers regarding the novelty of the proposed methods and some of the technical aspects. This said, I lean towards a weak accept of the paper because of the improved results when compared to state-of-art.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers agree on the novelty of the paper. Although most major concerns appear to have been addressed in the rebuttal, the initial scores remain unchanged. Two reviewers highlighted issues regarding the clarity of the paper. While this aspect can be corrected in a revised version without major modifications or additions, it may be challenging to address fully, as it requires additional review to ensure clarity is achieved. I strongly recommend the authors to perform thorough proofreading of final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers agree on the novelty of the paper. Although most major concerns appear to have been addressed in the rebuttal, the initial scores remain unchanged. Two reviewers highlighted issues regarding the clarity of the paper. While this aspect can be corrected in a revised version without major modifications or additions, it may be challenging to address fully, as it requires additional review to ensure clarity is achieved. I strongly recommend the authors to perform thorough proofreading of final version.



back to top