Abstract

Existing learning-based cortical surface reconstruction approaches heavily rely on the supervision of pseudo ground truth (pGT) cortical surfaces for training. Such pGT surfaces are generated by traditional neuroimage processing pipelines, which are time consuming and difficult to generalize well to low-resolution brain MRI, e.g., from fetuses and neonates. In this work, we present CoSeg, a learning-based cortical surface reconstruction framework weakly supervised by brain segmentations without the need for pGT surfaces. CoSeg introduces temporal attention networks to learn time-varying velocity fields from brain MRI for diffeomorphic surface deformations, which fit an initial surface to target cortical surfaces within only 0.11 seconds for each brain hemisphere. A weakly supervised loss is designed to reconstruct pial surfaces by inflating the white surface along the normal direction towards the boundary of the cortical gray matter segmentation. This alleviates partial volume effects and encourages the pial surface to deform into deep and challenging cortical sulci. We evaluate CoSeg on 1,113 adult brain MRI at 1mm and 2mm resolution. CoSeg achieves superior geometric and morphological accuracy compared to existing learning-based approaches. We also verify that CoSeg can extract high-quality cortical surfaces from fetal brain MRI on which traditional pipelines fail to produce acceptable results.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1517_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1517_supp.pdf

Link to the Code Repository

https://github.com/m-qiang/CoSeg

Link to the Dataset(s)

https://www.humanconnectome.org/study/hcp-young-adult/data-releases https://biomedia.github.io/dHCP-release-notes/

BibTex

@InProceedings{Ma_Weakly_MICCAI2024,
        author = { Ma, Qiang and Li, Liu and Robinson, Emma C. and Kainz, Bernhard and Rueckert, Daniel},
        title = { { Weakly Supervised Learning of Cortical Surface Reconstruction from Segmentations } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper uses the reconstructed cortical surfaces of pseudo ground truth (pGT) segmentations to supervise the segmentation of brain images and reconstruct the white and pial surfaces. The authors call this method a novel weakly supervised learning of cortical surface reconstruction, which, however, seems similar to previous explicit pGT cortical surfaces-guied methods in my opinion.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written, very clear, and highly reproducible.
    2. Evaluation is comprehensive, with ablation studies of parameters, comparisons to previous implicit and explicit methods.
    3. Novel boundary and inflation losses for pial surface reconstruction that can tackle partial volume effects
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited novelty. The whole framework seems very similar with previous explicit methods, e.g. CoTAN [18], CortexODE [19], Vox2Cortex [3]. The only difference is the loss used for training the network. Previous methods used Bi-Chamfer loss, this paper develops several novel losses, e.g., the boundary and inflation losses for pial surface reconstruction
    2. Overclaim and unclear description of weakly supervised learning. The loss formulation seems the same as previous methods, that calculates the distance between the deformed surfaces and pGT surfaces generated by traditional pipelines. I didn’t see why such losses are weak supervision.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper is detailedly described for implementation. The reproducibility should be very high.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper presents some novel loss functions that can improve the quality of cortical surface reconstruction, especially for pial surfaces, which is useful and awesome. However, besides this contribution, other parts seem very similar to previous methods, e.g., the whole framework, the diffeomorphic deformation, the pGT generation process, etc. I didn’t the get the meaning of weakly supervised learning of cortical surfaces that the authors kept emphasizing in the paper. The authors still need to use a traditional pipeline, e.g., HCP pipeline, to generate pGT segmentations, then use marching cube method to reconstruct the pGT cortical surfaces, and finally compute the losses between the predicted surface and pGT surfaces. What’s the fundamental difference between this process and other explicit methods in Table 1? Moreover, the authors did not employ topology-correct method for the pGT cortical surfaces. Does that mean the upper bound of the achievable accuracy of the proposed method is even lower than previous methods that employ the topology-correct method for generating the pGT cortical surfaces?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited novetly.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    On the one hand, authors confirmed that they used surfaces extracted from pGT seg by MC as the GT to reconstruct white surfaces and as the weak supervision to reconstruct pial surfaces. On the other hand, the authors still emphasized they present “a framework weakly supervised by brain segmentations without the need for pGT surfaces”. Are not the two statements contradictory? Although in some places, the statement was complemented as “without the need of pGT surfaces generated by traditional pipelines”, does that mean the authors think “MC is not a traditional pipeline”? Or, pGT surfaces generated by FreeSurfer or dHCP pipelines are pGT surfaces, pGT surfaces generated by MC are not pGT surfaces? It’s so confusing and misleads readers into thinking “pGT are not required” as posted by R1.

    “previous explicit approaches [3,18,19,25] in Table 1 only use the bi-Chamfer loss” is not true. It is no doubt that previous methods also adopted some regularization terms, such as the normal consistency loss L_nc in [3].

    The authors also didn’t explain the meaning of “weakly supervised learning” in the title. If “pGT seg providing weak supervision” makes the method “weakly supervised learning”, all previous methods that used pGT surfaces are “weakly supervised learning”?

    I also agree with R4 that the paper lacks the description and comparison with traditional deformation-based pial surface reconstruction methods such as FreeSurfer.

    To summarize, the rebuttal provided by the authors is not effective and does not address my concern at all. I acknowledge the novel loss functions proposed by the authors to address the partial volume effect, which is the contribution and should be the focus of this paper. But they used other fancy words like weak supervision, without the need, to deviate the attention from original contribution and mislead the readers. My intention was to ask the authors to revise the narrative and overclaims, but they didn’t respond to any of these in rebuttal.



Review #2

  • Please describe the contribution of the paper

    The paper presents a approach for weakly supervised learning aimed at reconstructing cortical surfaces from brain MRI segmentations. This method diverges from conventional techniques that depend extensively on pseudo ground truth (pGT) surfaces produced by labor-intensive neuroimaging processing pipelines. Instead, Seg2CS directly uses cortical ribbon segmentations as a form of weak supervision. This strategy avoids the need for pGT surfaces during training, which proves especially beneficial for processing low-resolution images, such as those obtained from fetal and neonatal MRIs. Additionally, the framework employs temporal attention networks to capture time-varying velocity fields, enabling quick and precise cortical surface deformations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Seg2CS is specially designed to learn cortical surface deformations without depending on pre-generated template (pGT) surfaces. This method, which employs a weakly supervised approach, utilizes only cortical ribbon segmentations that are easier to obtain, particularly for low-resolution images. Implementing temporal attention networks to directly learn diffeomorphic deformations from MRI data improves the model’s adaptability across varying resolutions and ages of subjects. With this approach high-quality cortical surfaces can be generated from fetal brain MRI scans where traditional techniques have been unsuccessful. When tested on the HCP young adult dataset and the dHCP fetal dataset, this framework demonstrated geometric and morphological precision, setting a robust benchmark for future research in this area.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method guides vertex displacement along the normal of the input white surface, which could potentially compromise the anatomical fidelity of the reconstructed pial surfaces. This limitation is acknowledged by the authors and could be a point of concern for clinical applications where precise anatomical detail is crucial.

    The performance of Seg2CS heavily relies on the quality of the input cortical ribbon segmentations. In cases where these segmentations are of poor quality, particularly in lower resolution scans, the reconstructed surfaces might not accurately reflect true anatomical structures.

    Lack of Comparative Analysis with Non-learning Based . The paper extensively compares Seg2CS against other learning-based methods but does not provide a direct comparison with traditional non-learning based cortical surface reconstruction approaches in terms of efficiency and accuracy. This could have provided a more comprehensive understanding of the improvements offered by Seg2CS.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    A thorough examination of the errors, especially in instances where Seg2CS underperforms or fails, could offer valuable insights into areas for enhancement. This could involve investigating shortcomings in segmentation accuracy or in scenarios involving unusually complex anatomy.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is formulated with considerable novelty, and its contributions are clearly outlined. It has been benchmarked, providing the reader with the opportunity to compare this method against others. The experiment utilized a public dataset that includes both adult and fetal images.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Based on the rebuttal and comments from other reviewers, there is a significant contribution.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a CNN-based cortical surface reconstruction method that operates on meshes but does not require surfaces extracted by tools like FreeSurfer for training, using gray matter and white matter (ventricles in subcortical gray matter included) segmentations instead. The method starts with a topologically correct template that is deformed towards the white matter surface using an LDDMM-like transformation model and Chamfer loss to the boundary of the white matter segmentation. Then the white matter surface is deformed towards the pial boundary using a loss that includes a geometrical term that encourages deformation along the normal direction and results in better modeling of deep sulci. The method is compared with several alternative approaches with excellent performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very clearly written and design choices are clearly explained. While the method is relatively straightforward, it is very logical and the main innovation of using a cosine-like loss for the white->pial deformation to maintain deformation along the normal vector is well justified. The results are presented well and show improvement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I don’t find any significant weaknesses with the approach, rather some questions with respect to the design choices in the algorithm.

    For the ODE method, it was not really justified why the attention map is better than say repeatedly integrating the SVF (like in log-demons or VoxelMorph) or using t as an input to the u-net to modulate its output with respect to time. Also, in principle LDDMM (which is what the ODE is) requires the time-varying velocity field to be smooth and bounded for the final transformation to be diffeomorphic, but I don’t see anything in the loss functions or in the network to explicitly enforce this, perhaps this happens implicitly due to the edge and normal consistency losses?

    Also the distinction between R and M in the TA-NET approach is not clearly made. Are the R resolution levels obtained from different layers of the u-Net? Or is there no practical distinction between R and M and then we can say that the uNet produces six layers that are weighted by the attention net?

    Relatedly, even though integrating the ODE in theory results in diffeomorphisms, when you do this with a fixed grid and a mesh, self-intersections can still occur due to discretization in both space and time (this is why VoxelMorph deformation fields have negative Jacobians in 1-2% of the voxels). And minimizing them without dampening the overall registration requires careful calibration of the regularization parameters. Here there does not seem to be any such regularization. Could the authors report on self-intersections - do they really never happen?

    The inflation loss is a powerful feature of the method but has a bit of a shortcoming in that its gradients explode for small displacements. Could this be addressed more elegantly in the loss, e.g. by making the loss “kick in” only if the vertex has moved some minimal distance away from its original location (e.g., using RELU) or by using a dummy variable to represent displacement along the normal from the original surface and then using MSD between the displaced mesh vertex v_hat and the v_0 displaced along the normal by the dummy amount?

    While it is clear how the bidirectional Chamfer is problematic for the pial surface, it can also pose challenges for white matter. There are some relatively thin white matter sheets and I can see the chamfer loss preventing the initial template squeezing into these narrow regions. For example, the WM between the inferior hippocampus and perirhinal cortex can be quite narrow, especially with atrophy. Would be nice to see how Sec2CS and other methods perform in this area.

    In general, the hippocampal region is a struggle for cortical reconstruction methods. It would be good if the figures showed this region, not only lateral surfaces of the cortex.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Very reproducible because of very clear explanation. I can see myself coding this up!

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the comments under weaknesses - they are really suggestions for how the paper and the method could be further improved or how some lingering questions could be addressed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This method while fairly straightforward, makes very sensible decisions and accomplishes very good cortical reconstructions. The comparison is thorough though I would Not using FreeSurfer meshes as the ground truth during learning is a huge plus, because we are no longer imposing a ceiling on the performance of the method and because generalization to new domains (infant brains, postmortem brains) is more achievable. This can have great practical implications in neuroimaging.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I stand by my original review after the rebuttal. I think the paper is sufficiently novel to be accepted. I think that the rebuttal addresses the concerns about novelty raised by R5 as it clarifies the point that pseudo ground truth is not required for training this method.




Author Feedback

We thank all Reviewers for their constructive comments!

To R#5 (Novelty): The main novelty of this work is that Seg2CS learns cortical surfaces weakly supervised by pseudo ground truth (pGT) segmentations (segs), without the need of pGT cortical surfaces generated by traditional pipelines. As described in Introduction, the motivation is that traditional pipelines are time-consuming to extract pGT surfaces for a large dataset. Besides, these pipelines may fail to produce acceptable pGT surfaces, e.g., for low-resolution fetal MRI (Fig. 5). In contrast, the pGT segs are relatively easier to acquire, e.g., using fast learning-based methods [2,14,24,29,32].

Seg2CS only uses cortical ribbon segs as pGT for training (Fig. 1). The boundary of the seg is extracted by MC. However, such a pGT seg boundary only provides noisy and weak supervision especially for the pial surface, which is severely affected by the partial volume effects of cGM segs and fails to capture the deep cortical sulci (Fig.1 leftmost).

Given pGT segs, previous explicit approaches [3,18,19,25] in Table 1 only use the bi-Chamfer loss, which overfits to the noisy pGT seg boundary, and thus fail to predict acceptable pial surfaces (Table 3, Fig. 4). In contrast, Seg2CS introduces a novel weakly supervised loss (Fig. 2, Eq. 3-4), which effectively addresses the partial volume effects of pGT segs and predicts high quality cortical surfaces (Fig. 3-5). We will summarize the novelty in the paper more clearly.

Our Seg2CS framework is model agnostic and we extend CoTAN [18] to TA-Net, enabling both adult and fetal cortical surface reconstruction. Seg2CS can preserve the surface topology without any topological requirement on the pGT. Although weakly supervised by pGT segs, Seg2CS still performs well, e.g., on dHCP fetal data, while the traditional dHCP pipeline fails (Fig. 5).

To R#4 (Accuracy): To enhance anatomical accuracy, as shown in Fig. 1, Seg2CS only uses brain MRI as the input volume. The cortical ribbon segs are only used as pGT for training. This can effectively alleviate the influence of imprecise segmentation. The pGT segs could also be refined manually to increase the surface quality if required, e.g., in clinical applications. Seg2CS inflates the white surface strictly along the normal, which could affect anatomical fidelity. We will address this in future work by leveraging unsupervised MRI intensity to further refine the surfaces.

We have compared Seg2CS with both traditional HCP and dHCP pipelines. As shown in Fig. 3, Seg2CS (0.11s) achieves similar surface quality to the HCP pipeline (>6h) for both resolutions, while being orders of magnitude faster. In Fig. 5, the traditional dHCP pipeline fails to generalize on fetal data. We will also report the runtime of the dHCP pipeline, which requires ~6.5h.

Due to the low image resolution and contrast of the dHCP fetal MRI, Seg2CS might produce very few errors for older fetal subjects with increasing complexity of brain anatomy. We will report and visualize these errors to provide insights for further refinement.

To R#1 (Method): Integrating multiple SVFs or using t as the input to the U-Net could lead to lengthy and unstable training, since the gradients will be back propagated to all SVFs recurrently. Our attention-based TVF can avoid this issue. The TVF is smoothed implicitly since we added smoothness losses to the surface mesh explicitly.

For TA-Net, R is the resolution level for coarse-to-fine deformations. M is the number of SVFs at each level to enhance the representation ability of the TVF.

Due to numerical errors, as reported in Sec. 3, Seg2CS indeed produces a negligible number of self-intersecting faces, i.e., 0.06/3.65 out of 330k faces for white/pial surface on the HCP dataset.

Many thanks for the valuable suggestions to address the issue of exploding gradients.

Seg2CS also has good quality in the hippocampal region. We will visualize the medical view of both white and pial surfaces.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The problem addressed in the paper is important, and the rebuttal address some concerns raised by the reviewers. However, there are several major drawbacks. First, lacks some important baseline methods. R4 pointed out that Seg2CS does not directly compare with traditional non-learning-based cortical surface reconstruction approaches in terms of efficiency and accuracy; R5 pointed out that the Marching Cubes algorithm should also be considered a conventional method. Second, all the models are trained with the pseudo ground truth (pGT) from segmentations and evaluated with pGT cortical surfaces from traditional pipelines. This does not guarantee the weakly supervised method can perform as well as conventional pipelines or other supervised DL methods (trained with pGT cortical surfaces as they are supposed to). Additionally, quantitative results are not reported for the dHCP fetal dataset. Overall, the paper must include more baselines for fair comparisons, carefully revise its claims, and polish its presentation. It is unlikely that these revisions could be finished within a short period.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The problem addressed in the paper is important, and the rebuttal address some concerns raised by the reviewers. However, there are several major drawbacks. First, lacks some important baseline methods. R4 pointed out that Seg2CS does not directly compare with traditional non-learning-based cortical surface reconstruction approaches in terms of efficiency and accuracy; R5 pointed out that the Marching Cubes algorithm should also be considered a conventional method. Second, all the models are trained with the pseudo ground truth (pGT) from segmentations and evaluated with pGT cortical surfaces from traditional pipelines. This does not guarantee the weakly supervised method can perform as well as conventional pipelines or other supervised DL methods (trained with pGT cortical surfaces as they are supposed to). Additionally, quantitative results are not reported for the dHCP fetal dataset. Overall, the paper must include more baselines for fair comparisons, carefully revise its claims, and polish its presentation. It is unlikely that these revisions could be finished within a short period.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top