3 minute read

Reference

GaussReg: Fast 3D Registration with Gaussian Splatting

From coarse to fine: Robust hierarchical localization at large scale

DReg-NeRF: Deep Registration for Neural Radiance Fields

Registration ์ˆœ์„œ: feature extraction, feature matching, transformation estimation

Registration์˜ ๊ธฐ๋ณธ ์ˆœ์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค:

  • Feature Extraction (ํŠน์ง• ์ถ”์ถœ):
    • ๋‘ ๋ฐ์ดํ„ฐ(์˜ˆ: ๋‘ ์ด๋ฏธ์ง€, ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋“ฑ)์—์„œ ์˜๋ฏธ ์žˆ๋Š” ํŠน์ง•์ (keypoints)๊ณผ ๋””์Šคํฌ๋ฆฝํ„ฐ(descriptors)๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ: SIFT, ORB, SuperPoint ๋“ฑ
  • Feature Matching (ํŠน์ง• ๋งค์นญ):
    • ์ถ”์ถœ๋œ ํŠน์ง•์ ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‘ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ๋Œ€์‘์ ์„ ์ฐพ์Šต๋‹ˆ๋‹ค.
    • ๋ฐฉ๋ฒ•: ์ตœ๊ทผ์ ‘ ์ด์›ƒ ๊ฒ€์ƒ‰(NN Search), KNN, RANSAC ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๋งค์นญ์„ ํ•„ํ„ฐ๋ง
  • Transformation Estimation (๋ณ€ํ™˜ ์ถ”์ •):
    • ๋งค์นญ๋œ ๋Œ€์‘์ ์„ ์ด์šฉํ•ด ๋‘ Point cloud ๊ฐ„์˜ ๋ณ€ํ™˜ ๊ด€๊ณ„(์˜ˆ: ํšŒ์ „, ์ด๋™, ์Šค์ผ€์ผ ๋“ฑ)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ: homography, rigid transformation, Affine transformation, ICP(Iterative Closest Point) ๋“ฑ

Point cloud registration

DReg-NeRF: Deep Registration for Neural Radiance Fields

Point cloud registration is a classic problem in 3D computer vision, which aims at computing the relative transformation from the source point cloud to the target point cloud.

Point cloud registration์€ source point cloud์™€ target point cloud์˜ 3D ์ขŒํ‘œ๊ณ„์—์„œ correspondences๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ๋Œ€์ ์ธ rigid transformation์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

์ด ๊ณผ์ •์—์„œ๋Š” correspondences(๋‘ ์ ๊ตฐ ์‚ฌ์ด์˜ ๋Œ€์‘์ )์— ๋Œ€ํ•ด confidence scores๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ์ˆ˜๋Š” ๊ฐ ๋Œ€์‘์ ์˜ ์‹ ๋ขฐ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ,

  • ๋†’์€ confidence scores๋ฅผ ๊ฐ€์ง„ correspondences๋Š” ๋” ์ค‘์š”ํ•œ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผ๋˜์–ด ๊ฐ€์ค‘์น˜๊ฐ€ ๋” ํฌ๊ฒŒ ๋ถ€์—ฌ๋ฉ๋‹ˆ๋‹ค.
  • ๋ฐ˜๋ฉด, ๋‚ฎ์€ confidence scores๋ฅผ ๊ฐ€์ง„ correspondences๋Š” ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ๋‹ค๊ณ  ํŒ๋‹จ๋˜์–ด ๊ณ„์‚ฐ์—์„œ ์ œ์™ธ(mask out)๋ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ confidence scores๋ž€, ์˜ˆ์ธก๋œ ์  $\tilde{X}_{source}$์™€ $\tilde{X}_{target}$์ด ์‹ค์ œ๋กœ ์„œ๋กœ ๋Œ€์‘๋˜๋Š” ์ ์ผ ๊ฐ€๋Šฅ์„ฑ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋™์‹œ์—, ์ด ์ ๋“ค์ด ๊ฐ๊ฐ source NeRF์™€ target NeRF์—์„œ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ณด์ด๋Š”์ง€(๊ฐ€์‹œ์„ฑ, visibility)๋ฅผ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ„๋‹จํžˆ ๋งํ•˜๋ฉด, ์ ์ˆ˜๊ฐ€ ๋†’์„์ˆ˜๋ก ๋‘ ์ ์ด ์ง„์งœ ๋Œ€์‘์ ์ผ ํ™•๋ฅ ์ด ๋†’๊ณ , ํ•ด๋‹น ์ ์ด NeRF์˜ ๋ Œ๋”๋ง๋œ 3D ๊ณต๊ฐ„์—์„œ ์นด๋ฉ”๋ผ ์‹œ์ (viewpoint)์—์„œ ๊ฐ€๋ ค์ง€์ง€ ์•Š๊ณ  ๋ช…ํ™•ํžˆ ๊ด€์ฐฐ๋  ์ˆ˜ ์žˆ๋Š”, ๊ฐ€์‹œ์„ฑ์ด ๋†’์€ ์œ„์น˜์— ์žˆ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

After encoding features by transformer, we further adopt a single-head attention layer to predict the corresponding point locations $\tilde{X}_{source}$ and confidence scores $\tilde{S}_{source}$ of the source voxel points $\hat{X}_{source}$ in the target NeRFโ€™s coordinate frame.

Similarly, we also predict the corresponding point locations $\tilde{X}_{target}$ and confidence scores $S_{target}$ of the target voxel points $\hat{X}_{target}$ in the source NeRFโ€™s coordinate frame.

Finally, we utilize the predicted correspondences to compute the relative rigid transformation.

The confidence scores are used as weights that mask out the irrelevant correspondences and can be interpreted as how likely the predicted points from $\tilde{X}_{source}$ and $\tilde{X}_{target}$ are correspondences and are visible in the source NeRF and the target NeRF.

3D point cloud registration (correspondence searching & transformation estimation)

3D point cloud registration๋Š” feature extraction๊ณผ feature matching์„ ํ•ฉ์ณ ๋ถ€๋ฅด๋Š” correspondence searching๊ณผ transformation estimation์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค๊ณ  ๋งํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

3D point cloud registration has been developed for decades. Given two overlapping point clouds with different coordinate systems, the target of this task is to find the transformation between them.

Traditional methods [4, 17, 18, 21, 23, 40, 42] divide this process into two parts: correspondence searching and transformation estimation.

Correspondence searching involves finding sparse matched feature points between the source and target point clouds.

Transformation estimation is to calculate the transformation matrix using these correspondences.

These two stages will be conducted iteratively to find the optimal transformation.

Point cloud registration์„ ํ•˜๋Š” ์ด์œ ๋Š” large-scale 3D scene reconstruction์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ์ž…๋‹ˆ๋‹ค.

Point cloud registration is a fundamental problem for larges-cale 3D scene scanning and reconstruction.

In traditional 3D scene scanning and reconstruction, a large-scale scene is usually divided into different blocks, resulting in many independent sub-scenes that may not in the same coordinate system.

Therefore, the registration between them plays a crucial role.

The mainstream methods typically involve extracting features from point clouds and locating matching points to calculate the transformation between the two input scenes.

NeRF๋กœ large-scale scene reconstruction์„ ํ•˜๋Š” ์ด์œ 

  1. large-scale reconstruction์„ ์œ„ํ•ด์„œ๋Š” data collection process๊ฐ€ ๊ธธ์–ด์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์‹œ๊ฐ„์ด ๊ฑธ๋ ค ์ถ”๊ฐ€์ ์œผ๋กœ ์–ป์–ด์ง€๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ƒ๊ธฐ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜์–ด ์–ป์–ด์ง€๋Š” scene์„ registrationํ•ด์„œ ๋”ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ scene์„ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
  2. NeRF์„ ๋งŽ์€ ์ด๋ฏธ์ง€๋กœ ํ•™์Šต ์‹œํ‚ค๋Š” ๊ฒƒ์€ ์‹œ๊ฐ„์ด ์˜ค๋ž˜๊ฑธ๋ฆผ. ๋”ฐ๋ผ์„œ large-scale scene์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ small scene๋“ค๋กœ ๋‚˜๋ˆ ์„œ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ•™์Šตํ•œ ๋‹ค์Œ, registration์œผ๋กœ small scene๋“ค์„ ํ•ฉ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

When considering large-scale scene reconstruction based on NeRF, there are two main challenges:

  1. Due to the complex occlusions present in real-world scenes, lots of images or videos are often required to capture for large-scale reconstruction, leading to a time-consuming data collection process.
  2. Optimizing NeRF with numerous images is computationally intensive. Therefore, a direct approach is to divide a large-scale scene into some smaller scenes, reconstruct them separately, and then use registration to combine all these small scenes together.

HLoc vs GaussReg

Our GaussReg is 44ร— faster than HLoc (SuperPoint as the feature extractor and SuperGlue as the matcher) with comparable accuracy.

Leave a comment