1 minute read

Reference

Geometric Transformer for Fast and Robust Point Cloud Registration

SuperPoint: Self-Supervised Interest Point Detection and Description

ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•˜๊ณ  ์Šˆํผํฌ์ธํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

๋‹ค์šด์ƒ˜ํ”Œ๋ง (Downsampling)

๋‹ค์šด์ƒ˜ํ”Œ๋ง์€ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์˜ ๊ณต๊ฐ„ ํ•ด์ƒ๋„๋ฅผ ์ค„์ด๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

  • ์›๋ณธ 3D ํ‘œํ˜„์„ ์œ ์ง€ํ•˜๋ฉด์„œ ํฌ์ธํŠธ ์ˆ˜๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๊ด€๋ฆฌํ•˜๊ธฐ ์‰ฌ์šด ํฌ๊ธฐ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ €์žฅ ๋ฐ ์ฒ˜๋ฆฌ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ค„์ž…๋‹ˆ๋‹ค.

์Šˆํผํฌ์ธํŠธ (Superpoints)

  • ์Šˆํผํฌ์ธํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค:
  • ์œ ์‚ฌํ•œ ๊ธฐํ•˜ํ•™์  ํŠน์„ฑ์„ ๊ฐ€์ง„ ํฌ์ธํŠธ๋“ค์„ ๊ทธ๋ฃนํ™”ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์˜ ์ค‘๋ณต์„ฑ์„ ํฌ์ฐฉํ•˜๊ณ  ์ฒ˜๋ฆฌ ๋น„์šฉ์„ ํฌ๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค. ๋กœ์ปฌ ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ๊ฐ€ ์œ ์‚ฌํ•œ ํฌ์ธํŠธ๋“ค์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•ฉ๋‹ˆ๋‹ค.

ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ์Šˆํผํฌ์ธํŠธ๋กœ ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ฒƒ์€ ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ฉด์„œ๋„ ์ค‘์š”ํ•œ ๊ธฐํ•˜ํ•™์  ํŠน์„ฑ์„ ๋ณด์กดํ•˜๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋“ฑ๋ก ๋ฐ ๋งค์นญ๊ณผ ๊ฐ™์€ ์ž‘์—…์—์„œ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ๋†’์ด๊ณ  ์ฒ˜๋ฆฌ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

SuperPoint๋Š” ํ•œ ์ด๋ฏธ์ง€์—์„œ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋กœ Homography H๋ฅผ ํ†ตํ•ด ๋ณ€ํ™˜ํ•œ image pair์—์„œ correct matches๋ฅผ ๊ฐ€์žฅ ๋งŽ์ด ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.

SuperPoint tends to produce a larger number of correct matches which densely cover the image, and is especially effective against illumination changes.

image

SIFT performs well for sub-pixel precision homographies $\epsilon = 1$ and has the lowest mean localization error (MLE).

This is likely due to the fact that SIFT performs extra sub-pixel localization, while other methods do not perform this step.

Repeatability๊ฐ€ ๋†’๋‹ค๊ณ ํ•ด์„œ ์ตœ์ข… Homography ์ถ”์ •์ด ์ž˜๋˜๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค.

Repeatability๋Š” ์ด๋ฏธ์ง€ ๊ฐ„์˜ ์กฐ๊ฑด(์กฐ๋ช…, ๊ฐ๋„ ๋“ฑ)์ด ๋ฐ”๋€Œ๋”๋ผ๋„ ๋™์ผํ•œ ์œ„์น˜์—์„œ ํŠน์ง•์ ์„ ๊ฒ€์ถœํ•˜๋Š” ๋Šฅ๋ ฅ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

ORB๋Š” ์ด repeatability๊ฐ€ ๋†’์•„์„œ ๋™์ผ ์œ„์น˜์—์„œ ํŠน์ง•์ ์„ ์ž˜ ์ฐพ์ง€๋งŒ, ํŠน์ง•์ ์˜ ๋ฐฐ์น˜๊ฐ€ ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌ๋˜์ง€ ์•Š๊ณ  ํŠน์ • ์˜์—ญ์— ๋ชฐ๋ ค ์žˆ์–ด์„œ ์ตœ์ข… ํ˜ธ๋ชจ๊ทธ๋ž˜ํ”ผ ์ถ”์ • ์ž‘์—…์—์„œ ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

ORB achieves the highest repeatability (Rep.); however, its detections tend to form sparse clusters throughout the image as shown in Figure 8, thus scoring poorly on the final homography estimation task.

This suggests that optimizing solely for repeatability does not result in better matching or estimation further up the pipeline.

SuperPoint๋Š” descriptor-focused metrics์ธ nearest neighbor mAP (NN mAP)์™€ macthing score (M. score)์—์„œ ๋†’์€ ์ ์ˆ˜๋ฅผ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค.

SuperPoint scores strongly in descriptor-focused metrics such as nearest neighbor mAP (NN mAP) and matching score (M. Score), which confirms findings from both Choy et al. [3] and Yi et al. [32] which show that learned representations for descriptor matching outperform hand-tuned representations.

Leave a comment