2 minute read

$\bf{d} \in \mathbb{R}^{H \times W}$

  • depth๋Š” $H \times W$๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ์ฝ”๋“œ์—์„œ๋„ (H,W) 2๊ฐœ์˜ ์ฑ„๋„์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  • depth value๋Š” ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ 3D point๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์ž…๋‹ˆ๋‹ค. (์ผ๋ฐ˜์ ์œผ๋กœ -1 ~ 1์˜ ๊ฐ’์€ ์•„๋‹˜)

๋Œ€๋ถ€๋ถ„์˜ depth sensor๋„ near plane, far plane๊ฐœ๋…์ด ์žˆ์Šต๋‹ˆ๋‹ค.

image

  • LiDAR sensor๋„ minimum distance, max distance๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • near plane์ด 0.1 cm์ด๋ฉด 0.1cm์ด๋‚ด์— ์žˆ๋Š” ์ ๋“ค์— ๋Œ€ํ•ด์„œ๋Š” real value๋ฅผ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

depth sensor๋กœ ์–ป๋Š” ground truth๋Š” Light Sensor๋กœ ๋น”์„ ์˜๊ณ  ๋ฐ˜์‚ฌ๋˜์–ด ๋‚˜์˜ค๋Š” ๊ฒƒ์„ IR sensor (Infrared sensor, ์ ์™ธ์„  ์„ผ์„œ)๋กœ captureํ•˜์—ฌ depth๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

  • depth sensor (realsense depth camera, LiDAR, etc.)๋Š” ํ•ญ์ƒ scene์— ์กด์žฌํ•˜๋Š” material properties์— ์˜ํ•˜์—ฌ ๋ฐ˜์‚ฌ๋˜๋Š” ๋น›์˜ ์˜ํ–ฅ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.
  • ์ด๋•Œ ๋ฐ˜์‚ฌ๋˜๋Š” ๋น›์ด IR sensor์— ๋„๋‹ฌํ•  ๋•Œ, ๊ทธ ๊ฐ’์ด ๋„ˆ๋ฌด ํฌ๊ฒŒ ๋ฐ˜์‚ฌ๋˜๋ฉด IR sensor๊ฐ€ captureํ•  ์ˆ˜ ์žˆ๋Š” value range๋ฅผ ๋„˜์–ด๊ฐ€์„œ ํ•˜์–—๊ฒŒ ๋‚˜์˜ต๋‹ˆ๋‹ค. image
  • ์˜ˆ์‹œ๋กœ, ์•„๋ž˜ Ground Truth์—์„œ relect๊ฐ€ ์‹ฌํ•œ ์ฆ‰, shinyํ•œ material์— ๋Œ€ํ•ด์„  ํ•˜์–—๊ฒŒ ํ‘œ์‹œ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. image
  • realsense depth sensor๊ฐ™์€๊ฑธ๋กœ ์–ป์€ Ground Truth normal๋„ ๊ต‰์žฅํžˆ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. image

true depth needs scale

  • fake depth๋กœ ํ‘œํ˜„ํ•˜๊ณ , depth๋ฅผ rgb color๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • ๋ฌธ์ œ๋Š” ์นด๋ฉ”๋ผ๋ถ€ํ„ฐ ๊ฐ•์•„์ง€๊นŒ์ง€ ๊ฑฐ๋ฆฌ์™€ ์นด๋ฉ”๋ผ๋ถ€ํ„ฐ ๊ฑด๋ฌผ๊นŒ์ง€ ๊ฑฐ๋ฆฌ๋Š” ๋‹จ์œ„๋ถ€ํ„ฐ๊ฐ€ ๋‹ค๋ฅธ๋ฐ, fake depth์—์„œ๋Š” -1 ~ 1๋กœ normalizeํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋ฏ€๋กœ, true depth๋ฅผ ์•Œ ์ˆ˜๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. image

โ€˜scale ambiguity problemโ€™(์Šค์ผ€์ผ ๋ชจํ˜ธ์„ฑ ๋ฌธ์ œ)์€ ๊นŠ์ด ์ถ”์ •์—์„œ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

  • ์ด ๋ฌธ์ œ๋Š” ์žฅ๋ฉด์—์„œ ๊ฐœ์ฒด์˜ ํฌ๊ธฐ๋‚˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ถ”์ •ํ•  ๋•Œ ์‹ค์ œ ๋ฌผ๋ฆฌ์  ํฌ๊ธฐ๋‚˜ ๊ฑฐ๋ฆฌ์™€ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ค€์ด ์—†์„ ๋•Œ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค์‹œ ๋งํ•ด, ๊นŠ์ด ์ถ”์ • ๋ชจ๋ธ์€ ๋ฌผ์ฒด ๊ฐ„์˜ ์ƒ๋Œ€์ ์ธ ๊ฑฐ๋ฆฌ๋Š” ์ž˜ ์•Œ ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด ๊ฑฐ๋ฆฌ๋“ค์ด ์‹ค์ œ ์„ธ๊ณ„์—์„œ ์–ผ๋งˆ๋‚˜ ํฐ์ง€, ์ฆ‰ ์ ˆ๋Œ€์ ์ธ ์Šค์ผ€์ผ์„ ์•Œ ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ZoeDepth์™€ DepthAnything ๊ฐ™์€ ๋ชจ๋ธ๋“ค์€ ์žฅ๋ฉด์˜ ๊ฐ ํ”ฝ์…€๋งˆ๋‹ค ๊นŠ์ด๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๊นŠ์ด ์ •๋ณด๋Š” ์ƒ๋Œ€์ ์ผ ๋ฟ, ์ ˆ๋Œ€์ ์ธ ๊ฑฐ๋ฆฌ๋ฅผ ์ œ๊ณตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ ์ด๋Ÿฌํ•œ ๊นŠ์ด ์ •๋ณด๋ฅผ ์‹ค์ œ ์„ธ๊ณ„์˜ ์Šค์ผ€์ผ๊ณผ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๋น„๊ต ๊ธฐ์ค€์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ โ€˜sparse SfM pointsโ€™์™€ ๋น„๊ตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋Š” SfM(Structure from Motion) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์–ป์€ ์žฅ๋ฉด์˜ ์ผ๋ถ€ ํฌ์ธํŠธ๋“ค์˜ ์ ˆ๋Œ€์ ์ธ ๊นŠ์ด ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ „์ฒด ๊นŠ์ด ์ง€๋„์˜ ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  • SfM ํฌ์ธํŠธ๋ฅผ ์นด๋ฉ”๋ผ ๋ทฐ์— ํˆฌ์˜ํ•˜์—ฌ ์–ป์€ sparse depth map์˜ ์Šค์ผ€์ผ๊ณผ ์ผ์น˜ํ•˜๋„๋ก ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋ฅผ ์œ„ํ•ด, ๊ฐ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด scale parameter(a)์™€ shift parameter(b)๋ฅผ ๋‹ซํžŒ ํ˜•ํƒœ์˜ ์„ ํ˜• ํšŒ๊ท€ ์†”๋ฃจ์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค image
  • DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing

Depth Evaluation Metrics

image

image

์‚ฌ๋žŒ์ด ํ•œ์ชฝ ๋ˆˆ์„ ์žƒ์–ด์„œ ๋‹ค๋ฅธ ํ•œ์ชฝ ๋ˆˆ์œผ๋กœ๋งŒ ๋ฌผ์ฒด๋ฅผ ๋ณด๋ฉด, ๊ทธ ๋ฌผ์ฒด์— ๋Œ€ํ•œ ๊นŠ์ด๋ฅผ ์ถ”์ •ํ•˜๋Š”๋ฐ ๊ต‰์žฅํžˆ ์–ด๋ ต๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ๋žŒ์ฒ˜๋Ÿผ scene์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ์žˆ์œผ๋ฉด, ํ•œ์ชฝ ๋ˆˆ๋งŒ์œผ๋กœ๋„ ๊นŠ์ด๋ฅผ ์–ด๋Š์ •๋„ ์ถ”์ •ํ•  ์ˆ˜๋Š” ์žˆ๊ธด ํ•˜์ง€๋งŒ ์–ด๋ ต๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋งŒ ์‚ฌ์šฉํ•˜์—ฌ depth๋ฅผ ์ถ”์ •ํ•˜๋Š” monocular depth estimation์„ ์ˆ˜ํ–‰ํ•  ๋•Œ, scene์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ์žˆ๋Š” ๋ชจ๋ธ์ด๋ผ๋„ view๊ฐ€ ํ•˜๋‚˜๋งŒ ์กด์žฌํ•  ๋•Œ๋Š” depth๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

monocular depth estimation is a dense, structured regression task

  • monocular depth estimation์€ ๋ชจ๋“  pixel์— ๋Œ€ํ•ด depth value๋ฅผ predictionํ•ด์•ผํ•˜๋ฏ€๋กœ denseํ•œ task์ž…๋‹ˆ๋‹ค.
  • -1 ~ 1 ์‚ฌ์ด์˜ ๊ฐ’ ์ค‘์— ๋งž๋Š” depth value๋กœ regressํ•ด์•ผํ•˜๋ฏ€๋กœ regression task์ž…๋‹ˆ๋‹ค.

Ground Truth Depth๋Š” depth sensor๋กœ ์ดฌ์˜ํ•œ ๊ฒƒ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. (i.e. iPhone์œผ๋กœ captureํ•œ depth๋ฅผ gt๋กœ ์‚ฌ์šฉํ•จ)

image

image

Sensor depth

  • LiDAR ๋˜๋Š” ์„ผ์„œ๋กœ ์ธก์ •ํ•œ ๊นŠ์ด๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹์˜ depth map์— ๊นŠ์ด ์ •๊ทœํ™”๋ฅผ ์ง์ ‘ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค..
  • ์ผ๋ฐ˜์ ์ธ ์ƒ์—…์šฉ ๊นŠ์ด ์„ผ์„œ๋Š” ๋ฌผ์ฒด ๊ฒฝ๊ณ„์— ๊ฐ€์žฅ์ž๋ฆฌ๊ฐ€ ๋งค๋„๋Ÿฝ์ง€ ์•Š์œผ๋ฉฐ ๋งค๋„๋Ÿฌ์šด ํ‘œ๋ฉด์—์„œ ๋ถ€์ •ํ™•ํ•œ ๊ฐ’์„ ์ œ๊ณตํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ ์ €์ž๋“ค์€ RGB ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ ์‘ํ˜• ๊นŠ์ด ์ •๊ทœํ™”๋ฅผ ์œ„ํ•œ gradient-aware depth loss๋ฅผ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ๋ฌผ์ฒด์˜ ๊ฐ€์žฅ์ž๋ฆฌ์™€ ๊ฐ™์ด ์ด๋ฏธ์ง€ gradient๊ฐ€ ํฐ ์˜์—ญ์—์„œ๋Š” depth loss๊ฐ€ ๋‚ฎ์•„์ง€๋ฉฐ, ๋งค๋„๋Ÿฌ์šด ์˜์—ญ์—์„œ ์ •๊ทœํ™”๊ฐ€ ๋” ๊ฐ•ํ™”๋ฉ๋‹ˆ๋‹ค.
\[L_\hat{D} = g_{rgb} \frac{1}{|\hat{D}|} \sum \log \left( 1 + \|\| \hat{D} - D \|\|_1 \right)\]

์—ฌ๊ธฐ์„œ

\[g_{rgb} = \exp \left( - \nabla I \right)\]

image

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

  • Scene scale
  • Dense Depth Priors for Neural Radiance Fields from Sparse Input Views ๊ทธ๋ฆผ ์ฐธ์กฐ
  • SfM๊ณผ Sparse Depth map ๊ด€๊ณ„

    image

Reference

Leave a comment