4 minute read

CUDA Programming Tutorial 1

CUDA is a programming language for GPU, so it enables faster parallel computation, thus making pytorch faster.

image

CUDA does parallel computation and return the output to pytorch.

So the actual important part is on cuda, not c++.

c++ is only a bridge that connect pytorch and cuda.

๋ณธ tutorial 1์—์„œ๋Š” NeRF์˜ volume rendering์„ ์‰ฌ์šด ๋ฒ„์ „์œผ๋กœ ๋งŒ๋“ค์–ด๋ณผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • trilinear interpolation์œผ๋กœ 8 vertices๋กœ๋ถ€ํ„ฐ feature f๋ฅผ ์–ป๋Š” ๊ฒƒ์„ ๊ตฌํ˜„ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

image

image

  • tutorial 1์—์„œ๋Š” cuda์—†์ด c++ bridge๋งŒ ์จ๋ณผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • vscode์—์„œ conda๋กœ python ๊ฐ€์ƒํ™˜๊ฒฝ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  • interpolation.cpp ํŒŒ์ผ์„ ๋งŒ๋“ค๊ณ , c++ bridge๋ฅผ ์—ฌ๊ธฐ์— ์ž‘์„ฑํ•ด์„œ pytorch์™€ cuda๋ฅผ ์—ฐ๊ฒฐํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

    image

  • code๊ฐ€ pytorch๋กœ๋ถ€ํ„ฐ tensors๋ฅผ ๋ฐ›์œผ๋ฏ€๋กœ, c++๊ฐ€ tensor๊ฐ€ ๋ญ”์ง€ ์•Œ๊ฒŒ ํ•˜๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด, #include <torch/extension.h>๋ฅผ ์ž‘์„ฑํ•ด์ค๋‹ˆ๋‹ค.
    • ์ด๋•Œ ๋นจ๊ฐ„์ค„์ด ๋‚˜์˜ค๋Š”๋ฐ, ctrl+shift+p๋ฅผ ๋ˆ„๋ฅด๊ณ  C/C++: Edit Configurations (UI)๋ฅผ ์„ ํƒํ•ด์ฃผ์–ด, ๋นจ๊ฐ„์ค„์„ ์—†์•ฑ๋‹ˆ๋‹ค.

      image

    • C/C++: Edit Configuration (JSON)์„ ์„ ํƒํ•˜๊ณ  includePath์— python๊ณผ pytorch path๋ฅผ conda environment์— ๊ฒฝ๋กœ๋กœ ์ถ”๊ฐ€ํ•ด์ค๋‹ˆ๋‹ค.

      image

  • ํŒŒ์ผ์„ ์ €์žฅํ•˜๋ฉด #include <torch/extension.h>์˜ ๋นจ๊ฐ„ ๋ฐ‘์ค„์ด ์‚ฌ๋ผ์กŒ์Šต๋‹ˆ๋‹ค.

image

tutorial 1์—์„œ๋Š” placeholder๋งŒ ์ž‘์„ฑํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์‹ค์ œ computation์€ cpp ํŒŒ์ผ์ด ์•„๋‹ˆ๋ผ cuda ํŒŒ์ผ์—์„œ ์‹คํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

  • ๋”ฐ๋ผ์„œ cpp ํŒŒ์ผ์—์„œ๋Š” function name, input and output๋งŒ ์ž‘์„ฑํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

image

  • trilinear interpolation์œผ๋กœ function name๊ณผ input์ธ feats, point์™€ output์ธ return feats๋ฅผ Tensor๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

image

image

  • c++์€ cuda execution์„ callํ•˜๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  • c++์€ python์„ callํ•˜๊ธฐ ์œ„ํ•œ interface๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. (์ด๊ฒƒ์ด cpp์—์„œ ์ •์˜ํ•œ ํ•จ์ˆ˜๋ฅผ python์—์„œ callํ•  ์ˆ˜ ์žˆ๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค.)

๋งŒ์•ฝ python์—์„œ c++๋ฅผ callํ•ด๋ดค๋‹ค๋ฉด, pybind๋ผ๋Š” library๋ฅผ ์•„์‹คํ…๋ฐ, ์ด library๋Š” python์œผ๋กœ๋ถ€ํ„ฐ c++ code๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ด pybind๋ผ๋Š” library๋Š” pytorch์™€ ๊ด€๋ จ๋  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, opencv code๊ฐ€ ์‚ฌ์šฉ๋  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๊ฐ€ ํ• ์ผ์€ ์ด๋Ÿฐ pybind interface๋ฅผ ์‚ฌ์šฉํ•ด python์—์„œ c++ ํ•จ์ˆ˜๋ฅผ callํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ด์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ์ฒซ๋ฒˆ์งธ ์ธ์ž๋Š” python์œผ๋กœ๋ถ€ํ„ฐ ๋ถˆ๋ ค์งˆ function name: "trilinear_interpolation"
  • ๋‘๋ฒˆ์งธ ์ธ์ž๋Š” ๋ถˆ๋ ค์ง„ c++ function์ธ trilinear_interpolation์ž…๋‹ˆ๋‹ค.
  • ์•„๋ž˜์ฒ˜๋Ÿผ PYBIND11_MODULE๋กœ ์ฒซ๋ฒˆ์งธ ์ธ์ž, ๋‘๋ฒˆ์งธ ์ธ์ž๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด c++ bridge๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ๋งŒ๋“  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

image

c++์—์„œ ์ž‘์„ฑํ•œ ์ด ํ•จ์ˆ˜๋ฅผ python์—์„œ ์–ด๋–ป๊ฒŒ ๋ถˆ๋Ÿฌ์˜ค๋Š”์ง€ ํ•ด๋ด…์‹œ๋‹ค.

  • c++์—์„œ ์ž‘์„ฑํ•œ ํ•จ์ˆ˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ž‘์„ฑํ•œ c++ ์ฝ”๋“œ๋ฅผ ๋จผ์ € build ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
  • c++๋Š” ๋‹ค๋ฅธ ๊ณณ์—์„œ call ๋˜๊ธฐ ์œ„ํ•ด์„œ๋Š” compiling๊ณผ building์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.
  • setup.py๋กœ c++ ์ฝ”๋“œ๋ฅผ compiling๊ณผ building์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

image

setup.py์—์„œ c++ ์ฝ”๋“œ๊ฐ€ ์–ด๋–ป๊ฒŒ bulit ๋  ๊ฒƒ์ธ์ง€ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

  • ์•„๋ž˜ ๋‚ด์šฉ์„ copy and pasteํ•ด์„œ ์ˆ˜์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • name์€ package์˜ ์ด๋ฆ„์„ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด๊ณ , ๋‚˜๋จธ์ง€๋Š” ๋ถ€๊ฐ€์ ์ž…๋‹ˆ๋‹ค.
  • ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์€ ext_modules์ž…๋‹ˆ๋‹ค.
    • ๋จผ์ € buildํ•˜๊ณ  ์‹ถ์€ c++ ์ฝ”๋“œ๋ฅผ sources์˜ ๋ฆฌ์ŠคํŠธ ์•ˆ์— ์จ์ค๋‹ˆ๋‹ค. (์—ฌ๋Ÿฌ๊ฐœ ์žˆ์œผ๋ฉด, ์ฝค๋งˆ๋กœ ์ด์–ด์„œ ๋‹ค๋ฅธ cpp ํŒŒ์ผ๋„ ์จ์ค๋‹ˆ๋‹ค.)

      image

    • ๋งˆ์ง€๋ง‰์œผ๋กœ cmdclass๋Š” ์šฐ๋ฆฌ๊ฐ€ code๋ฅผ buildingํ•œ ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๋ ค์ฃผ๋Š” ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ pip version์„ python -m pip install pip -U๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ณ , setup.py๋กœ c++ ์ฝ”๋“œ๋ฅผ build ํ•ด๋ด…์‹œ๋‹ค.

  • path๋Š” setup.py๊ฐ€ ์œ„์น˜ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ํ˜„์žฌ current folder์— setup.py๊ฐ€ ์œ„์น˜ํ•˜๋ฏ€๋กœ ๊ทธ๋ƒฅ ํ˜„์žฌ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” .์„ path๋กœ ๋„ฃ์–ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • pip install .์„ ์ž…๋ ฅํ•˜๋ฉด ์‹œ๊ฐ„์ด ์ข€ ๊ฑธ๋ฆฌ๋ฉด์„œ build๊ฐ€ ์™„๋ฃŒ๋ฉ๋‹ˆ๋‹ค.

image

setup.py๋กœ c++ ํŒŒ์ผ์ด build๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด python test.py ํŒŒ์ผ์—์„œ ๋ถˆ๋Ÿฌ์™€์„œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด๋ด…์‹œ๋‹ค.

  • cppcuda_tutorial์—์„œ torch๋ฅผ importํ–ˆ์œผ๋ฏ€๋กœ torch๋ฅผ import ์•ˆํ•ด๋„ ๋  ๊ฑฐ๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, torch๋ถ€ํ„ฐ import๋ฅผ ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋งŒ์•ฝ ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์šฐ๋ฆฌ๊ฐ€ c++ ์ฝ”๋“œ๋กœ ์ง  custom package๊ฐ€ import๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ์ง์ ‘ ์ง  cppcuda_tutorial c++ ์ฝ”๋“œ์—์„œ trilinear_interpolation ํ•จ์ˆ˜๋ฅผ ๋ถˆ๋Ÿฌ์™€์„œ python test.py์—์„œ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๊นŒ์ง€ ์„ฑ๊ณต์ ์œผ๋กœ ๋„์ถœํ•˜์˜€์Šต๋‹ˆ๋‹ค.

image

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

์‹ค์ „ ์—์ œ

diff-gaussian-rasterization/ext.cpp์—์„œ c++ bridge๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • ๊ธฐ๋ณธ์ ์ธ #include <torch/extension.h>๋กœ torch๊ฐ€ ๋ฌด์—‡์ธ์ง€ cpp ํŒŒ์ผ์—๊ฒŒ ์•Œ๋ ค์ค๋‹ˆ๋‹ค.
  • ext.cppํŒŒ์ผ์„ setup.py๋กœ pip install .๋กœ buildํ•˜๋ฉด ์ด์ œ python ํŒŒ์ผ์—์„œ c++, cuda๋กœ ์ž‘์„ฑํ•œ ํ•จ์ˆ˜๋ฅผ importํ•˜์—ฌ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋•Œ, ์•ž์„œ ๋งํ–ˆ๋“ฏ์ด ext.cpp ํŒŒ์ผ์€ ์˜ค์ง pytorch์™€ cuda๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” bridge ์—ญํ• ๋งŒ ํ•ฉ๋‹ˆ๋‹ค.
    • CUDA๋กœ ์ž‘์„ฑํ•œ RasterizeGaussianCUDA๋Š” python ํŒŒ์ผ์—์„œ rasterize_gaussians๋กœ ํ•จ์ˆ˜๋กœ ๋ถˆ๋Ÿฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • CUDA๋กœ ์ž‘์„ฑํ•œ RasterizeGaussiansBackwardCUDA๋Š” python ํŒŒ์ผ์—์„œ rasterize_gaussians_backward๋กœ ํ•จ์ˆ˜๋กœ ๋ถˆ๋Ÿฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • CUDA๋กœ ์ž‘์„ฑํ•œ markVisible๋Š” python ํŒŒ์ผ์—์„œ mark_visible๋กœ ํ•จ์ˆ˜๋กœ ๋ถˆ๋Ÿฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    # ext.cpp
    /*
     * Copyright (C) 2023, Inria
     * GRAPHDECO research group, https://team.inria.fr/graphdeco
     * All rights reserved.
     *
     * This software is free for non-commercial, research and evaluation use 
     * under the terms of the LICENSE.md file.
     *
     * For inquiries contact  george.drettakis@inria.fr
     */
      
    #include <torch/extension.h>
    #include "rasterize_points.h"
      
    PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
      m.def("rasterize_gaussians", &RasterizeGaussiansCUDA);
      m.def("rasterize_gaussians_backward", &RasterizeGaussiansBackwardCUDA);
      m.def("mark_visible", &markVisible);
    }
    
    • rasterize_points.h์—์„œ๋Š” c++ ์ฝ”๋“œ๋กœ function name, input, output์— ๋Œ€ํ•œ ์ •์˜๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

      # rasterize_points.h`
        /*
       * Copyright (C) 2023, Inria
       * GRAPHDECO research group, https://team.inria.fr/graphdeco
       * All rights reserved.
       *
       * This software is free for non-commercial, research and evaluation use 
       * under the terms of the LICENSE.md file.
       *
       * For inquiries contact  george.drettakis@inria.fr
       */
          
      #pragma once
      #include <torch/extension.h>
      #include <cstdio>
      #include <tuple>
      #include <string>
          	
      std::tuple<int, torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor>
      RasterizeGaussiansCUDA(
      	const torch::Tensor& background,
      	const torch::Tensor& means3D,
          const torch::Tensor& colors,
          const torch::Tensor& opacity,
      	const torch::Tensor& scales,
      	const torch::Tensor& rotations,
      	const float scale_modifier,
      	const torch::Tensor& cov3D_precomp,
      	const torch::Tensor& viewmatrix,
      	const torch::Tensor& projmatrix,
      	const float tan_fovx, 
      	const float tan_fovy,
          const int image_height,
          const int image_width,
      	const torch::Tensor& sh,
      	const int degree,
      	const torch::Tensor& campos,
      	const bool prefiltered,
      	const bool debug);
          
      std::tuple<torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor>
       RasterizeGaussiansBackwardCUDA(
       	const torch::Tensor& background,
      	const torch::Tensor& means3D,
      	const torch::Tensor& radii,
          const torch::Tensor& colors,
      	const torch::Tensor& scales,
      	const torch::Tensor& rotations,
      	const float scale_modifier,
      	const torch::Tensor& cov3D_precomp,
      	const torch::Tensor& viewmatrix,
          const torch::Tensor& projmatrix,
      	const float tan_fovx, 
      	const float tan_fovy,
          const torch::Tensor& dL_dout_color,
      	const torch::Tensor& sh,
      	const int degree,
      	const torch::Tensor& campos,
      	const torch::Tensor& geomBuffer,
      	const int R,
      	const torch::Tensor& binningBuffer,
      	const torch::Tensor& imageBuffer,
      	const bool debug);
          		
      torch::Tensor markVisible(
      		torch::Tensor& means3D,
      		torch::Tensor& viewmatrix,
      		torch::Tensor& projmatrix);
      

diff-gaussian-rasterization/setup.py์—์„œ ์‹ค์ œ buildํ•˜๋Š” ์ฝ”๋“œ๋Š” c++ bridge ์ฝ”๋“œ์ธ ext.cpp ์ด์™ธ์—๋„ cuda ์ฝ”๋“œ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

  • setup.py์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ธ ext_modules์—์„œ sources list์— ๋“ค์–ด์žˆ๋Š” cpp, cu ํŒŒ์ผ์„ build ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋กœ์จ "cuda_rasterizer/rasterizer_impl.cu", "cuda_rasterizer/forward.cu", "cuda_rasterizer/backward.cu", "rasterize_points.cu", "ext.cpp"์— ์ •์˜๋œ ํ•จ์ˆ˜๋ฅผ python์—์„œ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use 
# under the terms of the LICENSE.md file.
#
# For inquiries contact  george.drettakis@inria.fr
#

from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
os.path.dirname(os.path.abspath(__file__))

setup(
    name="diff_gaussian_rasterization",
    packages=['diff_gaussian_rasterization'],
    ext_modules=[
        CUDAExtension(
            name="diff_gaussian_rasterization._C",
            sources=[
            "cuda_rasterizer/rasterizer_impl.cu",
            "cuda_rasterizer/forward.cu",
            "cuda_rasterizer/backward.cu",
            "rasterize_points.cu",
            "ext.cpp"],
            extra_compile_args={"nvcc": ["-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})
        ],
    cmdclass={
        'build_ext': BuildExtension
    }
)

Reference

Pytorch+cpp/cuda extension ๆ•™ๅญธ tutorial 1 - English CC -

Leave a comment