small_gicp
small_gicp is a header-only C++ library providing efficient and parallelized algorithms for fine point cloud registration (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a refined and optimized version of its predecessor, fast_gicp, re-written from scratch with the following features.
- Highly Optimized : The core registration algorithm implementation has been further optimized from fast_gicp, achieving up to 2x speed gain.
- Fully parallerized : small_gicp offers parallel implementations of several preprocessing algorithms, making the entire registration process parallelized (e.g., Downsampling, KdTree construction, Normal/Covariance estimation). It supports OpenMP and Intel TBB as parallelism backends.
- Minimum dependencies : The library requires only Eigen along with the bundled nanoflann and Sophus. Optionally, it supports a PCL registration interface for use as a drop-in replacement
- Customizable : small_gicp allows the integration of any custom point cloud class into the registration algorithm via traits. Its template-based implementation enables customization of the registration process with original correspondence estimators and registration factors.
- Python bindings : By being isolated from PCL, small_gicp's Python bindings are more portable and can be used seamlessly with other libraries such as Open3D.
Note that GPU-based implementations are NOT included in this package.
If you find this package useful for your project, please consider leaving a comment here. It would help the author receive recognition in his organization and keep working on this project.
Requirements
This library uses C++17 features. The PCL interface is not compatible with PCL older than 1.11 that uses boost::shared_ptr
.
Dependencies
Installation
C++
small_gicp is a header-only library. You can just download and drop it in your project directory to use it.
If you need only basic point cloud registration functions, you can build and install the helper library as follows.
sudo apt-get install libeigen3-dev libomp-dev
cd small_gicp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release && make -j
sudo make install
Python (Linux / Windows / MacOS)
Install from PyPI
pip install small_gicp
Install from source
cd small_gicp
pip install .
pip install pybind11-stubgen
cd ~/.local/lib/python3.10/site-packages
pybind11-stubgen -o . --ignore-invalid=all small_gicp
Documentation
Usage (C++)
The following examples assume using namespace small_gicp
is placed somewhere.
The helper library (registration_helper.hpp
) enables easily processing point clouds represented as std::vector<Eigen::Vector(3|4)(f|d)>
.
Expand
small_gicp::align
takes two point clouds (std::vectors
of Eigen::Vector(3|4)(f|d)
) and returns a registration result (estimated transformation and some information on the optimization result). This is the easiest way to use small_gicp but causes an overhead for duplicated preprocessing.
#include <small_gicp/registration/registration_helper.hpp>
std::vector<Eigen::Vector3d> target_points = ...;
std::vector<Eigen::Vector3d> source_points = ...;
RegistrationSetting setting;
setting.num_threads = 4;
setting.downsampling_resolution = 0.25;
setting.max_correspondence_distance = 1.0;
Eigen::Isometry3d init_T_target_source = Eigen::Isometry3d::Identity();
RegistrationResult result = align(target_points, source_points, init_T_target_source, setting);
Eigen::Isometry3d T = result.T_target_source;
size_t num_inliers = result.num_inliers;
Eigen::Matrix<double, 6, 6> H = result.H;
There is also a way to perform preprocessing and registration separately. This enables saving time for preprocessing in case registration is performed several times for the same point cloud (e.g., typical odometry estimation based on scan-to-scan matching).
#include <small_gicp/registration/registration_helper.hpp>
std::vector<Eigen::Vector3d> target_points = ...;
std::vector<Eigen::Vector3d> source_points = ...;
int num_threads = 4;
double downsampling_resolution = 0.25;
int num_neighbors = 10;
auto [target, target_tree] = preprocess_points(target_points, downsampling_resolution, num_neighbors, num_threads);
auto [source, source_tree] = preprocess_points(source_points, downsampling_resolution, num_neighbors, num_threads);
RegistrationSetting setting;
setting.num_threads = num_threads;
setting.max_correspondence_distance = 1.0;
Eigen::Isometry3d init_T_target_source = Eigen::Isometry3d::Identity();
RegistrationResult result = align(*target, *source, *target_tree, init_T_target_source, setting);
Eigen::Isometry3d T = result.T_target_source;
size_t num_inliers = result.num_inliers;
Eigen::Matrix<double, 6, 6> H = result.H;
The PCL interface allows using small_gicp as a drop-in replacement for pcl::Registration
. It is also possible to directly feed pcl::PointCloud
to algorithms implemented in small_gicp.
Expand
#include <small_gicp/pcl/pcl_registration.hpp>
pcl::PointCloud<pcl::PointXYZ>::Ptr raw_target = ...;
pcl::PointCloud<pcl::PointXYZ>::Ptr raw_source = ...;
pcl::PointCloud<pcl::PointXYZ>::Ptr target = voxelgrid_sampling_omp(*raw_target, 0.25);
pcl::PointCloud<pcl::PointXYZ>::Ptr source = voxelgrid_sampling_omp(*raw_source, 0.25);
RegistrationPCL<pcl::PointXYZ, pcl::PointXYZ> reg;
reg.setNumThreads(4);
reg.setCorrespondenceRandomness(20);
reg.setMaxCorrespondenceDistance(1.0);
reg.setVoxelResolution(1.0);
reg.setRegistrationType("VGICP");
reg.setInputTarget(target);
reg.setInputSource(source);
auto aligned = pcl::make_shared<pcl::PointCloud<pcl::PointXYZ>>();
reg.align(*aligned);
reg.swapSourceAndTarget();
reg.align(*aligned);
It is also possible to directly feed pcl::PointCloud
to small_gicp::Registration
. Because all preprocessed data are exposed in this way, you can easily re-use them to obtain the best efficiency.
#include <small_gicp/pcl/pcl_point.hpp>
#include <small_gicp/pcl/pcl_point_traits.hpp>
pcl::PointCloud<pcl::PointXYZ>::Ptr raw_target = ...;
pcl::PointCloud<pcl::PointXYZ>::Ptr raw_source = ...;
pcl::PointCloud<pcl::PointCovariance>::Ptr target = voxelgrid_sampling_omp<pcl::PointCloud<pcl::PointXYZ>, pcl::PointCloud<pcl::PointCovariance>>(*raw_target, 0.25);
pcl::PointCloud<pcl::PointCovariance>::Ptr source = voxelgrid_sampling_omp<pcl::PointCloud<pcl::PointXYZ>, pcl::PointCloud<pcl::PointCovariance>>(*raw_source, 0.25);
const int num_threads = 4;
const int num_neighbors = 20;
estimate_covariances_omp(*target, num_neighbors, num_threads);
estimate_covariances_omp(*source, num_neighbors, num_threads);
auto target_tree = std::make_shared<KdTree<pcl::PointCloud<pcl::PointCovariance>>>(target, KdTreeBuilderOMP(num_threads));
auto source_tree = std::make_shared<KdTree<pcl::PointCloud<pcl::PointCovariance>>>(source, KdTreeBuilderOMP(num_threads));
Registration<GICPFactor, ParallelReductionOMP> registration;
registration.reduction.num_threads = num_threads;
registration.rejector.max_dist_sq = 1.0;
auto result = registration.align(*target, *source, *target_tree, Eigen::Isometry3d::Identity());
If you want to fine-control and customize the registration process, use small_gicp::Registration
template that allows modifying the inner algorithms and parameters.
Expand
#include <small_gicp/ann/kdtree_omp.hpp>
#include <small_gicp/points/point_cloud.hpp>
#include <small_gicp/factors/gicp_factor.hpp>
#include <small_gicp/util/normal_estimation_omp.hpp>
#include <small_gicp/registration/reduction_omp.hpp>
#include <small_gicp/registration/registration.hpp>
std::vector<Eigen::Vector3d> target_points = ...;
std::vector<Eigen::Vector3d> source_points = ...;
int num_threads = 4;
double downsampling_resolution = 0.25;
int num_neighbors = 10;
double max_correspondence_distance = 1.0;
auto target = std::make_shared<PointCloud>(target_points);
auto source = std::make_shared<PointCloud>(source_points);
target = voxelgrid_sampling_omp(*target, downsampling_resolution, num_threads);
source = voxelgrid_sampling_omp(*source, downsampling_resolution, num_threads);
auto target_tree = std::make_shared<KdTree<PointCloud>>(target, KdTreeBuilderOMP(num_threads));
auto source_tree = std::make_shared<KdTree<PointCloud>>(source, KdTreeBuilderOMP(num_threads));
estimate_covariances_omp(*target, *target_tree, num_neighbors, num_threads);
estimate_covariances_omp(*source, *source_tree, num_neighbors, num_threads);
Registration<GICPFactor, ParallelReductionOMP> registration;
registration.reduction.num_threads = num_threads;
registration.rejector.max_dist_sq = max_correspondence_distance * max_correspondence_distance;
Eigen::Isometry3d init_T_target_source = Eigen::Isometry3d::Identity();
auto result = registration.align(*target, *source, *target_tree, init_T_target_source);
Eigen::Isometry3d T = result.T_target_source;
size_t num_inliers = result.num_inliers;
Eigen::Matrix<double, 6, 6> H = result.H;
See 03_registration_template.cpp for more detailed customization examples.
Cookbook
Expand
Example A : Perform registration with numpy arrays
result = small_gicp.align(target_raw_numpy, source_raw_numpy, downsampling_resolution=0.25)
result.T_target_source
result.converged
result.iterations
result.num_inliers
result.H
result.b
result.e
Example B : Perform preprocessing and registration separately
target, target_tree = small_gicp.preprocess_points(target_raw_numpy, downsampling_resolution=0.25)
source, source_tree = small_gicp.preprocess_points(source_raw_numpy, downsampling_resolution=0.25)
target.size()
target.points()
target.normals()
target.covs()
result = small_gicp.align(target, source, target_tree)
Example C : Perform each of preprocessing steps one-by-one
target_raw = small_gicp.PointCloud(target_raw_numpy)
source_raw = small_gicp.PointCloud(source_raw_numpy)
target = small_gicp.voxelgrid_sampling(target_raw, 0.25)
source = small_gicp.voxelgrid_sampling(source_raw, 0.25)
target_tree = small_gicp.KdTree(target)
source_tree = small_gicp.KdTree(source)
small_gicp.estimate_covariances(target, target_tree)
small_gicp.estimate_covariances(source, source_tree)
result = small_gicp.align(target, source, target_tree)
Example D: Example with Open3D
target_o3d = open3d.io.read_point_cloud('small_gicp/data/target.ply').paint_uniform_color([0, 1, 0])
source_o3d = open3d.io.read_point_cloud('small_gicp/data/source.ply').paint_uniform_color([0, 0, 1])
target, target_tree = small_gicp.preprocess_points(numpy.asarray(target_o3d.points), downsampling_resolution=0.25)
source, source_tree = small_gicp.preprocess_points(numpy.asarray(source_o3d.points), downsampling_resolution=0.25)
result = small_gicp.align(target, source, target_tree)
source_o3d.transform(result.T_target_source)
open3d.visualization.draw_geometries([target_o3d, source_o3d])
Cookbook
Running examples
C++
cd small_gicp
mkdir build && cd build
cmake .. -DBUILD_EXAMPLES=ON && make -j
cd ..
./build/01_basic_registration
./build/02_basic_registration_pcl
./build/03_registration_template
Python
cd small_gicp
pip install .
python3 src/example/basic_registration.py
Processing speed comparison between small_gicp and Open3D (youtube).
Downsampling
- Single-threaded
small_gicp::voxelgrid_sampling
is about 1.3x faster than pcl::VoxelGrid
. - Multi-threaded
small_gicp::voxelgrid_sampling_tbb
(6 threads) is about 3.2x faster than pcl::VoxelGrid
. small_gicp::voxelgrid_sampling
provides accurate downsampling results that are nearly identical to those of pcl::VoxelGrid
, while pcl::ApproximateVoxelGrid
can produce spurious points (up to 2x more points).small_gicp::voxelgrid_sampling
can handle larger point clouds with finer voxel resolutions compared to pcl::VoxelGrid
. For a point cloud with a width of 1000m, the minimum voxel resolution can be 0.5 mm.
KdTree construction
- Multi-threaded implementation (TBB and OMP) can be up to 6x faster than the single-threaded version. The single-thread version performs almost equivalently to nanoflann.
- The new KdTree implementation demonstrates good scalability due to its well-balanced task assignment.
- This benchmark compares only the construction time (query time is not included). Nearest neighbor queries are included and evaluated in the following odometry estimation evaluation.
Odometry estimation
- Single-thread
small_gicp::GICP
is about 2.4x and 1.9x faster than pcl::GICP
and fast_gicp::GICP
, respectively. small_gicp::(GICP|VGICP)
demonstrates better multi-threaded scalability compared to fast_gicp::(GICP|VGICP)
.small_gicp::GICP
parallelized with TBB flow graph shows excellent scalability in many-threads scenarios (~128 threads), though with some latency degradation.- Outputs of
small_gicp::GICP
are almost identical to those of fast_gicp::GICP
.
License
This package is released under the MIT license.
If you find this package useful for your project, please consider leaving a comment here. It would help the author receive recognition in his organization and keep working on this project.
Contact
Kenji Koide, National Institute of Advanced Industrial Science and Technology (AIST)