Document

Efficient FPGA Implementation of

K-Nearest-Neighbor Search Algorithm for 3D

LIDAR Localization and Mapping in Smart Vehicles

Hao Sun^ID, Xinzhe Liu^ID, Qi Deng, Weixiong Jiang, Graduate Student Member, IEEE,

Shaobo Luo, and Yajun Ha^ID, Senior Member, IEEE

Abstract—K-Nearest-Neighbor search (KNN) has been extensively used in the localization and mapping based on 3D laser point clouds in smart vehicles. Considering the real-time requirement of localization and stringent battery constraint in smart vehicles, it is a great challenge to develop highly energy-efficient KNN implementations. Unfortunately, previous KNN implementations either cannot efficiently build search data structures or cannot search efficiently in massive and unevenly distributed point clouds. To solve the issue, we propose a new framework to optimize the implementation of KNN on FPGAs. First, we propose a novel data structure with a spatial subdivision method, which can be built efficiently even for massive point clouds. Second, based on our data structure, we propose a KNN search algorithm which is able to search in unevenly distributed point clouds efficiently. We have implemented the new framework on both FPGA and GPU. Energy efficiency results show that our proposed method is on average 2.1 times and 6.2 times higher than the state-of-the-art implementations of KNN on FPGA and GPU platform, respectively.

Index Terms—K-nearest-neighbor search, FPGA, 3D LIDAR localization and mapping, smart vehicles, KNN.

I. INTRODUCTION

ALTHOUGH widely used, KNN has been especially use-

ful in smart vehicles to match a current frame of LIDAR

(Light Detection and Ranging) point cloud with a massive

LIDAR point cloud map where the number of points is

over 50000 and the points are distributed in a wide space

over 100 × 100 × 5m³. KNN is an essential step in vari-

ous SLAM (Simultaneous Localization and Mapping) algo-

rithms ¹ because matching two point clouds is important

Manuscript received April 1, 2020; revised June 11, 2020; accepted June 28, 2020. Date of publication August 3, 2020; date of current version September 3, 2020. This work was supported by Shanghai Science and Technology Commission Funding under Grant 19511131200. This brief was recommended by Associate Editor J. M. de la Rosa. (Corresponding author: Yajun Ha.)

Hao Sun, Xinzhe Liu, Qi Deng, and Weixiong Jiang are with the School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China, also with the Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China, and also with the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China.

Shaobo Luo is with Universite Paris-Est, Paris 93162, France (e-mail: mailto:[email protected]).

Yajun Ha is with the School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China (e-mail: mailto:[email protected]).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSII.2020.3013758

to correct any localization error. Unlike image point clouds,

LIDAR point clouds are usually distributed unevenly in the

vehicle's vast surrounding area. The object classification of

laser point clouds has been challenging and not yet well stud-

ied. As a result, when matching two laser point clouds, we

have to heavily rely on the KNN algorithm, which finds the

K nearest neighbors in a laser point cloud map as the pairing

points. Although KNN uses simple operations to search in a

massive point cloud map, it takes up about 80 percent of the

matching process ². Considering the real-time requirement of

localization and stringent battery constraints from smart vehi-

cles, it is a great challenge to develop highly energy-efficient

KNN implementations that consume as little as possible

energy to calculate KNN results under the timing constraint.

Several previous works have been performed to imple-

ment KNN efficiently. First, researchers try to use GPU or

FPGA ³ to accelerate KNN, but only with brute force equiv-

alent search methods. They can only achieve limited energy

efficiency improvement. Second, researchers propose a tree

structure called KD-tree ⁴ to search KNN efficiently in a

short time. However, it needs substantial time to build the KD-

tree for large point cloud maps before it can start to search.

Because building the data structure is required whenever smart

vehicles update their maps, the long building time of KD-tree

prevents it from practical applications, where point cloud maps

are usually large. Reference ⁵ proposes a parallel construc-

tion of KD-tree on GPU. It meets the speed requirements but

at the cost of unacceptable power. Third, researchers propose

the GBDS (Grid-Based Data Structure) ⁶. Unlike KD-tree,

GBDS can be built quickly for big point cloud maps. It seg-

ments the data space into grids by a user-defined number and

enables the search of KNN only in the nearby grids rather than

in the whole data space. However, GBDS only performs well

on the evenly distributed datasets but poorly on 3D LIDAR

point cloud maps. When processing a massive and uneven

point cloud by GBDS, some grids are empty while others are

dense, which is inefficient for KNN searching. Furthermore,

GBDS needs to find enough candidate neighbors in an expand-

ing circle, which wastes time with outliers. Fourth, researchers

propose an efficient large-scale ANN search on FPGA ⁷.

They present a highly parallel k-selection method to accel-

erate the search process and demonstrate FPGA performance

superior to CPU and GPU in high-dimensional, large-scale

datasets, but its energy efficiency should be further improved.

To solve the issues, we propose an efficient FPGA implementation of KNN using the HLS (High-Level-Synthesis) tool. Our method includes a novel data structure called DSVS (Double-Segmentation-Voxel-Structure) and a new search algorithm based on DSVS. With the data structure DSVS, we segment the point cloud a limited number of times to generate three-dimensional cubic spaces (voxels) and sub-voxels with a decent number of points. With the searching algorithm, we only search the nearest neighbors in the neighboring voxels and sub-voxels to avoid redundant searches in an expanded search circle ⁸.

The contributions of this brief are as follows:

a new data structure (DSVS) for fast KNN search: ensure a scalable number of points in each voxel or sub-voxel to reduce redundant searches. Unlike GBDS ⁹, this new structure returns a balanced number of points in each sub-space. Unlike tree structure ¹⁰, we only segment the data space a limited number of times, which costs a tiny amount of time on constructing the data structure.
a new KNN search method based on DSVS and FPGA with the following features. (1) search KNN in a well-defined range within the limited number of neighboring voxels and sub-voxels. (2) optimized data transfer and access strategy. Only the points near the well-defined range will be transferred to minimize the data transfer traffic. The points in one voxel are stored sequentially in block RAMs to be accessed in parallel.

The remainder of this brief is organized as follows. Section II gives an overview of the DSVS framework. Section III introduces the novel DSVS data structure. Section IV introduces the new KNN search algorithm based on DSVS. Section V shows the experimental results and analysis. Section VI gives the conclusions and future work.

II. OVERVIEW OF DSVS FRAMEWORK

In this section, we first define the KNN search problem in LIDAR SLAM applications, then introduce our framework of DSVS for efficient KNN implementation.

Formally, the KNN search problem is defined as follows: Given a set S of points in space M and a query point q \in M , find the K closest point in S to q . For LIDAR SLAM applications in smart vehicles, S consists of the points from a surrounding point cloud map, named reference set. Generally, the reference set is huge and unevenly spread in a vast region. The query q consists of the feature points in a frame of LIDAR data. M is the 3-dimensional vector space, which is measured using the squared Euclidean distance.

For KNN in LIDAR applications, only those neighbors in a specific range are meaningful. If the KNN results are far from the query point, the query point will be considered as an outlier. We define a predetermined search range R_{in} in which we ensure all the neighbors can be searched while we pay little attention to the points out of the range. R_{in} indicates the requirement from the user, and both the building process and searching process are related to it. First, we segment the point cloud to obtain a set of voxels with the side length equal to R_{in} . If one voxel contains too many points above a threshold, we segment the voxel into several sub-voxels. We denote the

Fig. 1. An overview of DSVS framework.

threshold value as T_{voxel} . In the search process, we search the KNN only in a predetermined range R_{in} within the neighboring voxels and sub-voxels to avoid redundant searches in an expanding search circle ¹¹.

We implement the DSVS framework on a heterogeneous system that combines a powerful PS (Processing System) and PL (user-Programmable Logic) into the same FPGA. As Fig. 1 shows, The DSVS method consists of two parts, building the DSVS data structure and searching KNN based on DSVS. Through the HLS profiling tool, we find that sorting the reference set and query set is not a computation-intensive task, so we choose to execute it on the PS side. We generate the RTL (Register-Transfer Level) for FPGA with the Xilinx Vivado HLS tool. All the interfaces are implemented with direct memory access, and the arrays are packed to obtain the most efficient data transfer.

There are mainly three modules related to searching KNN:

Dataset buffer: buffer the points in a reference set surrounding a query point.
Search candidate neighbors: search candidate neighbors in R_{in} within the neighboring voxels and sub-voxels.
Hardware parallel K-selection: select K nearest neighbors from candidate neighbors in parallel.

III. DSVS DATA STRUCTURE

In this section, we introduce the DSVS data structure and how to build it.

First of all, we need to find a bounding box by traversing all points in the reference set. The minimum and maximum range in each axis (x,y,z) are calculated as AABB (Axis-Aligned Bounding Box) of the point cloud, denoted as

\begin{aligned} \text{highbound} &= \{x_{max}, y_{max}, z_{max}\} \\ \text{lowerbound} &= \{x_{min}, y_{min}, z_{min}\} \end{aligned} \qquad (1)

Next, we segment theABB into a set of voxels. Ideally, the side length of each voxel should equal to R_{in} . However, because the memory resource for voxels is limited, the side length may be amplified to avoid overflow. The achieved side length is denoted as R_{re} , where R_{re} \ge R_{in} . The number of voxels for each axis is denoted as VS = \{VS_x, VS_y, VS_z\} , calculated by

VS = \left( \frac{\text{highbound} - \text{lowerbound}}{R_{re}} \right) \qquad (2)

Fig. 2. An example of building DSVS data structure. (a) A raw point cloud with 32 points. (b) First segmentation result. (c) DSVS result. (d) Information for the first segmentation. The first line represents the hash table. The second line represents the number of points in each voxel. (e) Information for the second segmentation. The first line represents the number of sub-voxel in each voxel. The second line represents the sub-hash table, and the third line represents the number of points in each sub-voxel.

To quickly find out which voxel contains a particular point, we assign an index for each voxel. Given a point p = \{x, y, z\} , we assume the point is allocated in a voxel indexed by vp = \{vp_x, vp_y, vp_z\} . vp can be calculated by

vp \leftarrow voxel(p) = \lfloor \frac{p - lowerbound}{R_{re}} \rfloor \quad (3)

To simplify the index of the voxel, an injective hashing function is defined as:

hash(p) = vp_x * VS_y * VS_z + vp_y * VS_z + vp_z \quad (4)

By following the previous procedure, we can quickly calculate the voxel index that contains a particular point. When the number of points in one voxel is above a threshold T_{voxel} , we segment it further to a set of sub-voxels with at most K points inside because of the following two reasons: (1) the second segmentation could eliminate a lot of redundant searches because only K Nearest-Neighbors are to be found. (2) the second segmentation enables efficient search even in an uneven point cloud because the voxels are prevented from being dense. This is a considerable advantage compared to GBDS ¹².

Fig. 2 gives a detailed example of building the DSVS. First, we segment a point cloud into 4 \times 4 voxels. Assume T_{voxel} = 4 , K = 1 , then we segment these two voxels (6 and 7 in the hash table) with more points again into 2 \times 2 sub-voxels. As a result, each voxel or sub-voxel contains a decent number of points.

After two levels of segmentations, two attributions hash and sub_hash are added to point p,

p = \{x, y, z, hash, sub_hash\} \quad (5)

where x, y, z represent the position of point p. hash represents the voxel index, and sub_hash represents the sub-voxel index in hash voxel. If the voxel indexed by hash has not been further segmented, the sub_hash will be assigned with a non-sub-voxel flag.

Since we have the hash and sub_hash values of all points, we sort them in a way that points from the same voxel and sub-voxel will be placed sequentially in physically contiguous

Fig. 3. Comparison of candidate neighbors. The red point means the query point and the red circle represents the ideal search regions with a radius of R_{in} . (a) when searching only at voxel level, there are 15 candidate neighbors(in yellow) in {6, 7, 10, 11, 14, 15} voxels(in blue) (b) when searching at sub-voxel level, there are 7 candidate neighbors(in yellow) in {10, 11, 14, 15} voxels and {6 - 3, 7 - 2} sub-voxels (in blue).

memory. The sorted points benefit hardware implementation a lot. First, we can obtain a higher transfer speed from DDR to the PL side when traversing points in one voxel or sub-voxel. Second, sorted points mean we can use the cyclic partition of a reference set to access more points in parallel.

IV. KNN SEARCH ALGORITHM

In this section, we introduce the KNN search process based on the DSVS. There are three steps: (1) calculate the hash and sub_hash values of the query point; (2) select the neighboring voxels and sub-voxels in R_{in} of query point as searching regions; (3) traverse points in searching regions, calculate their squared Euclidean distance with query point and select the KNN.

For a given query point, the hash value can be calculated by Eq. (4). In the hash level, we find the nearby regions by simulating a ball with radius equals to R_{in} . All voxels intersecting with or in the ball are labeled as candidate searching voxels. Then we traverse those candidate voxels to remove the sub-voxel out of range R_{in} .

The final search regions consist of voxels (cannot be secondarily segmented) and sub-voxels in the range R_{in} . As can be obtained in Fig. 3, the number of search regions is reduced from 6 to 4.5, and the number of candidate neighbors is reduced from 15 to 7 due to the second segmentation.

Finally, we traverse the candidate neighbors in the search regions to select the KNN with K-selection ¹³.

However, to implement the searching process on hardware, there are several challenges.

The dynamic number of neighboring voxels and sub-voxels
The dynamic number of points in a voxel or sub-voxel
Accessing points from external memory constrains the performance. Meanwhile, the limited number of block RAM on the PL side is insufficient for large-scale reference sets.

For the three challenges, we have developed the following solutions.

First, due to the limitation of the maximum number of voxels, we achieve the side length with R_{re} , where R_{re} \ge R_{in} . As a result, we can limit the search regions in the 3 \times 3 \times 3 voxels Fig. 4. The KNN search process of each query point.

around a query point, which should contain the desired search-

ing ball with a radius of R_{in} . Furthermore, from the building

process of DSVS, we know that the number of points in voxel

without second segmentation is limited by T_{voxel} . Therefore,

the number of neighboring voxels is less than 27, and the

number of points in one voxel is less than T_{voxel} .

Second, for the maximum number of points in one sub-

voxel, we can also set it as T_{voxel} for the following two reasons.

(1) If the number of points in one sub-voxel is above T_{voxel} ,

it means the sub-voxel is exceptionally dense. We can neglect

those redundant points, which influence little to the result. (2)

The same maximum value means that the voxel can be treated

as a special sub-voxel, which reduces the logic complexity

and benefits the hardware implementation. After profiling the

datasets, the number of sub-voxels in one voxel can be fixed

to guarantee the number of points in one sub-voxel is less

than T_{voxel} . Generally, the fixed number can be adjusted with

the distribution of the reference set and the allocated memory

for voxels, sub-voxels. Therefore, the number of sub-voxels in

one voxel is less than a fixed size, and the number of points

in one sub-voxel is less than T_{voxel} .

Third, for the dataset transfer and access challenge, we

have developed a new strategy based on DSVS. We note that

in the DSVS data structure, points are sorted by hash and

sub_hash. The points in the same voxel and sub-voxel will be

placed sequentially in memory. We also note that we can cal-

culate the hash of query points to sort them in the same way.

Furthermore, we know that only the points surrounding the

query point are needed when searching KNN. Based on the

previous three features, we have designed a customized dataset

buffer. (1) It supports a high data rate streaming interface at

transferring reference set from external memory to PL side.

(2) It uses block RAMs that are cyclically partitioned, so the

points in one voxel or sub-voxel can be accessed simultane-

ously. (3) It is continuously updated with query points. We

ensure that the points surrounding the current query point are

always in the dataset buffer.

An overview of the KNN search process is given in Fig. 4.

In func_1, for each query point, we obtain its nearby regions

that consist of neighboring voxels and sub-voxels in the range

R_{in} . The neighboring voxels are processed in pipeline, and the

sub-voxels in one voxel are processed in parallel. In func_2,

we choose the points in nearby regions and label them as

candidate neighbors. The points in one voxel or sub-voxel

are processed in parallel. In func_3, we partition the can-

didate neighbors with a large number of points into several

smaller groups of points. In func_4, we select KNN in parallel

from those partitioned candidate neighbors. The four functions

are designed to be balanced to achieve high performance of

dataflow operations. The data channel between functions is

implemented as a stream type to enable task-level pipelining.

V. EXPERIMENTAL RESULTS AND ANALYSIS

A. Experiment Setup

Dataset and Testbench: We choose KITTI ¹⁴ (Visual Odometry/SLAM Evaluation 2012 point cloud) as the dataset and LOAM (Lidar Odometry and Mapping) algorithm ¹⁵ as the testbench because LOAM gets the best results on the selected KITTI dataset. In the LOAM algorithm, KNN is used to match the feature points in one frame of LIDAR data (query set) with the surrounding point cloud map (reference set) to correct the vehicle's pose. In our experiments, we first run the LOAM algorithm on the KITTI point cloud dataset. Next, before calling KNN, we output the reference set and query set as PCD files. Then we test our KNN and also the state-of-the-art KNN implementations by those two PCD files for comparison. The KNN searching time should be less than 80ms to meet the 1Hz frequency constraint in LOAM because the KNN search process can be iterated up to 10 times to obtain the best match results. It should be noted that the data structure only needs to be built once because the map remains the same throughout the match process. We do not compare the build time because, under a typical case, both the GBDS and DSVS cost less than 20ms on the building, which is small enough.

After profiling the size of the reference set and query set

of KNN in LOAM with the KITTI dataset, we have evaluated

different KNN implementations in the two most representative

cases. Case I assigns the reference set size to 200000 (the

average size of reference set), K to 5, and the query set size

from 1000 to 20000. Case II assigns the query set size to 10000

(the average size of query set), K to 5, and the reference set

size from 20000 to 400000.

Besides, we have randomly generated a reference set with

300,000 points and a query set with 10,000 points as case III,

to evaluate our proposed method in a general KNN search

scenario.

Experimental Platform: All results reported on CPU are measured on Intel(R) i5-6300U 2.4GHz CPU. All results reported on GPU are measured based on an Nvidia GeForce GTX 1080 Ti graphics card. All results reported on FPGA are designed with Xilinx SDSOC 2018.2 and measured using a Xilinx UltraScale+MPSoC ZCU102 FPGA board running at 200MHz. The power of CPU is measured by a power meter. The power of FPGA is measured by an embedded sensor INA226 ¹⁶. The power of GPU is measured by the APIs from NVIDIA Management Library.

B. Results and Analysis

To compare our method with the state-of-the-art methods,

we have collected results from five different KNN imple-

mentations. The first is a baseline implementation of GBDS

on CPU, called GBDS_CPU. The second is an implementa-

tion of GBDS on GPU ¹⁷ by following the pseudo code,

Fig. 4. The KNN search process of each query point.

Fig. 5. KNN search time for different implementations with different sizes of query sets.

Fig. 6. KNN search time for different implementations with different sizes of reference sets.

called GBDS GPU. The third is an implementation of GBDS on FPGA with the optimization strategies in ¹⁸, called GBDS_FPGA. The fourth and the fifth are our proposed method DSVS implemented on GPU and FPGA, called DSVS GPU, DSVS FPGA.

In Fig. 5 and Fig. 6, we compare the KNN search time on case I and case II, respectively. The results show that DSVS_FPGA gains a similar performance with GBDS GPU. DSVS GPU gains the best performance, which is about 5 to 10 times faster than that of GBDS GPU and DSVS FPGA. However, when considering energy efficiency, FPGA-based implementations perform better than GPU. The average power and energy efficiency of case I are reported in Table I. Results show that DSVS_FPGA performs the best in energy efficiency, which is 2.1 times, 2.8 times, and 17.4 times that of GBDS FPGA ¹⁹, DSVS GPU and GBDS GPU ²⁰, respectively.

The energy efficiency gain between DSVS_FPGA and GBDS_FPGA mainly comes from two aspects: (1) Due to DSVS, we only need to search KNN in neighbor- ing voxels rather than in an expanded circle. (2) There are fewer candidate neighbors in DSVS as Fig. 3 shows. The energy efficiency gain between DSVS_FPGA and DSVS GPU mainly comes from the fact that the search pro- cess of DSVS_FPGA is implemented in dataflow with high parallelism.

To verify the adaptability of DSVS and compare it with a previous FPGA-based KNN implementation ²¹, we test the GBDS_FPGA and DSVS_FPGA on case III and record one query time with total execution time divided by the number of queries. Results in Table II show that the EER of DSVS_FPGA

TABLE I

COMPARISON OF ENERGY EFFICIENCY

Method	Power(W)	Time(ms)	Energy(J)	EER
GBDS_CPU	20	574	11480	0.009
GBDS_GPU [^4]	74.13	24.18	1792.46	0.058
GBDS_FPGA [^5]	3.463	62.0	214.706	0.480
DSVS_GPU	82.56	3.5	288.960	0.357
DSVS_FPGA	3.604	28.6246	103.074	1

TABLE II

COMPARISON OF FPGA-BASED IMPLEMENTATION

Methods	KNN_FPGA [^2]	GBDS_FPGA	DSVS_FPGA
Time(ms)	1.231	0.0085	0.007
Power(W)	3.136	3.5	3.68
EER	1	129.76	149.86
BRAMS / FFs	512 / 23892	564 / 47391	701 / 65024
DSPs / LUTs	12 / 11838	67 / 49805	83 / 68960

is 149.86 times of ²² and 1.15 times of GBDS_FPGA ²³, respectively. The acceleration ratio between DSVS_FPGA and GBDS_FPGA is decreased because DSVS almost degrades into GBDS when processing an evenly distributed point cloud. Besides, DSVS_FPGA utilizes more resources than GBDS_FPGA for building the sub-voxels and searching in sub-voxels.

VI. CONCLUSION

In this brief, we present an energy-efficient FPGA imple- mentation of KNN for massive 3D LIDAR point clouds in smart vehicles. We propose a DSVS framework to accel- erate the KNN search process. DSVS segments the point cloud a limited number of times to avoid dense points in one voxel. We have efficiently implemented the DSVS frame- work on both FPGA and GPU. Energy efficiency results show that our proposed method is on average 2.1 times and 6.2 times higher than the state-of-the-art implementations of KNN on FPGA and GPU, respectively. In future work, we will look into ways to further improve the search pro- cess, and explore how to balance the number of voxels and sub-voxels.

REFERENCES

J. Zhang and S. Singh, “Low-drift and real-time Lidar odometry and mapping,” Auton. Robots, vol. 41, pp. 401–416, Feb. 2017.↩︎
J. Zhang and S. Singh, “Low-drift and real-time Lidar odometry and mapping,” Auton. Robots, vol. 41, pp. 401–416, Feb. 2017.↩︎
F. B. Muslim, A. Demian, L. Ma, L. Lavagno, and A. Qamar, “Energy-efficient FPGA implementation of the k-nearest neighbors algorithm using OpenCL,” in Proc. FedCSIS, 2016, pp. 141–145.↩︎
D. Wehr and R. Radkowski, “Parallel KD-tree construction on the GPU with an adaptive split and sort strategy,” Int. J. Parallel Program., vol. 46, pp. 1139–1156, Apr. 2018.↩︎
D. Wehr and R. Radkowski, “Parallel KD-tree construction on the GPU with an adaptive split and sort strategy,” Int. J. Parallel Program., vol. 46, pp. 1139–1156, Apr. 2018.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
J. Zhang, J. Li, and S. Khoram, “Efficient large-scale approximate nearest neighbor search on OpenCL FPGA,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4924–4932.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
D. Wehr and R. Radkowski, “Parallel KD-tree construction on the GPU with an adaptive split and sort strategy,” Int. J. Parallel Program., vol. 46, pp. 1139–1156, Apr. 2018.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
J. Zhang, J. Li, and S. Khoram, “Efficient large-scale approximate nearest neighbor search on OpenCL FPGA,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4924–4932.↩︎
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 3354–3361.↩︎
J. Zhang and S. Singh, “Low-drift and real-time Lidar odometry and mapping,” Auton. Robots, vol. 41, pp. 401–416, Feb. 2017.↩︎
H. Ding and M. Huang, “Improve memory access for achieving both performance and energy efficiencies on heterogeneous systems,” in Proc. IEEE Int. Conf. Field Program. Technol. (FPT), 2014, pp. 91–98.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
J. Zhang, J. Li, and S. Khoram, “Efficient large-scale approximate nearest neighbor search on OpenCL FPGA,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4924–4932.↩︎
J. Zhang, J. Li, and S. Khoram, “Efficient large-scale approximate nearest neighbor search on OpenCL FPGA,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4924–4932.↩︎
P. J. S. Leite et al., “Nearest neighbor searches on the GPU,” Int. J. Parallel Program., vol. 40, pp. 313–330, Oct. 2011.↩︎
F. B. Muslim, A. Demian, L. Ma, L. Lavagno, and A. Qamar, “Energy-efficient FPGA implementation of the k-nearest neighbors algorithm using OpenCL,” in Proc. FedCSIS, 2016, pp. 141–145.↩︎
F. B. Muslim, A. Demian, L. Ma, L. Lavagno, and A. Qamar, “Energy-efficient FPGA implementation of the k-nearest neighbors algorithm using OpenCL,” in Proc. FedCSIS, 2016, pp. 141–145.↩︎
J. Zhang, J. Li, and S. Khoram, “Efficient large-scale approximate nearest neighbor search on OpenCL FPGA,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4924–4932.↩︎