# FPGA-based Acceleration Applied to Spherical Projection and Distance Image Feature Extraction Algorithms in 3D LiDAR Odometry for Autonomous Driving

## Abstract

With the rapid development of autonomous driving technology, LiDAR (Light Detection and Ranging) has gradually become a mainstream tool for vehicle positioning and navigation. LiDAR odometry relies on the processing and feature extraction of 3D point cloud data to achieve high-precision environmental perception and path planning. Traditional computational methods face significant bottlenecks when handling these data, particularly in terms of real-time processing and hardware acceleration. To address this, this paper proposes a hardware implementation scheme for 3D LiDAR odometry accelerated by FPGA, with a focus on spherical projection and distance image-based feature extraction algorithms.First, we designed and implemented a multi-core parallel FPGA hardware architecture, which includes two main modules: the multi-core parallel spherical projection hardware module and the normal vector feature extraction module. By optimizing the hardware architecture and leveraging FPGA’s parallel processing capabilities, we accelerated the computation process, significantly improving the data processing speed.Experimental results show that, compared to traditional CPU processing methods, the FPGA-based acceleration scheme demonstrates significant acceleration in both spherical projection and feature extraction, with processing time greatly reduced. By comparing the processing times across different platforms, we validated the potential of FPGA in efficiently processing LiDAR data, providing an effective solution for hardware acceleration in 3D LiDAR data processing for autonomous driving systems.The findings of this paper offer a reference path for future FPGA-based hardware acceleration implementations in autonomous driving systems and open new directions for hardware optimization and applications in related fields.

## Authors

Yu Hu *Department of Electrical and Computer Engineering, AMSV University of Macau, Macau, China*

Sio Hang Pun *Department of Electrical and Computer Engineering, AMSV University of Macau, Macau, China*

Albert Li *Lingyange Semiconductor Inc, Zhuhai, China*

Mang I Vai *Faculty of Science and Technology, University of Macau, Macau, China*

Jianke Zhu *College of Computer Science, Zhejiang University, Hangzhou, China*

Peng Un Mak *Faculty of Science and Technology, University of Macau, Macau, China*

## Publication Information

**Journal:** 2025 6th International Conference on Electrical, Electronic Information and Communication Engineering (EEICE) **Year:** 2025 **Pages:** 453-457 **DOI:** [10.1109/EEICE65049.2025.11033697](https://doi.org/10.1109/EEICE65049.2025.11033697) **Article Number:** 11033697

## Metrics

**Total Downloads:** 49

## Funding

- Technology Development

---

## Keywords

**IEEE Keywords:** Laser radar, Three-dimensional displays, Computer architecture, Feature extraction, Data processing, Real-time systems, Odometry, Hardware acceleration, Field programmable gate arrays, Autonomous vehicles

**Index Terms:** Light Detection And Ranging, Autonomous Vehicles, Projection Images, Odometry, Project Characteristics, Projection Algorithm, Feature Extraction Algorithm, Great Circle Distance, Spherical Projection, Spherical Features, FPGA-based Accelerator, Normal Vector, Point Cloud, Path Planning, Light Detection, 3D Point, Hardware Implementation, 3D Point Cloud, Point Cloud Data, Hardware Architecture, Simultaneous Localization And Mapping, Eigenvalue Decomposition Of Matrix, Kernel Computation, Parallelization, MATLAB Platform, Single Kernel, Matrix Factorization, Covariance Matrix, Feature Extraction Part, Single Project

**Author Keywords:** LiDAR, Odometry, FPGA Acceleration, Spherical Projection, Feature Extraction

undefined
## SECTION I. Introduction

In recent years, there has been increasing attention on autonomous driving, leading to rapid advancements in related technologies [^1]. Odometry plays a crucial role in autonomous driving systems, as its main function is to measure and estimate the vehicle's position and motion state, providing key data for localization and navigation [^2]. Currently, odometry in autonomous driving systems largely relies on cameras and LiDAR (Light Detection and Ranging) [^3] [^4]. Compared to cameras, LiDAR odometry offers advantages such as strong immunity to lighting conditions, precise depth information, low environmental dependence, and a large measuring range, making it a focal point of research. Particularly in LiDAR odometry algorithms, preparing the frontend feature data is one of the core modules, which typically relies on feature extraction algorithms to obtain structural features from point clouds in odometry, or SLAM (Simultaneous Localization and Mapping) systems, the method of projecting 3D point cloud data onto 2D distance images has been widely studied in recent years [^5]. The introduction of this method as in [^5] marks a new approach and progress, with many researchers continuously optimizing it to improve system accuracy and speed [^6]. This method preserves the spatial neighborhood relationships of 3D point clouds and is highly suitable for parallel computing, offering good performance. Also, there are some studies focusing on the study of LOAM algorithms and hardware acceleration of its related improved algorithms [^7] [^8] [^9]. However, there are almost no hardware acceleration studies based on [^10], and only studies try to accelerate the algorithm through the joint acceleration of CPU and GPU, but to date, no studies have implemented this method on FPGA. Therefore, this paper aims to explore FPGA-based acceleration, focusing on spherical projection and distance image-based feature extraction algorithms. The goal is to provide a foundational implementation path for hardware acceleration of odometry algorithms based on spherical projection and feature extraction, and to offer a reference for hardware implementation research in this field.

Based on this objective, we propose a multi-core parallel FPGA hardware architecture, which mainly includes the following two components:

1. **Multi-core Parallel Spherical Projection Hardware Implementation**: Since the points in spherical projection are independent of each other, parallel computation can be performed. We have designed a unit module for projecting each point independently and achieved acceleration through a multi-core parallel architecture.
2. **Multi-point Parallel Dual RAM Access Hardware Implementation for Normal Vector Feature Extraction**: In the normal vector feature extraction process, we adopt a vertical multi-point parallel horizontal scanning strategy to improve processing efficiency and reduce computation time.

## SECTION II. Overview of the Front-End System

In this section, the overall implementation architecture and basic principles of the front-end system are introduced. First, the raw 3D point cloud data from the LiDAR on the vehicle is continuously fed into four parallel spherical projection modules through a double-buffer mechanism (active buffer and inactive buffer). After processing by the spherical projection modules, the 3D point cloud is projected to generate a 2D distance image, which is then stored in dual-port RAM. Subsequently, the feature extraction module reads the distance image data from the dual-port RAM and calculates the feature points in the distance image, ultimately generating the vertex map and normal vector map. The hardware architecture of the whole system is shown in Figure 1.

![Figure 1](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu1-p5-hu-large.gif)

*Figure 1. Hardware Architecture of the Entire Front-End System.*

### A. Hardware Implementation Method of Multi-Core Parallel Spherical Projection

The spherical projection part is the first stage of data preprocessing, and the projection calculation of each 3D point is independent, making it highly suitable for parallel computation. We designed a computation kernel for the spherical projection of individual points, which can be called multiple times in the higher-level system to implement multi-core parallelism. For resource considerations, we have implemented a four-core parallel computation. To ensure continuous operation of the computation kernels, we designed a double-buffer mechanism, where the active buffer continuously transmits data to the computation kernel, while the inactive buffer prepares the next data for computation. After the computation is complete, the buffers are swapped to maintain continuous data transmission. In addition, to ensure that data is continuously transmitted to RAM without interruption, the data is first written to a buffer before being sent to RAM. The flowchart of the spherical projection part of the realization process is shown in Figure 2.

![Figure 2](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu2-p5-hu-large.gif)

*Figure 2. The flowchart of the spherical projection part shows that the processes within a dashed box are performed synchronously.*

The following describes the computation of a single spherical projection kernel. 3D point clouds can be mapped to an organized 2D distance image through spherical projection, where the specific mapping function for a given 3D point is defined as follows [^5]:

$$
\begin{equation*}\left[ {\begin{array}{l} r \\ \theta \\ \varphi \end{array}} \right] = \left[ {\begin{array}{c} {\sqrt {{x^2} + {y^2} + {z^2}} } \\ {\arctan \left( {\frac{y}{x}} \right)} \\ {\arcsin \left( {\frac{z}{{\sqrt {{x^2} + {y^2} + {z^2}} }}} \right)} \end{array}} \right]\tag{1}\end{equation*}
$$

After the basic spherical projection, the projected image needs to undergo normalization. The specific function for obtaining the final point is as follows [^10]:

$$
\begin{equation*}\left[ {\begin{array}{l} u \\ v \end{array}} \right] = \prod\nolimits_s {(x,y,z)} = \left[ {\begin{array}{c} {\frac{1}{2}\left( {1 - \frac{\theta }{\pi }} \right){\omega _s}} \\ {\left[ {1 - \left( {\varphi + {f_{up}}} \right){f^{ - 1}}} \right]{h_s}} \end{array}} \right]\tag{2}\end{equation*}
$$

![Figure 3](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu3-p5-hu-large.gif)

*Figure 3. Internal Framework of the Single Spherical Projection Computation Kernel.*

In the single spherical projection computation kernel we designed; we implemented the operations of the two formulas mentioned above. The specific internal structure is shown in Figure 3. The computation of several trigonometric functions involved is achieved using the CORDIC IP. The flow inside a single spherical projection kernel is shown in Figure 4.

![Figure 4](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu4-p5-hu-large.gif)

*Figure 4. Internal process flow diagram of a single core for spherical projection.*

### B. Hardware Implementation Method for Multi-Point Parallel Dual-RAM Access for Normal Vector Features

To prepare the data for the subsequent point-to-point registration, we need to obtain the surface normal vector. In [^5] (in the literature), the cross-product method is used to calculate the normal vector, whereas in this work, we use the method from [^10] to compute the normal vector.

![Figure 5](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu5-p5-hu-large.gif)

*Figure 5. Quad-Core Parallel Scanning Computation Process for Distance Image.*

We use a fixed window to construct the neighborhood point set and select the nearest neighbors. The local surface best normal vector for the subset of points that meet the requirements is obtained by constructing an error function and performing least squares fitting, the error equation is given in equation (3). The point cloud subset can be modeled using a covariance matrix, and the normal vector is derived by decomposing this covariance matrix. The construction of the covariance matrix is shown in equation (4). The feature direction is selected based on the smallest eigenvalue, and inappropriate points are excluded based on curvature.

$$
\begin{align*} & {\text{e}} = \sum\nolimits_{{\text{i}} = 1}^{\text{k}} {{{\left( {{\text{p}}_{\text{i}}^{\text{T}}{\text{n}} - {\text{d}}} \right)}^2}} \quad {\text{ subject to }}|n| = 1\tag{3} \\ & \sum = \frac{1}{{\text{k}}}\sum\nolimits_{{\text{i}} = 1}^{\text{k}} {\left( {{{\text{p}}_{\text{i}}} - \overline {\text{p}} } \right)} {\left( {{{\text{p}}_{\text{i}}} - \overline {\text{p}} } \right)^{\text{T}}},\overline {\text{p}} = \frac{1}{{\text{k}}}\sum\nolimits_{{\text{i}} = 1}^{\text{k}} {{{\text{p}}_{\text{i}}}} \tag{4}\end{align*}
$$

![Figure 6](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu6-p5-hu-large.gif)

*Figure 6. The internal structure of the single-core computation, which also includes another module for eigenvalue decomposition.*

In the hardware implementation, we designed a feature extraction core for the points. Since the computation for each point is independent, it can be processed in parallel. We implemented four feature extraction cores to operate in parallel. The scan sequence and parallel computation method for the distance image are shown in the figure. We selected four rows for parallel processing, scanning horizontally to the right and sending the four points simultaneously into the feature extraction cores for computation. After computation, the process continues to scan the next points to the right. Once the 800 points in a row are computed, the scanning moves downward with a step of three points, completing the calculation for the entire distance image. The specific order and manner in which the quad cores are scanned in the map in parallel is shown in Figure 5.

![Figure 7](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu7-p5-hu-large.gif)

*Figure 7. Internal process flow diagram of a single feature extraction core.*

The structure of a single feature extraction core is shown in Figure 6, and the internal process flow of the single feature extraction core is shown in Figure 7. For the division calculations, we used a division IP. In addition, there is an important module for the eigenvalue decomposition of the covariance matrix. We have designed this separately, turning the eigenvalue decomposition of the covariance matrix into an independent module, which can be repeatedly instantiated and called in the peripheral modules. For the eigenvalue decomposition of the matrix, we used the traditional Jacobi iteration method. A state machine serves as the main framework for the iteration, and if the iteration exceeds a certain data threshold, it will break out.

## SECTION III. Experiment and Evaluation

To evaluate our design, we chose the Kitty dataset as the test data. Since our focus is on accelerating the front-end data preparation, and we are still using existing algorithms for acceleration without making any algorithmic improvements, we do not assess the overall system quality. Instead, we focus on comparing the point cloud data captured by LiDAR. We only conduct a simple comparison of the functionality, and the time improvements achieved.

![Figure 8](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu8-p5-hu-large.gif)

*Figure 8. The original image.*

![Figure 9](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu9-p5-hu-large.gif)

*Figure 9. Two result images from MATLAB, and two result images from FPGA.*

We show two maps obtained after spherical projection and feature extraction using the algorithm on the MATLAB platform and two maps obtained after implementing the algorithm on an FPGA to visualize our results. Figure 8 shows the unprocessed original point cloud image, while a comparison of the vertex maps and normal vector maps obtained after processing on the MATLAB platform and hardware implementation is shown in Figure 9, where it can be seen that the feature point mapping indicates that the orthogonality of the extracted edge points and normal vectors is as expected. To evaluate our design, we chose the Kitty dataset as our test data. Since our focus is on accelerating the front-end data preparation and we are still using the existing algorithms for acceleration without making any improvements to the algorithms, we do not evaluate the overall quality of the system. Instead, we focus on comparing the point cloud data collected by the LiDAR. We only make simple comparisons of functionality and time improvements achieved. We primarily compared the execution times of the spherical projection and feature extraction algorithm implementations on two different platforms. First, on the PC side, we used an Intel Core i7-12700 CPU, running at 2100 MHz, and tested it on MATLAB R2023a. The results on the FPGA are designed using Xilinx Vivado 2022.2 software and Verilog language, with an FPGA board, Xilinx Virtex UltraScale VCU129, running at a frequency of 250 MHz.

We evaluated the time performance of the two platforms and detailed the time for each phase, as shown in Table I and Table II. As seen, the overall front-end data preparation on FPGA achieved a speedup of about 2.57 times, the spherical projection part saw a speedup of about 3.55 times, while the feature extraction part achieved a speedup of about 2.26 times. The speedup for the feature extraction part is relatively lower, mainly due to the high computational complexity of the feature decomposition, where some iterative processes take longer, making the overall execution time longer. Additionally, the use of the Jacobi iteration algorithm might not be the optimal solution in terms of speed. This is something we plan to explore further in future research, to investigate if there are faster methods for performing the feature decomposition computation.

![Figure 10](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu.t1-p5-hu-large.gif)

*TABLE I.*

![Figure 11](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11033714/11033691/11033697/hu.t2-p5-hu-large.gif)

*TABLE II.*

## SECTION IV. Conclusions

In our work, we have implemented hardware acceleration for spherical projection and feature extraction. The spherical projection part is primarily accelerated using a double-buffer multi-core parallel structure, while feature extraction is also accelerated through multi-core parallel double-port RAM transfers. According to our results, the hardware acceleration has achieved a 2.57x speedup compared to the PC-based implementation. However, compared to the real-time requirements and in comparison, with existing GPU-CPU hybrid architectures, there is still significant room for speed improvement. Currently, there are no FPGA hardware acceleration implementations targeting these algorithms for spherical projection and feature extraction, and while our speedup may not be very significant, it still provides a foundation and reference value for future related research. Additionally, our research has some limitations, such as the lack of focus on low-power implementation, which is also a key issue in autonomous driving. In the future, we will also explore faster feature decomposition hardware implementation methods and more efficient acceleration hardware structures.

## References

[^1]: E. Yurtsever, J. Lambert, A. Carballo and K. Takeda, “A Survey of Autonomous Driving: Common Practices and Emerging Technologies,” in IEEE Access, vol. 8, pp. 58443 - 58469, 2020, doi: 10.1109/ACCESS.2020.2983149. [IEEE](https://ieeexplore.ieee.org/document/9046805) [Google Scholar](https://scholar.google.com/scholar?as_q=A+Survey+of+Autonomous+Driving%3A+Common+Practices+and+Emerging+Technologies&as_occt=title&hl=en&as_sdt=0%2C31)

[^2]: C. Cadena et al, “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age,” in IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1309 - 1332, Dec. 2016, doi: 10.1109/TRO.2016.2624754. [IEEE](https://ieeexplore.ieee.org/document/7747236) [Google Scholar](https://scholar.google.com/scholar?as_q=Past%2C+Present%2C+and+Future+of+Simultaneous+Localization+and+Mapping%3A+Toward+the+Robust-Perception+Age&as_occt=title&hl=en&as_sdt=0%2C31)

[^3]: R. Mur-Artal, J. M. M. Montiel and J. D. Tardós, “ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” in IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147 - 1163, Oct. 2015, doi: 10.1109/TRO.2015.2463671. [IEEE](https://ieeexplore.ieee.org/document/7219438) [Google Scholar](https://scholar.google.com/scholar?as_q=ORB-SLAM%3A+A+Versatile+and+Accurate+Monocular+SLAM+System&as_occt=title&hl=en&as_sdt=0%2C31)

[^4]: J. Zhu, “Image gradient-based joint direct visual odometry for stereo camera ”, Proc. Int. Joint Conf. Artif. Intell., pp. 4558 - 4564, 2017. Image Gradient-based Joint Direct Visual Odometry for Stereo Camera [DOI](https://doi.org/10.24963/ijcai.2017/636) [Google Scholar](https://scholar.google.com/scholar?as_q=Image+gradient-based+joint+direct+visual+odometry+for+stereo+camera&as_occt=title&hl=en&as_sdt=0%2C31)

[^5]: J. Behley and C. Stachniss, “Efficient surfel-based SLAM using 3D laser range data in urban environments ”, Robot.: Sci. Syst., vol. 2018, 2018. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments [DOI](https://doi.org/10.15607/rss.2018.xiv.016) [Google Scholar](https://scholar.google.com/scholar?as_q=Efficient+surfel-based+SLAM+using+3D+laser+range+data+in+urban+environments&as_occt=title&hl=en&as_sdt=0%2C31)

[^6]: X. Chen, A. Milioto, E. Palazzolo, P. Giguère, J. Behley and C. Stachniss, “SuMa++: Efficient LiDAR-based Semantic SLAM,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 4530 - 4537, doi: 10.1109/IROS40897.2019.8967704. [IEEE](https://ieeexplore.ieee.org/document/8967704) [Google Scholar](https://scholar.google.com/scholar?as_q=SuMa%2B%2B%3A+Efficient+LiDAR-based+Semantic+SLAM&as_occt=title&hl=en&as_sdt=0%2C31)

[^7]: J. Zhang and S. Singh, “LOAM: Lidar odometry and mapping in real-time ”, Robot.: Sci. Syst., vol. 2, no. 9, 2014. Ji_LidarMapping_RSS2014_v8.pdf [DOI](https://doi.org/10.15607/RSS.2014.X.007) [Google Scholar](https://scholar.google.com/scholar?as_q=LOAM%3A+Lidar+odometry+and+mapping+in+real-time&as_occt=title&hl=en&as_sdt=0%2C31)

[^8]: H. Wang, C. Wang, C. -L. Chen and L. Xie, “F-LOAM: Fast LiDAR Odometry and Mapping,” 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021, pp. 4390 - 4396, doi: 10.1109/IROS51168.2021.9636655. [IEEE](https://ieeexplore.ieee.org/document/9636655) [Google Scholar](https://scholar.google.com/scholar?as_q=F-LOAM%3A+Fast+LiDAR+Odometry+and+Mapping&as_occt=title&hl=en&as_sdt=0%2C31)

[^9]: T. Shan and B. Englot, “LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain,” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 4758 - 4765, doi: 10.1109/IROS.2018.8594299. [IEEE](https://ieeexplore.ieee.org/document/8594299) [Google Scholar](https://scholar.google.com/scholar?as_q=LeGO-LOAM%3A+Lightweight+and+Ground-Optimized+Lidar+Odometry+and+Mapping+on+Variable+Terrain&as_occt=title&hl=en&as_sdt=0%2C31)

[^10]: X. Zheng and J. Zhu, “Efficient LiDAR Odometry for Autonomous Driving,” in IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8458 - 8465, Oct. 2021, doi: 10.1109/LRA.2021.3110372. [IEEE](https://ieeexplore.ieee.org/document/9531543) [Google Scholar](https://scholar.google.com/scholar?as_q=Efficient+LiDAR+Odometry+for+Autonomous+Driving&as_occt=title&hl=en&as_sdt=0%2C31)

### Additional References