# Packetized Pipelined Pillar Feature Net Accelerator for LiDAR 3D Object Detection

## Abstract

Implementing LiDAR-based 3D object detection algorithms in practical autonomous driving situations presents a significant challenge. In current research algorithms, the inherent sparsity and randomness of point cloud data necessitate significant memory usage and frequent data read/write operations during preprocessing. Such demands are not well-suited for terminal devices with stringent real-time requirements and constrained resources. In this paper, we present a packetized processing Pillar Feature Net accelerator for LiDAR 3D object detection. By integrating voxelization and feature extraction into a pipelined architecture, the proposed accelerator significantly reduces the storage requirements for point cloud data and enhances the speed of feature extraction and pseudo-image generation. Experimental results indicate that the proposed method improves the computational throughput from point cloud data to pseudo-image generation by 1.2 times and eliminates the need for off-chip memory access during preprocessing.

## Authors

Qingyu Deng *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

Xinyu Chen *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

Wei Zhang *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

Beining Zhao *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

Yuhang Gu *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

Shan Cao *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

Zhiyuan Jiang *School of Communication and Information Engineering, Shanghai University, Shanghai, China*

## Publication Information

**Journal:** 2025 IEEE International Symposium on Circuits and Systems (ISCAS) **Year:** 2025 **Pages:** 1-5 **DOI:** [10.1109/ISCAS56072.2025.11043447](https://doi.org/10.1109/ISCAS56072.2025.11043447) **Article Number:** 11043447 **ISSN:** Electronic ISSN: 2158-1525, Print on Demand(PoD) ISSN: 0271-4302

## Metrics

**Total Downloads:** 35

## Funding

- National Natural Science Foundation of China

---

## Keywords

**IEEE Keywords:** Point cloud compression, Three-dimensional displays, Laser radar, Circuits and systems, Object detection, Computer architecture, Feature extraction, Throughput, Real-time systems, Field programmable gate arrays

**Index Terms:** 3D Object Detection, Point Cloud, Point Cloud Data, Terminal Devices, Off-chip Memory, Processing Unit, Hash Function, Data Frame, Data Packets, Beam Scanning, Lidar Data, Feature Encoder, Point Index, Table Entries, Hardware Accelerators, KITTI Dataset, External Interface

**Author Keywords:** 3D object detection, LiDAR, PointPillars, FPGA

undefined
## SECTION I. Introduction

The significance of 3D object detection in autonomous driving has gained increasing recognition, particularly in LiDAR-based detection systems, which offer highly accurate point cloud data. Consequently, LiDAR-based detection algorithms have become a central focus of current research [^1] [^2]. In many terminal devices requiring 3D object detection, timeliness and energy consumption are critical factors determining whether the device can be deployed on a large scale [^3]. Due to the sparsity and irregularity of point clouds, direct processing with neural networks can lead to high computational complexity which is inefficient [^4]. The voxel-based point cloud processing approach significantly reduces computational complexity by partitioning raw data into voxels [^5] [^6]. The PointPillars [^7] algorithm further simplifies the voxel representation into pillars and reduces 3D convolution for feature extraction to 2D convolution, thereby further lowering the computational cost of model inference. This model achieves a commendable balance between speed and accuracy in the field of LiDAR-based 3D object detection.

The workflow of the PointPillars algorithm is illustrated in Fig. 1. Initially, the point cloud data is fed into the Point Feature Net (PFN) module, where it is voxelized into individual pillars in a 2D space. These pillars, containing the point cloud data, then proceed to a feature extraction layer. In this layer, the pillar data is processed to extract higherlevel representations. Subsequently, the spatial relationships between points in the original data are used to generate pseudo-images, which resemble traditional images. Finally, 2D convolution is applied for feature extraction, followed by 3D box regression predictions via the Detection Head.

![Figure 1](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng1-p5-deng-large.gif)

*Fig. 1: An overview of the PointPillars network structure*

However, in previous studies, many hardware architecture designs have been proposed to accelerate the backbone and SSD steps of the PointPillars network [^8]–​[^9]. Yet, there has been relatively little research on the voxelization and feature extraction steps for point cloud data. In these studies, the preprocessing phase, such as the voxelization and encoding steps, is typically implemented by software and processed on the CPU side. Despite various optimizations applied to the software code, the entire preprocessing inference process still requires 70 milliseconds to run in a multithreaded environment [^8]. Therefore, implementing point cloud data preprocessing through the CPU severely hinders inference speed and results in high power consumption [^10]. To address this issue, several corresponding hardware accelerators are proposed for the 3D point cloud preprocessing stage [^11]–​[^12]. Notably, the accelerator for feature extraction is the sole focus of [^11], requiring continued CPU assistance for the entire preprocessing process. A hardware accelerator for voxelized point cloud data was introduced by [^13], where hash lookup techniques were utilized to reduce redundancy in voxel information storage and minimize frequent access during voxelization; however, the feature extraction process was not jointly optimized. A comprehensive hardware acceleration framework for the entire PFN stage was developed by [^12], but CPU intervention is still needed to manage interactions with external interfaces. Furthermore, both approaches necessitate off-chip memory to store the original point cloud data, introducing additional area, power consumption, and latency [^14], thereby rendering them unsuitable for deployment in certain resource-constrained terminal applications.

This paper presents a Pillar Feature Net Accelerator (PFNA) based on the PointPillars network. The accelerator comprises a pillar generator and a feature extractor, supporting packetized and pipelined processing of point cloud data within each frame, thereby achieving high-speed and efficient preprocessing. The pillar generator processes the raw point cloud data in packets, voxelizes it via hash mapping, and classifies it into a two-dimensional grid coordinate system. Subsequently, it compares the voxels based on their occurrence in previous packets, outputting qualifying voxels as pillars, which are then passed to the feature generator for stream processing. The extracted feature information is remapped to the two-dimensional grid to generate pseudo-images, which are stored in DDR. Our proposed approach reduces the large-scale storage requirements for the entire point cloud data frame during the PFN stage. It not only eliminates frequent interactions with off-chip memory during the point cloud preprocessing phase but also increases computational throughput through the pipelined gains of packetized processing. Ultimately, we implemented a complete FPGA-based PointPillars network using the proposed PFNA, and its feasibility was validated through experiments conducted on the KITTI dataset.

![Figure 2](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng2-p5-deng-large.gif)

*Fig. 2: Different Approaches of Pillar Feature Net Process: (a) The traditional method requires storing all voxelized point clouds in a frame. (b) Pipelined operations allow point clouds to be temporarily stored in packets within each frame.*

## SECTION II. Motivations

Point cloud data typically exhibit irregular spatial distribution characteristics, resulting in substantial variations in point cloud density across different regions. Hash tables, serving as efficient mapping and lookup tools, offer an effective solution for the storage and retrieval of voxel information [^15]. In this study, we reference the voxel encoding accelerator detailed in [^13] to address the challenges associated with redundant storage and frequent access to voxel data effectively. However, existing research on PFN accelerators fundamentally adheres to the workflow of the original network. As illustrated in Fig. 2(a), the traditional PFN process performs feature encoding only after the voxelization of each frame of point cloud data is complete. Consequently, a significant amount of memory is required to store all point cloud data within a frame during processing, which is typically reliant on off-chip memory [^13] [^12].

According to the principles of LiDAR data acquisition, point cloud coordinates captured prior to the laser beam scanning a specific angle will not reappear in the current frame. Therefore, it is unnecessary to wait for the voxelization of a frame of data to conclude before initiating feature encoding in the PFN process. Instead, the data within a frame can be segmented into several consecutive packets, each encompassing point cloud data within a defined angular range. Subsequently, it is possible to determine, on a packet-by-packet basis, whether a particular pillar and its corresponding point cloud data can be transferred to the subsequent processing stage. As depicted in Fig. 2(b), when the LiDAR beam scans between intervals 2 and 3, only the pillar information associated with point clouds in interval 1 can be transmitted to the feature extraction layer for encoding. Consequently, during processing, we can determine whether a pillar can be output to the next stage by comparing the pillar information across the two packets. This pipelined PFN approach necessitates the storage of at most the point cloud information of two packets, thereby effectively eliminating the requirement to retain an entire frame of point cloud data. This not only significantly reduces storage demands but also presents potential for on-chip implementation.

This hypothesis can be validated utilizing the KITTI [^16] dataset: if we regard the *N* consecutive point cloud data points within a single scanning cycle as a data packet and evaluate whether the point cloud data from adjacent *k* packets corresponds to the same voxel, we can assume that if the point cloud data in packets *j* and *j*+*i* pertains to the same voxel, yet the voxel is absent in the intermediate packets, the point cloud data in packet *j* + *i* will be discarded during processing. As *k* approaches infinity, we arrive at the conclusion illustrated in Figure 3: using LiDAR data from one scanning cycle as an example, comprising a total of 122,637 point cloud data points, the number of lost point cloud data points approaches zero as the number of cached points per data packet *N* increases. This substantiates the hypothesis that caching all point cloud data during the voxelization process is unnecessary and that, at runtime, the qualifying voxels can be output in advance for feature extraction and pseudo-image generation.

![Figure 3](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng3-p5-deng-large.gif)

*Fig. 3: The impact of packet size on missed points*

## SECTION III. Pillar Feature Net Accelerator

### A. SoC Architecture

The overall architecture of the proposed accelerator is illustrated in Fig. 4. This accelerator comprises a PFN preprocessing unit and a Neural Processing Unit (NPU) [^17]. The PFN preprocessing unit facilitates an external interface for LiDAR point cloud data and an AXI-stream data flow interface, which enables the remapping and storage of computed feature information in DDR to generate pseudo-images. Additionally, the PFN unit incorporates an interrupt signal to notify the NPU. A point buffer is utilized to store the point cloud data acquired from LiDAR, triggering computations on a packet basis. Within the *Preprocessor* unit, the point cloud data undergoes quantization and pillar coordinate calculations, which are subsequently relayed to the *Pillar Generator* unit for determining the corresponding pillar. The *Packet Controller* unit contains several record tables and decision units, which monitor the generation of pillars, determine which pillars can be output, and instruct the *Pillar Generator* to continuously output pillar information for storage in the *Pillar Buffer.* The *Feature Extractor* persistently receives pillars for encoding, after which the encoded features are integrated with the original 2D spatial coordinates of the point cloud data and transformed into AXI-stream data signals. Upon the complete storage of a data frame in DDR and the successful generation of the pseudo-image, the PFNA triggers an interrupt to the NPU, signaling that data is ready for processing and facilitating subsequent predictions and computations. The implementation of a pipelined approach for voxelization and encoding ensures that the memory requirements for both buffers remain minimal, allowing for comprehensive implementation within the on-chip RAM.

### B. Pillar Generator

As shown in Fig. 5, the data collected by LiDAR is stored in the *Point Buffer.* Once the number of stored points reaches the preset packet size, the buffer activates the subsequent processing unit to start the computation. First, the data within the detection range is filtered based on the positional information of the point cloud, followed by the calculation of grid coordinates for the point cloud data, which are then sent as hash keys to the voxel generator. The design structure of the voxel generator is implemented based on the hierarchical table described in the literature [^13], which outputs both the voxel index and the point index. The voxel index is used to check whether there is a corresponding entry in the *Voxel Index Table.* If none exists, a new entry is created; if one exists, it checks whether the entry in the *Mem Status Table* is marked as "A"(Allocated). If the entry is marked as "F" (Free), it indicates that the index has already been processed and output, and the corresponding point is discarded. If the entry is marked "A", the entry in the *Voxel Index Table* is used as the column address and the point index is used as the row address to store the data in the Point Data Memory. Additionally, the corresponding entry in the *Point Index Table* is updated with the point index. After all data in a packet has been processed, the entry in the *Package Index Table* corresponding to the current Package Counter-1 is located. The data is then retrieved in column-wise units, and the relevant position in the *Mem Status Table* is updated to "F". A mask is subsequently generated based on the entry values in the *Point Index Table* for valid point cloud data, which is applied to the retrieved column data. The generated data forms a pillar, which is then placed in the pillar buffer for further processing.

![Figure 4](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng4-p5-deng-large.gif)

*Fig. 4: SoC architecture for 3D object detection with LIDAR*

![Figure 5](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng5-p5-deng-large.gif)

*Fig. 5: The architecture of the Pillar Generator*

![Figure 6](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng6-p5-deng-large.gif)

*Fig. 6: The architecture of the Feature Extractor*

### C. Feature Extractor

The Feature Extractor will continuously extract pillars for feature encoding when the *Pillar Buffer* is not empty. This unit is capable of performing one-dimensional convolution, BatchNorm, ReLU, and max pooling operations, ultimately outputting features. As shown in Fig. 6, the Feature Extractor first receives Pillar data from the *Pillar Buffer.* It then uses DSPs to perform one-dimensional convolution. To enhance hardware execution efficiency, we have fused the convolution and BatchNorm operations. Assuming the output of the convolution part is *z* = *w* × *x* + *b*, and the output of the BatchNorm part is $y = \gamma \frac{{z - \mu }}{{\sqrt {{\sigma ^2} + \varepsilon } }} + \beta$. So the fused convolution parameters are derived by combining the above two equations, resulting in new convolution equations $y = \gamma \frac{{w \times x + b - \mu }}{{\sqrt {{\sigma ^2} + \varepsilon } }} + \beta$. These fused training parameters are pre-loaded into the unit. Finally, the Feature Extractor outputs the features as data, and outputs coordinates as addresses to generate AXI-stream signals, thereby storing the feature data in DDR to map the pseudo-images.

## SECTION IV. Implementation Results

### A. Experimental Setup

We implemented the SoC design using a development board with the Kintex-7 XC7K410T chip and evaluated it on the KITTI dataset. After considering both area and accuracy in our experimental setup, we set points per packet to 128 for the board validation. The clock frequencies for the PFNA and NPU are 200 MHz. We refer to the PointPillars network structure implemented in [^18]. Compared to the official implementation of PointPillars, our feature extraction component utilizes only 4-dimensional information for computation. This reduction in dimensionality decreases the processing time by 4 milliseconds while maintaining nearly the same accuracy. As a result, it saves 5% of LUT resources and 8% of BRAM resources during the PFN process.

### B. Evaluation Results

The PFN accelerator we proposed is compared to previous studies regarding latency and throughput, as presented in Table I. Our implementation achieves the highest throughput, which is 21 times that of CPU-based computation (Cortex-A53) and 1.2 times higher than the existing optimal hardware accelerators. This indicates that our accelerator is capable of executing the greatest number of multiply-accumulate (MAC) operations per second. This performance is attributed to the advantages of the packetized pipelined architecture. Furthermore, it is important to highlight that the PFN hardware units described in [^13] and [^12] necessitate additional off-chip memory to accommodate point cloud data. In contrast, our approach obviates the need for off-chip memory, as all computations are conducted entirely on-chip and have been duly considered in the synthesis results. Additionally, the implementation of Feature Encoding in [^13] is integrated with the backbone within the DNN accelerator, rendering the resource utilization associated with the PFN unit indeterminate. As illustrated in Table II, our PFN accelerator achieves more efficient computations with reduced logic and computational resources. The observed increase in BRAM resource utilization stems from our transition of point cloud data storage from off-chip to on-chip, facilitating more rapid and energy-efficient data interaction and transfer.

![Figure 7](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng.t1-p5-deng-large.gif)

*TABLE I:*

![Figure 8](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/11043142/11042930/11043447/deng.t2-p5-deng-large.gif)

*TABLE II:*

## SECTION V. Conclusion and Future Work

This paper presents an FPGA-accelerated preprocessing accelerator for LiDAR point cloud data, optimizing the PFN layer in the PointPillars network. By employing packetization and pipelined processing, the design reduces memory usage and boosts speed. Integrated with an NPU for full inference, it features a tightly coupled LiDAR data interface. Validation on a Xilinx Kintex-7 XC7K410T shows improved throughput with lower resource consumption. Future work includes dynamic packetization for varying point densities and realworld deployment in autonomous driving to assess robustness in complex environments.

## References

[^1]: G. Zamanakos, L. Tsochatzidis, A. Amanatiadis, and I. Pratikakis, “A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving,” Computers &amp; Graphics, vol. 99, pp. 153–181, 2021. [DOI](https://doi.org/10.1016/j.cag.2021.07.003) [Google Scholar](https://scholar.google.com/scholar?as_q=A+comprehensive+survey+of+LIDAR-based+3D+object+detection+methods+with+deep+learning+for+autonomous+driving&as_occt=title&hl=en&as_sdt=0%2C31)

[^2]: Y. Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,” IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, 2020. [IEEE](https://ieeexplore.ieee.org/document/9127855) [Google Scholar](https://scholar.google.com/scholar?as_q=Lidar+for+autonomous+driving%3A+The+principles%2C+challenges%2C+and+trends+for+automotive+lidar+and+perception+systems&as_occt=title&hl=en&as_sdt=0%2C31)

[^3]: L. Liu, S. Lu, R. Zhong, B. Wu, Y. Yao, Q. Zhang, and W. Shi, “Computing systems for autonomous driving: State of the art and challenges,” IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6469–6486, 2020. [IEEE](https://ieeexplore.ieee.org/document/9288755) [Google Scholar](https://scholar.google.com/scholar?as_q=Computing+systems+for+autonomous+driving%3A+State+of+the+art+and+challenges&as_occt=title&hl=en&as_sdt=0%2C31)

[^4]: S. Y. Alaba and J. E. Ball, “A survey on deep-learning-based lidar 3D object detection for autonomous driving,” Sensors, vol. 22, no. 24, p. 9577, 2022. [DOI](https://doi.org/10.3390/s22249577) [Google Scholar](https://scholar.google.com/scholar?as_q=A+survey+on+deep-learning-based+lidar+3D+object+detection+for+autonomous+driving&as_occt=title&hl=en&as_sdt=0%2C31)

[^5]: Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3D object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4490–4499. [IEEE](https://ieeexplore.ieee.org/document/8578570) [Google Scholar](https://scholar.google.com/scholar?as_q=Voxelnet%3A+End-to-end+learning+for+point+cloud+based+3D+object+detection&as_occt=title&hl=en&as_sdt=0%2C31)

[^6]: Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018. [DOI](https://doi.org/10.3390/s18103337) [Google Scholar](https://scholar.google.com/scholar?as_q=Second%3A+Sparsely+embedded+convolutional+detection&as_occt=title&hl=en&as_sdt=0%2C31)

[^7]: A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast Encoders for Object Detection From Point Clouds,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12689–12697. [IEEE](https://ieeexplore.ieee.org/document/8954311) [Google Scholar](https://scholar.google.com/scholar?as_q=Pointpillars%3A+Fast+Encoders+for+Object+Detection+From+Point+Clouds&as_occt=title&hl=en&as_sdt=0%2C31)

[^8]: J. Stanisz, K. Lis, and M. Gorgon, “Implementation of the pointpillars network for 3d object detection in reprogrammable heterogeneous devices using FINN,” Journal of Signal Processing Systems, vol. 94, no. 7, pp. 659–674, 2022. [DOI](https://doi.org/10.1007/s11265-021-01733-4) [Google Scholar](https://scholar.google.com/scholar?as_q=Implementation+of+the+pointpillars+network+for+3d+object+detection+in+reprogrammable+heterogeneous+devices+using+FINN&as_occt=title&hl=en&as_sdt=0%2C31)

[^9]: Y. Li, Y. Zhang, and R. Lai, “Tinypillarnet: Tiny pillar-based network for 3D point cloud object detection at edge,” IEEE Transactions on Circuits and Systems for Video Technology, 2023. [IEEE](https://ieeexplore.ieee.org/document/10189833) [Google Scholar](https://scholar.google.com/scholar?as_q=Tinypillarnet%3A+Tiny+pillar-based+network+for+3D+point+cloud+object+detection+at+edge&as_occt=title&hl=en&as_sdt=0%2C31)

[^10]: Y. Choi, B. Kim, and S. W. Kim, “Performance Analysis of PointPillars on CPU and GPU Platforms,” in 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), 2021, pp. 1–4. [IEEE](https://ieeexplore.ieee.org/document/9611297) [Google Scholar](https://scholar.google.com/scholar?as_q=Performance+Analysis+of+PointPillars+on+CPU+and+GPU+Platforms&as_occt=title&hl=en&as_sdt=0%2C31)

[^11]: C. Latotzke, A. Kloeker, S. Schoening, F. Kemper, M. Slimi, L. Eckstein, and T. Gemmeke, “FPGA-based Acceleration of Lidar Point Cloud Processing and Detection on the Edge,” in 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2023, pp. 1–8. [IEEE](https://ieeexplore.ieee.org/document/10186612) [Google Scholar](https://scholar.google.com/scholar?as_q=FPGA-based+Acceleration+of+Lidar+Point+Cloud+Processing+and+Detection+on+the+Edge&as_occt=title&hl=en&as_sdt=0%2C31)

[^12]: C. Park, S. Lee, and Y. Jung, “FPGA Implementation of Pillar-Based Object Classification for Autonomous Mobile Robot,” Electronics, vol. 13, no. 15, p. 3035, 2024. [DOI](https://doi.org/10.3390/electronics13153035) [Google Scholar](https://scholar.google.com/scholar?as_q=FPGA+Implementation+of+Pillar-Based+Object+Classification+for+Autonomous+Mobile+Robot&as_occt=title&hl=en&as_sdt=0%2C31)

[^13]: X. Li, A. Ren, Y. Tan, X. Li, Z. Huang, C. Wang, X. Chen, and D. Liu, “VEA: An FPGA-Based Voxel Encoding Accelerator for 3D Object Detection with LiDAR,” in 2022 IEEE 40th International Conference on Computer Design (ICCD), 2022, pp. 509–516. [IEEE](https://ieeexplore.ieee.org/document/9978533) [Google Scholar](https://scholar.google.com/scholar?as_q=VEA%3A+An+FPGA-Based+Voxel+Encoding+Accelerator+for+3D+Object+Detection+with+LiDAR&as_occt=title&hl=en&as_sdt=0%2C31)

[^14]: Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2016. [IEEE](https://ieeexplore.ieee.org/document/7738524) [Google Scholar](https://scholar.google.com/scholar?as_q=Eyeriss%3A+An+energy-efficient+reconfigurable+accelerator+for+deep+convolutional+neural+networks&as_occt=title&hl=en&as_sdt=0%2C31)

[^15]: Y. Yang, S. R. Kuppannagari, A. Srivastava, R. Kannan, and V. K. Prasanna, “FASTHash: FPGA-Based High T hroughput Parallel Hash Table,” in High Performance Computing: 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings 35. Springer, 2020, pp. 3–22. [DOI](https://doi.org/10.1007/978-3-030-50743-5_1) [Google Scholar](https://scholar.google.com/scholar?as_q=FASTHash%3A+FPGA-Based+High+T+hroughput+Parallel+Hash+Table&as_occt=title&hl=en&as_sdt=0%2C31)

[^16]: KITTI, “KITTI database website.” https://www.cvlibs.net/datasets/kitti/. [Google Scholar](https://scholar.google.com/scholar?as_q=KITTI+database+website&as_occt=title&hl=en&as_sdt=0%2C31)

[^17]: L. Hui, S. Cao, Z. Chen, S. Li, and S. Xu, “Configurable CNN Accelerator in Speech Processing based on Vector Convolution,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 146–149. [IEEE](https://ieeexplore.ieee.org/document/9869904) [Google Scholar](https://scholar.google.com/scholar?as_q=Configurable+CNN+Accelerator+in+Speech+Processing+based+on+Vector+Convolution&as_occt=title&hl=en&as_sdt=0%2C31)

[^18]: D-Robotics, “PointPillars reference algorithm,” https://developer.d-robotics.cc/forumDetail/118364000835765874. [Google Scholar](https://scholar.google.com/scholar?as_q=PointPillars+reference+algorithm&as_occt=title&hl=en&as_sdt=0%2C31)

### Additional References

9. H. Brum, M. Véstias, and H. Neto, “LiDAR 3D Object Detection in FPGA with Low Bitwidth Quantization,” in International Symposium on Applied Reconfigurable Computing. Springer, 2024, pp. 90–105.

10. Xilinx, “Vitis-ai,” https://github.com/Xilinx/Vitis-AI/blob/master/model_zoo/model-list/pt_pointpillars_3.5.

