# Design of a Drivable Area Segmentation Network Using a Field Programmable Gate Array Based on Light Detection and Ranging

## Abstract

With the continuing development of autonomous driving systems, the drivable area segmentation problem has become an indispensable part of self-driving cars. The drivable area segmentation technology introduces many features to self-driving car technology, such as providing information about the surrounding environment, assisting decision-making mechanisms, selecting appropriate driving paths, and avoiding obstacles. In addition, accurate segmentation of drivable areas is crucial for improving self-driving navigation and avoiding obstacles. To enable effective identification of drivable areas on the basis of environmental information, this study designed a drivable area segmentation network named DASNet. The proposed DASNet utilizes depthwise separable convolution as a basis/platform for feature extraction to enable features to be efficiently extracted to reduce both the computational load and required network parameters. Additionally, the proposed DASNet enhances the inference speed of the network while maintaining high accuracy. In order to reduce point cloud density without compromising essential information, we perform sampling and fusion on the point clouds in both Cartesian and spherical coordinate spaces during data preprocessing. The fused point cloud serves as the input to DASNet, while the output is the drivable area map. Finally, the proposed DASNet is ported to a field programmable gate array to achieve real-time drivable area detection. This study employs the publicly available KITTI dataset and proposed wooded environment dataset for experimental evaluation. By evaluating the model on two distinct datasets, we aim to demonstrate the capabilities across three types of urban road scenes and more densely wooded environments. The experimental results indicate that DASNet achieved an F1-score of 0.9449 and an inference speed of 9.32 ms on the KITTI dataset. Furthermore, the proposed DASNet was applied to autonomous vehicles operating in a wooded environment, achieving an F1-score of 98.64%.

## Authors

Xue-Qian Lin *Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan* [ORCID: 0009-0000-6020-162X](https://orcid.org/0009-0000-6020-162X)

Jyun-Yu Jhang *Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taichung, Taiwan* [ORCID: 0000-0001-6179-9125](https://orcid.org/0000-0001-6179-9125)

Cheng-Jian Lin *Department of Computer Science and Information Engineering, National Chin-Yi University of Technology, Taichung, Taiwan* [ORCID: 0000-0002-8709-2715](https://orcid.org/0000-0002-8709-2715)

Sheng-Fu Liang *Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan* [ORCID: 0000-0002-6347-5017](https://orcid.org/0000-0002-6347-5017)

## Publication Information

**Journal:** IEEE Access **Year:** 2025 **Volume:** 13 **Pages:** 130535-130548 **DOI:** [10.1109/ACCESS.2025.3591974](https://doi.org/10.1109/ACCESS.2025.3591974) **Article Number:** 11091303 **ISSN:** Electronic ISSN: 2169-3536

## Metrics

**Total Downloads:** 79

---

## Keywords

**IEEE Keywords:** Point cloud compression, Field programmable gate arrays, Roads, Laser radar, Feature extraction, Convolution, Cameras, Real-time systems, Logic gates, Deep learning

**Index Terms:** Light Detection And Ranging, Drivable Area Segmentation, Point Cloud, Coordination Sphere, Autonomous Vehicles, Computational Load, Inference Speed, Urban Scenes, KITTI Dataset, Depthwise Separable Convolution, Coordinate System, Power Consumption, Feature Maps, Graphics Processing Unit, Convolution Operation, Semantic Segmentation, Segmentation Task, Output Feature Map, Spherical Coordinate System, Point Cloud Data, Cylindrical Coordinate System, Convolution Module, Parameter Count, Traditional Convolution, Width Of The Feature Map, Hardware Accelerators, Unstructured Environments, Dice Loss

**Author Keywords:** Drivable area segmentation, deep learning network, depthwise separable convolution, field programmable gate array, coordinate system

undefined
## SECTION I. Introduction

This rapid development of autonomous driving systems (ADSs) is driving a technological revolution in the automotive industry. According to information from the U.S. National Highway Traffic Safety Administration [^1], up to 94% of traffic accidents in the United States are caused by human error. Many studies have focused on the development of ADSs to reduce the frequency of human errors and thus reduce the occurrence of traffic accidents; however, achieving a safe and efficient ADS continues to pose many challenges. Autonomous driving tasks encompass two subtasks: (1) object detection/tracking and (2) drivable area segmentation/planning. Compared with the technology of object detection, that of drivable area segmentation is more complex. This technology provides an ADS with rich environmental information that can be used to plan safe driving paths and thus avoid collisions with obstacles. Sensors are indispensable for gathering information regarding the environment surrounding an autonomous vehicle. Most ADSs analyze environmental information by using two-dimensional (2D) images either from cameras or from point cloud data from light detection and ranging (LiDAR). Studies such as Chan et al. [^2] have used camera images to simulate human vision systems. Environmental information were extracted with algorithms through analysis of images captured by cameras. However, cameras are susceptible to interference from various factors (e.g., fog, rain, and light), leading to relatively unreliable sensing results under some conditions. By contrast, Xue et al. [^3] proposed an algorithm for drivable area segmentation based on LiDAR. Unlike two-dimensional images that contain color and texture information, LiDAR-generated point clouds represent positions in the spherical coordinate system. By calculating the ratio of the speed of light to the time taken for the light to reflect back, the distance between the object and the LiDAR sensor can be determined. In adverse weather conditions, LiDAR offers key information; however, it also poses the challenge of requiring processing of large amounts of point cloud data. Nevertheless, a well-defined point cloud coordinate system can effectively sample point clouds and thereby enhance computational efficiency. Consequently, point cloud coordinate systems have become a frequent topic of discussion among scholars. Some of the most well-known point cloud coordinate systems include the Cartesian coordinate system, cylindrical coordinate system, and spherical coordinate system. Bia et al. [^4] used the Cartesian coordinate system, a common system used to describe the positions of objects in an environment, for point cloud distribution. Zhu et al. [^5] recorded point cloud coordinates using the cylindrical coordinate system, which can simulate panoramic cameras, capturing depth-related information from the environment surrounding an autonomous vehicle. Finally, Li et al. [^6] constructed images containing distance information from point cloud data for object detection, tracking, and recognition. Since the point clouds are already mapped to the spherical coordinate system from LiDAR, coordinate transformations can be minimized during computation, thereby improving processing efficiency.

Data acquisition is the starting point of the ADSs. Multiple stages are required for intelligent driving capabilities. Therefore, many researchers have begun developing road detection methods based on their domain knowledge. The detection process for determining the drivable area for an autonomous vehicle can be divided into four steps: preprocessing, feature extraction, detection, and postprocessing. Hata et al. [^7] used the least trimmed squares method to handle discontinuous occlusions in edge detection and then employed the Monte Carlo method to identify road edges. Kumar et al. [^8] converted elevation, intensity, and pulse width information into 2D raster images, which were then input into the researchers’ proposed snake model to extract road boundaries and drivable areas. This snake model was then tested in rural, urban, and main road scenarios to verify its stability. Azizi et al. [^9] performed interpolation on point clouds by using the inverse distance weighted method and constructed road layers by using digital surface models, digital terrain models, and digital non-terrain models. Finally, that study used support vector machines to divide road layer information into road and nonroad categories. Although these methods yielded robust evaluation results in drivable area tasks, expert-designed feature extraction methods are still required under complex and challenging road conditions.

Unlike traditional methods based on domain knowledge, deep learning offers superior feature extraction capabilities, which have made it widely applied across various fields. The rapid advancement of deep learning technologies has led to the development of many semantic segmentation models (e.g., fully convolutional network (FCN) [^10], U-Net [^11], deep fully convolutional neural network architecture for semantic pixel-wise segmentation (SegNet) [^12], deconvolution network (DeConvNet) [^13], DeepLab [^14], multi-path refinement network (RefineNet) [^15], Pyramid Scene Parsing Network (PSPNet) [^16], and gated shape convolutional neural network (GSCNN) [^17]) for performing image segmentation tasks. Although these techniques are applicable to drivable area segmentation tasks, many researchers have proposed various methods to enhance the performance of ADSs. In Milioto et al. [^18], point clouds were used as inputs for RangeNet++ to perform semantic segmentation tasks; the experimental results revealed that RangeNet++ achieved higher accuracy compared with traditional machine learning methods. Aksoy et al. [^19] designed a deep encoder-decoder network (SalsaNet) to detect vehicles and roads in the environment, using three-dimensional (3D) point clouds as model input data to achieve satisfactory segmentation performance. The SalsaNet model was evaluated on the Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI) public dataset to determine its ability to perform semantic segmentation tasks effectively. He et al. [^20] developed a ground segmentation framework for outdoor LiDAR point clouds (SectorGSnet) to perform ground segmentation tasks in outdoor settings. SectorGSnet employs depthwise separable convolutions to minimize the number of model parameters and uses a convolutional block attention module to enhance feature extraction capabilities, thereby achieving an excellent balance between performance and computational complexity. Therefore, deep learning has become an indispensable core technology for achieving efficient and accurate results in autonomous driving systems.

Deep learning has been criticized because of the numerous network parameters and the long computation time involved. However, optimization can be achieved by reducing the number of network parameters or by implementing the model on field programmable gate array (FPGA) hardware platforms for high-speed computation. Xu et al. [^21] proposed a model architecture called the Local Binary Convolutional Neural Network (LBCNet). In the LBCNet, convolution operations use only three values (−1, 0, and 1) as weights, and the computation time and number of parameters are thus substantially reduced. Howard et al. [^22] designed a lightweight network called MobileNet to deploy neural networks on embedded devices. MobileNet introduces depthwise convolution (DWConv) and pointwise convolution (PWConv) operations to reduce network weights and to increase the computation speed. Lin et al. [^23] developed a LBCNet to minimize the number of memory access points used by computational units and implemented it on an FPGA; compared with other state-of-the-art methods, the proposed approach reduced memory usage by 39.41%. Lin et al. [^24] proposed a high-performance FPGA accelerator for depthwise separable CNNs. This accelerator reduces computation and memory space, thereby achieving relatively low power consumption and real-time computation. Lyu et al. [^25] presented an FPGA-based network called ChipNet for drivable area segmentation; the experimental results revealed that LiDAR scan data could be processed within 17.59ms. Baczmanski et al. [^26] proposed MultiTaskV3 for drivable area segmentation and validated this network for autonomous vehicles. Compared with central processing unit (CPU) implementation, the FPGA-based platform consumed only 5 watts and achieved 97% mean Average Precision (mAP) for object detection and 90% mean Intersection over Union (MIoU) for image segmentation accuracy. Thus, compared with CPUs and GPUs, FPGAs offer superior performance in power efficiency and real-time processing, making them particularly well-suited for ADSs.

Although previous studies have strong software performance in drivable area segmentation tasks, they often overlook practical challenges in real-world applications, such as real-time processing and power consumption constraints. The current study proposes DASNet to enable efficient and rapid completion of drivable area segmentation tasks in real-world forest environments and on the KITTI public dataset, based on prior studies [^27], [^28]. In DASNet, first, point cloud data obtained from LiDAR are transformed into both Cartesian and spherical coordinate systems and then fused to serve as the input data for the DASNet network. Different from traditional point cloud fusion methods, our approach makes the model effectively capture both local and global features from Cartesian and spherical coordinate systems. DASNet then employs depthwise separable convolutions as the core of feature extraction, which reduces both the computational load and the number of parameters. The results of DASNet are mapped into Cartesian coordinates through post-processing algorithms for actual control of autonomous vehicles. Finally, DASNet is implemented on an FPGA platform to achieve the real-time performance and low power consumption required by ADSs. During the experimental evaluation phase, two datasets, the KITTI public dataset [^29] and the wooded dataset collected by our laboratory, were used for testing and assessment. To promote transparency and reproducibility, all implementation details have been publicly available on GitHub at: https://github.com/Th0rnLin/DASNet. The specific contributions of this study are as follows:

1. Proposal of the DASNet network architecture for drivable area segmentation to provide environmental information about roads for ADSs.
2. Optimization of the network architecture using depthwise separable convolutions to enhance the efficiency and performance of DASNet during inference.
3. Fusion of point cloud data from Cartesian and spherical coordinate systems to retain critical spatial information while minimizing the number of sampled points.
4. Implementation of DASNet on an FPGA to maximize inference speed and meet practical environmental demands.
5. A challenging wooded environment is used to verify the robustness of the proposed DASNet.
6. Validation of the DASNet model by using the KITTI public dataset to demonstrate the model’s performance and inference speed.

## SECTION II. Proposed Driving Area Segmentation System

This section introduces the proposed drivable area segmentation system. Fig. 1. illustrates the architecture of the drivable area segmentation system. LiDAR is primarily used to perceive a surrounding environment and to transmit information to a NVIDIA® Jetson AGX. The AGX then preprocesses the received point cloud data and presents it in Cartesian and spherical coordinate systems. The processed point cloud data can then be provided to DASNet and implemented on an FPGA to perform the drivable area segmentation task. Notably, DASNet generates predictions in a spherical view; these predictions are then transmitted back to the AGX and converted into a frontal view for clearer and more convenient observation and verification. This section of this paper is divided into five subsections focusing on the following topics: (1) preprocessing algorithms for point cloud data in coordinate systems, (2) the DASNet network architecture, (3) model optimization analysis of the computational load and parameter count, (4) transformation methods for drivable area segmentation, and (5) FPGA hardware implementation.

![Figure 1](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin1-3591974-large.gif)

*FIGURE 1. Driving area segmentation system architecture.*

### A. Preprocessing Algorithms for Point Cloud Data in Coordinate Systems

In an ADS, one of the following three coordinate systems is typically utilized: Cartesian, cylindrical, and spherical coordinate systems. Additionally, four types of image views are employed: the camera view, bird’s-eye view, cylindrical view, and spherical view. The camera view and bird’s-eye view are common image views based on the Cartesian coordinate system, making them relatively intuitive for human interpretation. However, the point cloud data in the Cartesian coordinate system tends to be sparse, meaning that the drivable area segmentation system requires additional computational resources to process many zero-value data points. Compared with the Cartesian coordinate system, the spherical coordinate system can more accurately represent objects’ positions and orientations in a 3D space, thereby enhancing the coverage of point clouds in images and improving segmentation efficiency. The Cartesian coordinate system is more effective for representing local structures, while the spherical coordinate system captures global spatial context. Therefore, this study adopts both coordinate systems to overcome the limitations of using either system alone and to provide the model with richer and more informative spatial features. In the DASNet architecture, the point cloud data in the Cartesian and spherical coordinate systems serve as inputs to DASNet to provide rich environmental information. The output of DASNet is the drivable area represented in the spherical coordinate system.

The pseudo code of data preprocessing is shown in Algorithm 1. Information related to the surrounding environment, including the 3D coordinates of objects ($\gamma,\theta,\varphi$) and their reflectivity (r), is obtained using LiDAR, where $\gamma$ represents the radial distance, $\theta$ represents the polar angle, and $\varphi$ represents the azimuth angle. These data are then transformed from spherical coordinates and Cartesian coordinates to obtain object coordinates (x, y, z), the azimuth angle range of [−45°, 45°) was selected as the region of interest, with the azimuth angle grouped at intervals of 0.5°, which can cover the drivable area of the self-driving car and has a good resolution. The polar angle division is based on the 64 laser angles used by LiDAR, resulting in 64 groups. For each group, the maximum and minimum values within the group are sampled to represent the longest and shortest distances from the sensor. This study both projects LiDAR data onto the spherical coordinate system and uses Cartesian coordinates as an additional feature channel to enhance the model’s local and global environment features. Through integration of data from both coordinate systems, unstructured point cloud data can be converted into a $180\times 64\times 14$–point cloud feature map, as Fig. 2. To address the scanning gaps that occur when LiDAR encounters low-reflectivity objects, interpolation is introduced to enhance data completeness and reduce information loss.

![Figure 2](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin2ab-3591974-large.gif)

*FIGURE 2. Point cloud data preprocessing flow chart: (a) Input data processing steps of DASNet, (b) Output data processing steps of DASNet.*

### Algorithm 1 Point Cloud Data Preprocessing

1:

**Input**: Point cloud data $X=\left ({{ \gamma,\theta,\varphi, r }}\right)$, ROI angle range $\left [{{ -R, R }}\right]$, number of laser beams *L*

2:

**Initialize**: $Y=\emptyset$

3:

$\left ({{ x,y,z }}\right)\leftarrow$ by Eqs. (1), (2), and (3)

4:

$Y\leftarrow \left ({{ x,y,z,\gamma,\theta,\varphi,r }}\right)$

5:

$\tilde {Y} \leftarrow$ Grouping *Y* by ROI angle range $\left [{{ -R, R }}\right]$ and number of laser beams *L*

6:

$\acute {Y} \leftarrow$ Down sampling $\tilde {Y}$ by Eq. (4)

7:

$\hat {Y} \leftarrow$ Interpolation $\acute {Y}$ by Eq. (5)

8:

**Output**: $\hat {Y}$

The relationship between Cartesian coordinates and spherical coordinates is as follows:

$$
\begin{align*} \gamma & =\sqrt {x^{2}+y^{2}+z^{2}}. \tag {1}\\ \theta & ={tan}^{-1}\left ({{ \frac {\sqrt {x^{2}+y^{2}}}{z} }}\right). \tag {2}\\ \varphi & ={\tan }^{-1}\left ({{ \frac {y}{x} }}\right). \tag {3}\end{align*}
$$

where $\gamma$ represents the radial distance between the origin and the red dot; $\theta$ represents the polar angle between the line between the origin and the red dot and the z-axis; $\varphi$ represents the azimuth angle between the line between the origin and the red dot and the x-axis.

In each group of data, sampling is performed based on the maximum and minimum values. The formula is as follows:

$$
\begin{align*} \acute {Y}=\left \{{{\begin{array}{l} \max \left ({{ x }}\right)\!,\min \left ({{ x }}\right)\!,\max \left ({{ y }}\right)\!,\min \left ({{ y }}\right)\!, \\ \max \left ({{ z }}\right)\!,\min \left ({{ z }}\right)\!,\max \left ({{ \gamma }}\right)\!,\min \left ({{ \gamma }}\right)\!, \\ \max \left ({{ \theta }}\right)\!,\min \left ({{ \theta }}\right)\!,\max \left ({{ \varphi }}\right)\!,\min \left ({{ \varphi }}\right)\!, \\ \max \left ({{ r }}\right)\!,\min \left ({{ r }}\right) \\ \end{array}}}\right \}. \tag {4}\end{align*}
$$

During the point cloud generation process by LiDAR, data loss may occur due to factors such as scanning angle limitations, poor laser reflections, and interference from strong light. If the sampled feature map has a value loss, the interpolation operation is used to make up for the lost value according to the values at the upper and lower angles of the optical radar. The interpolation operation formula is as follows:

$$
\begin{align*} & \!\!\!\hat {Y}_{j}=\begin{cases} \displaystyle \frac {{(\acute {Y}}_{j-1}+\acute {Y}_{j+1})}{2},& \acute {Y}_{j-1}\gt 0~and~\acute {Y}_{j+1}\gt 0 \\ \displaystyle \acute {Y}_{j},& otherwise \end{cases} \\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad for~j=0, \ldots,L. \tag {5}\end{align*}
$$

### B. DASNet Network Architecture

One of the key focuses of this study was optimizing the network architecture with limited hardware resources. The proposed DASNet comprises three distinct convolutional modules: EnConvBlock, DSConvBlock, and DeConvBlock, as illustrated in Fig. 3.

![Figure 3](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin3-3591974-large.gif)

*FIGURE 3. DASNet network architecture.*

The architecture of the proposed DASNet is detailed in Table 1. After preprocessing, the point cloud data is first passed through the EnConvBlock for dimensional expansion, followed by multiple DSConvBlocks for feature extraction, and finally processed by a DeConvBlock for dimensionality reduction. Notably, to enhance the feature extraction capability for DASNet on resource-constrained embedded devices, we increased the number of DSConvBlocks. These blocks are reused during FPGA computation to optimize resource efficiency. Moreover, since memory is a limited resource in embedded systems, all computations in DASNet use feature maps of size (180, 64). The hardware implementation details are presented in Section II-E.

![Figure 4](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t1-3591974-large.gif)

*TABLE 1*

The three convolutional modules are described as follows.

#### 1) EnConvBlock

EnConvBlock uses a $1\times 1$ convolutional kernel for convolution operations. The purpose of EnConvBlock is to increase the number of input feature maps to provide richer feature information for subsequent processing. The calculation formula of EnConvBlock is as follows:

$$
\begin{equation*} F_{k, l,n}^{EnC}=\sum \nolimits _{m} {W_{m,n}^{EnC}\ast X_{k, l,m}}. \tag {6}\end{equation*}
$$

where $F^{EnC}$ represents the output feature map of EnConvBlock, $W^{EnC}$ represents the weight of EnConvBlock, *X* is the input feature map, *n* is the index of the output feature map, *k* and *l* are the indexes of the output feature map values, *m* is the index of the input feature map, and * is the convolution operator.

#### 2) DSConvBlock

DSConvBlock is a feature extraction convolutional module optimized using the principle of depthwise separable convolution. Depthwise separable convolution is a type of convolution operation frequently used in embedded systems. This operation breaks down traditional convolution into two separate steps: DWConv and PWConv. This design substantially reduces both the computational load and the number of parameters required during processing. The architecture of DSConvBlock, presented in Fig. 4, involves four operations: DWConv, PWConv, rectified linear unit (ReLU) activation, and batch normalization.

![Figure 5](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin4-3591974-large.gif)

*FIGURE 4. DSConvBlock architecture diagram.*

The calculation formula of the output feature map of the depth convolution ($F^{DWC}$) is as follows:

$$
\begin{equation*} F_{k, l,m}^{DWC}=\sum \nolimits _{i,j} {W_{i,j,m}^{DWC}\ast F^{EnC}_{k+i, l+j,m}}. \tag {7}\end{equation*}
$$

where $W^{DWC}$ represents the weight of the depth convolution; $F^{EnC}$ is the output feature map from EnConvBlock; *k* and *l* are the indexes of the output feature map values; *m* is the index of the input feature map; *i* and *j* are the indexes of the input feature map values. The calculation formula of the output feature map of point convolution ($F^{PWC}$) is as follows:

$$
\begin{equation*} F_{k, l,n}^{PWC}=\sum \nolimits _{m} {W_{m,n}^{PWC}\ast F_{k,l,m}^{DWC}}. \tag {8}\end{equation*}
$$

where $W^{PWC}$ represents the weight of point convolution; $F^{DWC}$ is the output feature map from depth convolution; *n* is the index of the output feature map.

The calculation formula of the output feature map of ReLU (*R*) is as follows:

$$
\begin{align*} R=\begin{cases} \displaystyle 0,& for~F^{PWC}\lt 0 \\ \displaystyle F^{PWC},& for~F^{PWC}\ge 0. \end{cases} \tag {9}\end{align*}
$$

The calculation formula of the output feature map of batch normalization (*B*) is as follows:

$$
\begin{equation*} B=\frac {\alpha \left ({{ R-\mu }}\right)}{\sqrt {\sigma ^{2}+\epsilon }}+\beta. \tag {10}\end{equation*}
$$

where $\mu$ is the mean value of *R*; $\sigma ^{2}$ is the variance of *R*; $\alpha$ and $\beta$ are trainable weights.

#### 3) DeConvBlock

DeConvBlock fuses the feature map generated by DSConvBlock. The output of DSCNet is $180\times 64$. The calculation formula of the output feature of DeConvBlock ($F^{DeC}$) is as follows:

$$
\begin{equation*} F_{k, l}^{DeC}=\sum \nolimits _{m} {W_{m}^{DeC}\ast B_{k, l,m}}. \tag {11}\end{equation*}
$$

where $W^{DeC}$ represents the weight of DeConvBlock.

### C. Model Optimization Analysis

To demonstrate that the proposed DSConvBlock has a relatively low computational cost and relatively few required parameters in convolution operations, this subsection presents a comparison of the computational load and parameter count of traditional convolution operations with those of the DSConvBlock. The calculations for the computational load and parameter count of traditional convolution operations are expressed as follows:

$$
\begin{align*} T^{O}& =W\times H\times K\times K\times M\times N. \tag {12}\\ T^{W}& =K\times K\times M\times N. \tag {13}\end{align*}
$$

where $T^{O}$ and $T^{W}$ represent the computational load and parameter count of traditional convolution, respectively; *W* and *H* denote the width and height of the input feature map, respectively; *M* and *N* represent the width and height of the output feature map, respectively; and *K* denotes the kernel size.

The calculations for the computational load and parameter count of DSConvBlock are expressed as follows:

$$
\begin{align*} D^{O}& =W\times H\times K\times K\times M+W\times H\times M\times N. \tag {14}\\ D^{W}& =K\times K\times M+M\times N. \tag {15}\end{align*}
$$

where $D^{O}$ and $D^{W}$ represent the computational load and parameter count of DSConvBlock, respectively; *W* and *H* denote the width and height of the input feature map, respectively; *M* and *N* represent the width and height of the output feature map, respectively; and *K* denotes the kernel size.

The ratios of the computational load and parameter count between traditional convolution and DSConvBlock are calculated as follows:

$$
\begin{align*} P^{O}& =\frac {T^{O}}{D^{O}}=\frac {1}{N}+\frac {1}{K^{2}}. \tag {16}\\ P^{W}& =\frac {T^{W}}{D^{W}}=\frac {1}{N}+\frac {1}{K^{2}}. \tag {17}\end{align*}
$$

where $P^{O}$ and $P^{W}$ represent the ratios of the computational load and parameter count between traditional convolution and DSConvBlock, respectively. As indicated, DSConvBlock outperforms traditional convolution in terms of both computational load and parameter count.

### D. Conversion of Authentic Scene Images of Drivable Areas

The output of the DASNet network is a drivable area map represented as a spherical coordinate system. During the postprocessing stage, the output is converted into a Cartesian coordinate map. The image’s perspective is then adjusted from the spherical view to the camera view for more effective integration with the authentic scene imagery. The detailed workflow of the postprocessing algorithm is illustrated in Fig. 5. and involves four steps: binarization, coordinate system conversion, dilation, and mapping to the authentic scene.

![Figure 6](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin5-3591974-large.gif)

*FIGURE 5. Flowchart for integrating drivable areas into authentic scene images.*

The pseudo code of data postprocessing is shown in Algorithm 2. The drivable area output by DASNet is presented in a spherical coordinate system and is binarized according to the threshold $\tau$. The calculation formula for numerical binarization is as follows:

$$
\begin{align*} P=\begin{cases} \displaystyle X,& X\gt \tau \\ \displaystyle 0,& otherwise. \end{cases} \tag {18}\end{align*}
$$

### Algorithm 2 Drivable Area Postprocessing

1:

**Input**: Drivable area segmentation based on spherical coordinate system *X*, ROI angle range $\left [{{ -R, R }}\right]$, number of laser beams *L*, threshold $\tau$, convert matrix $K\in R^{3\times 4}$, dilation rate *D*

2:

**Initialize**: $P=\emptyset$

3:

$P \leftarrow$ Binarization based on threshold $\tau$ by Eq. (18)

4:

$\tilde {P}\leftarrow KP$

5:

$\acute {P}\leftarrow$ Dilate $\tilde {P}$ by dilation rate *D*

6:

**Output**: $\acute {P}$

For easier observation and verification, the results are presented in a Cartesian coordinate system using a transformation matrix (*K*). Finally, the pseudo code of Algorithm 3 shows the dilate of the point cloud result using the dilation rate (*D*) to achieve a complete drivable area.

### Algorithm 3 Dilate Algorithm

Input:

Drivable area segmentation based on Cartesian coordinate system *X*, ROI angle range $\left [{{ -R, R }}\right]$, number of laser beams *L*, dilation rate *D*

**Initialize**: $\acute {P}=\emptyset$

**for**$i\leftarrow -R$**to***R***do**

**for**$j\leftarrow 0$**to***L***do**

**if**$X_{i,j}=1$**then**

**for**$m\leftarrow -D$**to***D***do**

**for**$n\leftarrow -D$**to***D***do**

$\acute {P}_{i+m,j+n}\leftarrow 1$

**end for**

**end for**

**else**

$\acute {P}_{i,j}\leftarrow 0$

**end if**

**end for**

**end for**

Output:

$\acute {P}$

### E. FPGA Hardware Implementation

The proposed DASNet network is implemented on the FPGA platform to achieve real-time performance and low power consumption. The hardware system architecture is illustrated in Fig. 6. This architecture is divided into two main parts: a processing system (PS) and programmable logic (PL). The PS is responsible for data processing and system control functions, and the PL is a crucial hardware accelerator responsible for implementing DASNet. The Advanced eXtensible Interface protocol handles communication between the PS and PL. This protocol requires a handshake confirmation between the PS and PL before data transmission to ensure accurate data transfer. The input data to the neural network and the output results are accessed and written using direct memory access and double data rate synchronous dynamic random access memory.

![Figure 7](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin6-3591974-large.gif)

*FIGURE 6. Diagram of DASNet hardware system architecture.*

Fig. 7. illustrates the architecture of the DASNet hardware accelerator, which consists of the input feature map buffer, the ping-pong buffer, EnConvBlock, DSConvBlock, and DeConvBlock. During the inference process of DASNet, the control flow is based on a finite-state machine. Input data are stored as external memory and then sequentially transferred to the input feature map buffer to await processing by EnConvBlock. The ping-pong buffer is used to manage the access of feature maps during computation. The results obtained by DeConvBlock are sent back to the external memory for storage, show as Fig. 8. A detailed description of each computational module is provided as follows.

![Figure 8](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin7-3591974-large.gif)

*FIGURE 7. DASNet hardware accelerator.*

![Figure 9](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin8-3591974-large.gif)

*FIGURE 8. DASNet data flow.*

#### 1) EnConvBlock Hardware Module

Fig. 9. presents the hardware module of EnConvBlock. The processing element (PE) of EnConvBlock consists of a $1\times 1$ convolution array module, a ReLU module, and a batch normalization module. EnConvBlock retrieves the input feature maps from the input feature map buffer. The PE performs a $1\times 1$ convolution operation on the weights and input feature maps, thereby increasing the number of feature maps. The resulting output is then transferred back to the ping-pong buffer for storage.

![Figure 10](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin9-3591974-large.gif)

*FIGURE 9. EnConvBlock hardware module.*

#### 2) DSConvBlock Hardware Module

Fig. 10. illustrates the hardware module of DSConvBlock. The PE of DSConvBlock consists of a $7\times 7$ DWConv array module, a $1\times 1$ PWConv array module, a ReLU module, and a batch normalization module. DSConvBlock reads the output data from EnConvBlock stored in the ping-pong buffer, performs the necessary computation operations, and then transfers the results back to the ping-pong buffer for storage.

![Figure 11](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin10-3591974-large.gif)

*FIGURE 10. DSConvBlock hardware module.*

#### 3) DeConvBlock Hardware Module

Fig. 11. presents the hardware module of DeConvBlock. The PE of DeConvBlock consists only of a $1\times 1$ convolution array module. DeConvBlock reads data from DSConvBlock stored in the ping-pong buffer and then transfers the processed results back to the external memory to serve as the output of DASNet.

![Figure 12](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin11-3591974-large.gif)

*FIGURE 11. DeConvBlock hardware module.*

#### 4) Line Buffer

The line buffer is used in the hardware architecture of DSConvBlock, and its hardware architecture is shown in Fig. 12. In the $7\times 7$ convolution operation, if the line buffer is not used, the hardware architecture of DSConvBlock requires 49 buffers for data access, and the data will be read and calculated repeatedly, resulting in a large waste of access resources. The line buffer reads values from $X_{in}$ in a sequential manner and opens 7 different output interfaces in the buffer to wait for the convolution operation of PE. The output interfaces are $X_{1}$, $X_{2}$, $X_{3}$, $X_{4}$, $X_{5}$ and $X_{6}$.

![Figure 13](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin12-3591974-large.gif)

*FIGURE 12. Hardware architecture of line buffer.*

#### 5) Ping-Pong Buffer

The architecture diagram of the ping-pong buffer is shown in Fig. 13. When the write signal (*Wr*) is high, the ping buffer will be set to the write state and wait for the $D_{in}$ data to be written. At the same time, the pong buffer is set to the read state and waits for $D_{out}$ to read the data. If the write signal (*Wr*) is in a low state, the operating states of the ping buffer and the pong buffer are swapped.

![Figure 14](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin13-3591974-large.gif)

*FIGURE 13. Ping-pong buffer hardware architecture.*

## SECTION III. Experimental Results

To evaluate the effectiveness of the proposed drivable area segmentation system, an experiment was conducted in this study in which the KITTI public dataset [^29] and the wooded dataset collected by our laboratory were used to validate the segmentation results. The experimental results are presented in the following four subsections.

### A. Experimental Equipment and Parameter Settings

The experiment in this study was conducted using an Intel Core i7-11700 CPU, 16 GB of RAM, and an NVIDIA GeForce RTX 2060 6-GB graphics processing unit (GPU) as the training setup for DASNet. Additionally, to enhance the inference speed of the neural network and to reduce power consumption, DASNet was ported to a ZCU104 FPGA. The DASNet network was implemented using the TensorFlow and Keras frameworks for model training and validation, respectively. The Verilog hardware description language (HDL) was used for hardware implementation for the FPGA design, which was performed in Xilinx’s Vivado design suite. Finally, Vitis was used to facilitate data communication between the peripherals and the FPGA chip.

The hyperparameter settings used during DASNet training are listed in Table 2. The network was trained for 200 epochs with a batch size of 4. The optimizer used was Adaptive Moment Estimation (Adam). The mean squared error is used as the loss function.

![Figure 15](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t2-3591974-large.gif)

*TABLE 2*

### B. Evaluation Metrics

In road segmentation tasks, pixel-level classification is a critical objective, where the model determines whether each pixel belongs to a road. To evaluate the performance of DASNet, four metrics—namely accuracy, recall, precision, and F1-score—were employed to compare the model’s performance. Accuracy measures the overall proportion of correct predictions. A high precision indicates that the model is cautious in labeling pixels as road, minimizing the misclassification of non-road pixels. A high recall means the model successfully identifies most actual road pixels, though it may include some non-road pixels. The F1-score, which is the weighted harmonic mean of precision and recall, provides a balanced evaluation of these two metrics. These metrics were calculated as follows:

$$
\begin{align*} Accuracy& =\frac {TP+TN}{TP+TN+FP+FN}. \tag {19}\\ Recall& =\frac {TP}{TP+FP}. \tag {20}\\ Precision& =\frac {TP}{TP+FN}. \tag {21}\\ F1-score& =\frac {2\times Recall\times Precision}{Recall+Precision}. \tag {22}\end{align*}
$$

where *TP* stands for “true positive,” representing the number of pixels correctly identified by the model as road pixels; *TN* stands for “true negative,” representing the number of pixels correctly identified as nonroad pixels; *FP* stands for “false positive,” indicating instances where the model incorrectly identified nonroad areas as road areas; and *FN* stands for “false negative,” indicating instances where the model incorrectly identified road areas as nonroad areas.

### C. The Wooded Environment Dataset

The wooded environments dataset was collected during daylight conditions. However, due to tree coverage, the overall image brightness is relatively low. The forest environment contains trees and several large rocks, which increase the difficulty of obstacle avoidance for autonomous vehicles. In addition, the presence of slopes and uneven terrain poses further challenges for autonomous navigation. This study constructed a wooded environment dataset based on previous studies [^27], [^28], as shown in Fig. 14. The dataset includes 135 point cloud data and images from LiDAR and cameras. Before the model is trained, the point cloud is pre-processed according to Algorithm 1 as the input of DASNet, and supervised learning is performed using the drivable areas marked in the image to improve the training results of DASNet.

![Figure 16](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin14-3591974-large.gif)

*FIGURE 14. Data collection of autonomous vehicles in a wooded environment.*

During the DASNet training process, we split the collected wooded dataset into training data and test data with a ratio of 8:2, and trained the model according to the training parameters provided in Subsection III-A. The experimental results show that the accuracy and F1-score of the proposed DASNet model are 0.9877 and 0.9864, respectively. In addition, in this experiment, the proposed DASNet will be compared with the RGB-based DASNet (RGB_DASNet), the Depth-fused Trilateral Network (DTN) model [^30], and the RoadSeg model [^31]. The comparison results are shown in Table 3. The RGB_DASNet model uses RGB images with a resolution of $360\times 512$ as input. Experimental results show that it achieves an accuracy of 0.6663 and an F1-score of 0.7656 in the road segmentation task. The DTN model [^30] is divided into spatial path and context path, which extract features from images and LiDAR respectively, and then fuse and upsample these features. The experimental results show that the accuracy and F1-score of the DTN model are 0.9650 and 0.9615, respectively. The RoadSeg model [^31] uses multiple layers of convolutional layers to extract features from the data and uses a residual-like architecture to preserve the texture information of the original features. Experimental results show that the accuracy and F1-score of the RoadSeg model are 0.9609 and 0.9572, respectively. Therefore, the accuracy and F1-score of the proposed DASNet model are better than those of the methods proposed in RGB_DASNet, [^30], and [^31].

![Figure 17](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t3-3591974-large.gif)

*TABLE 3*

The prediction results of DASNet on the drivable area of the wooded dataset are shown in Fig. 15. This model shows good results in terms of evaluation indicators, as shown in Figs. 15 scene (a) and scene (b). In the experimental results of DTN and RoadSeg, although these two models can distinguish most of the drivable areas in scene (a) and scene (b), there are still some omissions in the segmentation results of the area around the trees. However, due to the sparseness of the optical radar, DASNet misjudges the drivable area between trees, as shown in scene (c) of Fig. 15. In scene (d) of Fig. 15, the point cloud data is lost because the object is too far away, resulting in the prediction result of the drivable area in DASNet not being as expected. However, compared to DTN and RoadSeg, which divide non-drivable areas into drivable areas, the proposed DASNet is more suitable for application in wooded environments in terms of driving safety. The RGB_DASNet shows smooth boundaries between roads and obstacles across the four tested scenarios. However, since the visual-based model is susceptible to variations in lighting conditions, it can cause obstacles to be misclassified as drivable areas. This is problematic in narrow and complex forest environments, where such misclassifications could lead to potential collisions for autonomous vehicles. Therefore, the LiDAR-based DASNet proposed in this study is better suited for forest environments with limited lighting, as it effectively reduces the risk of misclassification caused by variations in illumination.

![Figure 18](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin15-3591974-large.gif)

*FIGURE 15. Comparison prediction results of various models for the drivable area in the wooded dataset.*

### D. KITTI Public Dataset

The KITTI public dataset [^29] is widely used in the fields of autonomous driving and computer vision research and is recognized as a standard dataset rich in diverse environmental sensor data. This dataset contains data from LiDAR, cameras, the global positioning system, and inertial measurement units. It encompasses three types of urban road scenes—namely urban unmarked, urban marked, and urban multiple marked scenes—comprising a total of 289 training samples and 290 test samples. In the present experiment, the KITTI public dataset was used to validate the performance of the proposed DASNet across multiple platforms.

The proposed DASNet architecture was also compared with other drivable area segmentation methods implemented on multiple computational platforms [^25], [^32], [^33], [^34], [^35], [^36], [^37], [^38], [^39], [^40], [^41]. As presented in Table 4, among the GPU-based methods, the approach proposed by Caltagirone et al. [^38] demonstrated the best performance, with an F1-score of 0.9603; however, the inference time was 0.15 seconds, presenting a challenge for practical application in ADSs that require low power consumption and real-time processing. Lyu et al. implemented their method on an FPGA for hardware acceleration, which reduced the inference time to 0.017 seconds; however, despite this acceleration, their method’s performance and speed still lagged behind those of the proposed DASNet because of a lack of optimization in the model architecture. The proposed DASNet was implemented on the Xilinx UltraScale ZCU104 FPGA development board. In terms of performance, it achieved an F1-score and precision and recall scores of 0.9449, 0.9428, and 0.9520, respectively. In addition, the inference time was 0.009 seconds, which was considerably shorter than those of the other methods. The drivable area segmentation results corresponding to the KITTI public dataset are presented in Fig. 16, which also indicates that DASNet with the ZCU104 effectively segmented drivable areas.

![Figure 19](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t4-3591974-large.gif)

*TABLE 4*

![Figure 20](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin16-3591974-large.gif)

*FIGURE 16. Experimental results of the DASNet drivable area segmentation operation using the ZCU104.*

### E. Drivable Area Segmentation Analysis

To comprehensively validate the effectiveness and contributions of each design choice in our proposed method, we conducted a series of ablation studies. These experiments were designed to investigate how different components and techniques influence the overall performance of the model, providing deeper insight into the role of each element in the drivable area segmentation task. Specifically, we systematically analyzed network optimization architectures, point cloud preprocessing methods, the impact of coordinate system selection, and loss function choices. Through these ablation studies, we aimed to verify whether each design decision meaningfully improves the model’s accuracy and practical applicability.

Different optimization methods affect the model’s performance in drivable area segmentation in different manners, with the number of parameters directly influencing the model’s inference speed. The proposed DASNet was experimentally compared with a traditional CNN and LBCNet, as presented in Table 5. The results indicated that all three methods performed well across the four evaluation metrics. In terms of parameter count, DASNet contained only 9,409 parameters, which was considerably lower than both the traditional CNN’s count of 101,313 and LBCNet’s count of 820,033. These results demonstrate a notable advantage of DASNet in terms of parameter efficiency. Furthermore, regarding segmentation performance, DASNet achieved an F1-score of 94.88%, surpassing the performance of both the traditional CNN and LBCNet.

![Figure 21](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t5-3591974-large.gif)

*TABLE 5*

When LiDAR scans the surrounding environment, it is affected by transparent or reflective surfaces, resulting in loss of information. In addition, if the object is too far away from the sensor or the angle is inappropriate, the LiDAR may not be able to correctly capture the object information, thus affecting the perception ability of the autonomous vehicle. Therefore, the point cloud preprocessing algorithm in this study aims to compensate for the information loss that may occur during the LiDAR scanning process, and its performance evaluation results are shown in Table 6. Experimental results indicate that the interpolation method exhibits superior performance in drivable area segmentation, with an F1-score of up to 98.33%. The accuracy of the proposed interpolation method is improved by 0.5% and 0.8% compared with the original raw method and GussianBlur method, respectively.

![Figure 22](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t6-3591974-large.gif)

*TABLE 6*

Different coordinate systems give different physical meanings to the models. Cartesian coordinates simulate the reflection of point clouds on objects in human life, while spherical coordinates are closer to the working principle of LiDAR. Therefore, this study conducts ablation experiments on different coordinate systems, as shown in Table 7. The accuracy, precision, recall, and F1-score of the proposed fusion of coordinate systems are 0.9833, 0.9501, 0.9476, and 0.9488, respectively. Experimental results show that the fusion of coordinate systems achieves the best performance compared with a single coordinate system. Among single coordinate systems, Cartesian coordinates perform slightly better than spherical coordinates.

![Figure 23](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t7-3591974-large.gif)

*TABLE 7*

In drivable area segmentation tasks, the choice of loss function plays a critical role in spatial boundary handling and class discrimination. In this study, we evaluated three different loss functions—mean square error, dice loss, and cross-entropy loss—through ablation experiments, as shown in Table 8. Although cross-entropy loss and dice loss are commonly used in classification and segmentation tasks, respectively. The results of our ablation experiments show that mean square error outperforms cross-entropy loss and dice loss in terms of F1-score by 1.38% and 0.45%, respectively.

![Figure 24](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t8-3591974-large.gif)

*TABLE 8*

### F. FPGA Hardware Implementation Results

In this study, the ZCU104 was used as the acceleration platform, with the neural network deployed using the Verilog HDL. The clock constraint, a frequency of approximately 200MHz was used. The resource utilization of DASNet on the ZCU104 is detailed in Table 9. To efficiently store input feature maps and the feature maps generated during computation within the ZCU104, a large quantity of block RAM (BRAM) was employed for feature map access. The results revealed BRAM utilization of 98.88%. During the convolution operations, digital signal processing (DSP) was used repeatedly for Multiply Accumulate operations to enhance feature extraction efficiency. This structural design approach limited DSP usage to 832 units, yielding a utilization rate of 48.15%.

![Figure 25](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t9-3591974-large.gif)

*TABLE 9*

GPUs and FPGAs provide efficient computational power and are widely used in the field of artificial intelligence. These two types of acceleration platform exhibit marked differences in precision processing. GPUs are designed primarily to handle a large volume of floating-point operations. By contrast, FPGAs have faster Input/Output (I/O) access speeds, enabling high performance in real-time computation. To reduce hardware resource consumption, fixed-point arithmetic is adopted for floating-point operations. Although this approach reduces the complexity of floating-point calculations, it involves a trade-off related to computational precision. The present study compared 32-bit GPU floating-point operations and 18-bit FPGA fixed-point operations, as illustrated in Table 10, which indicates that compared with the GPU platform, the designed system experienced only a 0.05% reduction in accuracy and a 0.39% reduction in the F1-score.

![Figure 26](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t10-3591974-large.gif)

*TABLE 10*

To demonstrate the advantages of an FPGA related to computation speed and power consumption, the present experiment compared the performance of an Intel Core i7-11700 CPU, an NVIDIA RTX 2060 GPU, and NVIDIA Jetson Xavier NX, as indicated in Table 11. The evaluation metrics were calculated as follows:

$$
\begin{align*} T_{r}& =\frac {T_{\frac {C}{G}}}{T_{F}}. \tag {23}\\ P_{r}& =\frac {P_{\frac {C}{G}}}{P_{F}}. \tag {24}\end{align*}
$$

where $T_{r}$ and $P_{r}$ represent the time and power consumption ratios, respectively, and where *C*, *G*, and *F* represent the CPU, GPU, NX, and FPGA, respectively.

![Figure 27](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/6287639/10820123/11091303/lin.t11-3591974-large.gif)

*TABLE 11*

Regarding computation speed, the CPU, GPU, and NX required 44.11, 34.87, and 137.01ms, respectively, both of which were considerably slower than the FPGA’s 9.32ms. In addition, regarding power consumption, the CPU, GPU, and NX consumed 45, 102, 17.1 watts, respectively. Therefore, the power consumption of CPU, GPU, and NX is 12.17, 27.58, and 4.62 times that of FPGA, respectively. In summary, the FPGA outperformed both the CPU, GPU, and NX in terms of both computation speed and power consumption.

## SECTION IV. Conclusion

This study developed a lightweight neural network designed for use in ADSs to segment drivable areas and to provide information about the environment surrounding a self-driving car. The DASNet architecture was accelerated by the use of FPGA hardware, and the performance was evaluated using the wooded dataset collected by our laboratory and the KITTI public dataset. In addition to achieving superior segmentation performance on both structured (KITTI) and unstructured (wooded) environments, this work highlights the benefits of combining Cartesian and spherical coordinate systems for feature representation. These fusion methods enhance the self-driving car’s ability to perceive and recognize both local and global environments. The experimental results revealed that the proposed model in the wooded dataset and the KITTI public dataset achieved an F1-score of 0.9864 and 0.9449, respectively, outperforming many existing methods. Moreover, through FPGA hardware acceleration, the inference speed was reduced to 9.32ms. The real-time capabilities of DASNet were substantially enhanced. The method proposed in this study enables self-driving cars to accurately segment drivable areas and respond in real time, making it highly suitable for ADSs.

In our future research, we intend to shift our focus toward unsupervised learning methods. More specifically, techniques including domain adversarial learning, self-training, and co-training will be explored to enable the DASNet model to extract key features from unlabeled data to further facilitate the effective segmentation of drivable areas. Although the KITTI dataset and the forest dataset we collected sufficiently demonstrate the effectiveness of DASNet based on LiDAR in both structured and unstructured environments, existing public datasets for drivable area segmentation are predominantly image-based, with limited attention given to LiDAR data. We regard this as another important direction for future work. We plan to expand our LiDAR dataset to include various weather conditions and perform drivable area segmentation tasks on these diverse scenarios. This will help evaluate and demonstrate the model’s robustness and effectiveness under different environmental conditions.

## References

[^1]: S. Singh, “Critical reasons for crashes investigated in the national motor vehicle crash causation survey,” National Highway Traffic Safety Administration (NHTSA), Washington, DC, USA, Tech. Rep. DOT HS 812 115, 2015. [Google Scholar](https://scholar.google.com/scholar?as_q=Critical+reasons+for+crashes+investigated+in+the+national+motor+vehicle+crash+causation+survey&as_occt=title&hl=en&as_sdt=0%2C31)

[^2]: Y.-C. Chan, Y.-C. Lin, and P.-C. Chen, “Lane mark and drivable area detection using a novel instance segmentation scheme,” in Proc. IEEE/SICE Int. Symp. Syst. Integr. (SII), Paris, France, Jan. 2019, pp. 502–506. [IEEE](https://ieeexplore.ieee.org/document/8700359) [Google Scholar](https://scholar.google.com/scholar?as_q=Lane+mark+and+drivable+area+detection+using+a+novel+instance+segmentation+scheme&as_occt=title&hl=en&as_sdt=0%2C31)

[^3]: H. Xue, H. Fu, R. Ren, J. Zhang, B. Liu, Y. Fan, and B. Dai, “LiDAR-based drivable region detection for autonomous driving,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Prague, Czech Republic, Sep. 2021, pp. 1110–1116. [IEEE](https://ieeexplore.ieee.org/document/9636289) [Google Scholar](https://scholar.google.com/scholar?as_q=LiDAR-based+drivable+region+detection+for+autonomous+driving&as_occt=title&hl=en&as_sdt=0%2C31)

[^4]: L. Bai, Y. Lyu, X. Xu, and X. Huang, “PointNet on FPGA for real-time LiDAR point cloud processing,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Seville, Spain, Oct. 2020, pp. 1–5. [IEEE](https://ieeexplore.ieee.org/document/9180841) [Google Scholar](https://scholar.google.com/scholar?as_q=PointNet+on+FPGA+for+real-time+LiDAR+point+cloud+processing&as_occt=title&hl=en&as_sdt=0%2C31)

[^5]: Z. Zhu, X. Li, J. Xu, J. Yuan, and J. Tao, “Unstructured road segmentation based on road boundary enhancement point-cylinder network using LiDAR sensor,” Remote Sens., vol. 13, no. 3, p. 495, Jan. 2021. [DOI](https://doi.org/10.3390/rs13030495) [Google Scholar](https://scholar.google.com/scholar?as_q=Unstructured+road+segmentation+based+on+road+boundary+enhancement+point-cylinder+network+using+LiDAR+sensor&as_occt=title&hl=en&as_sdt=0%2C31)

[^6]: Y. Li, C. Le Bihan, T. Pourtau, T. Ristorcelli, and J. Ibanez-Guzman, “Coarse-to-fine segmentation on LiDAR point clouds in spherical coordinate and beyond,” IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 14588–14601, Dec. 2020. [IEEE](https://ieeexplore.ieee.org/document/9226132) [Google Scholar](https://scholar.google.com/scholar?as_q=Coarse-to-fine+segmentation+on+LiDAR+point+clouds+in+spherical+coordinate+and+beyond&as_occt=title&hl=en&as_sdt=0%2C31)

[^7]: A. Y. Hata, F. S. Osorio, and D. F. Wolf, “Robust curb detection and vehicle localization in urban environments,” in IEEE Intell. Vehicles Symp. Proc., Dearborn, MI, USA, Jun. 2014, pp. 1257–1262. [IEEE](https://ieeexplore.ieee.org/document/6856405) [Google Scholar](https://scholar.google.com/scholar?as_q=Robust+curb+detection+and+vehicle+localization+in+urban+environments&as_occt=title&hl=en&as_sdt=0%2C31)

[^8]: P. Kumar, C. P. McElhinney, P. Lewis, and T. McCarthy, “An automated algorithm for extracting road edges from terrestrial mobile LiDAR data,” ISPRS J. Photogramm. Remote Sens., vol. 85, pp. 44–55, Nov. 2013. [DOI](https://doi.org/10.1016/j.isprsjprs.2013.08.003) [Google Scholar](https://scholar.google.com/scholar?as_q=An+automated+algorithm+for+extracting+road+edges+from+terrestrial+mobile+LiDAR+data&as_occt=title&hl=en&as_sdt=0%2C31)

[^9]: Z. Azizi, A. Najafi, and S. Sadeghian, “Forest road detection using LiDAR data,” J. Forestry Res., vol. 25, no. 4, pp. 975–980, Dec. 2014. [DOI](https://doi.org/10.1007/s11676-014-0544-0) [Google Scholar](https://scholar.google.com/scholar?as_q=Forest+road+detection+using+LiDAR+data&as_occt=title&hl=en&as_sdt=0%2C31)

[^10]: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–3440. [IEEE](https://ieeexplore.ieee.org/document/7298965) [Google Scholar](https://scholar.google.com/scholar?as_q=Fully+convolutional+networks+for+semantic+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^11]: O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., 2015, pp. 234–241. [DOI](https://doi.org/10.1007/978-3-319-24574-4_28) [Google Scholar](https://scholar.google.com/scholar?as_q=U-net%3A+Convolutional+networks+for+biomedical+image+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^12]: V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder–decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, Dec. 2017. [IEEE](https://ieeexplore.ieee.org/document/7803544) [Google Scholar](https://scholar.google.com/scholar?as_q=SegNet%3A+A+deep+convolutional+encoder%E2%80%93decoder+architecture+for+image+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^13]: H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1520–1528. [IEEE](https://ieeexplore.ieee.org/document/7410535) [Google Scholar](https://scholar.google.com/scholar?as_q=Learning+deconvolution+network+for+semantic+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^14]: L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018. [IEEE](https://ieeexplore.ieee.org/document/7913730) [Google Scholar](https://scholar.google.com/scholar?as_q=DeepLab%3A+Semantic+image+segmentation+with+deep+convolutional+nets%2C+Atrous+convolution%2C+and+fully+connected+CRFs&as_occt=title&hl=en&as_sdt=0%2C31)

[^15]: G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5168–5177. [IEEE](https://ieeexplore.ieee.org/document/8100032) [Google Scholar](https://scholar.google.com/scholar?as_q=RefineNet%3A+Multi-path+refinement+networks+for+high-resolution+semantic+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^16]: H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 2881–2890. [IEEE](https://ieeexplore.ieee.org/document/8100143) [Google Scholar](https://scholar.google.com/scholar?as_q=Pyramid+scene+parsing+network&as_occt=title&hl=en&as_sdt=0%2C31)

[^17]: T. Takikawa, D. Acuna, V. Jampani, and S. Fidler, “Gated-SCNN: Gated shape CNNs for semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5229–5238. [IEEE](https://ieeexplore.ieee.org/document/9009833) [Google Scholar](https://scholar.google.com/scholar?as_q=Gated-SCNN%3A+Gated+shape+CNNs+for+semantic+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^18]: A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “RangeNet ++: Fast and accurate LiDAR semantic segmentation,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Macau, Nov. 2019, pp. 4213–4220. [IEEE](https://ieeexplore.ieee.org/document/8967762) [Google Scholar](https://scholar.google.com/scholar?as_q=RangeNet+%2B%2B%3A+Fast+and+accurate+LiDAR+semantic+segmentation&as_occt=title&hl=en&as_sdt=0%2C31)

[^19]: E. E. Aksoy, S. Baci, and S. Cavdar, “SalsaNet: Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving,” in Proc. IEEE Intell. Vehicles Symp. (IV), Las Vegas, NV, USA, Oct. 2020, pp. 926–932. [IEEE](https://ieeexplore.ieee.org/document/9304694) [Google Scholar](https://scholar.google.com/scholar?as_q=SalsaNet%3A+Fast+road+and+vehicle+segmentation+in+LiDAR+point+clouds+for+autonomous+driving&as_occt=title&hl=en&as_sdt=0%2C31)

[^20]: D. He, F. Abid, Y.-M. Kim, and J.-H. Kim, “SectorGSnet: Sector learning for efficient ground segmentation of outdoor LiDAR point clouds,” IEEE Access, vol. 10, pp. 11938–11946, 2022. [IEEE](https://ieeexplore.ieee.org/document/9691325) [Google Scholar](https://scholar.google.com/scholar?as_q=SectorGSnet%3A+Sector+learning+for+efficient+ground+segmentation+of+outdoor+LiDAR+point+clouds&as_occt=title&hl=en&as_sdt=0%2C31)

[^21]: F. Juefei-Xu, V. N. Boddeti, and M. Savvides, “Local binary convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4284–4293. [IEEE](https://ieeexplore.ieee.org/document/8099939) [Google Scholar](https://scholar.google.com/scholar?as_q=Local+binary+convolutional+neural+networks&as_occt=title&hl=en&as_sdt=0%2C31)

[^22]: A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” 2017, arXiv:1704.04861. [Google Scholar](https://scholar.google.com/scholar?as_q=MobileNets%3A+Efficient+convolutional+neural+networks+for+mobile+vision+applications&as_occt=title&hl=en&as_sdt=0%2C31)

[^23]: I.-C. Lin, C.-H. Tang, C.-T. Ni, X. Hu, Y.-T. Shen, P.-Y. Chen, and Y. Xie, “A novel, efficient implementation of a local binary convolutional neural network,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 4, pp. 1413–1417, Apr. 2021. [IEEE](https://ieeexplore.ieee.org/document/9249011) [Google Scholar](https://scholar.google.com/scholar?as_q=A+novel%2C+efficient+implementation+of+a+local+binary+convolutional+neural+network&as_occt=title&hl=en&as_sdt=0%2C31)

[^24]: L. Bai, Y. Zhao, and X. Huang, “A CNN accelerator on FPGA using depthwise separable convolution,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 65, no. 10, pp. 1415–1419, Oct. 2018. [IEEE](https://ieeexplore.ieee.org/document/8438987) [Google Scholar](https://scholar.google.com/scholar?as_q=A+CNN+accelerator+on+FPGA+using+depthwise+separable+convolution&as_occt=title&hl=en&as_sdt=0%2C31)

[^25]: Y. Lyu, L. Bai, and X. Huang, “ChipNet: Real-time LiDAR processing for drivable region segmentation on an FPGA,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 5, pp. 1769–1779, May 2019. [IEEE](https://ieeexplore.ieee.org/document/8580596) [Google Scholar](https://scholar.google.com/scholar?as_q=ChipNet%3A+Real-time+LiDAR+processing+for+drivable+region+segmentation+on+an+FPGA&as_occt=title&hl=en&as_sdt=0%2C31)

[^26]: M. Baczmanski, M. Wasala, and T. Kryjak, “Implementation of a perception system for autonomous vehicles using a detection-segmentation network in SoC FPGA,” in Proc. Int. Symp. Appl. Reconfigurable Comput., 2023, pp. 200–211. [DOI](https://doi.org/10.1007/978-3-031-42921-7_14) [Google Scholar](https://scholar.google.com/scholar?as_q=Implementation+of+a+perception+system+for+autonomous+vehicles+using+a+detection-segmentation+network+in+SoC+FPGA&as_occt=title&hl=en&as_sdt=0%2C31)

[^27]: C.-H. Shih, C.-J. Lin, and J.-Y. Jhang, “Ackerman unmanned mobile vehicle based on heterogeneous sensor in navigation control application,” Sensors, vol. 23, no. 9, p. 4558, May 2023. [DOI](https://doi.org/10.3390/s23094558) [Google Scholar](https://scholar.google.com/scholar?as_q=Ackerman+unmanned+mobile+vehicle+based+on+heterogeneous+sensor+in+navigation+control+application&as_occt=title&hl=en&as_sdt=0%2C31)

[^28]: Z. Zhang, J.-Y. Jhang, and C.-J. Lin, “Detection and navigation of unmanned vehicles in wooded environments using light detection and ranging sensors,” Sensors Mater., vol. 35, no. 11, pp. 3637–3654, Nov. 2023. [DOI](https://doi.org/10.18494/SAM4688) [Google Scholar](https://scholar.google.com/scholar?as_q=Detection+and+navigation+of+unmanned+vehicles+in+wooded+environments+using+light+detection+and+ranging+sensors&as_occt=title&hl=en&as_sdt=0%2C31)

[^29]: J. Fritsch, T. Kühnl, and A. Geiger, “A new performance measure and evaluation benchmark for road detection algorithms,” in Proc. 16th Int. IEEE Conf. Intell. Transp. Syst. (ITSC), Oct. 2013, pp. 1693–1700. [IEEE](https://ieeexplore.ieee.org/document/6728473) [Google Scholar](https://scholar.google.com/scholar?as_q=A+new+performance+measure+and+evaluation+benchmark+for+road+detection+algorithms&as_occt=title&hl=en&as_sdt=0%2C31)

[^30]: Q. Liu and S. Zhou, “LightFusion: Lightweight CNN architecture for enabling efficient sensor fusion in free road segmentation of autonomous driving,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 71, no. 9, pp. 4296–4300, Sep. 2024. [IEEE](https://ieeexplore.ieee.org/document/10488384) [Google Scholar](https://scholar.google.com/scholar?as_q=LightFusion%3A+Lightweight+CNN+architecture+for+enabling+efficient+sensor+fusion+in+free+road+segmentation+of+autonomous+driving&as_occt=title&hl=en&as_sdt=0%2C31)

### Additional References

