# Robust Compressive Histogramming Based on Autoencoder for SPAD Direct ToF LiDAR Covering Challenging Scenarios

## Abstract

The high-resolution single-photon avalanche diode (SPAD) sensor array chip for light detection and ranging (LiDAR) faces a challenge in handling high data rates, due to the use of time-correlated single-photon counting (TCSPC). On-chip partial histogramming methods suffer from low compression ratio (CR) and high latency. Count/histogram-less methods are with high noise-sensitivity. Direct compression on received photons has achieved high CRs but relies on handcrafted analytical codebooks (CBs) and lacks robustness. This article proposes a robust and data-driven compressive histogramming method based on the autoencoder, which shows superior accuracy improvement over previous compression methods, capable of covering challenging imaging scenarios. Furthermore, the compact 4-bit quantized compression engine design that can process at least 64 timestamps per illumination is proposed and implemented in FPGA. A line scanning LiDAR system is constructed after connecting the engine with our previous SPAD array chip. Compared to the TCSPC-based full histogram circuit, a memory size reduction of $3.76\times$ is achieved while maintaining similar depth accuracy. Compared to the sweep-based partial histogramming circuit, the proposed design achieves a $4\times$ improvement in CR and a 65% reduction in root-mean-square depth error (RMSE).

## Authors

Lichen Feng *Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi’an, China* [ORCID: 0000-0002-7685-2141](https://orcid.org/0000-0002-7685-2141)

Yimeng Liu *Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi’an, China* [ORCID: 0009-0008-7342-170X](https://orcid.org/0009-0008-7342-170X)

Dong Li *Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi’an, China* [ORCID: 0000-0002-0368-3017](https://orcid.org/0000-0002-0368-3017)

Yaoqi Bao *Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi’an, China* [ORCID: 0000-0002-7560-5990](https://orcid.org/0000-0002-7560-5990)

Rui Ma *Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi’an, China* [ORCID: 0000-0001-5015-4989](https://orcid.org/0000-0001-5015-4989)

Zhangming Zhu *Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi’an, China* [ORCID: 0000-0002-7764-1928](https://orcid.org/0000-0002-7764-1928)

## Publication Information

**Journal:** IEEE Sensors Journal **Year:** 2025 **Volume:** 25 **Issue:** 14 **Pages:** 27701-27711 **DOI:** [10.1109/JSEN.2025.3575784](https://doi.org/10.1109/JSEN.2025.3575784) **Article Number:** 11028946 **ISSN:** Print ISSN: 1530-437X, Electronic ISSN: 1558-1748, CD: 2379-9153

## Metrics

**Total Downloads:** 202

## Funding

- National Science and Technology Major Project (Grant: 2021ZD0114403)
- National Natural Science Foundation of China (Grant: 62474128, 62134005, 62021004 and 62204181)

---

## Keywords

**IEEE Keywords:** Histograms, Single-photon avalanche diodes, Photonics, Autoencoders, Laser radar, Encoding, System-on-chip, Imaging, Accuracy, Lighting

**Index Terms:** Light Detection And Ranging, Challenging Scenarios, Single-photon Avalanche Diode, Illumination, Data Rate, Time-correlated Single-photon Counting, Memory Size, Accurate Depth, Depth Error, Neural Network, Imaging Results, Depth Images, Imaging Conditions, Obvious Peak, Photon Counting, Challenging Conditions, Full Method, Depth Estimation, Head Shape, Histogram Method, Time-to-digital Converter, Gray Code, Coding Method, Scene Changes, Inter-frame, Bin Values, Technology Node, Form Of Histograms, Autoencoder Model

**Author Keywords:** Autoencoder, histogram compression, light detection and ranging (LiDAR), single-photon avalanche diode (SPAD), time-correlated single-photon counting (TCSPC)

undefined
## SECTION I. Introduction

LIGHT detection and ranging (LiDAR) systems [^1] facilitate a wide range of vision applications, particularly long-range sensing for automotive driving [^2], which involves challenging imaging scenarios. Single-photon avalanche diode (SPAD)-based LiDAR has superior advantages, such as photon-level sensitivity [^3], picosecond resolution [^4], fast gating [^5], and compatibility with the well-established CMOS process [^6] for cost-effectiveness.

In direct time-of-flight (D-ToF) LiDAR, a detected photon generates a pulse, which is then converted to a timestamp through a time-to-digital converter (TDC). To compensate for nonideal effects like ambient light [^7], the time-correlated single-photon counting (TCSPC) method [^8] is widely used to build the timestamp histogram for surface depth estimation. A LiDAR operating at quarter video graphics array (QVGA) resolution with 30 frames/s would generate a data rate of 18 Gb/s (assuming $1\times 10^{3}$ bins per SPAD pixel and 1 byte per bin), which poses a significant challenge for data readout. Direct on-chip depth calculation based on the full histogram could reduce the data rate by $10^{3}\times$ ($10^{3}$ bins are reduced to one depth), but the required memory size is quite large. Assuming that the scanning scheme processes one row of 240 pixels per cycle, caching the histogram of one row requires 1.92 Mb, which occupies a considerable portion of silicon area.

On-chip partial histogramming methods with customized in-pixel TDCs in flash LiDARs have been silicon verified [^9], [^10], [^11], [^12], [^13], [^14]. “Zooming” techniques adjust the size of time window adaptively to reduce the bin numbers [^10], [^11], [^12]. Gyongy et al. [^13] propose a shifting method to locate and track peaks. A sweeping-based method is realized in [^14] with on-chip peak extraction by an FIR filter. However, these indirect methods need to open a narrowed time window in each illumination cycle. The illumination power is wasted when photons are received but outside of the time window. More exposure steps are required, leading to increased latency [^15]. Recently, a count-free method [^16] that updates a comparing threshold on the fly is proposed. It requires two registers per pixel but it offers limited accuracy in challenging conditions. Tontini et al. [^17] propose the histogram-less method that only needs two counters and an accumulator per pixel, but an additional phase recording background noise before each depth estimation increases its latency. Recurrent neural networks, such as long short-term memory [^18] and spiking Legendre memory unit [^19], are also adopted to estimate depth through timesteps [^18] or detected pulses from SPAD [^19] in a data-driven manner. However, their high complexity that cannot be shared by SPAD pixels remains to be simplified before applying to high-resolution SPAD arrays.

Compressive sensing strategies [^20], [^21], [^22], [^23], whose codebooks (CBs) can be shared among SPADs, have been successfully applied to full histograms without narrowing the time window. The timestamps received in each illumination cycle are encoded into the compressed representations within the cycle, which reduces the required memory size and data transferring rate on the fly. Sheehan et al. [^20] encode the histogram into the Fourier domain and reconstruct the D-ToF based on maximum likelihood estimation. Gutierrez-Barragan et al. [^22] propose and evaluate different coding schemes and achieve a high compression ratio (CR) of 128 by Gray-based coding with low depth error. Poisson et al. [^23] introduce pseudorandom projections for histogram compression. In general, these methods encode the received photon analytically using a hardware-friendly CB. However, customized for certain scenes, their handcrafted analytical CBs are sensitive to changes in imaging conditions.

The recent success of artificial neural networks [^24] has greatly reduced the requirement for handcrafted analytical CBs and has improved the performance in extremely challenging imaging conditions. However, there has been no explicit discussion on applying learning-based compression to histograms on chip. In this article, we draw inspiration from coding schemes [^22] and data-driven neural networks [^24] to design and evaluate the autoencoder [^25] for efficient on-chip SPAD histogram compression. Autoencoder is a type of neural network designed to learn a compressed representation of input data (SPAD histogram in our work). The input can be decoded from the representation with minimal error, which is consistent with existing compressive histogramming methods [^20], [^21], [^22], [^23]. Therefore, we can take advantage of the autoencoder to achieve learning-based compressive histogramming. The key contributions of this study can be summarized as follows.

1. The autoencoder is applied to histogram compression for the first time. The customized autoencoder is proposed to reduce the on-chip computation and improve the accuracy of depth estimation simultaneously. The proposed method is evaluated across various imaging scenarios, including the very challenging ones, to show the superior robustness of this data-driven model over previous methods.
2. Based on the proposed method, a compact compression engine that can process at least 64 timestamps per illumination is implemented in FPGA. A LiDAR system is constructed by connecting the engine with our SPAD array chip [^26]. Compared to the TCSPC-based circuit, a memory size reduction of $3.76\times$ is achieved with similar depth accuracy. Compared to the sweep-based circuit, a $4\times$ CR improvement is achieved with a 65% reduction in root-mean-square depth error (RMSE).

The remainder of this article is organized as follows. Section II describes the compressive single-photon histogramming based on autoencoder. Section III shows the quantitative metrics analysis and the imaging results, showing the performance and robustness improvements. The compact compression engine is described in Section IV, where FPGA implementation results are presented and compared with previous methods. The conclusion is drawn in Section V.

## SECTION II. Compressive Histogram Formation Based on Autoencoder

### A. Overview of TCSPC-Based Histogramming of SPAD D-ToF LiDAR

Fig. 1 depicts a simplified SPAD D-ToF LiDAR system schematic using TCSPC. A laser triggered by a start pulse emits a pulse wave of photons to a scene. The SPAD pixels on the receiver chip detect individual reflection photons. In the presence of a single photon, the SPAD induces an avalanche that generates a directly detectable stop pulse. A TDC then converts the time difference between the start and stop pulses to a digital timestamp, which is the D-ToF representing the $2\times$ depth of the reflective surface.

![Figure 1](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng1-3575784-large.gif)

*Fig. 1. Simplified system-level schematic of SPAD D-ToF LiDAR.*

In a general scene with one distinct reflecting surface, the reflected photons exhibit the same envelope of the laser pulse, which is assumed to follow a normal distribution N($\mu,\sigma ^{2}$) for simplicity. Due to the existence of ambient light, the SPAD on the receiver chip also randomly receives unwanted photons to comply with a uniform distribution $\textbf {U}'$ in low ambient light conditions. The number of discretized timestamp bins over the range of interest is denoted by *N*. Let $\tau$ denote the physical timestamp, so the discretized timestamp is $t = \tau$/$\Delta$, where $\Delta$ represents the temporal resolution of the TDC. Denote *T* as the timestamp range, which equals the laser pulse repetition period, and then, ${N} = T$/$\Delta$.

For an arbitrary pixel, the photon count $y_{t}$ at discretized timestamp $t =0$, 1,..., ${N} -1$ can be modeled as

$$
\begin{equation*}y_t \mid\left(\mu, \sigma, \pi_n, \pi_u\right) \sim \pi_n \mathbf{N}\left(\mu, \sigma^2\right)+\pi_u \mathbf{U}^{\prime} \tag {1}\end{equation*}
$$

where the expectation $\mu$ corresponds to the depth of the reflection surface and $\pi _{n}$ and $\pi _{u}$ ($\pi _{n}+\pi _{u}=1$) are the weight parameters of the normal and uniform distributions, respectively. Assuming the constant number of photons, by adjusting the ratio between $\pi _{n}$ and $\pi _{u}$, different signal-to-background ratios (SBRs) can be determined. Since it is not possible to tell whether the received photon is from laser or ambient, the TCSPC scheme is utilized to perform a statistical estimation of $\mu$. The SPAD sensor can minimize pile-up distortions by noise filtering [^26] or by multievent mode [^27], which guarantees that $y_{t}$ is an appropriate approximation of the mixture distribution.

As shown in Fig. 2, the photon-emission-detection and time-to-digit conversion process is repeated for *Cyl* ($\sim 10^{3}$) cycles, and the timestamp is used to update the statistic in a bin in an online manner. The full histogram ${\mathbf { Y}} ={(y_{t})}_{t=0}^{N-1}\in {\mathbf { R}}^{N}$ of the timestamps is constructed, which approximates the mixture distribution [^28]. The histogram formation process generates a 3-D histogram image, one histogram per pixel. In emerging SPAD D-ToF LiDAR with high resolution, building the histogram image and transferring the image off-chip for postprocessing leads to tens of gigabit-per-second to terabit-per-second data rates, which is the bottleneck of the system as illustrated above.

![Figure 2](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng2-3575784-large.gif)

*Fig. 2. Full histogram formation process according to the received photons.*

### B. Compressive Histogramming Based on Autoencoder

As observed by Gutierrez-Barragan et al. [^22], the histogram can be compressed on the fly where we see a photon and its timing information only once, without the need to build the entire **Y** explicitly, if the compressive computation can be linearly applied to individual photon timestamps.

In general, this compressive histogram formation can be expressed as follows. Denote $y_{j}^{c}$ as the one-hot vector for the photon detected at the *j*th timestamp in the *c*th cycle, i.e., $y_{j}^{c} =$ [0,..., 0, $\underbrace {1}\limits _{\text {jth}}$, 0,..., $\underbrace {0}\limits _{\text {Nth}}$]. The measured photon count $y_{j}$ at the *j*th histogram bin is rewritten as $y_{j}=\sum _{c=1}^{cj\vert Cyl} y_{j}^{c}$, where $c_{j}$ is the cycle with the photon detected at the *j*th bin.

In this case, given a coding matrix *C* $\in ~{\mathbf { R}}^{K \times N}=$ [*C*1;...; *Ck*;...; *CK*], $k=1$, 2,..., *K*, the projection of $y_{j}$ on ${C}_{k}=$[$C_{0}^{k},C_{1}^{k}$,..., $C_{N-1}^{k}$] can be written as

$$
\begin{equation*} \mathrm {p}_{k}=\sum _{j=\mathrm {0}}^{N-1} \sum _{c=1}^{cj\vert Cyl} C_{j}^{k} y_{j}^{c}. \tag {2}\end{equation*}
$$

By prestoring the coding matrix *C*, the above equation is a simple addition per photon-emission-detection cycle, which is the same as the original histogram formation process. By choosing the appropriate *C*, the full histogram ${\mathbf { Y}}~\in {\mathbf { R}}^{N}$ can be compressed into ${\mathbf { P}} =$ [$p_{1},p_{2}$,..., $p_{K}$]$\in {\mathbf { R}}^{K}$ with a CR of (*N*/*K*). The latent expression **P** has no physical meaning. The smaller *K* is, the larger CR can be achieved. However, the smaller *K*, the more loss of information is introduced by the compression, so the depth estimation error will be larger. A tradeoff between CR and depth error should be considered during choosing *K* [^22]. Note that this compressive histogramming process works, provided that the timestamp for each received photon is generated, i.e., the SPAD operates in D-ToF mode and is equipped with a TDC. There are no additional constraints on SPAD sensor types or architectures.

Gutierrez-Barragan et al. [^22] design and evaluate various coding schemes for SPAD histogram compression, including the Fourier domain coding in the sketching framework [^20]. They use a zero-mean normalized cross correlation method [^29] involving impulse response function to decode the depth. However, the analytical matrix design requires professional coding knowledge and techniques, and the performance (depth accuracy), robustness, and transferability of the linear handcrafted analytical projection are limited.

In this work, we apply the end-to-end data-driven autoencoder to do the histogram compression for the first time. An autoencoder is a type of artificial neural network trained to encode the input data into a compressed and reduced representation (latent space) and then decode it back to the original data. The primary goal is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction or feature extraction. Additionally, the autoencoder based on neural network can be easily deployed in hardware using mature quantization methods [^30].

As shown in Fig. 3(a), autoencoder-based compression is usually modeled with the encoder $E(\cdot)$ and decoder $G(\cdot)$ [^30], [^31]. To encode the input **Y**, the sequentially connected layers (Layer $1~\sim x$ in blue rectangles) progressively decreasing in size are constructed to execute the latent ${\mathbf { P}} =E$(**Y**). By controlling the dimension of **P** according to the desired CR, compression can be achieved. The lossy reconstruction of **Y** can be realized using the decoder ${\mathbf { Y}}' =G$(**P**). The compression incurs a distortion *d*(**Y**, ${\mathbf { Y}}'$), e.g., $d=$ MSE(**Y**, ${\mathbf { Y}}'$). The distortion can be utilized as the loss function to train the autoencoder model in an unsupervised manner.

![Figure 3](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng3ab-3575784-large.gif)

*Fig. 3. (a) Basic structure of autoencoder. (b) Proposed asymmetric autoencoder.*

To apply the autoencoder for histogram compression, the input **Y** is set to be the full histogram. By finding the argmax of ${\mathbf { Y}}'$, the depth can be estimated. To realize this kind of autoencoder on the fly and on-chip successfully and efficiently according to (2), the encoder $E(\cdot)$ in Fig. 3(a) should consist of the fewest possible layers and should exclude nonlinear activation layers. To meet this requirement, we propose the asymmetric inverted autoencoder, as shown in Fig. 3(b). The proposed model consists of an encoder with only one fully connected layer and a decoder with layers progressively increasing in size. The first layer of the encoder is the fully connected layer, which is linear to be consistent with the operations in (2). We can readout the output of the first layer and regard it as the encoded result. The remaining operations are processed off chip, where the nonlinear layers can be utilized to represent high-level features. The massive computations can be offloaded in this way.

## SECTION III. Quantitative Performance Comparisons With Previous Methods

### A. Simulation Setup and Performance Metrics Definition

To trade off between the performance and the complexity, we applied the Monte Carlo simulation for the inverted autoencoders with different sizes and activation functions following the procedure in [^22].

For a fair comparison, 1000 histogram samples with ${N} =1024$, containing 1000 received photons per sample ($N_{p} =1000$), are generated for 64 depths according to (1) under two SBRs, 0.01 and 0.5. The depths are with equal intervals as that in [^22]. The SBR is defined as $\left(\sum \mathrm{y}(\mathrm{~N}) / \sum \mathrm{y}(\mathbf{U})^{\prime}\right)$ as that in [^22] to evaluate the autoencoder’s model performance in various scenes. Hence, there are 64 000 samples for each SBR; 70% of the samples are randomly selected for training with the loss of Loss = MSE(**Y**, *G*[*E*(**Y**)] and the rest for testing. In the simulation, the histogram is the input, which is encoded directly without considering the accumulation process shown in (2). This setup is the same as the Gray coding and other compressive histogramming methods [^22], which is enough for evaluating the performance of these compressive histogramming methods.

The estimated depth $\mu _{e}'$s are decided by finding *t* with the maximum $y_{t}$ from the reconstructed histogram of the autoencoder. The plain TCSPC using full histogram, the quaternary zooming [^12], the sweep-based (with 16 subranges) [^14], the histogram-less [^17], and the Gray coding [^22] methods are reimplemented and tested on the same datasets.

The performance metrics used are relative mean expected absolute depth error (RMDE) and RMSE as in [^22], defined as

$$
\begin{align*} \begin{cases} \displaystyle \mathrm {RMDE}=\frac {1}{n\times N}\sum _{iu=1}^{n} | \mu _{e,iu}-\mu _{o,iu} | \\ \displaystyle \mathrm {RMSE}=\sqrt {\frac {1}{n}\sum _{iu=1}^{n} {\left ({{\mu _{e,iu}-\mu _{o,iu}}}\right )^{2}}} \end{cases} \tag {3}\end{align*}
$$

where $\mu _{o}$ is the original depth embedded in the dataset, *n* is the total number of testing samples, and *iu* denotes the index of the sample. In addition, we use the additional metric of accuracy (Acc5), defined as $(\text {The number of}~\text {samples with}~| \mu e-\mu o |{\lt 5~\text {bins}}/\text {The total}~\text {number of samples})\times 100$%, to evaluate these methods more comprehensively. Note that the definitions of SBR and RMDE consider the entire range for the background light, which is different from the traditional definitions and is only used for a fair comparison with other methods using the same metric definition. The estimated depth $\mu _{e}$ and the original depth $\mu _{o}$ are defined as the index of the bin where the peak is located, so no real unit with physical meaning is considered during the simulation. RMSE is expressed in units of “bin.” No real distance or physical depth is considered during the simulation.

The width of the received Gaussian pulse, WG, is set to $\Delta$, i.e., ${\exp }^{-(t-\mu)^{2}/\Delta }$, which is also the same as that in [^22] for a fair comparison. Two full histogram examples corresponding to SBR =0.01 and 0.5 are shown in Fig. 4(a). It can be observed that this setup of ${W}_{\mathrm {G}}=\Delta$ makes the task relatively simple even for SBR =0.01 since the peak is easily distinguished from the histograms.

![Figure 4](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng4abcd-3575784-large.gif)

*Fig. 4. (a) 1024-bin full histograms with ${N}_{p}=1000$ and ${W}_{\text {G}}={\mathsf {\Delta }} $ . (b) 1024-bin full histograms with ${N}_{p}=6000$ and ${W}_{\text {G}}= 4{\mathsf {\Delta }} $ . (c) 1024-bin full histograms with ${N}_{p}=6000$ and ${W}_{\text {G}}= 6{\mathsf {\Delta }} $ . Multipeaks exist due to the extreme condition. (d) 1024-bin full histograms after FIR filtering with ${N}_{p}=6000$ and ${W}_{\text {G}}= 6{\mathsf {\Delta }} $ . Pseudo peaks are eliminated.*

The performances of these depth estimation methods are shown in Table I. For SBR =0.5, nearly all the methods achieve the Acc5 of 100% with small RMSEs and RMDEs, which further demonstrates the ideality of the imaging condition (${W}_{\mathrm {G}}=\Delta$). For SBR =0.01, the performances of quaternary zooming, sweep-based, histogram-less, and Gray coding greatly degrade to the Acc5’s of less than 65% and RMSEs over 200, which are much worse than the plain full histogram method. Adding the FIR filtering to the sweep-based method as in [^14] improves the RMDE by $2.65\times$, RMSE by $1.71\times$, and Acc5 by 14.58%. The sweep-based technique with 16 subranges can only use around 63 photons in each subrange, which results in suboptimal performance despite the use of FIR filtering.

![Figure 5](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng.t1-3575784-large.gif)

*TABLE I*

The proposed autoencoder, on the other hand, achieves much better performance without the help of FIR filtering. We have varied the number of neurons and the activation function in each layer to find the best setup of the autoencoder. Acc5 of 97.13% with the CR of 128 [(1024)-8-128-1024] and Acc5 of 99.97% with the CR of 64 [(1024)-16-128-1024] are achieved by the autoencoders using the Hardtanh activation, which shows the superior performance improvement brought by the data-driven learning; (1024)-8-128-1024 represents the fully connected neural network model with 1024 input neurons, and 8 and 128 neurons in the first and second hidden layers, respectively, and 1024 output neurons.

To implement the proposed method in hardware efficiently, we quantize the encoder model (1024)–16-128-1024 in fixed-point arithmetic according to the advanced post-train quantization method [^30], to evaluate the influence of quantization. There are weights and outputs of activation layers that need to be converted to fixed points for efficient hardware realization. The results of RMDE and RMSE versus the bit-widths on the dataset with $N=1024,N_{p}=1000$, ${W}_{\mathrm {G}}=\Delta$, and SBR =0.01 are shown in Fig. 5. With the decrease of the bit-width of activation from floating point to 2-bit fixed point, RMDE and RMSE gradually increase. With the decrease of weight bit-widths from 8 to 2 bits, the performance varies. The 4-bit bit-width is selected for both activation and weight to tradeoff between performance and hardware complexity.

![Figure 6](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng5-3575784-large.gif)

*Fig. 5. Performance versus quantization bit-widths on dataset with ${N}={1024}$ , $ {N}_{p}=1000$ , ${W}_{\text {G}}={\mathsf {\Delta }} $ , and SBR =0.01.*

However, as illustrated above, this imaging condition setup is still an over-ideal imaging condition. WG is not easy to be equal to $\Delta$ with typical laser emitters. To further evaluate the robustness of these depth estimation methods, we extend WG to $4\Delta$ and $6\Delta$ while keeping the same SBR, to simulate the more challenging conditions with both low SBR scenes, and not obvious peaks. The number of received photons is increased to 6000 to adapt to the challenging imaging scenarios [^26]. The histograms after pulse extensions are shown in Fig. 4(b) and (c). Multipeak appears when ${W}_{\mathrm {G}}= 6\Delta$ at SBR =0.01, which is an extreme imaging condition that the plain full histogram method generates errors.

The performances of these depth estimation methods on this challenging dataset are shown in Table II. The autoencoder model (1024)–16-128-1024 trained and tested on the datasets with SBR =0.01 and ${W}_{\mathrm {G}}= 6\Delta$ achieves Acc5 over 80%, which is close to that of the full histogram method. Increasing the CR to 128 dramatically degrades Acc5 to 44.20%. This model [(1024)-16-128-1024 trained on the datasets with SBR =0.01 and ${W}_{\mathrm {G}}= 6\Delta$] is also tested on the dataset with SBR =0.5 and ${W}_{\mathrm {G}}= 6\Delta$ to examine its transferability. RMDE =0.002%, RMSE =0.04, and Acc$5=100$% are achieved, which further shows the robustness of the proposed autoencoder method toward SBR changes. After adding FIR filtering, the pseudo peaks can be eliminated even at the extreme scene with ${W}_{\mathrm {G}}= 6\Delta$ and SBR =0.01, as shown in Fig. 4(d). Therefore, the sweep-based method with FIR filtering as in [^14] achieves much better performance than zoom, histogram-less, and Gray coding methods in terms of RMSE/RMDE/Acc5 under extreme illumination scenarios.

![Figure 7](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng.t2-3575784-large.gif)

*TABLE II*

The training of the abovementioned autoencoders was conducted using a laptop equipped with an NVIDIA GeForce RTX 3060 GPU, an Intel Core i7-12700H CPU, and two Samsung 8-GB DDR memories. A maximum of 200 epochs (1647.54 s) was required for the convergence of training these autoencoders with different sizes for various SBRs, $N_{p}'$s, and ${W}_{\mathrm {G}}'$s.

### B. Imaging Results and Comparisons

In this section, we assess the imaging performance of the proposed autoencoder by applying it to two illumination datasets.

The first set of data consists of physically accurate histogram images rendered with MitsubaToF [^32] ($\Delta =50$ ps and $N=2000$). To adapt to the full histogram with $N=2000$ bins in this dataset, we have extended the model to the shape of (2000)-20-160-2000. As shown in Table II, the full histogram method, the sweep-based method with FIR filtering [^14], and the proposed autoencoder achieve better performance in challenging condition ($N_{p}=6000$, ${W}_{\mathrm {G}}= 6\Delta$, and SBR =0.01, and with no obvious peak). Therefore, in this imaging experiment, we focus on comparing the proposed autoencoder with the full histogram method and the sweep-based method with FIR filtering. Since the proposed autoecoder method is a compressive histogramming method, the state-of-the-art compressive histogramming method, Gray coding [^22], is also implemented for comparison to show the performance improvement in challenging conditions in terms of compressive histogramming. The imaging result using the 4-bit quantized autoencoder is also included, which shows a neglectable loss of accuracies. The estimated depth images are compared to the original depth embedded in the dataset to show the depth errors.

The depth images and depth errors obtained from five methods for the scene of kitchen with SBR =0.01 and 0.5, respectively, are shown in Fig. 6. The proposed autoencoder achieves the best RMDE/RMSE/Acc5 among the four methods at both SBR =0.01 and 0.5. For SBR =0.5, all the four methods generate good depth images with Acc$5\gt 95$%, while for SBR =0.01, only the autoencoder and full histogram methods output the depth image with good quality, which shows the superior robustness of the proposed autoencoder toward SBR changes.

![Figure 8](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng6-3575784-large.gif)

*Fig. 6. Imaging comparisons. Depth images and depth errors for the scene in kitchen with SBR =0.01 and 0.5 using the full histogram, the sweep-based method w/FIR [14], the Gray-based coding [22], and the proposed autoencoder (floating point and fixed point), respectively. The best RMDE/RMSE/Acc5 among the four methods are denoted in red. Four representative histogram examples are also included.*

As shown in the four histogram examples, the pixels on the image can be divided into two categories: pixels generating histograms with obvious peaks and pixels without obvious peaks. According to the Acc5 results from the full histogram method, 17.76% and 3.15% pixels generate the absolute depth errors larger than 5 bins at SBR =0.01 and 0.5, respectively. By applying the proposed autoencoder with high-level representation, however, these percentages are decreased to 16.39% and 2.99%, respectively, which shows that the proposed data-driven and learning-based method can eliminate the influence of pseudo peaks to some extent.

To further evaluate the robustness improvement toward scene changes brought by the autoencoder for compressive histogramming method, it is tested on the polystyrene head dataset, consisting of images with a polystyrene head in the scene which is captured by a scanning LiDAR at Heriot-Watt University [^33]. The data cube has the width and height of 141 pixels, and a total of $N=4613$ timestamps. A total acquisition time of 100 ms was used for each pixel resulting in an average photon count of 337 with an SBR of approximately 6.82. This is much clearer data with significantly fewer photon counts compared to the previously synthesized kitchen dataset.

First, we apply our autoencoder (2000)-20-160-2000 trained on the kitchen dataset [^32] directly (without retraining or parameter changes) to this new dataset [^33]. The number of neurons (2000) on the input and output layers is extended to 2560 to realize the CR of 128, where the additional 560 neurons are with connection weights of 0 to guarantee that the new model is the same as the original one. Since there is no embedded true depth, the estimated depth image obtained using a full-resolution histogram is adopted as the baseline, which is used to calculate the performance metrics. The 4-bit quantized autoencoder is also applied to show how the fixed-point autoencoder behaves with respect to floating-point implementation. The Gray-based CB for imaging in Fig. 6 is also applied to this polystyrene head dataset directly without parameter changes.

The 3-D plots of these methods are shown in Fig. 7. The proposed autoencoder achieves good RMDE/RMSE/Acc5 and the shape of the head is well-defined. The quantization has a neglectable influence on the imaging result. The 3-D plot by directly applying the Gray-based CB for the previous dataset shows that the shape and depth of the head are not well-defined, indicating the low transferability of the Gray-based method. The 3-D plot result using the readily available sketch-based CB released in [^20], which is adapted to the polystyrene head dataset, is also included in Fig. 7 to show the possible performance improvement of Gray coding method after careful parameter adjustment since the sketch-based coding method (denoted as Truncated Fourier coding in [^22]) is inferior to the Gray coding method as demonstrated in [^22].

![Figure 9](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng7-3575784-large.gif)

*Fig. 7. Illumination compressive imaging. The 3-D plots of the polystyrene head with photon count =337 and SBR =6.82 using the autoencoder (floating point and fixed point), and the Gray-based method. The CBs of the two methods are from the imaging experiments in Fig. 6. The result from the sketch-based CB, released in [20] and adapted to the polystyrene head dataset, is also included.*

It is worth considering that the recorded data of the polystyrene head are with much more sparse photons (&lt;1200 photons (337 on average) across 4613 bins). A typical histogram example is included in Fig. 7, which is quite different from the histograms in Figs. 4 and 6. The imaging results demonstrate the robustness of the proposed data-driven autoencoder: Our autoencoder-based methods can be directly and successfully applied to another scene with different SBR and different envelopes of the histogram. Note that this robustness of the autoencoder toward scene changes is not related to the motion artifacts, which is in accordance with the previous works [^9], [^10], [^11], [^12], [^13], [^14], [^20], [^21], [^22], [^23].

According to these results, we can conclude that the proposed autoencoder can compress the histogram and reconstruct the depth image with the similar error levels as the full histogram method even in extremely challenging imaging scenarios and outperforms the sweep- and Gray-based compression methods in terms of robustness. The circuit design of the compact autoencoder engine is shown in detail in Section IV.

## SECTION IV. Circuit Design of the Compact Compression Autoencoder Engine

### A. Circuit Design of the Compression Engine

According to (2), the compressive histogramming process can be realized on the fly according to each received photon of the pixel. The computation of the up-to-date coded histogram bin value ($p_{k}^{\text {new}}$, *k =* 0 ~*K −1*) according to the timestamp ($y_{j}^{c}$) is an accumulation operation of $p_{k}^{\text {new}}=p_{k}^{\text {old}}+y_{j}^{c}C_{j}^{k}$. In circuit implementation, $y_{j}^{c}$ is the index of the *j*th timestamp (the output of a TDC), which can be used as the address to read out the *K* codes from the *j*th row of the *K* CB memories ($C_{j}^{k}$). The *K* codes are added to the previously stored coded histogram bin values ($p_{k}^{\text {old}}$, *k =* 0 ~*K−1*) one by one, to update $p_{0\sim K-1}$ sequentially.

Therefore, the architecture of the proposed compact autoencoder engine is shown in Fig. 8. This simple compression engine is composed of *K* shared CB memories with the depth of *N*, a controller (“Ctrl.” block), an address generator (“*K* Addr. Gen.” block), an accumulator (“Accum.” block), and $N_{\text {px}}$ Coded Hist. (cH) memories with memory depth of *K*. In this way, $N_{\text {px}}$ timestamps (i.e., received photons) can be processed in the engine within one illumination cycle.

![Figure 10](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng8-3575784-large.gif)

*Fig. 8. Circuit design of the compact autoencoder-based histogram compression engine.*

The processing timing diagram of the engines is shown in Fig. 9. $N_{\text {px}}$ timestamps are read into the engine sequentially. Once reading in a timestamp “Tstp#*i*” ($i=1$,..., $N_{\text {px}}$) through the multiplexer, the read enabling signal “Rd” for the *K* shared *CB* memories, and the “Trig” signal for the address generator is valid. For the *K* shared *CB* memories, the timestamp “Tstp#*i*” is used as an address to read the *K* codes from the *K* shared *CB* memories at this address sequentially. In *K* Addr. Gen. block, addresses from 1 to *K* (A1~A*K* shown in Fig. 9) are generated sequentially to read the *K* original coded histograms from cH Mem.#*i*. By enabling “Rd” and “Trig” signals at the same time, the *K* codes from the *CB* memories and the *K* original coded histogram bins from cH Mem.#*i* are aligned, so they can be added up one-to-one as the *K* updated coded histogram bins, realizing the compressive histogram formation process shown in (2). After generating A*K*, the updating process of the coded histogram is completed. Then, the new timestamp “Tstp#($i+1$)” is read in to repeat the above procedure, to update cH Mem.#($i+1$).

![Figure 11](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng9-3575784-large.gif)

*Fig. 9. Timing diagram of the histogram compression engine.*

The proposed engine encodes the $N_{\text {px}}$ timestamps into the coded histograms within each illumination cycle sequentially, so no limit on whether the $N_{\text {px}}$ timestamps come from the same or different SPADs (the type of SPAD detectors) is required. If the $N_{\text {px}}$ timestamps are from $N_{\text {px}}'$ ($\lt N_{\text {px}}$) SPADs (having multievent SPADs in the $N_{\text {px}}'$ SPADs), then some of the $N_{\text {px}}$ coded histogram memories are not used within one illumination. In our hardware experiment, the $N_{\text {px}}$ timestamps are from the $N_{\text {px}}$ different SPADs (single-event SPADs) to adapt to our previous sensor chip [^26], which can suppress the pile-up effect even in the single-event mode. This setup maximizes the used number of coded histogram memories and minimizes the average memory consumed by one SPAD. Assuming that the bit-widths of *CB* memory and cH memory are both bw, the total number of bits for processing the $N_{\text {px}}$ pixels is [($K\times N +K\times N_{\text {px}}) \times$ bw]. ($K\times N/N_{\text {px}}+K)\times$ bw bits per pixel are needed. For example, to implement the 4-bit quantized (1024)–16-128-1024 autoencoder model for processing 64 pixels per illumination, $N=1024,K=16,N_{\text {px}}=64$, and bw =4bit, so 68 kb is required for storing all the content, and 1.0625 kb per pixel is required. Note that the definition of CR $=N/K$ represents the theoretical limit of memory reduction if the CB is shared by thousands of pixels ($N_{\text {px}}\gg N$). For the above example, the required memory size is reduced by $3.76\times$ if the full histogram method uses the 4-bit bin values (68 kb versus 256 kb $= 64\times 1024\times 4$ bits).

Since the histogram coding process updates the coded histograms sequentially, the reading out of all the coded histograms consumes the same amount of time. Therefore, 1/2 period is reserved to read out all the coded histograms in the $N_{i}$th illumination. $N_{i}$ can be any value to fulfill various experiment setups, which is a conservative design that can be improved for a fixed $N_{i}$.

Assuming the interframe rate [^34] (illumination repetition rate) of 180 kHz, allowing the maximum of 6k interframes to adapt to challenging scenes as illustrated above (low SBR scenes and not obvious peaks) and to get the true frame rate of 30 frames/s, the clock frequency of the compression engine’s “CLK” is $2\times 180\times K\times N_{\text {px}}$ kHz. Multiplying by 2 is added due to the scanning readout method, which needs half of the illumination period to readout cH Mem.#$1\sim N_{\text {px}}$. For the encoder model with the depth resolution of 1024 and the CR of 64 ($K=16$), the clock frequency of 368.64 MHz guarantees $N_{\text {px}}$ of 64. Note that the frequency calculation assumes the 180-kHz illumination repetition rate and 6k interframes to adapt to extremely low SBR scenes. Decreasing the number of interframes (i.e., laser repetition rate) at relatively higher SBR scenes can increase the number of pixels that can be processed within an illumination period while keeping the same CR. The 368.64-MHz clock frequency is not the limiting factor of the proposed method since the 3-D-stacked SPAD-based LiDAR can address this challenge easily, with the CMOS-SPAD using the 90–180-nm technology nodes, and the compression engine using the 28-nm technology node.

### B. Quantized Realization in FPGA

The 4-bit fixed-point compression engine with $N=1024$ and $K=16$ is then coded in Verilog HDL and implemented in Xilinx KC705 FPGA. Similar to the imaging experiment in Section III-B, we focus on comparing the proposed compression engine with the circuits for TCSPC-based full histogramming and sweep-based partial histogramming methods. For a fair comparison, a 4-bit bin value is used in the two circuits. Practically, the bin value of the full histogram and sweep-based method should be much larger ($8\sim 10$ bits) to record enough photon count in challenging scenarios.

The resource utilization of the three circuits is shown in Table III. In terms of logic resource, the three circuits consume a similar amount of lookup tables (LUTs) and flip-flops (FFs) for processing the maximum 64 photons from the 64 pixels per illumination. The proposed compression engine requires the same amount of LUTs and FFs when increasing the number of pixels (photons) processed within one cycle. However, the logic resource consumed by full histogram and sweep-based circuits linearly increases with the increase of processed pixels (photons) within one illumination cycle. As shown in Table III, by decreasing the interframe rate from 6 kHz to 3 kHz, 1.5 kHz, and 750 Hz, respectively, 128, 256, and 512 pixels (photons) can be processed within one illumination cycle with the only increase of memory requirement. In terms of memory size, 4, 8, 16, and 32 kb are required by 64, 128, 256, and 512 photons per illumination cycle, respectively, for storing the encoded histogram in the proposed engine. On the other hand, the full histogram circuit requires 256 kb, 512 kb, 1 Mb, and 2 Mb. Even if considering the shared weight memory with the constant size of 64 kb, the memory size of the compression engine is compressed by $3.76\times$ compared to the full histogram (68 kb = (64 k $+ 64\times 16 \times 4$) bits for the autoencoder versus 256 kb $= 64\times 1024 \times 4$ bits for full histogram when processing 64 pixels (photons) within one illumination cycle). With the 512 pixels (photons) processed in one illumination cycle, the memory size is compressed by $21.33\times$ (96 kb versus 2 Mb). The sweep-based circuit requires the least memory size when processing $\le 341$ photons per cycle. (For $N_{\text {px}}' \times 64\times 4$bits &lt;64 k$+N_{\text {px}}' \times 16\times 4$bits such that $N_{\text {px}}'~\lt 341.3$.) However, when the number of photons is larger than 341, the proposed engine requires a smaller memory size since there is a $4\times$ reduction of the memory increasing slope versus the number of photons [64 b/pixel (photon) versus 256 b/pixel (photon)] compared to the sweep-based circuit. On the other hand, the proposed engine is much more accurate and robust than the sweep-based circuit as evaluated in Section III. In one word, the proposed compression engine is more logic- and memory-efficient. The power consumption of the proposed histogram compressive engine is 73 mW for processing 64 pixels per illumination, which consists of the 4 mW consumed by the logic circuit and the 69 mW by the 68-kb memory. With the increase in the number of processed pixels per illumination, only the required memory size increases. Therefore, the logic circuit consumes the constant 4 mW, while the power consumption of memory linearly increases with the number of processed pixels.

![Figure 12](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng.t3-3575784-large.gif)

*TABLE III*

### C. Real-Time Depth Imaging and Depth Accuracy Evaluation of the Compression Engine

After connecting the proposed compression engine implemented in FPGA with our previous $32\times 32$ SPAD sensor chip [^26], the line scanning LiDAR system is constructed, as shown in Fig. 10. This sensor chip has a noise filtering circuit per pixel that can suppress pile-up induced by strong background light. An RGB photograph of a typical scene example and its $32\times 32$ reconstructed depth image from the coded histogram is included in Fig. 10 to show the good imaging quality of the system using the proposed compression engine.

![Figure 13](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng10-3575784-large.gif)

*Fig. 10. LiDAR system construction and real-time imaging.*

To further evaluate the quantitative performance of the 4-bit fixed-point engine in FPGA, a movable whiteboard (80% reflectivity) is placed at equal intervals from 10 to 14 m (5-m distances in total) within the field of view of the LiDAR receiver, as shown in Fig. 11. The actual distance is measured using a perambulator as the ground truth. At each distance, 30 sets of histograms are collected under two distinct scenes: one scenario represents relatively ideal imaging conditions, characterized by ${N} _{p}=1000$, WLE (the width of the emitted laser pulse) =5 ns, and SBR =0.1; the other represents challenging imaging condition, with $N_{p}=1000$, ${W}_{\mathrm {LE}}=10$ ns, and SBR =0.01. The experimental results show that the proposed compression engine achieves a comparable level of depth accuracy at a high CR of 64 compared to the full histogram method based on TCSPC. Compared to the sweep-based partial histogramming method, the proposed approach achieves enhanced depth accuracy and a $4\times$ improvement in CR, which is in accordance with simulation results.

![Figure 14](https://ieeexplore.ieee.org/mediastore/IEEE/content/media/7361/11080690/11028946/feng11-3575784-large.gif)

*Fig. 11. Depth accuracy evaluation setup. A reconstructed depth image example. The depth accuracy summary and comparison table.*

## SECTION V. Conclusion

This article proposes a robust, data-driven compressive histogramming method based on the autoencoder. The relation between depth estimation performance and encoder size is comprehensively evaluated. Achieving similar quantitative performance to the full histogram method with the CR of 64 across various imaging conditions, the proposed method outperforms handcrafted analytical CBs and is robust across scene changes, especially in challenging scenes with low signal-to-background ratios. A compact compression engine design is also proposed, quantized to 4-bit, and implemented in FPGA. The proposed engine achieves $3.76\sim 21.33\times$ memory size reduction as compared to the full histogram circuit. Compared to the sweep-based partial histogramming circuit, a $4\times$ improvement in CR and a 65% reduction in RMSE are achieved.

## References

[^1]: M. Ghioni, A. Gulinatti, I. Rech, F. Zappa, and S. Cova, “Progress in silicon single-photon avalanche diodes,” IEEE J. Sel. Topics Quantum Electron., vol. 13, no. 4, pp. 852–862, Jul./Aug. 2007. [IEEE](https://ieeexplore.ieee.org/document/4305219) [Google Scholar](https://scholar.google.com/scholar?as_q=Progress+in+silicon+single-photon+avalanche+diodes&as_occt=title&hl=en&as_sdt=0%2C31)

[^2]: G. Chen, C. Wiede, and R. Kokozinski, “Data processing approaches on SPAD-based d-TOF LiDAR systems: A review,” IEEE Sensors J., vol. 21, no. 5, pp. 5656–5667, Mar. 2021. [IEEE](https://ieeexplore.ieee.org/document/9261382) [Google Scholar](https://scholar.google.com/scholar?as_q=Data+processing+approaches+on+SPAD-based+d-TOF+LiDAR+systems%3A+A+review&as_occt=title&hl=en&as_sdt=0%2C31)

[^3]: S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa, “Avalanche photodiodes and quenching circuits for single-photon detection,” Appl. Opt., vol. 35, no. 12, pp. 1956–1976, Apr. 1996. [DOI](https://doi.org/10.1364/AO.35.001956) [Google Scholar](https://scholar.google.com/scholar?as_q=Avalanche+photodiodes+and+quenching+circuits+for+single-photon+detection&as_occt=title&hl=en&as_sdt=0%2C31)

[^4]: S. Pellegrini, G. S. Buller, J. M. Smith, A. M. Wallace, and S. Cova, “Laser-based distance measurement using picosecond resolution time-correlated single-photon counting,” Meas. Sci. Technol., vol. 11, no. 6, pp. 712–716, Jun. 2000. [DOI](https://doi.org/10.1088/0957-0233/11/6/314) [Google Scholar](https://scholar.google.com/scholar?as_q=Laser-based+distance+measurement+using+picosecond+resolution+time-correlated+single-photon+counting&as_occt=title&hl=en&as_sdt=0%2C31)

[^5]: G. Boso, A. Dalla Mora, A. Della Frera, and A. Tosi, “Fast-gating of single-photon avalanche diodes with 200ps transitions and 30ps timing jitter,” Sens. Actuators A, Phys., vol. 191, pp. 61–67, Mar. 2013. [DOI](https://doi.org/10.1016/j.sna.2012.11.042) [Google Scholar](https://scholar.google.com/scholar?as_q=Fast-gating+of+single-photon+avalanche+diodes+with+200ps+transitions+and+30ps+timing+jitter&as_occt=title&hl=en&as_sdt=0%2C31)

[^6]: E. Charbon, “Single-photon imaging in complementary metal oxide semiconductor processes,” Phil. Trans. Roy. Soc. A, Math., Phys. Eng. Sci., vol. 372, no. 2012, Mar. 2014, Art. no. 20130100. [DOI](https://doi.org/10.1098/rsta.2013.0100) [Google Scholar](https://scholar.google.com/scholar?as_q=Single-photon+imaging+in+complementary+metal+oxide+semiconductor+processes&as_occt=title&hl=en&as_sdt=0%2C31)

[^7]: M. Sicre, “Dark count rate in single-photon avalanche diodes: Characterization and modeling study,” in Proc. IEEE 51st Eur. Solid-State Device Res. Conf. (ESSDERC), Sep. 2021, pp. 143–146. [IEEE](https://ieeexplore.ieee.org/document/9631797) [Google Scholar](https://scholar.google.com/scholar?as_q=Dark+count+rate+in+single-photon+avalanche+diodes%3A+Characterization+and+modeling+study&as_occt=title&hl=en&as_sdt=0%2C31)

[^8]: W. Becker, Advanced Time-Correlated Single Photon Counting Techniques. Berlin, Germany : Springer, 2005. [DOI](https://doi.org/10.1007/3-540-28882-1) [Google Scholar](https://scholar.google.com/scholar?as_q=Advanced+Time-Correlated+Single+Photon+Counting+Techniques&as_occt=title&hl=en&as_sdt=0%2C31)

[^9]: O. Kumagai, “A 189 × 600 back-illuminated stacked SPAD direct time-of-flight depth sensor for automotive LiDAR systems,” in Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), Feb. 2021, pp. 110–112. [IEEE](https://ieeexplore.ieee.org/document/9365961) [Google Scholar](https://scholar.google.com/scholar?as_q=A+189+%C3%97+600+back-illuminated+stacked+SPAD+direct+time-of-flight+depth+sensor+for+automotive+LiDAR+systems&as_occt=title&hl=en&as_sdt=0%2C31)

[^10]: S. W. Hutchings, “A reconfigurable 3-D-stacked SPAD imager with in-pixel histogramming for flash LiDAR or high-speed time-of-flight imaging,” IEEE J. Solid-State Circuits, vol. 54, no. 11, pp. 2947–2956, Nov. 2019. [IEEE](https://ieeexplore.ieee.org/document/8848491) [Google Scholar](https://scholar.google.com/scholar?as_q=A+reconfigurable+3-D-stacked+SPAD+imager+with+in-pixel+histogramming+for+flash+LiDAR+or+high-speed+time-of-flight+imaging&as_occt=title&hl=en&as_sdt=0%2C31)

[^11]: C. Zhang, S. Lindner, I. M. Antolovic, J. M. Pavia, M. Wolf, and E. Charbon, “A 30-frames/s, 252 × 144 SPAD flash LiDAR with 1728 dual-clock 48.8-ps TDCs, and pixel-wise integrated histogramming,” IEEE J. Solid-State Circuits, vol. 54, no. 4, pp. 1137–1151, Apr. 2019. [IEEE](https://ieeexplore.ieee.org/document/8595429) [Google Scholar](https://scholar.google.com/scholar?as_q=A+30-frames%2Fs%2C+252+%C3%97+144+SPAD+flash+LiDAR+with+1728+dual-clock+48.8-ps+TDCs%2C+and+pixel-wise+integrated+histogramming&as_occt=title&hl=en&as_sdt=0%2C31)

[^12]: S. Park, “An 80 × 60 flash LiDAR sensor with in-pixel delta-intensity quaternary search histogramming TDC,” IEEE J. Solid-State Circuits, vol. 57, no. 11, pp. 3200–3211, Nov. 2022. [IEEE](https://ieeexplore.ieee.org/document/9882175) [Google Scholar](https://scholar.google.com/scholar?as_q=An+80+%C3%97+60+flash+LiDAR+sensor+with+in-pixel+delta-intensity+quaternary+search+histogramming+TDC&as_occt=title&hl=en&as_sdt=0%2C31)

[^13]: I. Gyongy, A. Erdogan, N. A. W. Dutton, H. Mai, F. M. D. Rocca, and R. K. Henderson, “A 200 kFPS, 256× 128 SPAD dToF sensor with peak tracking and smart readout,” in Proc. Int. Image Sensor Workshop, 2021, pp. 1–5. [Google Scholar](https://scholar.google.com/scholar?as_q=A+200+kFPS%2C+256%C3%97+128+SPAD+dToF+sensor+with+peak+tracking+and+smart+readout&as_occt=title&hl=en&as_sdt=0%2C31)

[^14]: D. Stoppa, “A reconfigurable QVGA/Q3VGA direct time-of-flight 3D imaging system with on-chip depth-map computation in 45/40 nm 3D-stacked BSI SPAD CMOS,” in Proc. Int. Image Sensor Workshop, 2021, pp. 1–4. [Google Scholar](https://scholar.google.com/scholar?as_q=A+reconfigurable+QVGA%2FQ3VGA+direct+time-of-flight+3D+imaging+system+with+on-chip+depth-map+computation+in+45%2F40+nm+3D-stacked+BSI+SPAD+CMOS&as_occt=title&hl=en&as_sdt=0%2C31)

[^15]: I. Gyongy, N. A. W. Dutton, and R. K. Henderson, “Direct time-of-flight single-photon imaging,” IEEE Trans. Electron Devices, vol. 69, no. 6, pp. 2794–2805, Jun. 2022. [IEEE](https://ieeexplore.ieee.org/document/9650745) [Google Scholar](https://scholar.google.com/scholar?as_q=Direct+time-of-flight+single-photon+imaging&as_occt=title&hl=en&as_sdt=0%2C31)

[^16]: A. Ingle and D. Maier, “Count-free single-photon 3D imaging with race logic,” IEEE Trans. Pattern Anal. Mach. Intell., early access, Oct. 7, 2024, doi: 10.1109/TPAMI.2023.3302822. [IEEE](https://ieeexplore.ieee.org/document/10210115) [Google Scholar](https://scholar.google.com/scholar?as_q=Count-free+single-photon+3D+imaging+with+race+logic&as_occt=title&hl=en&as_sdt=0%2C31)

[^17]: A. Tontini, S. Mazzucchi, R. Passerone, N. Broseghini, and L. Gasparini, “Histogram-less LiDAR through SPAD response linearization,” IEEE Sensors J., vol. 24, no. 4, pp. 4656–4669, Feb. 2024. [IEEE](https://ieeexplore.ieee.org/document/10375298) [Google Scholar](https://scholar.google.com/scholar?as_q=Histogram-less+LiDAR+through+SPAD+response+linearization&as_occt=title&hl=en&as_sdt=0%2C31)

[^18]: T. Milanese, J. Zhao, B. Hearn, and E. Charbon, “Histogram-less direct time-of-flight imaging based on a machine learning processor on FPGA,” in Proc. Int. Image Sensor Workshop, 2023, pp. 1–4. [Google Scholar](https://scholar.google.com/scholar?as_q=Histogram-less+direct+time-of-flight+imaging+based+on+a+machine+learning+processor+on+FPGA&as_occt=title&hl=en&as_sdt=0%2C31)

[^19]: J. MacLean, B. Stewart, and I. Gyongy, “TDC-less direct time-of-flight imaging using spiking neural networks,” 2024, arXiv:2401.10793. [IEEE](https://ieeexplore.ieee.org/document/10678856) [Google Scholar](https://scholar.google.com/scholar?as_q=TDC-less+direct+time-of-flight+imaging+using+spiking+neural+networks&as_occt=title&hl=en&as_sdt=0%2C31)

[^20]: M. P. Sheehan, J. Tachella, and M. E. Davies, “A sketching framework for reduced data transfer in photon counting LiDAR,” IEEE Trans. Comput. Imag., vol. 7, pp. 989–1004, 2021. [IEEE](https://ieeexplore.ieee.org/document/9541047) [Google Scholar](https://scholar.google.com/scholar?as_q=A+sketching+framework+for+reduced+data+transfer+in+photon+counting+LiDAR&as_occt=title&hl=en&as_sdt=0%2C31)

[^21]: J. Tachella, M. P. Sheehan, and M. E. Davies, “Sketched RT3D: How to reconstruct billions of photons per second,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2022, pp. 1566–1570. [IEEE](https://ieeexplore.ieee.org/document/9746304) [Google Scholar](https://scholar.google.com/scholar?as_q=Sketched+RT3D%3A+How+to+reconstruct+billions+of+photons+per+second&as_occt=title&hl=en&as_sdt=0%2C31)

[^22]: F. Gutierrez-Barragan, A. Ingle, T. Seets, M. Gupta, and A. Velten, “Compressive single-photon 3D cameras,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), New Orleans, LA, USA, Jun. 2022, pp. 17833–17843. [IEEE](https://ieeexplore.ieee.org/document/9878523) [Google Scholar](https://scholar.google.com/scholar?as_q=Compressive+single-photon+3D+cameras&as_occt=title&hl=en&as_sdt=0%2C31)

[^23]: V. Poisson, V. T. Nguyen, W. Guicquero, and G. Sicard, “Luminance-depth reconstruction from compressed time-of-flight histograms,” IEEE Trans. Comput. Imag., vol. 8, pp. 148–161, 2022. [IEEE](https://ieeexplore.ieee.org/document/9706248) [Google Scholar](https://scholar.google.com/scholar?as_q=Luminance-depth+reconstruction+from+compressed+time-of-flight+histograms&as_occt=title&hl=en&as_sdt=0%2C31)

[^24]: K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778. [IEEE](https://ieeexplore.ieee.org/document/7780459) [Google Scholar](https://scholar.google.com/scholar?as_q=Deep+residual+learning+for+image+recognition&as_occt=title&hl=en&as_sdt=0%2C31)

[^25]: G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006. [DOI](https://doi.org/10.1126/science.1127647) [Google Scholar](https://scholar.google.com/scholar?as_q=Reducing+the+dimensionality+of+data+with+neural+networks&as_occt=title&hl=en&as_sdt=0%2C31)

[^26]: J. Hu, B. Liu, R. Ma, M. Liu, and Z. Zhu, “A 32 × 32-pixel flash LiDAR sensor with noise filtering for high-background noise applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 2, pp. 645–656, Feb. 2022. [IEEE](https://ieeexplore.ieee.org/document/9321181) [Google Scholar](https://scholar.google.com/scholar?as_q=A+32+%C3%97+32-pixel+flash+LiDAR+sensor+with+noise+filtering+for+high-background+noise+applications&as_occt=title&hl=en&as_sdt=0%2C31)

[^27]: H. Seo, “Direct TOF scanning LiDAR sensor with two-step multievent histogramming TDC and embedded interference filter,” IEEE J. Solid-State Circuits, vol. 56, no. 4, pp. 1022–1035, Apr. 2021. [IEEE](https://ieeexplore.ieee.org/document/9324814) [Google Scholar](https://scholar.google.com/scholar?as_q=Direct+TOF+scanning+LiDAR+sensor+with+two-step+multievent+histogramming+TDC+and+embedded+interference+filter&as_occt=title&hl=en&as_sdt=0%2C31)

[^28]: Y. Altmann and S. McLaughlin, “Range estimation from single-photon LiDAR data using a stochastic em approach,” in Proc. 26th Eur. Signal Process. Conf. (EUSIPCO), Sep. 2018, pp. 1112–1116. [IEEE](https://ieeexplore.ieee.org/document/8553536) [Google Scholar](https://scholar.google.com/scholar?as_q=Range+estimation+from+single-photon+LiDAR+data+using+a+stochastic+em+approach&as_occt=title&hl=en&as_sdt=0%2C31)

[^29]: P. Mirdehghan, W. Chen, and K. N. Kutulakos, “Optimal structured light a la carte,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6248–6257. [IEEE](https://ieeexplore.ieee.org/document/8578752) [Google Scholar](https://scholar.google.com/scholar?as_q=Optimal+structured+light+a+la+carte&as_occt=title&hl=en&as_sdt=0%2C31)

[^30]: Y. Li, “BRECQ: Pushing the limit of post-training quantization by block reconstruction,” in Proc. Int. Conf. Learn. Represent., 2021, pp.1–16. [Google Scholar](https://scholar.google.com/scholar?as_q=BRECQ%3A+Pushing+the+limit+of+post-training+quantization+by+block+reconstruction&as_occt=title&hl=en&as_sdt=0%2C31)

### Additional References