• 3D点云/图像加速器芯片研究现状与发展动态调研报告(2022-2023)

    1. 文献时序排序与逐篇介绍

    2022年研究成果

    1. A Low-Power Graph Convolutional Network Processor With Sparse Grouping for 3D Point Cloud Semantic Segmentation in Mobile Devices (2022年1月)

    2. A Block PatchMatch-Based Energy-Resource Efficient Stereo Matching Processor on FPGA (2022年3月)

    3. Processing-in-SRAM Acceleration for Ultra-Low Power Visual 3D Perception (2022年7月)

    4. Eventor: An Efficient Event-Based Monocular Multi-View Stereo Accelerator on FPGA Platform (2022年8月)

    5. Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles (2022年10月)

    6. Pushing Point Cloud Compression to the Edge (2022年10月)

    7. ViA: A Novel Vision-Transformer Accelerator Based on FPGA (2022年11月)

    8. SENTunnel: Fast Path for Sensor Data Access on Automotive Embedded Systems (2022年11月)

    9. Chaos LiDAR Based RGB-D Face Classification System With Embedded CNN Accelerator on FPGAs (2022年12月)

    10. MobileSP: An FPGA-Based Real-Time Keypoint Extraction Hardware Accelerator for Mobile VSLAM (2022年12月)

    11. An Energy-Efficient SIFT Based Feature Extraction Accelerator for High Frame-Rate Video Applications (2022年12月)

    2023年研究成果

    12. EmPointMovSeg: Sparse Tensor-Based Moving-Object Segmentation in 3-D LiDAR Point Clouds for Autonomous Driving-Embedded System (2023年1月)

    13. A Reconfigurable Coprocessor for Simultaneous Localization and Mapping Algorithms in FPGA (2023年1月)

    14. ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds (2023年2月)

     

    15. CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks (2023年2月)(续)

    16. A 40-nm 91-mW, 90-fps Learning-Based Full HD Super-Resolution Accelerator (2023年2月)

    17. A 40nm 2TOPS/W Depth-Completion Neural Network Accelerator SoC With Efficient Depth Engine for Realtime LiDAR Systems (2023年5月)

    18. HDSuper: Algorithm-Hardware Co-design for Light-weight High-quality Super-Resolution Accelerator (2023年7月)

    19. An Energy Efficient and Runtime Reconfigurable Accelerator for Robotic Localization (2023年7月)

    20. FLNA: An Energy-Efficient Point Cloud Feature Learning Accelerator with Dataflow Decoupling (2023年7月)

    21. QuickFPS: Architecture and Algorithm Co-Design for Farthest Point Sampling in Large-Scale Point Clouds (2023年11月)

    22. A Low-Latency Framework With Algorithm-Hardware Co-Optimization for 3-D Point Cloud (2023年11月)

    23. Point Cloud Acceleration by Exploiting Geometric Similarity (2023年12月)

    2. 国内外研究现状及发展动态分析

    从2022年至2023年的研究文献可以看出,3D点云/图像处理加速器芯片研究呈现出显著的发展趋势和科学意义。主要现状与动态可总结如下:

    1. 国际研究前沿:韩国KAIST、美国哈佛大学、宾夕法尼亚州立大学和匹兹堡大学等机构引领了点云处理和加速器设计的理论研究,特别是在低功耗处理器和定制SoC设计方面。

    2. 国内研究发力:中国科学院、上海交通大学、电子科技大学、北京航空航天大学和华南理工大学等机构在算法和硬件协同设计以及特定领域加速器方面快速追赶,贡献了许多创新性研究。

    3. 技术演进趋势

      • 从单一功能加速器向支持多模态和多功能的系统演进

      • 算法-硬件协同设计成为主流,实现更高效能和能效优化

      • 从高算力低能效逐步向轻量化、高能效转变

      • 处理器内存(PIM)等前沿技术开始应用于点云处理

    4. 应用驱动创新:自动驾驶、机器人、AR/VR等应用对实时低功耗3D感知的需求推动了技术创新,催生了众多专用加速器设计。

    5. 关键技术突破:在数据结构优化、稀疏计算、内存访问优化、并行处理等方面取得重要突破,使点云处理速度提升数十倍同时能效提高数倍。

    从科学意义角度看,这一领域研究在理论和应用两方面都显示出重要价值:建立了点云处理从数据结构到电路实现的全栈优化方法论;为智能移动设备、自动驾驶和机器人等下一代信息技术提供了关键支撑。

    3. 文献调研综合结果

    3.1 领域重要里程碑

    通过对2022-2023年发表的23篇文献的分析,可以识别出以下3D点云/图像加速器芯片领域的重要里程碑:

    1. 点云特征提取加速

      • FLNA (2023)提出的数据流解耦技术显著减少了计算量(>86%)并提高能效

      • 轻量级网络设计(LPN)实现了比PointNet小30倍的参数规模,同时保持准确率

      • SIFT特征提取加速器(Liu等,2022)实现每秒162帧处理速度

    2. 最近点搜索优化

      • ParallelNN (2023)通过HBM高带宽内存和并行八叉树构建打破带宽瓶颈,比CPU快107.7倍

      • QuickFPS (2023)创新的基于桶的采样算法将点云处理加速43.4倍

      • GDPCA (2023)首次将几何相似性概念应用于点云加速,能效提高2.7倍

    3. 配准算法加速器

      • 基于PatchMatch的立体匹配处理器(Wang等,2022)达到165.7FPS高帧率

      • 基于块PIM的3D感知加速(He等,2022)解决了内存墙瓶颈,速度提高11倍

      • SENTunnel (2022)通过硬件卸载协议栈减少了传感器数据处理延迟高达93.8%

    4. 专用芯片领域突破

      • HDSuper (2023)通过算法和硬件协同设计在FPGA上实现37.44dB PSNR,功耗仅2.08W

      • 低功耗GCN处理器(Kim等,2022)将3D点云语义分割能耗降低76.9%

      • Chaos LiDAR系统(Chiu等,2022)首次实现室内外高精度深度图像采集

    3.2 性能、面积和功耗提升趋势

    从文献分析中可以观察到以下明显趋势:

    1. 性能提升

      • 计算性能:从早期设计的单位数GOPS提升到如今的百位GOPS甚至TOPS级别

      • 帧率:从低帧率(<30fps)迅速提升到高帧率(>90fps)

      • 相对加速比:与GPU相比,最新设计可实现10-100倍加速

    2. 面积优化

      • 通过算法轻量化减少参数量和计算量,使片上资源需求降低

      • 专用架构设计和数据重用策略使芯片面积得到显著优化

      • 量化技术广泛应用,降低内存需求(如HDSuper减少72%内存)

    3. 功耗进步

      • 能效比从早期的GOPS/W提升到现在的TOPS/W级别(如深度完成加速器达2TOPS/W)

      • 通过降低内存访问和计算量,实现了能耗降低2-20倍

      • 运行时可重构技术(Liu等,2023)在不同工作负载下动态调整功耗

    3.3 现有方案的不足

    尽管取得了显著进展,现有方案在"数据结构-算法-架构深度耦合优化"方面仍存在多项不足:

    1. 数据结构层面

      • 大多数设计仍针对特定算法优化,缺乏通用数据表示和处理范式

      • 点云数据的稀疏性和不规则性处理仍然依赖于算法特异性解决方案

      • 数据模态转换和统一表示缺乏系统性方法

    2. 算法层面

      • 硬件感知算法设计仍处于初步阶段,多数仍采用"先设计算法,再考虑硬件"的思路

      • 算法轻量化往往以牺牲精度为代价,难以在精度和性能间取得最佳平衡

      • 多模态融合算法很少考虑硬件实现效率

    3. 架构层面

      • 大多数加速器为单一任务专用设计,缺乏可扩展性

      • 内存层次结构设计仍以通用架构为基础,未充分考虑点云数据特性

      • 缺乏真正自适应的动态架构重配置机制

    4. 深度耦合优化不足

      • 少数研究(如HDSuper,FLNA,GDPCA)尝试跨层优化,但多数仍分层独立优化

      • 缺乏系统性的跨层协同设计方法论

      • 针对点云处理全流程的端到端优化方案罕见

    这些不足表明,建立一套针对3D点云/图像处理的系统化跨层次协同设计方法论,实现数据结构-算法-架构的深度耦合优化,具有重要科学价值和应用意义。这将为未来高效、低功耗、实时的3D感知系统奠定理论和技术基础。

    4. 关键参数对比表格

    文献任务类型采用的算法主要优化策略实现平台
    Kim et al. (2022)3D点云语义分割稀疏分组的膨胀图卷积(SG-DGC)两级流水线(TLP)、点级模块级融合(PMF)、中心点特征重用(CPFR)65nm CMOS
    Wang et al. (2022)立体匹配块级PatchMatch和多尺度传播稀疏Census特征表示、随机搜索策略避免所有视差级别估计FPGA (350MHz)
    He et al. (2022)视觉里程计边缘基于视觉里程计(EBVO)PIM友好数据布局、位并行和可重构SRAM-PIM架构90nm CMOS模拟
    Li et al. (2022)基于事件相机的多视图立体视觉事件单目多视图立体视觉(EMVS)算法硬件协同设计、高度并行化和全流水线处理元素Zynq FPGA
    Krishnan et al. (2022)自主UAV SoC设计贝叶斯优化、强化学习多目标算法-硬件协同设计、F-1模型模拟评估
    Ying et al. (2022)点云压缩帧内+帧间压缩莫顿码辅助的并行八叉树构建、时间相似性利用NVIDIA Jetson AGX Xavier
    Wang et al. (2022)视觉Transformer加速半层映射和吞吐量分析分区策略减少数据局部性影响、重用处理引擎Xilinx Alveo U50 FPGA
    Zheng et al. (2022)传感器数据访问硬件协议解析和预处理统一访问模块、预处理器模块、轻量级驱动和零拷贝机制FPGA
    Chiu et al. (2022)人脸分类基于RGB-D的嵌入式CNN高速TOF处理架构、混合RGB-D特征融合40nm CMOS, Xilinx ZCU 102
    Liu et al. (2022)关键点提取SuperPoint改进部分共享检测和描述编码、基于预排序的NMSZynq ZCU104 FPGA
    Liu et al. (2022)SIFT特征提取优化SIFT算法快慢双时钟域设计、部分和重用设计、动态填充180nm CMOS
    He et al. (2023)移动物体分割基于AR-SI的特征提取稀疏张量和稀疏卷积、时间和几何特征融合嵌入式系统(未详细说明)
    Tan et al. (2023)SLAM姿态估计四元数、李代数优化可重构架构、内存重用策略、两个并行计算核心Zynq 7020 FPGA
    Chen et al. (2023)最近邻搜索并行八叉树构建、基于关键帧的近似kNN轨迹编码和深度信息优化、多通道HBM、迷你交叉开关Virtex HBM FPGA
    Dai et al. (2023)图匹配弹性匹配过滤弹性匹配元数据结构、跨图协调器、SRAM缓冲区未详细说明
    Shen et al. (2023)超分辨率RAISR算法补丁数据重用、哈希基于滤波、内核压缩40nm CMOS
    Sun et al. (2023)深度完成二步插值方案、多特征神经网络全填充数据流管理引擎、硬件平铺协处理器40nm CMOS
    Chang et al. (2023)超分辨率轻量级深度可分离卷积统一计算核心(UCC)、高效F-A映射策略、补丁训练FPGA
    Liu et al. (2023)机器人定位SLAM优化硬件感知算法、数据稀疏性/局部性利用、可配置硬件架构FPGA
    Lyu et al. (2023)点云特征学习数据流解耦并行体系结构、分块处理、转置SRAM策略40nm CMOS
    Han et al. (2023)最远点采样基于桶的FPS两级树数据结构、合并计算和隐式计算机制、4阶段流水线28nm CMOS
    Yu et al. (2023)点云分类轻量级点云网络(LPN)可重构计算核心(RCC)、自适应数据流、部分并行计算Xilinx Kintex UltraScale KCU150 FPGA
    Chen et al. (2023)点云神经网络几何感知差分算法体素化数据结构、几何感知引擎、差分更新和聚合引擎未详细说明(算法-架构协同设计)

    5. 技术路线图与发展趋势

    基于对上述文献的分析,可以预测3D点云/图像加速器芯片领域未来的发展趋势和研究热点:

    5.1 数据结构优化方向

    5.2 算法轻量化趋势

    5.3 架构设计创新

    5.4 应用拓展方向

    5.5 关键使能技术

    6. 参考文献列表

    [1] S. Kim, S. Kim, J. Lee, and H.-J. Yoo, "A Low-Power Graph Convolutional Network Processor With Sparse Grouping for 3D Point Cloud Semantic Segmentation in Mobile Devices," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 4, pp. 1507-1518, Apr. 2022.

    [2] H. Wang, W. Zhou, X. Zhang, and X. Lou, "A Block PatchMatch-Based Energy-Resource Efficient Stereo Matching Processor on FPGA," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 7, pp. 2893-2905, Jul. 2022.

    [3] Y. He, S. Qu, G. Lin, C. Liu, L. Zhang, and Y. Wang, "Processing-in-SRAM acceleration for ultra-low power visual 3D perception," in Proceedings of the 59th ACM/IEEE Design Automation Conference, Jul. 2022, pp. 295-300.

    [4] M. Li et al., "Eventor: an efficient event-based monocular multi-view stereo accelerator on FPGA platform," in Proceedings of the 59th ACM/IEEE Design Automation Conference, Aug. 2022, pp. 331-336.

    [5] S. Krishnan et al., "Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles," in 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2022, pp. 300-317.

    [6] Z. Ying et al., "Pushing Point Cloud Compression to the Edge," in 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2022, pp. 282-299.

    [7] T. Wang et al., "ViA: A Novel Vision-Transformer Accelerator Based on FPGA," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 4088-4099, Nov. 2022.

    [8] R. Zheng et al., "SENTunnel: Fast Path for Sensor Data Access on Automotive Embedded Systems," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 3697-3708, Nov. 2022.

    [9] C.-T. Chiu et al., "Chaos LiDAR Based RGB-D Face Classification System With Embedded CNN Accelerator on FPGAs," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 12, pp. 4847-4859, Dec. 2022.

    [10] Y. Liu et al., "MobileSP: An FPGA-Based Real-Time Keypoint Extraction Hardware Accelerator for Mobile VSLAM," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 12, pp. 4919-4929, Dec. 2022.

    [11] B. Liu et al., "An Energy-Efficient SIFT Based Feature Extraction Accelerator for High Frame-Rate Video Applications," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 12, pp. 4930-4943, Dec. 2022.

    [12] Z. He, X. Fan, Y. Peng, Z. Shen, J. Jiao, and M. Liu, "EmPointMovSeg: Sparse Tensor-Based Moving-Object Segmentation in 3-D LiDAR Point Clouds for Autonomous Driving-Embedded System," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 1, pp. 41-53, Jan. 2023.

    [13] Y. Tan et al., "A Reconfigurable Coprocessor for Simultaneous Localization and Mapping Algorithms in FPGA," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 1, pp. 286-290, Jan. 2023.

    [14] F. Chen, R. Ying, J. Xue, F. Wen, and P. Liu, "ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds," in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2023, pp. 403-414.

    [15] Y. Dai, Y. Zhang, and X. Tang, "CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks," in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2023, pp. 584-597.

    [16] H.-Y. Shen, Y.-C. Lee, T.-W. Tong, and C.-H. Yang, "A 40-nm 91-mW, 90-fps Learning-Based Full HD Super-Resolution Accelerator," IEEE Journal of Solid-State Circuits, vol. 58, no. 2, pp. 520-529, Feb. 2023.

    [17] M. Sun et al., "A 40nm 2TOPS/W Depth-Completion Neural Network Accelerator SoC With Efficient Depth Engine for Realtime LiDAR Systems," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 5, pp. 1704-1708, May 2023.

    [18] L. Chang, X. Zhao, D. Fan, Z. Hu, and J. Zhou, "HDSuper: Algorithm-Hardware Co-design for Light-weight High-quality Super-Resolution Accelerator," in 2023 60th ACM/IEEE Design Automation Conference (DAC), Jul. 2023, pp. 1-6.

    [19] Q. Liu et al., "An Energy Efficient and Runtime Reconfigurable Accelerator for Robotic Localization," IEEE Transactions on Computers, vol. 72, no. 7, pp. 1943-1957, Jul. 2023.

    [20] D. Lyu, Z. Li, Y. Chen, N. Xu, and G. He, "FLNA: An Energy-Efficient Point Cloud Feature Learning Accelerator with Dataflow Decoupling," in 2023 60th ACM/IEEE Design Automation Conference (DAC), Jul. 2023, pp. 1-6.

    [21] M. Han et al., "QuickFPS: Architecture and Algorithm Co-Design for Farthest Point Sampling in Large-Scale Point Clouds," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 11, pp. 4011-4024, Nov. 2023.

    [22] Y. Yu, W. Mao, J. Luo, and Z. Wang, "A Low-Latency Framework With Algorithm-Hardware Co-Optimization for 3-D Point Cloud," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 11, pp. 4221-4225, Nov. 2023.

    [23] C. Chen, X. Zou, H. Shao, Y. Li, and K. Li, "Point Cloud Acceleration by Exploiting Geometric Similarity," in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2023, pp. 1135-1147.