Single photon 3D Imaging with Deep Sensor Fusion

Download as PPTX, PDF

•2 likes•6,742 views

This document describes a method for single-photon 3D imaging using deep sensor fusion. Single-photon avalanche diodes (SPADs) are used to capture sparse photon detections along with a conventional intensity image. A convolutional neural network fuses the SPAD measurements and intensity image to estimate depth maps in a photon-efficient manner. The method achieves improved depth estimation compared to prior single-photon techniques by leveraging the additional intensity image context. It offers a tradeoff of increased acquisition speed and resolution compared to pulsed time-of-flight systems at the cost of reduced maximum range. The technique is demonstrated through simulations and a proof-of-concept prototype using a single vertical line of SPAD pixels.

Technology

Single photon 3D Imaging with Deep Sensor Fusion

1. Single-Photon 3D Imaging with Deep Sensor Fusion David B. Lindell, Matthew O’Toole, Gordon Wetzstein Stanford University

2. d = 𝑐∙𝑡 2

3. Tradeoffs in Active 3D Imaging • maximum range • acquisition speed • resolution

4. Tradeoffs in Active 3D Imaging • maximum range • acquisition speed • resolution Velodyne

5. * timestamp of photon event scene Single-photon Avalanche Diode (SPAD)Num.Detections Time of Flight Detection Histogram Ambient Light

6. Active 3D Imaging Pulsed Continuous Wave Algorithms Hardware Kinect-type systemsConventional LIDAR Single-Photon Avalanche Diodes (SPADs) [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] [Marco et al. 17, Su et al. 18] Deep Depth Deep Sensor Fusion [Li et al. 16, Hui et al. 16] Tradeoff + Maximum Range - Acquisition Speed - Resolution Single-Photon Depth Estimation Tradeoff - Maximum Range + Acquisition Speed + Resolution

7. Single-Photon Depth Estimation Active 3D Imaging Pulsed Continuous Wave Algorithms Hardware Kinect-type systemsConventional LIDAR Single-Photon Avalanche Diodes (SPADs) [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] [Marco et al. 17, Su et al. 18] Deep Depth Deep Sensor Fusion [Li et al. 16, Hui et al. 16] Tradeoff + Maximum Range - Acquisition Speed - Resolution Tradeoff - Maximum Range + Acquisition Speed + Resolution + single-photon depth estimation - make assumptions

8. Single-Photon Depth Estimation [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] Laser detections Noise/Ambient light

9. Single-Photon Depth Estimation [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] Censored Detections Laser detections Noise/Ambient light

10. Single-Photon Depth Estimation [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] Censored Detections Solve for Depth Laser detections Noise/Ambient light

11. Noisy Detections Censored Detections Solve for Depth Heuristics for censoring: • Median time of flight of surrounding pixels • Use superpixel clustering • Less effective with low signal/high ambient, complex scenes Single-Photon Depth Estimation [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17]

12. Single-Photon Depth Estimation Active 3D Imaging Pulsed Continuous Wave Algorithms Hardware Kinect-type systemsConventional LIDAR Single-Photon Avalanche Diodes (SPADs) [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] [Marco et al. 17, Su et al. 18] Deep Depth Deep Sensor Fusion [Li et al. 16, Hui et al. 16] Tradeoff + Maximum Range - Acquisition Speed - Resolution Tradeoff - Maximum Range + Acquisition Speed + Resolution + single-photon depth estimation - make assumptions - Applies to CW-TOF

13. Depth Estimation Active 3D Imaging Pulsed Continuous Wave Algorithms Hardware Kinect-type systemsConventional LIDAR Single-Photon Avalanche Diodes (SPADs) [Kirmani et al. 12; Shin et al. 15, 16] [Rapp and Goyal 17] [Marco et al. 17, Su et al. 18] Deep Depth Deep Sensor Fusion [Li et al. 16, Hui et al. 16] Tradeoff + Maximum Range - Acquisition Speed - Resolution Tradeoff - Maximum Range + Acquisition Speed + Resolution + single-photon depth estimation - make assumptions - don’t use raw photon counts + Maximum range + Acquisition speed + Resolution

14. Intensity Image Photon Detections (20 Hz) avg. < 1 laser photon per spatial position 3D Reconstruction x y t CNN Sensor Fusion

15. Single-Photon 3D Imaging Single-Photon Avalanche Diodes (SPADs) + intensity Image CNN Processing Photon-Efficient Prototype

16. Single-Photon Avalanche Diodes (SPADs) + intensity Image CNN Processing Photon-Efficient Prototype Single-Photon 3D Imaging

17. SPAD Image Formation Avg.Detections

18. Poisson Sampling Num. Laser Shots Detection Efficiency Radial Falloff/ Reflectivity Avg. Laser Photon Arrivals Avg. Ambient Photons + Sensor Noise Detections Measurement Histogram Avg.Detections SPAD Image Formation

19. Poisson Sampling Num. Laser Shots Detection Efficiency Radial Falloff/ Reflectivity Avg. Laser Photon Arrivals Avg. Ambient Photons + Sensor Noise Detections Measurement Histogram Parameter Value pulse duration 80 ps peak power 30 W range 200 m num. pulses (N) 10000 (~8 ms exposure) ~ 2 photons ~ 1 photon SPAD Image Formation

20. Single-Photon Avalanche Diodes (SPADs) + intensity Image CNN Processing Photon-Efficient Prototype

21. Dataset • ~16K Simulated SPAD measurements from NYU2 v2 • Time of flight from depth • Intrinsic decomposition [Chen and Koltun 13] • Train on various avg. signal and noise levels Intensity + Depth Images from NYU v2 [Silberman et al. 12]

22. Noisy Detections CNN Processing Censored Detections & Regressed Pulse Intensity Image Estimated Depth Estimated Depth + Upsampling Processing Pipeline

23. Training Simulated Noisy Detections (+ intensity image) CNN Processing Estimated Pulse 𝐿 ℎ, ℎ = 𝑘 𝐷 𝐾𝐿(ℎ 𝑘 , ℎ 𝑘 ) KL Divergence LossGround Truth Pulse Depth Map

24. CNNArchitecture(1of3) CNN Architecture for Depth Estimation SPAD measurements only (1 of 3) SPAD measurements Input Output(1of3) GeometryIntensity image

25. SPAD measurements Input Intensity image CNNArchitecture(2of3) Output(1of3) Geometry Output(2of3) Geometry (2 of 3)CNN Architecture for Depth Estimation Improved denoising by sensor fusion

26. SPAD measurements Input Intensity image CNN Architecture for Depth Estimation Guided upsampling by sensor fusion CNNArchitecture(3of3) Output(3of3) Geometry Output(1of3) Geometry Output(2of3) Geometry (3 of 3)

27. Depth Estimation (simulated) Ground Truth Depth Intensity

28. Depth Estimation (simulated) Ground Truth Depth Measurements (SBR: 2/50) Log-matched Filter (RMSE: 5.7336) Shin et al. 2016 (RMSE: 5.0787) Rapp and Goyal 2017 (RMSE: 0.0482) Ours with intensity (RMSE: 0.0343)

29. Intensity Log-matched Filter (RMSE: 5.7336)Ground Truth Depth Shin et al. 2016 (RMSE: 5.0787) Ours with intensity (RMSE: 0.0343) 4x magnification 4x magnification Depth Estimation (simulated) Rapp and Goyal 2017 (RMSE: 0.0482)

30. Ground Truth Depth Intensity Depth Estimation (simulated)

31. Ground Truth Depth Log-matched Filter (RMSE: 5.795) Rapp and Goyal 2017 (RMSE: 0.0242) Ours with intensity (RMSE: 0.0239)Ours w/o intensity (RMSE: 0.0765) Depth Estimation (simulated) Measurements (SBR: 2/50)

32. Depth Estimation+Upsampling (simulated) Intensity High-Res. Ground Truth Depth

33. Depth Estimation+Upsampling (simulated) Low-Res. Measurements (SBR 2/50) High-Res. Ground Truth Depth Depth CNN + Bicubic Upsampling RMSE: 0.0622 Rapp & Goyal + Upsample CNN RMSE: 0.1156 Depth CNN + Upsample CNN RMSE: 0.0663 Proposed End-to-End RMSE: 0.0593

34. Depth Estimation+Upsampling (simulated) Intensity High-Res. Ground Truth Depth Depth CNN+ Bicubic Upsampling RMSE: 0.0622 Rapp & Goyal + Upsample CNN RMSE: 0.1156 Depth CNN + Upsample CNN RMSE: 0.0663 Proposed End-to-End RMSE: 0.0593

35. Single-Photon Avalanche Diodes (SPADs) + intensity Image CNN Processing Photon-Efficient Prototype

36. verticalscanline illumination optics imaging optics Intensity Camera SPAD Line Array

37. vertical scanline (note: laser illumination is too weak to observe visually while scanning under ambient light) scan rate: 20 Hz lights on

38. scan rate: 20 Hz lights off

39. Intensity image SPAD measurements (20 Hz) Average per spatial position 0.64 Signal Detections 0.87 Background Detections y t x

40. Intensity image Log-matched filter [Rapp and Goyal 2017] Denoised (w/o intensity) Denoised (w/ intensity) Denoised + Guided upsampling

41. Intensity image SPAD measurements (5 Hz) Average per spatial position 0.85 Signal Detections 2.9 Background Detections y t x

42. Intensity image Log-matched filter [Rapp and Goyal 2017] Denoised (w/o intensity) Denoised (w/ intensity) Denoised + Guided upsampling

43. Intensity image SPAD measurements (5 Hz) (Outdoors under indirect sunlight) y t x

44. Intensity image Log-matched filter [Rapp and Goyal 2017] Denoised (w/o intensity) Denoised (w/ intensity) Denoised + Guided upsampling

45. Limitations Signal limitations ~800 m SPAD Ranging [Pawlikowska et al. 17]

46. • processing • intensity image • temporal consistency • 2D SPAD arrays Limitations

47. • processing • intensity image • temporal consistency • 2D SPAD arrays Limitations

48. • processing • intensity image • temporal consistency • 2D SPAD arrays Limitations

49. • processing • intensity image • temporal consistency • 2D SPAD arrays Limitations [Burri et al.17]

50. Summary Single-Photon Avalanche Diodes (SPADs) + intensity Image CNN Processing Photon-Efficient Prototype • Maximum range • Acquisition speed • Resolution

51. Single-Photon 3D Imaging with Deep Sensor Fusion David B. Lindell, Matthew O’Toole, Gordon Wetzstein Stanford University Contact: [email protected] Code and data available: computationalimaging.org y t x

Editor's Notes

#3: LIDAR stands for light detection and ranging, and is an active 3d imaging technology that is frequently used for example in autonomous navigation. LIDAR calculates distance by measuring the time it takes for a pulse of light to travel to an object and back. Here, the measurements show the time of the detected photons from the pulse
#4: If we increase the distance and we have ambient detections, the signal to noise is much lower and it becomes difficult to see the return pulse in the measurements. So we could solve this by increasing the exposure time, but then overall acquisition time goes down and so does the amount of points we could potentially scan. So we have this tradeoff between range, acquisition speed, and resolution.
#5: Commercial lidars that need to achieve long range return a sparsely scanned 3d pointcloud of the scene.
#6: So how does a lidar work? There are different types of are used in LIDAR system, but one emerging class of extremely sensitive sensor is called a single photon avalanche diode, or SPAD. Here’s how a SPAD works: We use a picosecond laser to send millions of short pulses of light into the scene every second. Each pulse interacts reflects off the scene scatters back to our detector. For each pulse emitted into the scene, the SPAD has a chance at detecting the arrival of a single photon. When a detection occurs, a piece of circuitry called a time to digital converter or TDC generates a timestamp to record the time of flight of the photon. Time zero corresponds to when the laser pulse was emitted and the largest timestamp value corresponds to the instant just before the next laser pulse is emitted. By binning detections by their time of flight, we start to when the laser pulse arrives This histogram comprises a single spatial scan of a 3D scene.
#7: Pulsed systems scan a concentrated pulse of light around the scene and can be paired with sensitive detectors called SPADs which detect the arrival time of down to single photons. These systems trade off long range for long acquisition time and low resolution. Kinect type systems diffuse their light over the whole scene, and so since they don’t require scanning, they can trade off long range for high acquisition speed and resolution.
#8: There are also different reconstruction algorithms for these sensors. SPADs can be used to recover depth with only a single detected photon, existing algorithms make limiting assumptions about the types of scenes that can be reconstructed Depth estimation with deep networks and sensor fusion approaches exist for Kinect-type sensors, but they use entirely different measurements than sensitive SPAD detectors.
#9: These methods take as input a noisy set of photon detections at each pixel. Again, some photons come from the laser and some come from sensor noise or ambient light. The ambient and noise detections make solving for the depth a non-convex problem, so these methods attempt to remove any photon detections from noise or ambient light and then solve an optimization problem to determine the depth from the remaining photon detections.
#10: These methods take as input a noisy set of photon detections at each pixel. Again, some photons come from the laser and some come from sensor noise or ambient light. The ambient and noise detections make solving for the depth a non-convex problem, so these methods attempt to remove any photon detections from noise or ambient light and then solve an optimization problem to determine the depth from the remaining photon detections.
#11: These methods take as input a noisy set of photon detections at each pixel. Again, some photons come from the laser and some come from sensor noise or ambient light. The ambient and noise detections make solving for the depth a non-convex problem, so these methods attempt to remove any photon detections from noise or ambient light and then solve an optimization problem to determine the depth from the remaining photon detections.
#12: There are different heuristic approaches for censoring the ambient and noise photons. Some methods filter based on comparison to the median time of flight values for groups of pixels, average measurements together from pixels with similar albedo values to try to increase the amount of signal from the laser pulse. Another method histograms the detections and tries to identify peaks which correspond to a sparse set of depth planes. These types of heuristic approaches might not work as well for scenes with complex geometry or extremely low signal and high ambient light.
#13: There are also different reconstruction algorithms for these sensors. SPADs can be used to recover depth with only a single detected photon, existing algorithms make limiting assumptions about the types of scenes that can be reconstructed Depth estimation with deep networks and sensor fusion approaches exist for Kinect-type sensors, but they use entirely different measurements than sensitive SPAD detectors.
#14: So we want to alleviate this tradeoff between range, acquisition speed and resolution.
#15: In this work, we use sensor fusion with a normal intensity image and the time of flight of detected photons from sensitive photodetectors to robustly recover 3D geometry with less than a single laser photon returning on average at each spatial position. We’d like to fuse information from the intensity image and the time of flight, because they contain complimentary information. The intensity image is 2D with low noise and high-resolution, and the depth map is 3D, but noisy and low resolution. But it’s not clear how we can actually jointly model and exploit these modalities, so we propose to use a learned approach with convolutional neural networks to robustly recover 3d geometry. Moreover, I’ll show how we can learn and end-to-end mapping for depth upsampling, and we also imagine using such an approach for higher-level goals such as object detection or way point estimation for driving from the raw measurements.
#16: So I’m going to talk about how we model these SPAD measurements, a CNN framework for depth estimation and sensor fusion, and how we combine this with a photon efficient prototype for depth estimation.
#17: I’ll first talk about how these spad sensors work and the types of measurements that we get
#18: If we look at this histogram of measurements, we can identify three sources of signal: noise from the sensor, ambient light, and our laser pulse. And we can model this histogram with the average rate of detections from each source. The forward image formation of the measurement histogram is then given by sampling a poisson process with these average detections as a time varying arrival rate. This is described by this equation where N is the number of laser pulses over our exposure period, tau is the arrival rate of the laser pulse, gamma accounts for radial distance falloff and object reflectivity, eta accounts for the efficiency of our photodetector, and a and d indicate the average ambient and sensor noise detections.
#19: If we look at this histogram of measurements, we can identify three sources of signal: noise from the sensor, ambient light, and our laser pulse. And we can model this histogram with the average rate of detections from each source. The forward image formation of the measurement histogram is then given by sampling a poisson process with these average detections as a time varying arrival rate. This is described by this equation where N is the number of laser pulses over our exposure period, tau is the arrival rate of the laser pulse, gamma accounts for radial distance falloff and object reflectivity, eta accounts for the efficiency of our photodetector, and a and d indicate the average ambient and sensor noise detections.
#20: So we could run the numbers for a simple example with 80 ps pulse duration, 30 W peak power and 200 m range with 10K pulses. On average from the return pulse we might expect to get back just a couple photons, and then during that same time interval we would get around a single photon from ambient light. So this helps to identify some of the difficulty facing commercial lidar systems that are trying to estimate depth at such long ranges. And this also motivates a sensor fusion approach to help disentangle the signal photons from the background detections.
#21: Additional information might help with the problem. For example, a conventional image of the scene contains image gradients with information about physical structure and a measure of how much ambient light comes from each location.
#22: Additional information might help with the problem. For example, a conventional image of the scene contains image gradients with information about physical structure and a measure of how much ambient light comes from each location. But identifying an analytical a mapping between these complimentary sensor measurements is not easily modeled, so we look towards a learned algorithm. Moreover, we can train a learned approach end-to-end for not just denoising tasks, but upsampling, classification, or even to estimate waypoints for self driving cars.
#23: So given the difficulty in isolating the signal
#24: So how can we use CNNs to learn to perform the sensor fusion and depth estimation task?
#25: In order to train the CNN, we use our image formation model to construct a dataset of SPAD measurements and intensity images which we use to train our CNN. We model the SPAD measurements from the NYU v2 dataset of RGB-D images where at each pixel location we calculate the laser pulse arrival time and radial falloff based on the depth, and we account for object reflectivity by estimating the albedo using a method for intrinsic decomposition. We don’t model multipath effects, though we don’t find this is necessary for the CNN to learn the depth reconstruction.
#26: The algorithm takes as input the noisy photon detections and optionally an intensity image. The CNN censors spurious detections and regresses the laser pulse from the measurements at each spatial location. From there we can output a depth map, or use the intensity image, which is higher resolution than the SPAD sensor, to do a guided depth upsampling.
#27: We train the network on the dataset of SPAD measurements. The network regresses the pulse at each spatial location and we can take the peak to determine the depth value. We compare the output pulse from the network to the normalized ground truth pulse arrival and use the KL divergence as a loss function. The intuition is that depth values close to the ground truth should be penalized less than far away, and this loss allows the network to receive some reward for getting close to the ground truth pulse if it’s not perfect.
#28: Here’s an overview of the CNN architecture for the case where we use only the SPAD measurements to estimate depth. We pass the spad measurement volume into a network of 3D convolutions at multiple resolution scales, which are then upsampled to the original resolution and processed with additional 3d convolutional layers. The output of the network is the regressed laser pulse at each spatial location.
#29: We can concatenate the intensity image into the network and jointly process it with features from the SPAD measurements to improve the output depth estimate.
#30: Finally, we can use the high-resolution intensity image to upsample the initial low-resolution depth map output. We model the upsampling network on a state of the art image-guided depth upsampling approach. This takes a high-pass filtered version of the low-resolution depth map, upsamples it, and adds it to a bicubicly upsampled version of the depth map to predict the high-resolution version. In this way we can have an end-to-end trained network that predicts an upsampled depth map from the raw photon counts and an intensity image.
#31: Here are some results for simulated SPAD measurements on a scene from the middlebury dataset. The scene and the following are all simulated with an average of 2 laser photon detections and 50 background detections per pixel. We can outperform conventional log-matched filtering and Shin et al.,
#32: Here are some results for simulated SPAD measurements on a scene from the middlebury dataset. The scene and the following are all simulated with an average of 2 laser photon detections and 50 background detections per pixel. We can outperform conventional log-matched filtering and Shin et al.,
#33: and recover more fine details in the scene compared to recent work from Rapp and Goyal.
#34: Here is another scene. Our method without the intensity image recovers details in the laundry basket that are smoothed over in Rapp and Goyal. These details are also preserved in our approach without the intensity image, though at this low noise level there are artifacts in the reconstruction. Note that the RMSE metric does not heavily penalize the output of Rapp and Goyal despite the loss of detail.
#35: Here is another scene. Our method without the intensity image recovers details in the laundry basket that are smoothed over in Rapp and Goyal. These details are also preserved in our approach without the intensity image, though at this low noise level there are artifacts in the reconstruction. Note that the RMSE metric does not heavily penalize the output of Rapp and Goyal despite the loss of detail.
#36: This table shows RMSE values averaged across a testset of middlebury scenes for a range of signal and background detection levels. Our approach with the intensity image achieves comparable quantitative results to Rapp and Goyal, but qualitatively better preserves detail as shown in the Laundry scene. These results are for a single model trained across a range of noise levels. We can also improve the performance in some cases by training models specific to each noise level.
#37: Here’s an example of the image guided depth upsampling. In this case we show the upsampled result after depth estimation with different methods. At lower resolutions the super-pixel clustering approach of Rapp and Goyal doesn’t work as well to produce a good initial depth estimate and so the upsampled output fails to recover many of the details. We can run our depth estimate with the intensity image and then use bicubic upsampling, or run our depth cnn and an image guided depth upsampling cnn in a two step procedure. Finally we can train the depth estimation and upsampling end to end and achieve a result which demonstrates the least error and has a better qualitative appearance than the other images.
#38: Here’s an example of the image guided depth upsampling. In this case we show the upsampled result after depth estimation with different methods. At lower resolutions the super-pixel clustering approach of Rapp and Goyal doesn’t work as well to produce a good initial depth estimate and so the upsampled output fails to recover many of the details. We can run our depth estimate with the intensity image and then use bicubic upsampling, or run our depth cnn and an image guided depth upsampling cnn in a two step procedure. Finally we can train the depth estimation and upsampling end to end and achieve a result which demonstrates the least error and has a better qualitative appearance than the other images.
#39: Our end-to-end approach also demonstrates significant improvements over other approaches to image guided upsampling from the raw spad measurements.
#40: Finally, to demonstrate our method, we built a hardware prototype
#41: The prototype contains two optical paths: one path for the illumination optics and one for the imaging optics. Along the illumination path, a picosecond laser pulse passes through a cylindrical lens to illuminate a scanline on the scene. We use a linear 256x1 array of SPAD pixels to image this illuminated line. The linear spad array and the laser line are scanned using two sets of synchronized scanning mirrors. * Label spad and camera
#42: Here’s an example of the prototype in action
#44: In slow motion you can see the scanning path of the laser.
#45: Here’s an example video we captured with the SPAD prototype along with the SPAD measurement volume
#46: Our reconstruction approach with and without the intensity image outperforms both log-matched filtering and rapp and goyal’s approach. We can also increase the density of the point cloud with upsampling, though at the cost of introducing floating pixel artifacts.
#47: Here’s another example of an indoor scene which we captured.
#48: … and the reconstructed point clouds
#49: Here’s another example of an indoor scene which we captured.
#50: … and the reconstructed point clouds
#51: Finally, we also captured these last scenes outdoors in indirect sunlight.
#53: Finally, we also captured this last scene outdoors in indirect sunlight. Notice that we barely get any signal, so it’s surprising we can recover anything at all.
#55: One clear limitation of the prototype is that our range was limited to a couple of meters. This is because our low power laser and low-fill factor spad give us really low photon counts and limit the range we can get. This first plot shows the amount of photon detections vs distance, and shows that at around 1.5 meters we receive less than a single photon detection from the laser pulse on average. And this is less than the average number of ambient and dark counts that are detected. However, another group has been able to achieve 800 m range for the same or better signal to background ratio with a more engineered optical system, and our method could apply to that as well.
#56: Finally, we note that our algorithm along with other techniques for single-photon depth estimation currently require offline processing of the data. We use a fairly large input measurement volume, which could be downsampled to mitigate the processing requirements. But along with other solutions, I would point out that compute resources are becoming increasingly capable- so the processing on a TPU would increase the bandwidth already by nearly 20x. Also if we were to deploy the system at night-time, we wouldn’t be able to make much use of the intensity image. However, the measurements would also be much less noisy without ambient light, and so we could still use the denoising framework without the intensity image. Note too that since the SPAD observes the world at the laser wavelength and the intensity image captures other wavelengths of light, we assume that the image of the scene changes smoothly over the spectrum so that the measurements are compatible. While we do achieve fairly good temporal consistency in the video results, this isn’t explicitly encoded into the framework, this could be exploited with a different kind of architecture. Finally, acquisition speed could be further improved using SPAD array rather than a line sensor, which would mitigate the scanning requirement. SPADs are CMOS-based technology and 2D SPAD arrays are already being produced at sizeable resolutions, including greater than 256 x 256.
#57: Finally, we note that our algorithm along with other techniques for single-photon depth estimation currently require offline processing of the data. We use a fairly large input measurement volume, which could be downsampled to mitigate the processing requirements. But along with other solutions, I would point out that compute resources are becoming increasingly capable- so the processing on a TPU would increase the bandwidth already by nearly 20x. Also if we were to deploy the system at night-time, we wouldn’t be able to make much use of the intensity image. However, the measurements would also be much less noisy without ambient light, and so we could still use the denoising framework without the intensity image. Note too that since the SPAD observes the world at the laser wavelength and the intensity image captures other wavelengths of light, we assume that the image of the scene changes smoothly over the spectrum so that the measurements are compatible. While we do achieve fairly good temporal consistency in the video results, this isn’t explicitly encoded into the framework, this could be exploited with a different kind of architecture. Finally, acquisition speed could be further improved using SPAD array rather than a line sensor, which would mitigate the scanning requirement. SPADs are CMOS-based technology and 2D SPAD arrays are already being produced at sizeable resolutions, including greater than 256 x 256.
#58: Finally, we note that our algorithm along with other techniques for single-photon depth estimation currently require offline processing of the data. We use a fairly large input measurement volume, which could be downsampled to mitigate the processing requirements. But along with other solutions, I would point out that compute resources are becoming increasingly capable- so the processing on a TPU would increase the bandwidth already by nearly 20x. Also if we were to deploy the system at night-time, we wouldn’t be able to make much use of the intensity image. However, the measurements would also be much less noisy without ambient light, and so we could still use the denoising framework without the intensity image. Note too that since the SPAD observes the world at the laser wavelength and the intensity image captures other wavelengths of light, we assume that the image of the scene changes smoothly over the spectrum so that the measurements are compatible. While we do achieve fairly good temporal consistency in the video results, this isn’t explicitly encoded into the framework, this could be exploited with a different kind of architecture. Finally, acquisition speed could be further improved using SPAD array rather than a line sensor, which would mitigate the scanning requirement. SPADs are CMOS-based technology and 2D SPAD arrays are already being produced at sizeable resolutions, including greater than 256 x 256.
#59: Finally, we note that our algorithm along with other techniques for single-photon depth estimation currently require offline processing of the data. We use a fairly large input measurement volume, which could be downsampled to mitigate the processing requirements. But along with other solutions, I would point out that compute resources are becoming increasingly capable- so the processing on a TPU would increase the bandwidth already by nearly 20x. Also if we were to deploy the system at night-time, we wouldn’t be able to make much use of the intensity image. However, the measurements would also be much less noisy without ambient light, and so we could still use the denoising framework without the intensity image. Note too that since the SPAD observes the world at the laser wavelength and the intensity image captures other wavelengths of light, we assume that the image of the scene changes smoothly over the spectrum so that the measurements are compatible. While we do achieve fairly good temporal consistency in the video results, this isn’t explicitly encoded into the framework, this could be exploited with a different kind of architecture. Finally, acquisition speed could be further improved using SPAD array rather than a line sensor, which would mitigate the scanning requirement. SPADs are CMOS-based technology and 2D SPAD arrays are already being produced at sizeable resolutions, including greater than 256 x 256.
#60: In summary, we’ve presented a photon-efficient method for 3D imaging which leverages Single photon avalanche diodes and a high-resolution intensity image. While sensor fusion for this task is a very challenging analytical problem, we can readily leverage CNNs for this task in a learned approach and we demonstrate a photon-efficient hardware prototype.
#61: I’d like to thank our sponsors and also mention that code and data for this project are online on our project webpage. Thank you

Single photon 3D Imaging with Deep Sensor Fusion

More Related Content

What's hot (20)

Similar to Single photon 3D Imaging with Deep Sensor Fusion (20)

Recently uploaded (20)

Single photon 3D Imaging with Deep Sensor Fusion

Editor's Notes