SlideShare a Scribd company logo
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 63
Efficient Point Cloud Pre-processing using The Point Cloud
Library
Marius Miknis Marius.Miknis@southwales.ac.uk
Faculty of Computing, Engineering and Science
University of South Wales
Pontypridd, CF37 1DL, UK
Ross Davies Ross.Davies@southwales.ac.uk
Faculty of Computing, Engineering and Science
University of South Wales
Pontypridd, CF37 1DL, UK
Peter Plassmann Peter.Plassmann@southwales.ac.uk
Faculty of Computing, Engineering and Science
University of South Wales
Pontypridd, CF37 1DL, UK
Andrew Ware Andrew.Ware@southwales.ac.uk
Faculty of Computing, Engineering and Science
University of South Wales
Pontypridd, CF37 1DL, UK
Abstract
Robotics, video games, environmental mapping and medical are some of the fields that use 3D
data processing. In this paper we propose a novel optimization approach for the open source
Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of
the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid
downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the
point cloud. Additionally, OpenGL shader based rendering is examined. An optimization
technique based on CPU cycle measurement is proposed and applied in order to optimize those
parts of the pre-processing chain where measured performance is slowest. Results show that
with optimized modules the performance of the pre-processing chain has increased 69 fold.
Keywords: Point Cloud, Point Cloud Library, Point Data Pre-processing.
1. INTRODUCTION
Point clouds are sparse spatial representations of 3D object shapes. Algorithms such as the ones
in the frequently used RANSAC [1] method can then be applied to reconstruct the complete
object shapes from the point clouds.
A popular library for storing and manipulating point cloud data is the Point Cloud Library (PCL)
[2]. The PCL is a large scale open source project that is focused on both 2D and 3D point clouds
and includes some image processing functionality. Currently the Library has over 120 developers,
from universities, commercial companies and research institutes. The PCL is released under the
terms of the BSD license, which means that it is free for commercial and research use. It can be
cross compiled for many different platforms including Windows, Linux, Mac OS, Android and iOS.
This allows the library to also be used in embedded systems. The main algorithm groups in the
PCL are for segmentation, registration, feature estimation, surface reconstruction, model fitting,
visualization and filtering.
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 64
In the work presented in this paper stereo-photogrammetry is used as the main method of 3D
data acquisition. This method is based on stereoscopy where two spatially separated images are
obtained from different viewing positions [3]. The analysis of disparity (separation) between
corresponding points in both images encodes the distance of object points which are then stored
in a disparity map.
This paper is organized as follows: section 2 presents related work in the field of 3D data
acquisition and point cloud processing, followed in section 3 by a description of PCL modules and
their optimization, while conclusion and future work are discussed in sections 1 and 1.
2. RELATED WORK
There are many uses for 3D data ranging from environmental perception for robots via
autonomous car navigation, playing video games to medical uses such as wound measurement,
facial reconstruction and more. A number of ways to capture 3D data have been proposed and
implemented. Many existing technologies rely heavily on the use of structured or infrared lighting
to extract the depth data [4]. The technique of structured lighting is widely used in computer vision
for its many benefits [5] in terms of accuracy and ease of use. Over the last 15 years 3D laser
scanners have been developed [6] as active remote sensing devices. Such scanners can quickly
scan thousands or even millions of 3D cloud points in a scene. Time of flight cameras are also
widely used in computer vision. The principle behind these cameras is similar to that of a sonar,
but with light replacing sound. Such cameras were introduced into the wider public domain by the
Microsoft Xbox One console [7] to replace its older structured lighting based Kinect sensor.
Once 3D data has been acquired by the above systems some kind of processing needs to be
applied to extract useful information as well as to remove noise, outliers or any unnecessary
information. With the number of points that can be sampled point clouds can get extremely large
and contain noise as well as outliers and errors. Thus the pre-processing stage is important [8] [9]
as it deals with noise, error and outlier removal through the use of filters as well as smoothing the
point cloud and reducing the point count while still keeping the relevant feature information. There
are software tools available for such processing [10] [11] [12] but very few provide a complete
library framework to incorporate into software projects. 3DReshaper [13] is such a library that
provides point cloud processing capabilities. The PCL is the most commonly used library for point
cloud processing, thus the PCL was used as the main development library in this research.
The current application focus of the PCL library is in the field of robotics. For robots to sense,
compute and interact with objects or whole scenes a way to perceive the world is needed, which
is why the PCL is used as a part of the Robot Operating System (ROS). Using the PCL as a part
of ROS, robots can compute a 3D environment in order to understand it, detect objects and
interact with them. Due to space and power restrictions such systems rarely use desktop-like
computing devices and are therefore in most cases implemented on relatively small embedded
systems. In these systems the universal nature of the PCL (many operating systems, many 3D
data formats, etc.) results in slow performance. The following section III proposes a range of
optimizations in order to improve performance.
3. POINT CLOUD PROCESSING OPTIMISATIONS
Four key algorithm areas were selected for optimization: point cloud creation (section 3.1),
rendering (section 3.2), voxel grid down-sampling (section 3.3), pass through filtering (section
3.4) and the pre-processing chain (section 3.5). For the stereo test data the New Tsukuba Stereo
Dataset [14] was used. This is a collection of synthetic stereo image pairs created using computer
graphics. Additionally, the OpenCV (Open Source Computer Vision Library) was used for image
loading. The project code was run on a desktop Intel i7 machine. The first set of tests used the
Microsoft Visual Studio 2013 code analyzer for inspecting code and its performance statistics.
The purpose of the tests was to identify which parts of the code are using the most of the CPU
calls and then to optimize those.
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 65
3.1. Point Cloud Creation Speed Improvements
When using a stereo camera setup depth values are represented as a disparity map which in
most cases is a greyscale image where the brightness of pixels represents depth values. A
second output is a color image that stores information of the actual color value of the point. From
the disparity and color images a point cloud can be produced. The PCL provides the
OrganisedConversion<>::convert() method which uses the disparity map, color image and the
focal length of the camera to produce a point cloud.
Point cloud generation is in 3 stages: first the input images are loaded into memory using
OpenCV which converts them to vectors that can be passed as parameters to the second stage,
PCL point cloud creation. The point cloud is then rendered on screen in the third stage. Using
Microsoft Visual Studio 2013 code profiler CPU cycles were measured per line of code. In order
to average-out operating system specific random overheads all following test were performed
three times. Results are shown in FIGURE 1.
FIGURE 1: Test figures for CPU usage of different stages. Test 1 – 3 show pre-optimised PCL code,
while tests 4 and 5 show optimised conversion.
• For the first test OpenCV was used to read the Tsukuba dataset as a sequence of images,
loaded one at a time. OpenCV, PCL point cloud generation and rendering algorithms were
used ‘as is’ without changes and as provided from public repositories. The results are
shown in the first bar in FIGURE 1. PCL point cloud generation required 36% of CPU
cycles, rendering 45%. This resulted in a processing speed of 2 frames per second (fps).
• In the second test rendering was disabled to identify CPU load more accurately when
OpenCV loaded images one at a time.
• This is contrasted by the third test where OpenCV loaded images not as individual stills
but as a video sequence. Encoding the still images into a video sequence was achieved
using the OpenCV Intel IYUV. This had a dramatic effect as OpenCV CPU cycles reduced
from 27% to only 3%, leaving the remaining almost 97% to the PCL conversion.
• In order to improve PCL performance numerous optimizations were made. In particular,
these were a) bit-shifting pointer incrementation of color values to allow faster access and
modification of values, b) vector clear and resize checks to avoid clearing and resizing a
new vector when it is the same size as the previous one c) vector access optimizations
through the use of data pointers which allowed the optimization of vector pushback
overhead and d) several minor optimizations. The source code and documentation of
these changes are available in the PCL developer’s forum [15]. The 4
th
bar in FIGURE 1
shows that as a result the CPU cycles needed for PCL conversion reduced by 66% to less
than the cycles needed for image loading by OpenCV.
• The two improvements documented in tests 3 and 4 were finally tested in the same way
as in the first test of this series, i.e. with rendering switched on again. With image loading
replaced by video loading and conversion optimized the total cycle usage of these two
2fps n/a n/a n/a 5fps
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 66
components now consume less than 10% of processor cycles while rendering now takes
72%. Importantly, the overall frame rate increased to 5 frames per second.
3.2. Rendering Speed Improvements
Since rendering was now the new bottleneck, steps were taken to improve its performance.
By default, rendering for the PCL is done by The Visualization Toolkit (VTK) which is an open
source library for 3D computer graphics and image processing. This was replaced with a shader
(i.e. graphics processor) based OpenGL rendering implementation for desktop PCs.
The basic data structure inside the PCL is the point cloud. This is an assembly of sub-fields. The
main ones are ‘width’, ‘height’ and ‘points’. ‘Points’ is a vector that stores points of PointT type
which in turn can be PointXYZ, PointRGB, PointRGBA (and some other basic types). Under the
existing PCL data structure non-colored point clouds of type PointXYZ could be rendered with our
new OpenGL implementation but not colored ones. To enable this several changes were made to
the PCL:
• A fourth float value was added to the point cloud type union. This was easy to do since the
union already had memory allocated for four float values but only x, y and z floats were
declared. The forth parameter added now stores the color value to be passed to the
OpenGL shaders.
• To store the color values the three constituent independent integer values were bit-shifted
into a single float which was then stored as the fourth value of the above union. This was
done to avoid integer calculations having to be performed in the shaders while at the
same time having minimal impact on the PCL.
• However, OpenGL shaders do not support bit shifting. The color values were therefore
extracted in the shader by manipulating the known structure (8 bits for each of the
channel). In the vertex shader the floor() method was used to extract each color channel
separately as the return value is an integer.
The result of the above manipulations are shown in FIGURE 2. The two bars labelled ‘VTK’ are
unchanged re-runs of the first and fifth group tests from the previous section (see FIGURE 1).
When in the first test VTK is replaced by OpenGL the frame rate increases by a modest 50% to 3
fps. When, however, this is done in the optimized system produced in the previous section the
speed improvement is considerable: 38 fps. In this final system where all three components are
optimized, OpenGL rendering uses only 8.5% of the processor cycles while before VTK used up
72%.
FIGURE 2: 1st and 5th re-tests (using the standard VTK renderer) compared to new OpenGL renderer.
The first 2 bars represent performance of the non-optimised PCL code and the 3rd and 4th bar the optimised
PCL/OpenCV code.
2fps 3fps 5fps 38fps
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 67
3.3. Voxel Grid Downsample Filter Improvements
After the point cloud has been produced further processing is usually required, e.g. for data
reduction and filtering operations. A relatively low resolution point cloud of 640 x 480 (e.g.
produced by the Kinect) results in 307,200 points. While for some operations (e.g. thresholding)
point processing follows an O(n) notation a more complex algorithm (e.g. k nearest neighbor
filtering) becomes O(nk). This can place a heavy workload on the processor.
One of the methods frequently used to lower the amount of points in a point cloud and
unnecessary complexity while retaining detail and information is voxel grid down sampling. The
down sampling is performed using an octree to sub-divide the point cloud into multiple cube
shaped regions (voxels). After processing, all points in the voxel are reduced to a single one. This
results in a point cloud that is smaller in size and complexity but is still precise enough to work
with and has a smaller cost in terms of CPU performance. The PCL has a dedicated method for
this called voxelGrid.filter(). For testing the leaf size values of the filter were 0.03f, 0.03f, 0.03f
(3x3x3cm). Three groups of tests were performed as shown in FIGURE 3.
FIGURE 3: Test figures for CPU usage of voxel grid. Test 1 shows the stock code, test 2 shows results
with Quicksort algorithm implemented and test 3 shows overall optimised voxel grid performance.
• In the first group of tests voxel filtering was added to the optimized processing chain
developed in the previous two sections A and B. Voxel grid computation proved to be very
CPU intensive with overall CPU cycle usage of 98%. This also resulted in a poor frame
rate of under 0.1 fps (8.6 seconds per frame). Analysis of the filter code revealed that 30%
of the processing was spent on sorting the points using a standard C++ library vector sort
method.
• The second group of tests was therefore performed with the sort method replaced by a
Quicksort algorithm [16]. This algorithm takes on average O(n log n) steps to sort n points,
but in the worst case scenario when a chosen pivot value is the smallest or largest of the
points to sort the algorithm has to make O(n
2
) comparisons. To avoid this possible issue a
mean value is computed before the sorting to avoid using very small or very large values
as the pivot. Compared to the standard C++ sort with 30% of processor cycles used,
Quicksort was significantly more efficient, using only 0.9%. This unfortunately improved
the overall filter method by only 5.2% as the computation shifted to different parts of the
algorithm, mostly to vector access overheads.
• For the third test group vector access was therefore optimized by replacing vector
pushback calls with pointer accesses and improving the centroid finding which together
took up 65% of the processing. These changes reduced the voxel filter computation time
by 26% to an overall contribution of that in total using only 72% of CPU cycles.
The combined changes to the sorting and vector processes increased the frame rate 91-fold to an
average frame rate of about 10 fps.
0.1fps 0.1fps 10fps
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 68
3.4. Pass Through Filter Improvements
Another PCL provided post-processing method is passthrough.filter() which is as a means to
allow the removal of points from the cloud which are not within a specified range. This allows the
point cloud to be adjusted in any coordinate direction similar to a frustum cut-off. The
passthrough.filter() method accepts parameters for upper and lower limits and a direction along
the x, y or z axis. For the Tsukuba dataset the depth range values of 3 and 12 were used for
testing in the z coordinate direction. Two groups of tests were performed with results shown in
FIGURE 4.
FIGURE 4: Test figures for CPU usage of pass through filter. The first and second test showing the stock
code performance and second improved code performance respectively.
• In the first test the pass through filter was appended to the optimized processing chain
outlined previously in sections A and B. The filter was very CPU intensive using 93.6% of
cycles bringing down the frame rate to 3 fps. Analysis of the code showed that (as before
with voxel filtering) vector accesses were inefficient.
• After vector access optimization along the lines outlined before with voxel filtering and
improving the non-finite entries check (54%) as well and field value memory copy calls
(24%) the pass through filter now only consumes 41% of CPU cycles with the frame rate
rising to 18 fps.
3.5. Combined Pre-processing Chain
The PCL modules analyzed above when combined create the main pre-processing chain of the
point cloud manipulation. The order in which these algorithms are applied makes a substantial
performance difference.
Running the voxel filter first proved to be the slower combination as the down sampling had to be
performed on the whole point cloud, in this case 307,200 points. Looking at FIGURE 5 it can be
seen that voxel grid computation is the most CPU intensive task taking up 92% of all processing.
Pass through filtering only took up 2% of CPU cycles and organized PCL conversion 4%. An
optimized version saw a more balanced use of the processing with voxel grid processing lowered
to 68% and pass through filtering at 12%. Organized conversion rose to 10% and OpenCV’s
contribution increased to 7% from 0.2% previously.
3fps 18fps
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 69
FIGURE 5: CPU usage shown for pre-processing applying voxel grid filter computations before pass
through filtering.
When the pass through filter was applied first the performance changed by a great margin. As
shown in FIGURE 6 it can be seen that the voxel grid process is still the most CPU intensive part
but has improved over the previous order. It was found to only use 65% of processing steps
instead of 92%. This led to CPU cycles being distributed more evenly between the pass through
filter (13%) and organized PCL conversion (20%). The optimized version of the modules exhibits
the most even distribution of processing with voxel grid contribution lowered to 21% and pass
through filtering taking up 33% of CPU cycles. Organized conversion used up 24% and OpenCV
18% respectively.
FIGURE 6: CPU usage shown for pre-processing applying pass through filtering before voxel grid
computations.
The order of code execution has led to a significant change in performance (see FIGURE 7).
When the voxel grid was processed before the pass through filter the stock code was not able to
render more than 0.1 fps, i.e. it took around 9.1 seconds to render a single frame. This order
when used with the optimized code has shown a significant improvement as the frame rate
increased to 3fps, i.e. it only took 98 milliseconds on average to render a single frame, making it
on average up to 93 times faster. Similar results were seen in the reverse arrangement. The
stock code with pass through filtering being applied first was able to render 0.4 fps (2.5 seconds
per frame) which is a four times better performance. The biggest change was seen in the overall
optimised code frame rate which on average was 25 fps making it close to real time performance
0.1fps 9fps
0.4fps 25fps
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 70
as it only took 37 miliseconds to render a frame. Overall this is a 69 times better performance
compared to the original unaltered stock code.
FIGURE 7: Frame rates shown of stock and optimised modules in different execution orders.
To further support and test the findings additional testing was performed on a wide range of
devices which included embedded systems such as Raspberry Pi 1 and 2, tablets, laptops and
powerful rendering machines. In total eighteen different machines were used to perform a
comparative evaluation between the stock and optimised code, of which some ran a Linux
operating system to give a full spectrum of hardware and software combinations. These results
show that optimised code was able to increase the performance for every single machine tested.
The embedded systems saw the smallest increase due to their lack of power on the ARM based
processor, but still saw four times better performance with optimise code compared to stock. As
the power of machines increased so did the optimised code performance while stock stayed
almost level.
FIGURE 8: Comparative evaluation test results between stock and optimised code on eighteen different
machines sorted from least powerful(left) to most powerful(right).
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 71
4. CONCLUSIONS
Since the PCL is a general purpose and multi-platform library many of its internal aspects are
generalized, not all parts are optimized and performance can suffer on time sensitive processing.
As shown in section 3 optimized PCL modules provide significant performance gains over the
stock modules. When neglecting the minimal cost of performance testing measurement
overheads speed increased 2.4 times for the organized PCL conversion, 91 times for voxel grid
filtering and 7.8 times for pass through filtering. As seen in section 3.5 this allows for the use of
multiple PCL modules together while still maintaining near real-time frame rates giving an
average of 69 times improved performance for the pre-processing of the point clouds. It is
important to note that the optimized code is still generalized, not specific to a particular platform
and backwards compatible with existing stock code. The optimized modules in this paper have
not been changed since libraries release 2011 showing the need for the update and
improvement. The point cloud pre-processing optimizations are important for various point cloud
tasks such as registration, object recognition and segmentation. Part of these improvements are
already being implemented to the library project by the community.
5. FUTURE WORK
Future plans focus on working with PCL developer community, and to contribute optimized
algorithms to the official PCL code repository. Another part of research has already been started
to allow the PCL to be used with embedded devices to perform real time point cloud processing.
6. REFERENCES
[1] S. Ruwen, W. Roland and R. Klei, "Efficient RANSAC for Point-Cloud Shape Detection,"
Computer Graphics Forum, vol. 26, no. 2, p. 214–226, 2007.
[2] S. C. Rusu Radu Bogdan, "3d is here: Point cloud library (pcl)," in Robotics and Automation
(ICRA), 2011 IEEE International Conference, Shanghai, 2011.
[3] C. Sun, "A Fast Stereo Matching Method," in Digital Image Computing: Techniques and
Applications, Auckland, 1997.
[4] S. Izadi, D. Kim and O. Hiliges, "Real-time 3D Reconstruction and Interaction Using a Moving
Depth Camera," in 24th annual ACM Symposium on User Interface Software and
Technology, New York, NY, 2011.
[5] D. Lanman, D. Crispell and G. Taubin, "Surround Structured Lighting for Full Object
Scanning," in Sixth International Conference on 3-D Digital Imaging and Modeling, Montreal,
Aug. 2007.
[6] A. Zhang, S. Hu, Y. Chen, H. Liu, F. Yang and J. Liu, "Fast Continuous 360 Degree Color 3D
Laser Scanner," in The Internal Archives of the Photogrammetry, Remote Sensing and
Spatial Information sciences, Volume XXXVII, Beijing, 2008.
[7] Microsoft, "Kinect for Windows," Microsoft, [Online]. Available: https://www.microsoft.com/en-
us/kinectforwindows/develop/. [Accessed 2 June 2015].
[8] I. Budak, D. Vukelić, D. Bračun, J. Hodolič and M. Sokovi, "Pre-Processing of Point-Data
from Contact and Optical 3D Digitization Sensors," Sensors, vol. 12, no. 1, pp. 1100-1126,
2013.
[9] X. Zhang, C. K. Sun, C. Wang and S. Ye, "Study on Preprocessing Methods for Color 3D
Point Cloud," Materials Science Forum, Vols. 471-472, pp. 716-721 , 2004.
Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware
International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 72
[10] Bentley Systems, "Bentley Pointools V8i," Bentley Systems, [Online]. Available:
http://www.bentley.com/en-US/Promo/Pointools/pointools.htm. [Accessed 16 June 2015].
[11] Mirage-Technologies, "Home: PointCloudViz," Mirage-Technologies, [Online]. Available:
http://www.pointcloudviz.com/. [Accessed 16 June 2015].
[12] Faro, "Home: PointSense," Faro, [Online]. Available: http://faro-3d-
software.com/CAD/Products/PointSense/index.php. [Accessed 16 June 2015].
[13] E. K. Stathopoulou, J. L. Lerma and A. Georgopoulos, "Geometric documentation of the
almoina door of the cathedral of Valencia.," in Proceedings of EuroMed2010 3rd International
Conference dedicated on Digital Heritage, Cyprus, 2010.
[14] S. Martull, M. Peris and K. Fukui, "Realistic CG stereo image dataset with ground truth
disparity maps," Trak-Mark, 2012.
[15] Point Cloud Library, "Point Cloud Library (PCL) Developers mailing list," Naddle, [Online].
Available: http://www.pcl-developers.org/. [Accessed July 2015].
[16] C. A. R. Hoare, "Quicksort," The Computer Journal, pp. 10-16 , 1962.
[17] Willow Garage, "Software: ROS," Willow Garage, 3 June 2015. [Online]. Available:
https://www.willowgarage.com/pages/software/ros-platform.
[18] Itseez, "Home page: OpenCV," Itseez, [Online]. Available: http://opencv.org/. [Accessed 15
January 2015].
[19] Kitware, "Home: VTK," Kitware, [Online]. Available: http://www.vtk.org/. [Accessed 15 June
2015].
[20] GiHub, "Point Cloud Library Repository," [Online]. Available:
https://github.com/PointCloudLibrary/pcl. [Accessed 23 June 2015].

More Related Content

What's hot (20)

PDF
A Review: Metaheuristic Technique in Cloud Computing
IRJET Journal
 
PDF
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
ijccsa
 
PDF
Ijarcce9 b a anjan a comparative analysis grid cluster and cloud computing
Harsh Parashar
 
PDF
A01260104
IOSR Journals
 
PDF
Performance Improvement of Cloud Computing Data Centers Using Energy Efficien...
IJAEMSJORNAL
 
PDF
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
IJECEIAES
 
PDF
A Review on Scheduling in Cloud Computing
ijujournal
 
PDF
Data Division in Cloud for Secured Data Storage using RSA Algorithm
IRJET Journal
 
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
B02120307013
theijes
 
PPTX
An optimized scientific workflow scheduling in cloud computing
DIGVIJAY SHINDE
 
PDF
AUTO RESOURCE MANAGEMENT TO ENHANCE RELIABILITY AND ENERGY CONSUMPTION IN HET...
IJCNCJournal
 
PDF
Paper444012-4014
saumya yuval
 
PDF
An Enhanced trusted Image Storing and Retrieval Framework in Cloud Data Stora...
IJERA Editor
 
PDF
Multicloud Deployment of Computing Clusters for Loosely Coupled Multi Task C...
IOSR Journals
 
PDF
Scheduling in cloud computing
ijccsa
 
PDF
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
IJECEIAES
 
PDF
Improving Cloud Performance through Performance Based Load Balancing Approach
IRJET Journal
 
PDF
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
IJERA Editor
 
PDF
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
IOSR Journals
 
A Review: Metaheuristic Technique in Cloud Computing
IRJET Journal
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
ijccsa
 
Ijarcce9 b a anjan a comparative analysis grid cluster and cloud computing
Harsh Parashar
 
A01260104
IOSR Journals
 
Performance Improvement of Cloud Computing Data Centers Using Energy Efficien...
IJAEMSJORNAL
 
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
IJECEIAES
 
A Review on Scheduling in Cloud Computing
ijujournal
 
Data Division in Cloud for Secured Data Storage using RSA Algorithm
IRJET Journal
 
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
B02120307013
theijes
 
An optimized scientific workflow scheduling in cloud computing
DIGVIJAY SHINDE
 
AUTO RESOURCE MANAGEMENT TO ENHANCE RELIABILITY AND ENERGY CONSUMPTION IN HET...
IJCNCJournal
 
Paper444012-4014
saumya yuval
 
An Enhanced trusted Image Storing and Retrieval Framework in Cloud Data Stora...
IJERA Editor
 
Multicloud Deployment of Computing Clusters for Loosely Coupled Multi Task C...
IOSR Journals
 
Scheduling in cloud computing
ijccsa
 
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
IJECEIAES
 
Improving Cloud Performance through Performance Based Load Balancing Approach
IRJET Journal
 
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
IJERA Editor
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
IOSR Journals
 

Viewers also liked (16)

PPTX
6 staffing system and retention management
Preeti Bhaskar
 
PDF
Summary of Whale Done Approach
GMR Group
 
DOC
Metrics formulas
md_taufeeq
 
PDF
Overcoming the Challenges of your Master Data Management Journey
Jean-Michel Franco
 
PPT
los mercados globales en accion
Jose Leonardo Narvaez Velasco
 
PPTX
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
 
DOC
Medical Billing Flow Chart
Karna *
 
PDF
Application Developers Guide to HIPAA Compliance
TrueVault
 
PPTX
Calibration of spectrophotometer
Deepak Shilkar
 
PPTX
Proactive Contact Beta Results & Outbound Contact Express
David Ward
 
PPTX
Mobile Commerce: A Security Perspective
Pragati Rai
 
PPTX
Mobile marketing strategies
Dave Chaffey
 
PPT
Introduction to Management - Basic concepts & fundamentals (An overview)
Seema -
 
PDF
API Business Models
John Musser
 
PPT
Social Media
Alex Wong
 
PDF
Assembly and Details machine drawing pdf
umesh chikhale
 
6 staffing system and retention management
Preeti Bhaskar
 
Summary of Whale Done Approach
GMR Group
 
Metrics formulas
md_taufeeq
 
Overcoming the Challenges of your Master Data Management Journey
Jean-Michel Franco
 
los mercados globales en accion
Jose Leonardo Narvaez Velasco
 
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
 
Medical Billing Flow Chart
Karna *
 
Application Developers Guide to HIPAA Compliance
TrueVault
 
Calibration of spectrophotometer
Deepak Shilkar
 
Proactive Contact Beta Results & Outbound Contact Express
David Ward
 
Mobile Commerce: A Security Perspective
Pragati Rai
 
Mobile marketing strategies
Dave Chaffey
 
Introduction to Management - Basic concepts & fundamentals (An overview)
Seema -
 
API Business Models
John Musser
 
Social Media
Alex Wong
 
Assembly and Details machine drawing pdf
umesh chikhale
 
Ad

Similar to Efficient Point Cloud Pre-processing using The Point Cloud Library (20)

PPTX
PCL (Point Cloud Library)
University of Oklahoma
 
PDF
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
cscpconf
 
PDF
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
CubiCasa
 
PDF
Indoor Point Cloud Processing
PetteriTeikariPhD
 
PDF
Bl32821831
IJMER
 
PDF
Final_draft_Practice_School_II_report
Rishikesh Bagwe
 
PDF
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
csandit
 
PPTX
Point cloud library
Bindu Karki
 
PDF
What is point cloud annotation?
Annotation Support
 
PDF
IRJET- Proposed Design for 3D Map Generation using UAV
IRJET Journal
 
PPTX
Dynamic Adaptive Point Cloud Streaming
Alpen-Adria-Universität
 
PDF
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
toukaigi
 
PDF
Stereo vision-based obstacle avoidance module on 3D point cloud data
TELKOMNIKA JOURNAL
 
PDF
Point cloud mesh-investigation_report-lihang
Lihang Li
 
PDF
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
exploring the wondors of cloud technology.pdf
Chudasama Outsourcing | AutoCAD Drawing and Drafting Services
 
PPTX
exploring the wondors of cloud technology].pptx
Chudasama Outsourcing | AutoCAD Drawing and Drafting Services
 
PDF
Analysis of KinectFusion
Dong-Won Shin
 
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PetteriTeikariPhD
 
PDF
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
cscpconf
 
PCL (Point Cloud Library)
University of Oklahoma
 
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
cscpconf
 
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
CubiCasa
 
Indoor Point Cloud Processing
PetteriTeikariPhD
 
Bl32821831
IJMER
 
Final_draft_Practice_School_II_report
Rishikesh Bagwe
 
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
csandit
 
Point cloud library
Bindu Karki
 
What is point cloud annotation?
Annotation Support
 
IRJET- Proposed Design for 3D Map Generation using UAV
IRJET Journal
 
Dynamic Adaptive Point Cloud Streaming
Alpen-Adria-Universität
 
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
toukaigi
 
Stereo vision-based obstacle avoidance module on 3D point cloud data
TELKOMNIKA JOURNAL
 
Point cloud mesh-investigation_report-lihang
Lihang Li
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
exploring the wondors of cloud technology.pdf
Chudasama Outsourcing | AutoCAD Drawing and Drafting Services
 
exploring the wondors of cloud technology].pptx
Chudasama Outsourcing | AutoCAD Drawing and Drafting Services
 
Analysis of KinectFusion
Dong-Won Shin
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PetteriTeikariPhD
 
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
cscpconf
 
Ad

Recently uploaded (20)

PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PDF
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPTX
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
community health nursing question paper 2.pdf
Prince kumar
 
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
Dimensions of Societal Planning in Commonism
StefanMz
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 

Efficient Point Cloud Pre-processing using The Point Cloud Library

  • 1. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 63 Efficient Point Cloud Pre-processing using The Point Cloud Library Marius Miknis [email protected] Faculty of Computing, Engineering and Science University of South Wales Pontypridd, CF37 1DL, UK Ross Davies [email protected] Faculty of Computing, Engineering and Science University of South Wales Pontypridd, CF37 1DL, UK Peter Plassmann [email protected] Faculty of Computing, Engineering and Science University of South Wales Pontypridd, CF37 1DL, UK Andrew Ware [email protected] Faculty of Computing, Engineering and Science University of South Wales Pontypridd, CF37 1DL, UK Abstract Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold. Keywords: Point Cloud, Point Cloud Library, Point Data Pre-processing. 1. INTRODUCTION Point clouds are sparse spatial representations of 3D object shapes. Algorithms such as the ones in the frequently used RANSAC [1] method can then be applied to reconstruct the complete object shapes from the point clouds. A popular library for storing and manipulating point cloud data is the Point Cloud Library (PCL) [2]. The PCL is a large scale open source project that is focused on both 2D and 3D point clouds and includes some image processing functionality. Currently the Library has over 120 developers, from universities, commercial companies and research institutes. The PCL is released under the terms of the BSD license, which means that it is free for commercial and research use. It can be cross compiled for many different platforms including Windows, Linux, Mac OS, Android and iOS. This allows the library to also be used in embedded systems. The main algorithm groups in the PCL are for segmentation, registration, feature estimation, surface reconstruction, model fitting, visualization and filtering.
  • 2. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 64 In the work presented in this paper stereo-photogrammetry is used as the main method of 3D data acquisition. This method is based on stereoscopy where two spatially separated images are obtained from different viewing positions [3]. The analysis of disparity (separation) between corresponding points in both images encodes the distance of object points which are then stored in a disparity map. This paper is organized as follows: section 2 presents related work in the field of 3D data acquisition and point cloud processing, followed in section 3 by a description of PCL modules and their optimization, while conclusion and future work are discussed in sections 1 and 1. 2. RELATED WORK There are many uses for 3D data ranging from environmental perception for robots via autonomous car navigation, playing video games to medical uses such as wound measurement, facial reconstruction and more. A number of ways to capture 3D data have been proposed and implemented. Many existing technologies rely heavily on the use of structured or infrared lighting to extract the depth data [4]. The technique of structured lighting is widely used in computer vision for its many benefits [5] in terms of accuracy and ease of use. Over the last 15 years 3D laser scanners have been developed [6] as active remote sensing devices. Such scanners can quickly scan thousands or even millions of 3D cloud points in a scene. Time of flight cameras are also widely used in computer vision. The principle behind these cameras is similar to that of a sonar, but with light replacing sound. Such cameras were introduced into the wider public domain by the Microsoft Xbox One console [7] to replace its older structured lighting based Kinect sensor. Once 3D data has been acquired by the above systems some kind of processing needs to be applied to extract useful information as well as to remove noise, outliers or any unnecessary information. With the number of points that can be sampled point clouds can get extremely large and contain noise as well as outliers and errors. Thus the pre-processing stage is important [8] [9] as it deals with noise, error and outlier removal through the use of filters as well as smoothing the point cloud and reducing the point count while still keeping the relevant feature information. There are software tools available for such processing [10] [11] [12] but very few provide a complete library framework to incorporate into software projects. 3DReshaper [13] is such a library that provides point cloud processing capabilities. The PCL is the most commonly used library for point cloud processing, thus the PCL was used as the main development library in this research. The current application focus of the PCL library is in the field of robotics. For robots to sense, compute and interact with objects or whole scenes a way to perceive the world is needed, which is why the PCL is used as a part of the Robot Operating System (ROS). Using the PCL as a part of ROS, robots can compute a 3D environment in order to understand it, detect objects and interact with them. Due to space and power restrictions such systems rarely use desktop-like computing devices and are therefore in most cases implemented on relatively small embedded systems. In these systems the universal nature of the PCL (many operating systems, many 3D data formats, etc.) results in slow performance. The following section III proposes a range of optimizations in order to improve performance. 3. POINT CLOUD PROCESSING OPTIMISATIONS Four key algorithm areas were selected for optimization: point cloud creation (section 3.1), rendering (section 3.2), voxel grid down-sampling (section 3.3), pass through filtering (section 3.4) and the pre-processing chain (section 3.5). For the stereo test data the New Tsukuba Stereo Dataset [14] was used. This is a collection of synthetic stereo image pairs created using computer graphics. Additionally, the OpenCV (Open Source Computer Vision Library) was used for image loading. The project code was run on a desktop Intel i7 machine. The first set of tests used the Microsoft Visual Studio 2013 code analyzer for inspecting code and its performance statistics. The purpose of the tests was to identify which parts of the code are using the most of the CPU calls and then to optimize those.
  • 3. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 65 3.1. Point Cloud Creation Speed Improvements When using a stereo camera setup depth values are represented as a disparity map which in most cases is a greyscale image where the brightness of pixels represents depth values. A second output is a color image that stores information of the actual color value of the point. From the disparity and color images a point cloud can be produced. The PCL provides the OrganisedConversion<>::convert() method which uses the disparity map, color image and the focal length of the camera to produce a point cloud. Point cloud generation is in 3 stages: first the input images are loaded into memory using OpenCV which converts them to vectors that can be passed as parameters to the second stage, PCL point cloud creation. The point cloud is then rendered on screen in the third stage. Using Microsoft Visual Studio 2013 code profiler CPU cycles were measured per line of code. In order to average-out operating system specific random overheads all following test were performed three times. Results are shown in FIGURE 1. FIGURE 1: Test figures for CPU usage of different stages. Test 1 – 3 show pre-optimised PCL code, while tests 4 and 5 show optimised conversion. • For the first test OpenCV was used to read the Tsukuba dataset as a sequence of images, loaded one at a time. OpenCV, PCL point cloud generation and rendering algorithms were used ‘as is’ without changes and as provided from public repositories. The results are shown in the first bar in FIGURE 1. PCL point cloud generation required 36% of CPU cycles, rendering 45%. This resulted in a processing speed of 2 frames per second (fps). • In the second test rendering was disabled to identify CPU load more accurately when OpenCV loaded images one at a time. • This is contrasted by the third test where OpenCV loaded images not as individual stills but as a video sequence. Encoding the still images into a video sequence was achieved using the OpenCV Intel IYUV. This had a dramatic effect as OpenCV CPU cycles reduced from 27% to only 3%, leaving the remaining almost 97% to the PCL conversion. • In order to improve PCL performance numerous optimizations were made. In particular, these were a) bit-shifting pointer incrementation of color values to allow faster access and modification of values, b) vector clear and resize checks to avoid clearing and resizing a new vector when it is the same size as the previous one c) vector access optimizations through the use of data pointers which allowed the optimization of vector pushback overhead and d) several minor optimizations. The source code and documentation of these changes are available in the PCL developer’s forum [15]. The 4 th bar in FIGURE 1 shows that as a result the CPU cycles needed for PCL conversion reduced by 66% to less than the cycles needed for image loading by OpenCV. • The two improvements documented in tests 3 and 4 were finally tested in the same way as in the first test of this series, i.e. with rendering switched on again. With image loading replaced by video loading and conversion optimized the total cycle usage of these two 2fps n/a n/a n/a 5fps
  • 4. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 66 components now consume less than 10% of processor cycles while rendering now takes 72%. Importantly, the overall frame rate increased to 5 frames per second. 3.2. Rendering Speed Improvements Since rendering was now the new bottleneck, steps were taken to improve its performance. By default, rendering for the PCL is done by The Visualization Toolkit (VTK) which is an open source library for 3D computer graphics and image processing. This was replaced with a shader (i.e. graphics processor) based OpenGL rendering implementation for desktop PCs. The basic data structure inside the PCL is the point cloud. This is an assembly of sub-fields. The main ones are ‘width’, ‘height’ and ‘points’. ‘Points’ is a vector that stores points of PointT type which in turn can be PointXYZ, PointRGB, PointRGBA (and some other basic types). Under the existing PCL data structure non-colored point clouds of type PointXYZ could be rendered with our new OpenGL implementation but not colored ones. To enable this several changes were made to the PCL: • A fourth float value was added to the point cloud type union. This was easy to do since the union already had memory allocated for four float values but only x, y and z floats were declared. The forth parameter added now stores the color value to be passed to the OpenGL shaders. • To store the color values the three constituent independent integer values were bit-shifted into a single float which was then stored as the fourth value of the above union. This was done to avoid integer calculations having to be performed in the shaders while at the same time having minimal impact on the PCL. • However, OpenGL shaders do not support bit shifting. The color values were therefore extracted in the shader by manipulating the known structure (8 bits for each of the channel). In the vertex shader the floor() method was used to extract each color channel separately as the return value is an integer. The result of the above manipulations are shown in FIGURE 2. The two bars labelled ‘VTK’ are unchanged re-runs of the first and fifth group tests from the previous section (see FIGURE 1). When in the first test VTK is replaced by OpenGL the frame rate increases by a modest 50% to 3 fps. When, however, this is done in the optimized system produced in the previous section the speed improvement is considerable: 38 fps. In this final system where all three components are optimized, OpenGL rendering uses only 8.5% of the processor cycles while before VTK used up 72%. FIGURE 2: 1st and 5th re-tests (using the standard VTK renderer) compared to new OpenGL renderer. The first 2 bars represent performance of the non-optimised PCL code and the 3rd and 4th bar the optimised PCL/OpenCV code. 2fps 3fps 5fps 38fps
  • 5. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 67 3.3. Voxel Grid Downsample Filter Improvements After the point cloud has been produced further processing is usually required, e.g. for data reduction and filtering operations. A relatively low resolution point cloud of 640 x 480 (e.g. produced by the Kinect) results in 307,200 points. While for some operations (e.g. thresholding) point processing follows an O(n) notation a more complex algorithm (e.g. k nearest neighbor filtering) becomes O(nk). This can place a heavy workload on the processor. One of the methods frequently used to lower the amount of points in a point cloud and unnecessary complexity while retaining detail and information is voxel grid down sampling. The down sampling is performed using an octree to sub-divide the point cloud into multiple cube shaped regions (voxels). After processing, all points in the voxel are reduced to a single one. This results in a point cloud that is smaller in size and complexity but is still precise enough to work with and has a smaller cost in terms of CPU performance. The PCL has a dedicated method for this called voxelGrid.filter(). For testing the leaf size values of the filter were 0.03f, 0.03f, 0.03f (3x3x3cm). Three groups of tests were performed as shown in FIGURE 3. FIGURE 3: Test figures for CPU usage of voxel grid. Test 1 shows the stock code, test 2 shows results with Quicksort algorithm implemented and test 3 shows overall optimised voxel grid performance. • In the first group of tests voxel filtering was added to the optimized processing chain developed in the previous two sections A and B. Voxel grid computation proved to be very CPU intensive with overall CPU cycle usage of 98%. This also resulted in a poor frame rate of under 0.1 fps (8.6 seconds per frame). Analysis of the filter code revealed that 30% of the processing was spent on sorting the points using a standard C++ library vector sort method. • The second group of tests was therefore performed with the sort method replaced by a Quicksort algorithm [16]. This algorithm takes on average O(n log n) steps to sort n points, but in the worst case scenario when a chosen pivot value is the smallest or largest of the points to sort the algorithm has to make O(n 2 ) comparisons. To avoid this possible issue a mean value is computed before the sorting to avoid using very small or very large values as the pivot. Compared to the standard C++ sort with 30% of processor cycles used, Quicksort was significantly more efficient, using only 0.9%. This unfortunately improved the overall filter method by only 5.2% as the computation shifted to different parts of the algorithm, mostly to vector access overheads. • For the third test group vector access was therefore optimized by replacing vector pushback calls with pointer accesses and improving the centroid finding which together took up 65% of the processing. These changes reduced the voxel filter computation time by 26% to an overall contribution of that in total using only 72% of CPU cycles. The combined changes to the sorting and vector processes increased the frame rate 91-fold to an average frame rate of about 10 fps. 0.1fps 0.1fps 10fps
  • 6. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 68 3.4. Pass Through Filter Improvements Another PCL provided post-processing method is passthrough.filter() which is as a means to allow the removal of points from the cloud which are not within a specified range. This allows the point cloud to be adjusted in any coordinate direction similar to a frustum cut-off. The passthrough.filter() method accepts parameters for upper and lower limits and a direction along the x, y or z axis. For the Tsukuba dataset the depth range values of 3 and 12 were used for testing in the z coordinate direction. Two groups of tests were performed with results shown in FIGURE 4. FIGURE 4: Test figures for CPU usage of pass through filter. The first and second test showing the stock code performance and second improved code performance respectively. • In the first test the pass through filter was appended to the optimized processing chain outlined previously in sections A and B. The filter was very CPU intensive using 93.6% of cycles bringing down the frame rate to 3 fps. Analysis of the code showed that (as before with voxel filtering) vector accesses were inefficient. • After vector access optimization along the lines outlined before with voxel filtering and improving the non-finite entries check (54%) as well and field value memory copy calls (24%) the pass through filter now only consumes 41% of CPU cycles with the frame rate rising to 18 fps. 3.5. Combined Pre-processing Chain The PCL modules analyzed above when combined create the main pre-processing chain of the point cloud manipulation. The order in which these algorithms are applied makes a substantial performance difference. Running the voxel filter first proved to be the slower combination as the down sampling had to be performed on the whole point cloud, in this case 307,200 points. Looking at FIGURE 5 it can be seen that voxel grid computation is the most CPU intensive task taking up 92% of all processing. Pass through filtering only took up 2% of CPU cycles and organized PCL conversion 4%. An optimized version saw a more balanced use of the processing with voxel grid processing lowered to 68% and pass through filtering at 12%. Organized conversion rose to 10% and OpenCV’s contribution increased to 7% from 0.2% previously. 3fps 18fps
  • 7. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 69 FIGURE 5: CPU usage shown for pre-processing applying voxel grid filter computations before pass through filtering. When the pass through filter was applied first the performance changed by a great margin. As shown in FIGURE 6 it can be seen that the voxel grid process is still the most CPU intensive part but has improved over the previous order. It was found to only use 65% of processing steps instead of 92%. This led to CPU cycles being distributed more evenly between the pass through filter (13%) and organized PCL conversion (20%). The optimized version of the modules exhibits the most even distribution of processing with voxel grid contribution lowered to 21% and pass through filtering taking up 33% of CPU cycles. Organized conversion used up 24% and OpenCV 18% respectively. FIGURE 6: CPU usage shown for pre-processing applying pass through filtering before voxel grid computations. The order of code execution has led to a significant change in performance (see FIGURE 7). When the voxel grid was processed before the pass through filter the stock code was not able to render more than 0.1 fps, i.e. it took around 9.1 seconds to render a single frame. This order when used with the optimized code has shown a significant improvement as the frame rate increased to 3fps, i.e. it only took 98 milliseconds on average to render a single frame, making it on average up to 93 times faster. Similar results were seen in the reverse arrangement. The stock code with pass through filtering being applied first was able to render 0.4 fps (2.5 seconds per frame) which is a four times better performance. The biggest change was seen in the overall optimised code frame rate which on average was 25 fps making it close to real time performance 0.1fps 9fps 0.4fps 25fps
  • 8. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 70 as it only took 37 miliseconds to render a frame. Overall this is a 69 times better performance compared to the original unaltered stock code. FIGURE 7: Frame rates shown of stock and optimised modules in different execution orders. To further support and test the findings additional testing was performed on a wide range of devices which included embedded systems such as Raspberry Pi 1 and 2, tablets, laptops and powerful rendering machines. In total eighteen different machines were used to perform a comparative evaluation between the stock and optimised code, of which some ran a Linux operating system to give a full spectrum of hardware and software combinations. These results show that optimised code was able to increase the performance for every single machine tested. The embedded systems saw the smallest increase due to their lack of power on the ARM based processor, but still saw four times better performance with optimise code compared to stock. As the power of machines increased so did the optimised code performance while stock stayed almost level. FIGURE 8: Comparative evaluation test results between stock and optimised code on eighteen different machines sorted from least powerful(left) to most powerful(right).
  • 9. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 71 4. CONCLUSIONS Since the PCL is a general purpose and multi-platform library many of its internal aspects are generalized, not all parts are optimized and performance can suffer on time sensitive processing. As shown in section 3 optimized PCL modules provide significant performance gains over the stock modules. When neglecting the minimal cost of performance testing measurement overheads speed increased 2.4 times for the organized PCL conversion, 91 times for voxel grid filtering and 7.8 times for pass through filtering. As seen in section 3.5 this allows for the use of multiple PCL modules together while still maintaining near real-time frame rates giving an average of 69 times improved performance for the pre-processing of the point clouds. It is important to note that the optimized code is still generalized, not specific to a particular platform and backwards compatible with existing stock code. The optimized modules in this paper have not been changed since libraries release 2011 showing the need for the update and improvement. The point cloud pre-processing optimizations are important for various point cloud tasks such as registration, object recognition and segmentation. Part of these improvements are already being implemented to the library project by the community. 5. FUTURE WORK Future plans focus on working with PCL developer community, and to contribute optimized algorithms to the official PCL code repository. Another part of research has already been started to allow the PCL to be used with embedded devices to perform real time point cloud processing. 6. REFERENCES [1] S. Ruwen, W. Roland and R. Klei, "Efficient RANSAC for Point-Cloud Shape Detection," Computer Graphics Forum, vol. 26, no. 2, p. 214–226, 2007. [2] S. C. Rusu Radu Bogdan, "3d is here: Point cloud library (pcl)," in Robotics and Automation (ICRA), 2011 IEEE International Conference, Shanghai, 2011. [3] C. Sun, "A Fast Stereo Matching Method," in Digital Image Computing: Techniques and Applications, Auckland, 1997. [4] S. Izadi, D. Kim and O. Hiliges, "Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera," in 24th annual ACM Symposium on User Interface Software and Technology, New York, NY, 2011. [5] D. Lanman, D. Crispell and G. Taubin, "Surround Structured Lighting for Full Object Scanning," in Sixth International Conference on 3-D Digital Imaging and Modeling, Montreal, Aug. 2007. [6] A. Zhang, S. Hu, Y. Chen, H. Liu, F. Yang and J. Liu, "Fast Continuous 360 Degree Color 3D Laser Scanner," in The Internal Archives of the Photogrammetry, Remote Sensing and Spatial Information sciences, Volume XXXVII, Beijing, 2008. [7] Microsoft, "Kinect for Windows," Microsoft, [Online]. Available: https://www.microsoft.com/en- us/kinectforwindows/develop/. [Accessed 2 June 2015]. [8] I. Budak, D. Vukelić, D. Bračun, J. Hodolič and M. Sokovi, "Pre-Processing of Point-Data from Contact and Optical 3D Digitization Sensors," Sensors, vol. 12, no. 1, pp. 1100-1126, 2013. [9] X. Zhang, C. K. Sun, C. Wang and S. Ye, "Study on Preprocessing Methods for Color 3D Point Cloud," Materials Science Forum, Vols. 471-472, pp. 716-721 , 2004.
  • 10. Marius Miknis, Ross Davies, Peter Plassmann & Andrew Ware International Journal of Image Processing (IJIP), Volume (10) : Issue (2) : 2016 72 [10] Bentley Systems, "Bentley Pointools V8i," Bentley Systems, [Online]. Available: http://www.bentley.com/en-US/Promo/Pointools/pointools.htm. [Accessed 16 June 2015]. [11] Mirage-Technologies, "Home: PointCloudViz," Mirage-Technologies, [Online]. Available: http://www.pointcloudviz.com/. [Accessed 16 June 2015]. [12] Faro, "Home: PointSense," Faro, [Online]. Available: http://faro-3d- software.com/CAD/Products/PointSense/index.php. [Accessed 16 June 2015]. [13] E. K. Stathopoulou, J. L. Lerma and A. Georgopoulos, "Geometric documentation of the almoina door of the cathedral of Valencia.," in Proceedings of EuroMed2010 3rd International Conference dedicated on Digital Heritage, Cyprus, 2010. [14] S. Martull, M. Peris and K. Fukui, "Realistic CG stereo image dataset with ground truth disparity maps," Trak-Mark, 2012. [15] Point Cloud Library, "Point Cloud Library (PCL) Developers mailing list," Naddle, [Online]. Available: http://www.pcl-developers.org/. [Accessed July 2015]. [16] C. A. R. Hoare, "Quicksort," The Computer Journal, pp. 10-16 , 1962. [17] Willow Garage, "Software: ROS," Willow Garage, 3 June 2015. [Online]. Available: https://www.willowgarage.com/pages/software/ros-platform. [18] Itseez, "Home page: OpenCV," Itseez, [Online]. Available: http://opencv.org/. [Accessed 15 January 2015]. [19] Kitware, "Home: VTK," Kitware, [Online]. Available: http://www.vtk.org/. [Accessed 15 June 2015]. [20] GiHub, "Point Cloud Library Repository," [Online]. Available: https://github.com/PointCloudLibrary/pcl. [Accessed 23 June 2015].