1 Introduction

With the increasing demand for autonomous vehicles and autonomy in general, motion planning has become a prominent area of research. The goal of motion planning is to find an optimal, collision-free path from a starting state to a goal state, while satisfying a set of kinematic and dynamic constraints. Motion planners can operate locally or globally. Global motion planners provide a path to any given goal state, while local motion planners arrange maneuvers to satisfy the motion constraints between states along the path. Local motion planning can be approached through sampling, optimization, and novel machine learning methods. In this paper, we propose a data-driven, deep learning-assisted local predictive motion planner for mobile robots, referred to as LPM-Net.

The sampling-based methods randomly iterate in the state space tree and assemble a control sequence for the autonomous robot (AR). Non-holonomic rapidly exploring random trees (RRT) [1, 2] or stable sparse RRT (SST) [3] set off randomly from a base node and take a long time to identify a viable path. More optimized versions (RRT\(^*\)) take parallel approaches [4, 5] or incorporate linear quadratic regulation (LQR) algorithms [6] to compensate for this method’s shortcomings such as being trapped in a local minima or oscillation around an obstacle [7]. RRT\(^*\) algorithms are amongst the most popular methods in this category that quickly span the state space sparsely and hence lack completeness [8].

The optimization methods define the motion planning problem as a concise cost function that gets minimized using an optimization algorithm. The most popular approaches include LQR [9] and model predictive controller (MPC) [10, 11]. Optimization methods possess complex arithmetic calculations in each iteration which burdens the processors heavily. The term predictive refers to the motion planners that can forecast the imminent state of their environment based on existing models and take the results into consideration as well. The prognostic advantage of these methods reduces hastily decision-making and averts patchy trajectories [10]. However, they exhibit high computational complexity which often exceeds the capacity of smaller computers, making them less attainable. This problem embarked on a series of investigations to address the common techniques through the aid of machine learning that is surveyed in [12, 13] seeking less computationally expensive predictive motion planning algorithms [14, 15].

Machine learning has taken part in motion planning through reinforcement and imitation learning. For example, in [16] the performance of a conventional RRT global planner was enhanced with the aid of reinforcement learning (RL). The pathfinding ability of an RL agent was demonstrated in a simulation in [17] and its robustness and stability in unknown environments were further enhanced in [18]. The RL approach was taken by collecting comprehensive and descriptive sensory data to practice and learn in the environments. These methods work based on cost function approximation [19] such as the Q-learning method [20], gradient strategy [21], or actor-critic hierarchical RL [22] which utilizes a combination of both [23]. The RL-RRT technique was employed for a variety of applications such as robot manipulation in obstructed environments [24] and for autonomous car parking [25]. The imitation learning approaches merge supervised learning with motion planning and resemble the desired behaviors of the conventional methods with ideal computational efficiency [26]. For instance, in [27], the inputs and outputs of an RRT algorithm were mapped into a latent space which was then sampled for training a new imitating model. This technique was then combined with other sampling-based motion planning (SMP) algorithms for worst-case theoretical guarantees [28].

Throughout the following, the problem is discussed comprehensively in Sect. 2. Then, in Sect. 3, a role model motion planner known as the paradigm model is constructed to monitor its behavior. Next, the paradigm model is simulated in a virtual world and descriptive data is collected in Sect. 4. In Sect. 5 the working principles and the architecture of the proposed motion planner are elaborated and its performance is evaluated in Sect. 6. Lastly, the paper is concluded in Sect. 7.

2 Problem Formulation

As introduced in Sect. 1, most of the current works use numeric calculations to predict the state of the autonomous robot (AR) and suggest paths upon which the value of a predefined cost function \(J\) is minimized throughout a prediction horizon. In practice, however, this approach poses some practical drawbacks for incorporating complex real-time arithmetic computations often to be implemented locally on an AR’s hardware. AR units span a wide range of sizes and scales; from palm-size drones [29, 30] to medium-size wheeled robots, they are mostly equipped with small low-power computers. As a result, AR computers often fail to steer with a desirable throughput which ultimately affects the quality of maneuvers leading to frequent braking and sharp steering.

Predictive motion planners rely on a predefined cost function to make optimal decisions based on specific criteria. These cost functions incorporate various parameters that determine the suitability of a solution. However, the interpretation and significance of these parameters are subjective and can vary depending on the specific use case. Therefore, it is necessary to customize the cost function for each scenario. For example, routing an autonomous race car requires different considerations than routing a fuel lorry, as their priorities regarding speed and safety may differ. Furthermore, despite using similar hardware and payload characteristics, a range of optimization algorithms are employed to minimize the cost function. Consequently, this diversity of approaches can hinder the development of a consistent solution that delivers desired performance across different problem domains.

To overcome the regarded drawbacks of the conventional approaches, in this work, a deep learning assisted local predictive motion planner (LPM-Net) is proposed that is both less computationally expensive on certain hardware and also is flexible enough to be utilized for a wide span of applications. In opposition to the conventional methods that have to find a path by running complex optimization algorithms, LPM-Net uses a deep neural network (DNN) which once trained using expert data, predicts feasible paths that satisfy the given kinodynamic constraints. Unlike the conventional motion planners that run on a CPU, LPM-Net offers the advantage of being compatible with DNN-friendly hardware such as GPU and TPU enabled system-on-chip (SoC) which can dramatically boost performance when implemented locally on an AR’s hardware.

LPM-Net takes the lidar observations as a 2D cost map which gets mapped into the latent space through a consecutive CNN-RNN network. The global planner’s path and the robot heading get mapped concurrently into the latent space through RNN networks. The obtained latent vectors then get fed into a dense network which results in the mobile robot’s command input. LPM-Net works in conjunction with a global path planner to obtain an overall road map toward the goal and a potential field (PF) cost map (Fig. 1)

Fig. 1
figure 1

Potential field in action. The arrow on the right image shows the AR’s position in a pentagon-shaped world in the simulation seen from above. The left side image is the PF’s cost map showing the AR’s perception of the surrounding walls and obstacles

to repel stationary and moving obstacles. LPM-Net takes the beginning fraction of the global planner’s path, \({\psi _k}\subset \Psi \) and a sequence of most-recent PF’s cost maps, \({\gamma _k}\subset \Gamma \) as inputs to predict the robot’s maneuvers.

In order to train an LPM-Net, expert knowledge is required from logging the behavior of an existing paradigm motion planner which will be discussed thoroughly in Sect. 3.

3 LPM-Net Paradigm Model

LPM-Net is a data-driven motion planning approach; therefore, an adequate set of knowledge data is required to train it. The existing conventional predictive motion planners can perfectly serve this purpose. The paradigm model can be obtained in computer simulation while operating at a lower speed to justify the computational payload of the conventional motion planners and the computer’s performance limitation. In the following, a model predictive controller (MPC) motion planner is utilized as a global planner in the paradigm model. The objective of the MPC planner is to find a collision-free trajectory as a sequence of system inputs \([u]_{t=0}^T \mapsto U\) at each time step to be conducted to the AR. Furthermore, a PF scheme is implemented to measure the proximity of the surrounding obstacles. The following explains the utilized model predictive controller and potential field.

Fig. 2
figure 2

Three different profiles for obstacle avoidance strictness of the potential field cost map

3.1 The Model Predictive Controller

Let \(C\) denote the configuration space (C-space) of an AR and \(C_{free} = C/C_{obs}\) be the collision-free portion of it. Let \(\lambda (c)\in \Lambda \) be the state of the AR in \(C\) where \(c \in C\). The system’s dynamics can be formulated as \(\lambda _{k+1} = f(\lambda _k, u_k)\) where \(u_k \in U\) is a feasible input for the system. Equation Eq. 1 is a MPC quadratic cost function penalizing the error from reference path \(\delta \) and control commands u over a prediction horizon \(N_p\):

$$\begin{aligned} \begin{aligned} J(k|t) = \sum _{k=1}^{N_p}{\biggl (\delta (k|t)^TQ\delta (k|t)+ u^T(k|t)Ru(k|t)+\Phi (k|t)\biggr )} \end{aligned} \end{aligned}$$
(1)

where:

$$\begin{aligned} \delta (k|t) = \psi (k|t) - \lambda (k|t) \end{aligned}$$
(2)

\(\lambda \) and \(\psi \) are the actual and the global path’s desired states, respectively. \(\Phi \) is the cost returned from the PF, \(Q\) and \(R\) are weighing variables. The optimum value for input \(u_k\) will then be:

$$\begin{aligned} u(k|t) = \mathop {\textrm{argmin}}_{u_k}\;J(k|t) \end{aligned}$$
(3)

In this particular paradigm, a basic cost function is considered to evaluate the system’s performance. Alternatively, other parameters can also be incorporated into the cost function to satisfy subject-specific constraints.

3.2 The Potential Field

As Eq. 1 suggests, the PF’s output, \(\Phi (k|t)\) is incorporated into the cost function to penalize obstacles proximity allowing the AR to safely avoid collisions. The obstacles’ contribution to the cost is calculated through a Gaussian function (\(\Phi \sim N(\mu _\Phi , \sigma _\Phi ^2)\)) with the robot located at the mean. The cost caused by obstacles is an exponential function of their distances to the robot and can be obtained as follows:

$$\begin{aligned} \Phi (x) = \sum _{n} \frac{1}{\sqrt{2\pi \sigma ^2}} e^{-\frac{(\zeta _n - \mu )^2}{2\sigma ^2}} \end{aligned}$$
(4)

were \(\zeta _n\) are the distances to the surrounding obstacles obtained from the point cloud Z returned by the lidar.

The distribution can be modified such that it satisfies any particular application’s constraints in terms of obstacle avoidance strictness by varying its standard deviation, \(\sigma _\Phi \) (Fig. 2). Algorithm 1 describes the PF implementation.

Algorithm 1
figure a

Potential Field Costmap Calculation

4 Simulation and Dataset Collection

In this article, the robot operating system (ROS) is selected as the infrastructure for the simulations. In the following, the paradigm is implemented on a differential mobile robot in the Gazebo simulator environment. For our mobile robot, we chose TurtleBot’s simulation model due to its high popularity among researchers. The dataset collected from this experience will be used to train an LPM-Net to be replaced with the paradigm on this robot. Let \(\lambda _k = [x_k, y_k]^T\) denote the state of the AR in \(C\). The robot’s dynamics can be expressed as:

$$\begin{aligned} \lambda _{k+1}=f(\lambda _k,u_k)= \begin{bmatrix} x_k+\dot{x}T\\ y_k+\dot{y}T \end{bmatrix} \end{aligned}$$
(5)

where \(u_k = [\dot{x}, \dot{y}] \) is the paradigm planner’s navigation directions and \(T\), is the sampling time such that \(t=kT\) yields the actual passing time. The robot’s linear and angular velocity can be obtained as follows:

$$\begin{aligned} \begin{bmatrix} \upsilon \\ \omega \end{bmatrix} = \begin{bmatrix} \sqrt{\dot{x}^2+\dot{y}^2}\;\\ \\ \frac{\partial \theta }{\partial t} \end{bmatrix} \end{aligned}$$
(6)

where:

$$\begin{aligned} \frac{\partial \theta }{\partial t} = \frac{\partial {atan2(x,y)}}{\partial {t}}= \frac{\dot{x}\ddot{y}-\dot{y}\ddot{x}}{\dot{x}^2+\dot{y}^2} \end{aligned}$$
(7)

Then, the angular velocity command deployed to each motor will be:

$$\begin{aligned} \begin{bmatrix} \omega _{R}\\ \omega _{L} \end{bmatrix} =\frac{1}{R} \begin{bmatrix} \upsilon +\frac{\omega W}{2}\\ \\ \upsilon -\frac{\omega W}{2} \end{bmatrix} \end{aligned}$$
(8)

where R and W are the radius of the wheels and the width of the wheelbase respectively.

Combining Eqs. 68, the angular velocity of each motor can be directly obtained as a function of the system input \(u_k\):

$$\begin{aligned} \begin{bmatrix} \omega _{R}\\ \omega _{L} \end{bmatrix} =\xi (u_k) \end{aligned}$$
(9)

To avoid learning bias, the simulation scenarios are chosen to be as generic as possible. During the simulation, the mobile robot gets deployed to a new goal position within the free space, \(\lambda _{goal} \in C_{free}\) every 5 s. The global planner will then construct a path, \(\Psi =[\psi (k|t)]_{k=0}^N\) toward the goal each time such that the first \(N_p\) elements, \([\psi (k|t)]_{k=0}^{N_p}\) will be traced by the MPC local planner (\(N_p\) being the prediction horizon).

At each time step of a simulation, a data recorder probes four parameters:

  1. 1.

    \(\psi \): The beginning fraction of the global plan, \(\Psi _\nu =[\psi (k|t)]_{k=0}^\nu \subset \mathbb {R}^2\) where \(\nu \) \((\nu \le N_p<N)\) is the look-ahead horizon; a length of the global plan that LPM-Net takes into account.

  2. 2.

    \(\eta \): A sequence of the recently stored heading angle of the AR, \(H =[\eta (k|t)]_{k=0}^\pi \subset \mathbb {R}^2\) where \(\pi \) is the duration at which \(\eta \) can be recalled. It reports the angle between the local and the global coordinate systems at each time step. The angle is reported in Quaternion form \(\hat{\eta }_k = \{q_0,q_1,q_2,q_3\}\). Since the AR operates on a 2-D plane, the first two elements of \(\hat{\eta }\) can be overlooked, therefore:

    $$\begin{aligned} H=\{\;\eta _k|k\in [0,\pi ]\;,\;\eta _k=[ \hat{\eta }_{k_2},\hat{\eta }_{k_3}]\;\} \end{aligned}$$
    (10)
  3. 3.

    \(\gamma \): A sequence of recently stored 2-D cost maps obtained from PF, \(\Gamma =[\gamma (k|t)]_{k=0}^\rho \subset \mathbb {R}^2\) where \(\rho \) is the duration at which \(\Gamma \) can be recalled.

  4. 4.

    \(u\): The output of the paradigm motion planner.

Note that \(\nu \), \(\pi \), and \(\rho \) are three hyperparameters to adjust buffers memory length.

Algorithm 2 describes the dataset collection procedure.

Algorithm 2
figure b

Dataset Collection

To collect training data, the paradigm planner algorithm is executed in the simulation environment as follows. The user is allowed by the simulation environment to select a target location from a free and reachable space on the map. After the target is selected, it is pursued by the robot. During this process, the required information is recorded using the data collection tool in the simulator. Additionally, to ensure continuous activity and engagement of the paradigm planner, a new target location within the permissible range is automatically and randomly assigned to the simulation each time the robot approaches the target location, by a background program that is running continuously.

Since the paradigm planner imposes a relatively high computational load, its performance speed may be reduced during the simulation. However, this delay will not affect the quality of the data because there is no time reference stored in the data. An important point in the data collection process is to ensure the coherence of the data so that, later on, when referring to any of the database entries, all the recorded information, including speed and direction at that moment, can be accurately associated. Otherwise, the network will not be able to make specific inferences from the data.

The simulation is conducted for 6 h under the mentioned conditions, with a sampling frequency of 5 Hz (5 samples per second), in four separate virtual simulation spaces. The simulated spaces are specifically designed for training purposes and do not represent a particular real-world location.

5 Network Architecture

In this section, the underlying architecture of the LPM-Net is presented. As Fig. 3

Fig. 3
figure 3

LPM-Net architecture with three inputs and one output. It takes the lidar observations as a 2D cost map which gets mapped into the latent space through a consecutive CNN-RNN network. The global planner’s path and the robot heading get mapped concurrently into the latent space through RNN networks. The obtained latent vectors then get fed into a dense network which results in the mobile robot’s command input

depicts, it primarily consists of an obstacle repulsion unit (ORU) and a path pursuing unit (PPU) which comprises two convolutions and dense neural encoders that map the workspace observations into a latent vector \(Z\). A recurrent network comprising a suit of long short-term memory (LSTM) units interprets the rate at which \(Z\) has progressed or changed over a short period of elapsed time [31]. And Lastly, a dense decoder translates these interpretations into steering commands for the AR. This network is trained against the collected dataset from Sect. 4 and the evaluation accuracy of 79% was obtained over 150 epochs. The following sections elaborate on each section of the network comprehensively.

5.1 Obstacle Repulsion Unit

The LPM-Net notices the presence of an obstacle using PF cost maps. To avoid potential collisions, the ORU infers each frame of 2-D cost maps by passing them through one convolutional neural network (CNN) which after a dropout results in a latent vector \(Z_\gamma \). Regardless of the cost map dimensions, ORU remaps the cost map into a fixed-size input, \(W_\gamma \times W_\gamma \times 1\) and passes it through a pre-adapted normalizer layer that maps the input data between 0 and 1. \(W_\gamma \) value is a determinant for precision and speed trade-off. \(W_\gamma \) is an adjustable hyperparameter and \(W_\gamma = 20\) has shown to be an optimum value experimentally and is set by default.

When the ORU network is trained, any given neuron in the network corresponds to an abstract notion of the input cost map. It must be considered that each pixel of a cost map represents the probability of an obstacle being present in that particular corresponding area only; meaning that the absence of an obstacle in one area must not affect the probability of an adjacent obstacle’s presence. To take this into account, rectified linear unit (ReLU) activation function is used in ORU. As Eq. 11 shows, ReLU returns 0 for inputs \(x\in \mathbb {R}^-\), representing the absence of an obstacle, allowing the induced values by the neurons not to contribute negatively and hence not affecting the rest of the network.

$$\begin{aligned} f_{ReLU}(x) = \max (0,x) \end{aligned}$$
(11)

5.2 Path Pursuing Unit

Let \(\psi \) denote the beginning \(\nu \) elements of global planner’s path \(\Psi \), comprising state points, \(\lambda _k=[x_k,y_k]\). Let \(\lambda _O\) denote the current AR odometry state. In order to track \(\Psi \), PPU subtracts \(\lambda _O\) from \(\psi \) to obtain absolute steering vectors, \(\psi _s\), independent of AR’s position. \(\psi _s\) can be defined as the following:

$$\begin{aligned} \psi _s=\{\;\lambda _k\;|\;k\in [0,\nu ]\;,\;\lambda _k=\psi _k-\lambda _O\;\} \end{aligned}$$
(12)

\(\psi _s\) has variable dimensions of \(2\times \nu _s\) where \(\nu _s\le \nu \), in case the AR is so close to the goal point that the remaining length of \(\Psi \) is less than \(\nu \). The tracking path \(\psi _s\) and the heading angle \(\eta \) are passed through pre-adapted normalizers resulting in latent vector \(Z_\psi \). Unlike ORU, non-linear activation functions suit best for PPU for two reasons. First, since PPU is a relatively shallow network, it is immune from gradient vanishing phenomena that often occur in the learning process of non-linear deep networks [32]. And second, linear activation functions are agnostic of the non-linear behavior of \(\psi \). To select an activation function, PPU is isolated and tested alone. Iterative experiments showed that hyperbolic tangent results in the best performance.

5.3 Recurrent Interpreter

At the second stage of feature extraction, two recurrent neural networks interpret the time relationship between the elements of \(Z_\psi \) and \(Z_\gamma \). This allows the LPM-Net to comprehend the curvature profile of the global planner’s path, \(\Psi \), the AR’s rotation speed, and the conversions in a dynamic environment; i.e. whether the AR and the obstacles are moving towards or away from each other. The number of LSTM blocks in each recurrent network corresponds with the length of each input feature vector. A dropout layer is implemented in between the out-most dense layers to prevent overfitting. Ultimately, the outputs of the recurrent networks get augmented to construct a new latent vector, \(\hat{Z}\) that gets propagated to a dense topping network to get converted to AR’s command inputs, \(u_k\). Algorithm 3 demonstrates the pseudo-code for LPM-Net implementation.

Algorithm 3
figure c

LPM-Net

6 Performance Evaluation

In this section, we aim to investigate the performance of the LPM-Net motion planner in a simulated environment that represents a real-world setting. Figure 4 depicts the selected map of a house environment. This environment has not been used for training the planner previously, and it is the first time the planner encounters it. The performance of the LPM-Net has been examined from various aspects, and the results for each aspect are presented separately.

Fig. 4
figure 4

Top view of a simulated environment of a house in Gazebo

6.1 Travel Duration

To assess the planner’s success in guiding the robot to its destination, similar to the methodology described in Sect. 4, we randomly select a target point in the simulated environment and allow the robot to navigate to that location. Although the selected target points can vary in proximity to the robot’s initial position, which affects the travel time, repeating this experiment over a long duration enables us to measure the planner’s average performance. The box plot in Fig. 5 illustrates the statistical distribution of the time elapsed from generating a new target point to the robot reaching that location. Outliers at the upper end of the plot often indicate instances where the robot failed to find the path to the goal and, after a specified time, a new destination was chosen. Additionally, this figure demonstrates the results for both the paradigm planner and the LPM-Net under the same conditions.

Fig. 5
figure 5

Comparison of average time spent to reach the destination

6.2 CPU Payload

Now that we know the LPM-Net is capable of successfully guiding the robot to a desired destination within a limited time and an unfamiliar environment,Footnote 1 we proceed to investigate its CPU payload. All simulations were conducted on an Intel 7th generation processor with x64 architecture. The CPU payload was measured as a percentage of the total processing power using Linux hardware resource monitoring tools throughout the simulation. Similarly, the CPU payload of the paradigm planner was measured under the same conditions, and the comparison between the two planners is shown in Fig. 6.

Fig. 6
figure 6

Comparison of processing power between the two planning algorithms: pattern-based planner and imitation-based planner

6.3 Motion Commands

Another important aspect to examine when evaluating a motion planner is the range of output commands it generates. These commands should be within the robot’s dynamic and kinematic constraints. Assuming this condition holds for the paradigm planner, we need to demonstrate that the output commands of the LPM-Net also fall within the same range. The plot in Fig. 7 illustrates that over a long simulation duration, the output of both planners remains consistent for all vector output parameters. Furthermore, Fig. 8 compares the angular acceleration of each wheel of the robot under the guidance of the paradigm planner and the LPM-Net, showing a similar distribution as expected.

Fig. 7
figure 7

Comparison of output commands (wheel speed commands) between the paradigm planner and the LPM-Net

Fig. 8
figure 8

Comparison of wheel accelerations under the guidance of the paradigm planner and the LPM-Net

6.4 Comparison Between the Paradigm Planner and the LPM-Net

The experimental results conducted in Sect. 6 indicate that the LPM-Net was able to demonstrate an stable performance while consuming less half processing power compared to the paradigm planner. The LPM-Net achieved a reduction of approximately 50% in CPU payload relative to the paradigm planner, albeit with a slight decrease in performance speed. This implies that the LPM-Net can be a suitable alternative for many problems in which processing efficiency is a more critical constraint than execution speed.

7 Conclusion

In this paper, we introduced LPM-Net, a novel data-driven, supervised-learning-based local predictive planner designed for autonomous robotics navigation. Utilizing environmental observations, LPM-Net efficiently predicts steering commands while adhering to stringent kinodynamic constraints. Its architecture stands out by offering significant adaptability to diverse 2-D navigation challenges and demonstrating enhanced computational efficiency on hardware platforms.

Our comparative analysis revealed that although LPM-Net occasionally sacrifices the near-optimal solution quality typically delivered by conventional predictive models, it compensates by reducing CPU usage by up to 50%. This makes it not only sustainable but also preferable for scenarios where power efficiency is critical. Looking ahead, we plan to further refine LPM-Net by integrating transfer learning techniques, which will broaden its applicability across varied operational contexts and enhance its learning efficiency. Additionally, we are aiming to extend our model’s utility to 3-D motion planning, which is pivotal for complex manipulations in robotic arms, thereby pushing the boundaries of its capabilities in industrial and service robots.

These advancements promise to elevate the practical deployment of autonomous systems, potentially leading to more robust, energy-efficient solutions that could revolutionize how machines interact and navigate in dynamic environments.