Abstract
A data-driven predictive motion planner for mobile robots, referred to as LPM-Net, has been proposed in this paper. Conventional predictive motion planners are computationally expensive, often resulting in insufficient throughput on mobile robot hardware. LPM-Net is an imitation learning-assisted local predictive non-holonomic motion planner that is capable of learning from conventional motion planners regarded as paradigm models and replicating their behavior while satisfying the same kinodynamic constraints. In addition, LPM-Net is compatible with GPU and TPU hardware, allowing for faster and more efficient processing. LPM-Net uses convolutional and recurrent long short-term memory deep neural networks to predict steering commands. This has improved computational efficiency which allows autonomous vehicles to be equipped with more cost-effective computers. In the present study, LPM-Net was tuned to mimic the behavior of a model predictive controller paradigm model. Measurements in this study demonstrate that the proposed mimic planner, LPM-Net, consumes approximately half the processing power of the conventional predictive planner, albeit with a slight increase in hesitation when reaching goals.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
With the increasing demand for autonomous vehicles and autonomy in general, motion planning has become a prominent area of research. The goal of motion planning is to find an optimal, collision-free path from a starting state to a goal state, while satisfying a set of kinematic and dynamic constraints. Motion planners can operate locally or globally. Global motion planners provide a path to any given goal state, while local motion planners arrange maneuvers to satisfy the motion constraints between states along the path. Local motion planning can be approached through sampling, optimization, and novel machine learning methods. In this paper, we propose a data-driven, deep learning-assisted local predictive motion planner for mobile robots, referred to as LPM-Net.
The sampling-based methods randomly iterate in the state space tree and assemble a control sequence for the autonomous robot (AR). Non-holonomic rapidly exploring random trees (RRT) [1, 2] or stable sparse RRT (SST) [3] set off randomly from a base node and take a long time to identify a viable path. More optimized versions (RRT\(^*\)) take parallel approaches [4, 5] or incorporate linear quadratic regulation (LQR) algorithms [6] to compensate for this method’s shortcomings such as being trapped in a local minima or oscillation around an obstacle [7]. RRT\(^*\) algorithms are amongst the most popular methods in this category that quickly span the state space sparsely and hence lack completeness [8].
The optimization methods define the motion planning problem as a concise cost function that gets minimized using an optimization algorithm. The most popular approaches include LQR [9] and model predictive controller (MPC) [10, 11]. Optimization methods possess complex arithmetic calculations in each iteration which burdens the processors heavily. The term predictive refers to the motion planners that can forecast the imminent state of their environment based on existing models and take the results into consideration as well. The prognostic advantage of these methods reduces hastily decision-making and averts patchy trajectories [10]. However, they exhibit high computational complexity which often exceeds the capacity of smaller computers, making them less attainable. This problem embarked on a series of investigations to address the common techniques through the aid of machine learning that is surveyed in [12, 13] seeking less computationally expensive predictive motion planning algorithms [14, 15].
Machine learning has taken part in motion planning through reinforcement and imitation learning. For example, in [16] the performance of a conventional RRT global planner was enhanced with the aid of reinforcement learning (RL). The pathfinding ability of an RL agent was demonstrated in a simulation in [17] and its robustness and stability in unknown environments were further enhanced in [18]. The RL approach was taken by collecting comprehensive and descriptive sensory data to practice and learn in the environments. These methods work based on cost function approximation [19] such as the Q-learning method [20], gradient strategy [21], or actor-critic hierarchical RL [22] which utilizes a combination of both [23]. The RL-RRT technique was employed for a variety of applications such as robot manipulation in obstructed environments [24] and for autonomous car parking [25]. The imitation learning approaches merge supervised learning with motion planning and resemble the desired behaviors of the conventional methods with ideal computational efficiency [26]. For instance, in [27], the inputs and outputs of an RRT algorithm were mapped into a latent space which was then sampled for training a new imitating model. This technique was then combined with other sampling-based motion planning (SMP) algorithms for worst-case theoretical guarantees [28].
Throughout the following, the problem is discussed comprehensively in Sect. 2. Then, in Sect. 3, a role model motion planner known as the paradigm model is constructed to monitor its behavior. Next, the paradigm model is simulated in a virtual world and descriptive data is collected in Sect. 4. In Sect. 5 the working principles and the architecture of the proposed motion planner are elaborated and its performance is evaluated in Sect. 6. Lastly, the paper is concluded in Sect. 7.
2 Problem Formulation
As introduced in Sect. 1, most of the current works use numeric calculations to predict the state of the autonomous robot (AR) and suggest paths upon which the value of a predefined cost function \(J\) is minimized throughout a prediction horizon. In practice, however, this approach poses some practical drawbacks for incorporating complex real-time arithmetic computations often to be implemented locally on an AR’s hardware. AR units span a wide range of sizes and scales; from palm-size drones [29, 30] to medium-size wheeled robots, they are mostly equipped with small low-power computers. As a result, AR computers often fail to steer with a desirable throughput which ultimately affects the quality of maneuvers leading to frequent braking and sharp steering.
Predictive motion planners rely on a predefined cost function to make optimal decisions based on specific criteria. These cost functions incorporate various parameters that determine the suitability of a solution. However, the interpretation and significance of these parameters are subjective and can vary depending on the specific use case. Therefore, it is necessary to customize the cost function for each scenario. For example, routing an autonomous race car requires different considerations than routing a fuel lorry, as their priorities regarding speed and safety may differ. Furthermore, despite using similar hardware and payload characteristics, a range of optimization algorithms are employed to minimize the cost function. Consequently, this diversity of approaches can hinder the development of a consistent solution that delivers desired performance across different problem domains.
To overcome the regarded drawbacks of the conventional approaches, in this work, a deep learning assisted local predictive motion planner (LPM-Net) is proposed that is both less computationally expensive on certain hardware and also is flexible enough to be utilized for a wide span of applications. In opposition to the conventional methods that have to find a path by running complex optimization algorithms, LPM-Net uses a deep neural network (DNN) which once trained using expert data, predicts feasible paths that satisfy the given kinodynamic constraints. Unlike the conventional motion planners that run on a CPU, LPM-Net offers the advantage of being compatible with DNN-friendly hardware such as GPU and TPU enabled system-on-chip (SoC) which can dramatically boost performance when implemented locally on an AR’s hardware.
LPM-Net takes the lidar observations as a 2D cost map which gets mapped into the latent space through a consecutive CNN-RNN network. The global planner’s path and the robot heading get mapped concurrently into the latent space through RNN networks. The obtained latent vectors then get fed into a dense network which results in the mobile robot’s command input. LPM-Net works in conjunction with a global path planner to obtain an overall road map toward the goal and a potential field (PF) cost map (Fig. 1)
to repel stationary and moving obstacles. LPM-Net takes the beginning fraction of the global planner’s path, \({\psi _k}\subset \Psi \) and a sequence of most-recent PF’s cost maps, \({\gamma _k}\subset \Gamma \) as inputs to predict the robot’s maneuvers.
In order to train an LPM-Net, expert knowledge is required from logging the behavior of an existing paradigm motion planner which will be discussed thoroughly in Sect. 3.
3 LPM-Net Paradigm Model
LPM-Net is a data-driven motion planning approach; therefore, an adequate set of knowledge data is required to train it. The existing conventional predictive motion planners can perfectly serve this purpose. The paradigm model can be obtained in computer simulation while operating at a lower speed to justify the computational payload of the conventional motion planners and the computer’s performance limitation. In the following, a model predictive controller (MPC) motion planner is utilized as a global planner in the paradigm model. The objective of the MPC planner is to find a collision-free trajectory as a sequence of system inputs \([u]_{t=0}^T \mapsto U\) at each time step to be conducted to the AR. Furthermore, a PF scheme is implemented to measure the proximity of the surrounding obstacles. The following explains the utilized model predictive controller and potential field.
3.1 The Model Predictive Controller
Let \(C\) denote the configuration space (C-space) of an AR and \(C_{free} = C/C_{obs}\) be the collision-free portion of it. Let \(\lambda (c)\in \Lambda \) be the state of the AR in \(C\) where \(c \in C\). The system’s dynamics can be formulated as \(\lambda _{k+1} = f(\lambda _k, u_k)\) where \(u_k \in U\) is a feasible input for the system. Equation Eq. 1 is a MPC quadratic cost function penalizing the error from reference path \(\delta \) and control commands u over a prediction horizon \(N_p\):
where:
\(\lambda \) and \(\psi \) are the actual and the global path’s desired states, respectively. \(\Phi \) is the cost returned from the PF, \(Q\) and \(R\) are weighing variables. The optimum value for input \(u_k\) will then be:
In this particular paradigm, a basic cost function is considered to evaluate the system’s performance. Alternatively, other parameters can also be incorporated into the cost function to satisfy subject-specific constraints.
3.2 The Potential Field
As Eq. 1 suggests, the PF’s output, \(\Phi (k|t)\) is incorporated into the cost function to penalize obstacles proximity allowing the AR to safely avoid collisions. The obstacles’ contribution to the cost is calculated through a Gaussian function (\(\Phi \sim N(\mu _\Phi , \sigma _\Phi ^2)\)) with the robot located at the mean. The cost caused by obstacles is an exponential function of their distances to the robot and can be obtained as follows:
were \(\zeta _n\) are the distances to the surrounding obstacles obtained from the point cloud Z returned by the lidar.
The distribution can be modified such that it satisfies any particular application’s constraints in terms of obstacle avoidance strictness by varying its standard deviation, \(\sigma _\Phi \) (Fig. 2). Algorithm 1 describes the PF implementation.
4 Simulation and Dataset Collection
In this article, the robot operating system (ROS) is selected as the infrastructure for the simulations. In the following, the paradigm is implemented on a differential mobile robot in the Gazebo simulator environment. For our mobile robot, we chose TurtleBot’s simulation model due to its high popularity among researchers. The dataset collected from this experience will be used to train an LPM-Net to be replaced with the paradigm on this robot. Let \(\lambda _k = [x_k, y_k]^T\) denote the state of the AR in \(C\). The robot’s dynamics can be expressed as:
where \(u_k = [\dot{x}, \dot{y}] \) is the paradigm planner’s navigation directions and \(T\), is the sampling time such that \(t=kT\) yields the actual passing time. The robot’s linear and angular velocity can be obtained as follows:
where:
Then, the angular velocity command deployed to each motor will be:
where R and W are the radius of the wheels and the width of the wheelbase respectively.
Combining Eqs. 6–8, the angular velocity of each motor can be directly obtained as a function of the system input \(u_k\):
To avoid learning bias, the simulation scenarios are chosen to be as generic as possible. During the simulation, the mobile robot gets deployed to a new goal position within the free space, \(\lambda _{goal} \in C_{free}\) every 5 s. The global planner will then construct a path, \(\Psi =[\psi (k|t)]_{k=0}^N\) toward the goal each time such that the first \(N_p\) elements, \([\psi (k|t)]_{k=0}^{N_p}\) will be traced by the MPC local planner (\(N_p\) being the prediction horizon).
At each time step of a simulation, a data recorder probes four parameters:
-
1.
\(\psi \): The beginning fraction of the global plan, \(\Psi _\nu =[\psi (k|t)]_{k=0}^\nu \subset \mathbb {R}^2\) where \(\nu \) \((\nu \le N_p<N)\) is the look-ahead horizon; a length of the global plan that LPM-Net takes into account.
-
2.
\(\eta \): A sequence of the recently stored heading angle of the AR, \(H =[\eta (k|t)]_{k=0}^\pi \subset \mathbb {R}^2\) where \(\pi \) is the duration at which \(\eta \) can be recalled. It reports the angle between the local and the global coordinate systems at each time step. The angle is reported in Quaternion form \(\hat{\eta }_k = \{q_0,q_1,q_2,q_3\}\). Since the AR operates on a 2-D plane, the first two elements of \(\hat{\eta }\) can be overlooked, therefore:
$$\begin{aligned} H=\{\;\eta _k|k\in [0,\pi ]\;,\;\eta _k=[ \hat{\eta }_{k_2},\hat{\eta }_{k_3}]\;\} \end{aligned}$$(10) -
3.
\(\gamma \): A sequence of recently stored 2-D cost maps obtained from PF, \(\Gamma =[\gamma (k|t)]_{k=0}^\rho \subset \mathbb {R}^2\) where \(\rho \) is the duration at which \(\Gamma \) can be recalled.
-
4.
\(u\): The output of the paradigm motion planner.
Note that \(\nu \), \(\pi \), and \(\rho \) are three hyperparameters to adjust buffers memory length.
Algorithm 2 describes the dataset collection procedure.
To collect training data, the paradigm planner algorithm is executed in the simulation environment as follows. The user is allowed by the simulation environment to select a target location from a free and reachable space on the map. After the target is selected, it is pursued by the robot. During this process, the required information is recorded using the data collection tool in the simulator. Additionally, to ensure continuous activity and engagement of the paradigm planner, a new target location within the permissible range is automatically and randomly assigned to the simulation each time the robot approaches the target location, by a background program that is running continuously.
Since the paradigm planner imposes a relatively high computational load, its performance speed may be reduced during the simulation. However, this delay will not affect the quality of the data because there is no time reference stored in the data. An important point in the data collection process is to ensure the coherence of the data so that, later on, when referring to any of the database entries, all the recorded information, including speed and direction at that moment, can be accurately associated. Otherwise, the network will not be able to make specific inferences from the data.
The simulation is conducted for 6 h under the mentioned conditions, with a sampling frequency of 5 Hz (5 samples per second), in four separate virtual simulation spaces. The simulated spaces are specifically designed for training purposes and do not represent a particular real-world location.
5 Network Architecture
In this section, the underlying architecture of the LPM-Net is presented. As Fig. 3
LPM-Net architecture with three inputs and one output. It takes the lidar observations as a 2D cost map which gets mapped into the latent space through a consecutive CNN-RNN network. The global planner’s path and the robot heading get mapped concurrently into the latent space through RNN networks. The obtained latent vectors then get fed into a dense network which results in the mobile robot’s command input
depicts, it primarily consists of an obstacle repulsion unit (ORU) and a path pursuing unit (PPU) which comprises two convolutions and dense neural encoders that map the workspace observations into a latent vector \(Z\). A recurrent network comprising a suit of long short-term memory (LSTM) units interprets the rate at which \(Z\) has progressed or changed over a short period of elapsed time [31]. And Lastly, a dense decoder translates these interpretations into steering commands for the AR. This network is trained against the collected dataset from Sect. 4 and the evaluation accuracy of 79% was obtained over 150 epochs. The following sections elaborate on each section of the network comprehensively.
5.1 Obstacle Repulsion Unit
The LPM-Net notices the presence of an obstacle using PF cost maps. To avoid potential collisions, the ORU infers each frame of 2-D cost maps by passing them through one convolutional neural network (CNN) which after a dropout results in a latent vector \(Z_\gamma \). Regardless of the cost map dimensions, ORU remaps the cost map into a fixed-size input, \(W_\gamma \times W_\gamma \times 1\) and passes it through a pre-adapted normalizer layer that maps the input data between 0 and 1. \(W_\gamma \) value is a determinant for precision and speed trade-off. \(W_\gamma \) is an adjustable hyperparameter and \(W_\gamma = 20\) has shown to be an optimum value experimentally and is set by default.
When the ORU network is trained, any given neuron in the network corresponds to an abstract notion of the input cost map. It must be considered that each pixel of a cost map represents the probability of an obstacle being present in that particular corresponding area only; meaning that the absence of an obstacle in one area must not affect the probability of an adjacent obstacle’s presence. To take this into account, rectified linear unit (ReLU) activation function is used in ORU. As Eq. 11 shows, ReLU returns 0 for inputs \(x\in \mathbb {R}^-\), representing the absence of an obstacle, allowing the induced values by the neurons not to contribute negatively and hence not affecting the rest of the network.
5.2 Path Pursuing Unit
Let \(\psi \) denote the beginning \(\nu \) elements of global planner’s path \(\Psi \), comprising state points, \(\lambda _k=[x_k,y_k]\). Let \(\lambda _O\) denote the current AR odometry state. In order to track \(\Psi \), PPU subtracts \(\lambda _O\) from \(\psi \) to obtain absolute steering vectors, \(\psi _s\), independent of AR’s position. \(\psi _s\) can be defined as the following:
\(\psi _s\) has variable dimensions of \(2\times \nu _s\) where \(\nu _s\le \nu \), in case the AR is so close to the goal point that the remaining length of \(\Psi \) is less than \(\nu \). The tracking path \(\psi _s\) and the heading angle \(\eta \) are passed through pre-adapted normalizers resulting in latent vector \(Z_\psi \). Unlike ORU, non-linear activation functions suit best for PPU for two reasons. First, since PPU is a relatively shallow network, it is immune from gradient vanishing phenomena that often occur in the learning process of non-linear deep networks [32]. And second, linear activation functions are agnostic of the non-linear behavior of \(\psi \). To select an activation function, PPU is isolated and tested alone. Iterative experiments showed that hyperbolic tangent results in the best performance.
5.3 Recurrent Interpreter
At the second stage of feature extraction, two recurrent neural networks interpret the time relationship between the elements of \(Z_\psi \) and \(Z_\gamma \). This allows the LPM-Net to comprehend the curvature profile of the global planner’s path, \(\Psi \), the AR’s rotation speed, and the conversions in a dynamic environment; i.e. whether the AR and the obstacles are moving towards or away from each other. The number of LSTM blocks in each recurrent network corresponds with the length of each input feature vector. A dropout layer is implemented in between the out-most dense layers to prevent overfitting. Ultimately, the outputs of the recurrent networks get augmented to construct a new latent vector, \(\hat{Z}\) that gets propagated to a dense topping network to get converted to AR’s command inputs, \(u_k\). Algorithm 3 demonstrates the pseudo-code for LPM-Net implementation.
6 Performance Evaluation
In this section, we aim to investigate the performance of the LPM-Net motion planner in a simulated environment that represents a real-world setting. Figure 4 depicts the selected map of a house environment. This environment has not been used for training the planner previously, and it is the first time the planner encounters it. The performance of the LPM-Net has been examined from various aspects, and the results for each aspect are presented separately.
6.1 Travel Duration
To assess the planner’s success in guiding the robot to its destination, similar to the methodology described in Sect. 4, we randomly select a target point in the simulated environment and allow the robot to navigate to that location. Although the selected target points can vary in proximity to the robot’s initial position, which affects the travel time, repeating this experiment over a long duration enables us to measure the planner’s average performance. The box plot in Fig. 5 illustrates the statistical distribution of the time elapsed from generating a new target point to the robot reaching that location. Outliers at the upper end of the plot often indicate instances where the robot failed to find the path to the goal and, after a specified time, a new destination was chosen. Additionally, this figure demonstrates the results for both the paradigm planner and the LPM-Net under the same conditions.
6.2 CPU Payload
Now that we know the LPM-Net is capable of successfully guiding the robot to a desired destination within a limited time and an unfamiliar environment,Footnote 1 we proceed to investigate its CPU payload. All simulations were conducted on an Intel 7th generation processor with x64 architecture. The CPU payload was measured as a percentage of the total processing power using Linux hardware resource monitoring tools throughout the simulation. Similarly, the CPU payload of the paradigm planner was measured under the same conditions, and the comparison between the two planners is shown in Fig. 6.
6.3 Motion Commands
Another important aspect to examine when evaluating a motion planner is the range of output commands it generates. These commands should be within the robot’s dynamic and kinematic constraints. Assuming this condition holds for the paradigm planner, we need to demonstrate that the output commands of the LPM-Net also fall within the same range. The plot in Fig. 7 illustrates that over a long simulation duration, the output of both planners remains consistent for all vector output parameters. Furthermore, Fig. 8 compares the angular acceleration of each wheel of the robot under the guidance of the paradigm planner and the LPM-Net, showing a similar distribution as expected.
6.4 Comparison Between the Paradigm Planner and the LPM-Net
The experimental results conducted in Sect. 6 indicate that the LPM-Net was able to demonstrate an stable performance while consuming less half processing power compared to the paradigm planner. The LPM-Net achieved a reduction of approximately 50% in CPU payload relative to the paradigm planner, albeit with a slight decrease in performance speed. This implies that the LPM-Net can be a suitable alternative for many problems in which processing efficiency is a more critical constraint than execution speed.
7 Conclusion
In this paper, we introduced LPM-Net, a novel data-driven, supervised-learning-based local predictive planner designed for autonomous robotics navigation. Utilizing environmental observations, LPM-Net efficiently predicts steering commands while adhering to stringent kinodynamic constraints. Its architecture stands out by offering significant adaptability to diverse 2-D navigation challenges and demonstrating enhanced computational efficiency on hardware platforms.
Our comparative analysis revealed that although LPM-Net occasionally sacrifices the near-optimal solution quality typically delivered by conventional predictive models, it compensates by reducing CPU usage by up to 50%. This makes it not only sustainable but also preferable for scenarios where power efficiency is critical. Looking ahead, we plan to further refine LPM-Net by integrating transfer learning techniques, which will broaden its applicability across varied operational contexts and enhance its learning efficiency. Additionally, we are aiming to extend our model’s utility to 3-D motion planning, which is pivotal for complex manipulations in robotic arms, thereby pushing the boundaries of its capabilities in industrial and service robots.
These advancements promise to elevate the practical deployment of autonomous systems, potentially leading to more robust, energy-efficient solutions that could revolutionize how machines interact and navigate in dynamic environments.
Change history
08 February 2025
A Correction to this paper has been published: https://doi.org/10.1007/s11063-025-11736-y
Notes
By an unfamiliar environment, we mean an environment that was not encountered during the training phase.
References
Adiyatov O, Varol HA (2013) Rapidly-exploring random tree based memory efficient motion planning. In: 2013 IEEE international conference on mechatronics and automation. IEEE, pp 354–359. https://doi.org/10.1109/ICMA.2013.6617944
LaValle SM, Kuffner JJ Jr (2001) Randomized kinodynamic planning. Int J Robot Res 20(5):378–400. https://doi.org/10.1177/02783640122067453
Li Y, Littlefield Z, Bekris KE (2016) Asymptotically optimal sampling-based kinodynamic planning. Int J Robot Res 35(5):528–564. https://doi.org/10.1177/0278364915614386
Xiao S, Bergmann N, Postula A (2017) Parallel RRT architecture design for motion planning. In: 2017 27th international conference on field programmable logic and applications (FPL). IEEE, pp 1–4. https://doi.org/10.23919/FPL.2017.8056773
Agarwal S et al (2018) Potential and sampling based RRT star for real-time dynamic motion planning accounting for momentum in cost function. In: International conference on neural information processing. Springer, Berlin, pp 209–221. https://doi.org/10.1007/978-3-030-04239-4_19
Perez A et al (2012) LQR-RRT*: optimal sampling-based motion planning with automatically derived extension heuristics. In: 2012 IEEE international conference on robotics and automation. IEEE, pp 2537–2542. https://doi.org/10.1109/ICRA.2012.6225177
Xinyu W et al (2019) Bidirectional potential guided RRT* for motion planning. IEEE Access 7:95046–95057. https://doi.org/10.1109/ACCESS.2019.2928846
Noreen I, Khan A, Habib Z (2016) Optimal path planning using RRT* based approaches: a survey and future directions. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2016.071114
Goretkin G, Perez A, Platt R (2013) Optimal sampling-based planning for linear-quadratic kinodynamic systems. In: 2013 International conference on robotics and automation. https://doi.org/10.1109/ICRA.2013.6630907
Li L et al (2021) MPC-MPNet: model-predictive motion planning networks for fast, near-optimal planning under kinodynamic constraints. IEEE Robot Autom Lett 6(3):4496–4503. https://doi.org/10.1109/LRA.2021.3067847
Faroni M, Beschi M, Pedrocchi N (2019) An MPC framework for online motion planning in human-robot collaborative tasks. In: 2019 24th IEEE international conference on emerging technologies and factory automation (ETFA). IEEE, pp 1555–1558. https://doi.org/10.1109/ETFA.2019.8869047
Xiao X et al (2022) Motion planning and control for mobile robot navigation using machine learning: a survey. Auton Robots 46(5):569–597. https://doi.org/10.1007/s10514-022-10039-8
Wang J et al (2021) A survey of learning-based robot motion planning. IET Cyber-Syst Robot 3(4):302–314. https://doi.org/10.1049/csy2.12020
Lai S, Lan M, Chen BM (2019) Model predictive local motion planning with boundary state constrained primitives. IEEE Robot Autom Lett 4(4):3577–3584. https://doi.org/10.1109/LRA.2019.2928255
Mansard N et al (2018) Using a memory of motion to efficiently warm-start a nonlinear predictive controller. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2986–2993. https://doi.org/10.1109/ICRA.2018.8463154
Chiang H-TL et al (2019) RL-RRT: kinodynamic motion planning via learning reachability estimators from RL policies. IEEE Robot Autom Lett 4(4):4298–4305. https://doi.org/10.1109/LRA.2019.2931199
Ota K et al (2020) Efficient exploration in constrained environments with goal-oriented reference path. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 6061–6068. https://doi.org/10.1109/IROS45743.2020.9341620
Tsukamoto H, Chung S-J (2021) Learning-based robust motion planning with guaranteed stability: a contraction theory approach. IEEE Robotics Autom Lett 6(4):6164–6171. https://doi.org/10.1109/LRA.2021.3091019
Franceschini R, Fumagalli M, Becerra JC (2022) Learn to efficiently exploit cost maps by combining RRT* with Reinforcement Learning. In: 2022 IEEE international symposium on safety, security, and rescue robotics (SSRR), pp 251–256. https://doi.org/10.1109/SSRR56537.2022.10018735
Das PK et al (2012) An improved Q-learning algorithm for path-planning of a mobile robot. Int J Comput Appl. https://doi.org/10.1109/ICInfA.2015.7279322
Xu J et al (2020) Reinforcement learning to rank with pairwise policy gradient. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. SIGIR ’20. Virtual Event, China: Association for Computing Machinery, pp 509–518. https://doi.org/10.1145/3397271.3401148
Li T et al (2018) Learning to interrupt: a hierarchical deep reinforcement learning framework for efficient exploration. https://doi.org/10.1109/ROBIO.2018.8665177. arXiv:1807.11150
Sun H et al (2021) Motion planning for mobile robots—focusing on deep reinforcement learning: a systematic review. IEEE Access 9:69061–69081. https://doi.org/10.1109/ACCESS.2021.3076530
Yamada J et al (2021) Motion planner augmented reinforcement learning for robot manipulation in obstructed environments. In: Conference on robot learning. PMLR, pp 589–603. https://doi.org/10.48550/arXiv.2010.11940
Shahi S, Lee H (2022) Autonomous rear parking via rapidly exploring random-tree-based reinforcement learning. Sensors 22(17):6655. https://doi.org/10.3390/s22176655
Qureshi AH et al (2019) Motion planning networks. In: 2019 International conference on robotics and automation (ICRA). IEEE, pp 2118–2124. https://doi.org/10.48550/arXiv.1806.05767
Ichter B, Pavone M (2019) Robot motion planning in learned latent spaces. IEEE Robot Autom Lett 4(3):2407–2414. https://doi.org/10.48550/arXiv.1807.10366
Qureshi AH et al (2020) Motion planning networks: bridging the gap between learning-based and classical motion planners. IEEE Trans Robot 37(1):48–66. https://doi.org/10.48550/arXiv.1907.06013
Amirhosseini H, Najafi F (2020) Design, prototyping and performance evaluation of a bio-inspired walking microrobot. Iran J Sci Technol Trans Mech Eng 44(3):799–811. https://doi.org/10.1007/s40997-019-00281-4
Zhakypov Z et al (2019) Designing minimal and scalable insect-inspired multi-locomotion millirobots. Nature 571(7765):381–386. https://doi.org/10.1038/s41586-019-1388-8
Heaton J (2018) Ian goodfellow, yoshua bengio, and aaron courville: deep learning. https://doi.org/10.1007/s10710-017-9314-z
Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Artif Intell Rev 53(8):5929–5955. https://doi.org/10.1007/s10462-020-09838-1
Acknowledgements
This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original Online version of this article was revised: to add the missing acknowledgment part.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Amirhosseini, F., Nilforoushan, Z. & Leili Mirtaheri, S. LPM-Net: A Data-Driven Resource-Efficient Predictive Motion Planner for Mobile Robots. Neural Process Lett 57, 5 (2025). https://doi.org/10.1007/s11063-024-11671-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11063-024-11671-4