Posted by @alex-roba:
I’ve observed an issue where a robot unnecessarily waits for traffic at a waypoint, even though its destination is clear and unblocked. The robot remains stuck in the “Waiting for Traffic” state indefinitely, until another robot moves to a different location. At that point, the waiting robot resumes its operation.
I’ve attached a video demonstrating the behavior. In the video, pay close attention to Robot 1, which exhibits the issue. It stops and waits unnecessarily without any apparent reason for the delay.
https://github.com/user-attachments/assets/ea3a483f-5ed7-4f1a-9e36-3a67c37448cc
Posted by @mxgrey:
Thanks for catching and reporting this. It appears to be a bug in either the traffic dependency system or the traffic schedule update system. If you’re able to provide a package (i.e. the map file and some launch files) that reliably reproduce the bug, I’ll dig in to identify the cause and figure out a fix.
Posted by @alex-roba:
Hi @mxgrey,
I was able to reliably reproduce the bug in the office world. Below is the step-by-step procedure in Jazzy:
-
Launch the office world using:
ros2 launch rmf_demos_gz office.launch.xml
-
Send a patrol task for robot 1 to the lounge using:
ros2 run rmf_demos_tasks dispatch_patrol -F tinyRobot -R tinyRobot1 -p lounge -n 1 --use_sim_time
-
Once robot 1 passes the patrol_D1 waypoint, send a patrol task for robot 2 to the lounge waypoint using:
ros2 run rmf_demos_tasks dispatch_patrol -F tinyRobot -R tinyRobot2 -p lounge -n 1 --use_sim_time
You will notice that tinyRobot1 is unnecessarily waiting at the waypoint before the lounge, even though tinyRobot2 has already completed its task, arrived at the lounge, and is parked. This wait is eventually interrupted by the excessively delayed behavior, which lasts around 30 seconds.
I am not using the reservation node here, but the same bug is also observed when the reservation node is enabled.
Let me know if you need more details!
https://github.com/user-attachments/assets/6c7b2c80-51a4-4560-a358-b05834ed48ab
Edited by @alex-roba at 2025-01-27T07:16:40Z
Posted by @mxgrey:
Thanks for figuring out the minimum reproducible example, I wouldn’t have been able to figure out the cause of the bug without this. I’ve opened a PR to fix it here: Fix negotiation route index bug by mxgrey · Pull Request #113 · open-rmf/rmf_traffic · GitHub
If you get a chance to try out that PR, let me know how it goes, especially if you run into any other cases of traffic waiting incorrectly.
Posted by @mxgrey:
I am not using the reservation node here, but the same bug is also observed when the reservation node is enabled.
On a separate note, if you see the exact same behavior with and without the reservation system running, then I think you aren’t enabling reservations for your fleet. That behavior doesn’t get turned on by default since it would be a major behavioral change for prior users.
Posted by @alex-roba:
Hi @mxgrey,
I was able to reliably reproduce the bug in the office world. Below is the step-by-step procedure in Jazzy:
Launch the office world using:
ros2 launch rmf_demos_gz office.launch.xml
Send a patrol task for robot 1 to the lounge using:
ros2 run rmf_demos_tasks dispatch_patrol -F tinyRobot -R tinyRobot1 -p lounge -n 1 --use_sim_time
Once robot 1 passes the patrol_D1 waypoint, send a patrol task for robot 2 to the lounge waypoint using:
ros2 run rmf_demos_tasks dispatch_patrol -F tinyRobot -R tinyRobot2 -p lounge -n 1 --use_sim_time
You will notice that tinyRobot1 is unnecessarily waiting at the waypoint before the lounge, even though tinyRobot2 has already completed its task, arrived at the lounge, and is parked. This wait is eventually interrupted by the excessively delayed behavior, which lasts around 30 seconds.
I am not using the reservation node here, but the same bug is also observed when the reservation node is enabled.
Let me know if you need more details!
Posted by @osama-z-salah:
@mxgrey I wanted to follow up and ask if you’re planning to work on fixing this issue soon. I’d love to help debug if possible—do you have any insights or suggestions on where I should start looking?
Looking forward to your thoughts!
Posted by @mxgrey:
It’s very near the top of my priority list. I expect I’ll be able to make time for it this week. But if anyone wants to investigate sooner, I do know where I’d look to track down the problem.
- [1] is where we check if the traffic is ready for us to pass. If this isn’t getting triggered when it should, or if it’s receiving the wrong index numbers for the other robots, then the waiting robot will not know to go.
- [2] and [3] are where a robot updates what progress it has made along its path. If these lines aren’t getting triggered when they should or if they’re sending out the wrong index numbers, then other robots waiting on them won’t know to proceed.
I would put gratuitous debug statements in both of these locations and then run a scenario where the issue occurs. Then try to narrow down where exactly the problem is happening based on which part isn’t working the way it’s supposed to.
Edited by @mxgrey at 2025-02-17T12:33:34Z
Posted by @alex-roba:
@mxgrey Thanks for your fix! I tried your PR (Fix negotiation route index bug by mxgrey · Pull Request #113 · open-rmf/rmf_traffic · GitHub), and it has reduced some of the delay. However, there are still occasional delays that appear unpredictably.
In the video below, I send both robots to the lounge sequentially with using_parking_reservation
set to true
. This means one robot should wait at the waypoint “wait”, which I placed near the lounge.
As you can see, for the first attempt, the robots proceed without unnecessary waiting. However, on the second attempt, at 00:57 in the video, tinyRobot1 experiences an unexpected delay near to the negotiation area. I’m not sure why this happens, as the task remains the same across all two attempts.
If you’d like, I can share the configuration and navigation graph of my experiment.
https://github.com/user-attachments/assets/56117297-b93f-406e-8b3c-2196a38b3815
Posted by @mxgrey:
I see, please do share the configuration file as well as the office.building.yaml
file so I can fully recreate the problem.
Posted by @alex-roba:
@mxgrey you can find them here: unnecessary waits for traffic files · GitHub
Edited by @alex-roba at 2025-03-03T05:43:25Z
Posted by @alex-roba:
@mxgrey this is the i am sending the tasks sequentially ros2 run rmf_demos_tasks dispatch_go_to_place -p lounge -F tinyRobot -R tinyRobot2 --use_sim_time && ros2 run rmf_demos_tasks dispatch_go_to_place -p lounge -F tinyRobot -R tinyRobot1 --use_sim_time
Posted by @alex-roba:
@mxgrey Were you able to reproduce that scenario? If not, I can investigate further to find more reliable cases where the unnecessary wait occurs.
Posted by @mxgrey:
Yep, I’m managing to reproduce it and in the process of debugging it right now!
It’s taking longer to debug than it normally would because the issue almost vanishes as I add in debug output, but I’m gradually narrowing the possible causes down.
Posted by @mxgrey:
I’ve tracked down the cause and opened a PR to fix it: Immeidately mark dependencies as deprecated when the target plan was cleared by mxgrey · Pull Request #115 · open-rmf/rmf_traffic · GitHub
Thanks again for coming up with the reproducible example. Once again I wouldn’t have been able to track down the cause without it.
Posted by @mxgrey:
The fix is merged now. Let us know if you run into any new situations where the dependency lingers longer than it should.
Posted by @alex-roba:
Hi @mxgrey I’ve noticed some unexpected behavior in a few places. The fleet adapter appears to wait for traffic even at locations where another robot has already passed. Although the wait is brief and done instantly, it does impact the continuity of the path. I’m unsure whether this is an acceptable side effect or if it might be a bug in the dependency clearing behavior.
Posted by @mxgrey:
What you’re describing is a known shortcoming that comes from the sequential execution of steps in the robot’s navigation plan. Essentially the steps look something like this:
- Move to location A
- Wait for known traffic to be clear at location A
- Move in front of door D
- Request door D to open
- Wait for door D to open
- Move through door D
- Move to location B
- Wait for known traffic to be clear at location B
Each step comes from a plan that gets generated based on the future traffic conditions that are predicted at the time that the plan is made. Each step is executed sequentially, which means the robot needs to stop at location A and location B before checking whether the traffic is clear there.
This rigidly sequential execution is a design choice largely motivated by race condition problems that we’ve seen happen when trying to juggle too many parallel events at once. Those race conditions can be things like incorrect behavior from different branches of execution falling out of sync, up to and including crashes due to memory errors (data races).
These problems were a big motivator for the bevy_impulse
library that we’re writing for the next generation. With bevy_impulse
we’ll be able to generate complex workflows with parallel branches and synchronization, and we’ll have guarantees that all the behavior will execute correctly. In the next generation we’ll use the navigation plan to generate workflows instead of sequential steps, and that will allow us to react to traffic conditions in advance instead of always needing to stop and examine.
Posted by @alex-roba:
@mxgrey thanks for the explanation!
I have another question — does that also apply to negotiations? I’ve noticed that sometimes unnecessary negotiations occur, even though they’re resolved instantly. Is that expected behavior?