Driving Behavior for ADAS and Autonomous Driving II

Driving Behavior for ADAS
and Autonomous Driving II
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California

Outline
• AutonoVi: Autonomous Planning with Dynamic Maneuvers and Traffic Constraints
• Identifying Driver Behaviors using Trajectory Features for Vehicle Navigation
• End-to-end Driving via Conditional Imitation Learning
• Conditional Affordance Learning for Driving in Urban Environments
• Failure Prediction for Autonomous Driving
• LIDAR-based Driving Path Generation Using Fully Convolutional Neural Networks
• LiDAR-Video Driving Dataset: Learning Driving Policies Effectively
• E2E Learning of Driving Models with Surround-View Cameras and Route Planners

AutonoVi: Autonomous Vehicle Planning with
Dynamic Maneuvers and Traffic Constraints
• AutonoVi is a novel algorithm for autonomous vehicle navigation that supports dynamic
maneuvers and satisfies traffic constraints and norms.
• It is based on optimization-based maneuver planning that supports dynamic lane-changes,
swerving, and braking in all traffic scenarios and guides the vehicle to its goal position.
• It takes into account various traffic constraints, including collision avoidance with other
vehicles, pedestrians, and cyclists using control velocity obstacles.
• A data-driven approach to model the vehicle dynamics for control and collision avoidance.
• The trajectory computation algorithm takes into account traffic rules and behaviors, such as
stopping at intersections and stoplights, based on an arc spline representation.
• Simulated scenarios include jaywalking pedestrians, sudden stops from high speeds, safely
passing cyclists, a vehicle suddenly swerving into the roadway, and high-density traffic
where the vehicle must change lanes to progress more effectively.

The Navigation Algorithm Pipeline: The autonomous vehicle planning algorithm operates as 1) First, a
route is planned using graph-search over the network of roads. 2) Secondly, traffic and lane-following
rules are combined to create a guiding trajectory for the vehicle for the next planning phase. 3) This
guiding trajectory is transformed to generate a set of candidate control inputs. These controls are
evaluated for dynamic feasibility using the data driven vehicle dynamics modeling and collision-free
navigation via extended control obstacles. 4) Those remaining trajectories are evaluated using the
optimization technique to determine the most-appropriate set of controls for the next execution cycle.

Finite State Machine: different behavior
states determined by the routing and
optimization algorithms.
Guiding Path Computation: The vehicle computes a
guiding path to the center of its current lane based on
a circular arc. (A): a path off the center of its current
lane. (B): abrupt changes to heading. (C): lane changes.

Cpath: costs associated with the vehicle’s success at
tracking its path and the global route
Cvel: squared diff. btw desired speed and current speed
Cdrift: squared distance btw the center line of the
vehicle’s lane and its current position
Cprog: vehicle’s desire to maximally progress along its
current path
Path Cost:
Comfort Costs:
caccel penalizes large accelerations and decelerations.
cyawr penalizes large heading changes and discourages sharp turning.

Maneuver Costs:
WrongLane: true if the vehicle’s lane does not match the lane
for the next maneuver.
LaneChanged: true if a candidate path crosses a lane boundary.
Proximity Costs:
Ctype: constant, larger for pedestrians and bicycles than for vehicles, and
guides the ego-vehicle to pass those entities with greater distance.

Acceleration and steering functions, A(v, ut) and
Φ(v, us), respectively, describe the relationship
btw the vehicle’s speed, steering, and control
inputs and its potential for changes in the
acceleration and steering.
Given the generated functions S, A, and Φ, the future path of the vehicle can be evaluated quickly
for planning future controls.
Safety function S(u, X) determines if a set of
controls is feasible, given the vehicle state.

Results: (A) and (B): The ego-vehicle is forced to stop as a pedestrian enters the roadway. Once the
pedestrian has moved away, the vehicle resumes its course. (C) and (D): The ego-vehicle approaches
a slower moving vehicle from behind. (E): The Hatchback ego-vehicle during the S-Turns benchmark.
(F): An overview of the Simulated City benchmark. (G): The ego-vehicle (outlined in green) yields to an
oncoming vehicle (outlined in red) during the Simulated City benchmark. (H): The ego-vehicle (outlined
in green) stops in traffic waiting for a stoplight to change.

Identifying Driver Behaviors using Trajectory
Features for Vehicle Navigation
• To automatically identify driver behaviors from vehicle trajectories and use them for safe
navigation of autonomous vehicles.
• A set of features easily extracted from car trajectories, then a data-driven mapping between
these features and 6 driver behaviors using an elaborate web-based user study.
• A summarized score indicating awareness level while driving next to other vehicles.
• Driving Behavior Computation: Trajectory to Driver Behavior Mapping (TDBM), factor
analysis on 6 behaviors, derived from 2 common behaviors: aggressiveness and carefulness,
four more behaviors as reckless, threatening, cautious, and timid.
• Improved Real-time Navigation: enhance AutonoVi to navigate accord. to the neighboring
drivers’ behavior, identify potentially dangerous drivers in real-time and chooses a path that
avoids potentially dangerous drivers.

During the training of TDBM, extract features from the trajectory database and conduct a user
evaluation to find the mapping between them. During the navigation stage, compute a set of
trajectory and extract the features, then compute the driving behavior using TDBM. Finally, plan for
real-time navigation, taking into account these driver behaviors.

10 candidate features f0,…,f9 for selection. Features in
green are selected for mapping to behavior-metrics
only, and those in blue are selected for mapping to
both behavior-metrics and attention metrics.
Denote longitudinal jerk jl and progressive jerk jp

Two example videos used in the user study. Participants are
asked to rate the 6 driving behavior metrics and 4 attention
metrics of the target car colored in red.
6 Driving Behavior metrics (b0, b1, ...,b5) and 4 Attention
metrics (b6, b7, b8, b9) used in TDBM.
Least absolute shrinkage and selection operator
(Lasso) analysis based on the objective function

End-to-end Driving via Conditional
Imitation Learning
• Deep networks trained on demo. of human driving learn to follow roads and avoid obstacles.
• However, driving policies trained via imitation learning cannot be controlled at test time.
• A vehicle trained e2e to imitate cannot be guided to take a turn at an up- coming intersection.
• Condition imitation learning:
• At training time, the model is given not only the perceptual input and the control signal, but also a
representation of the expert’s intention.
• At test time, the network can be given corresponding commands, which resolve the ambiguity in
the perceptuomotor mapping and allow the trained model to be controlled by a passenger or a
topological planner, just as mapping applications and passengers provide turn-by-turn directions
to human drivers.
• The trained network is thus freed from the task of planning and can devote its representational
capacity to driving.
• This enables scaling imitation learning to vision-based driving in complex urban environments.

Imitation Learning
Conditional imitation learning allows an autonomous vehicle trained e2e to be directed by high-level commands.

Imitation Learning
expose the latent state h to the controller by
introducing an additional command input: c = c(h).
The controller receives an observation ot from the
environ. and a command ct. It produces an action at
that affects the environment, advancing to the next
time step.

Imitation Learning
Two network architectures for command-conditional imitation learning. (a) command input: the
command is processed as input by the network, together with the image and the measurements. The
same architecture can be used for goal-conditional learning (one of the baselines), by replacing the
command by a vector pointing to the goal. (b) branched: the command acts as a switch that selects
btw specialized sub-modules.
j = J(i, m, c) = ⟨I(i), M(m), C(c)⟩ F(i, m, ci) = Ai(J(i, m)).

Imitation Learning
• To use CARLA, an urban driving simulator, to corroborate design decisions and evaluate the
proposed approach in a dynamic urban environment with traffic.
• A human driver is presented with a first-person view of the environment (center camera) at
a resolution of 800×600 pixels.
• The driver controls the simulated vehicle using a physical steering wheel and pedals, and
provides command input using buttons on the steering wheel.
• The driver keeps the car at a speed below 60 km/h and strives to avoid collisions with cars
and pedestrians, but ignores traffic lights and stop signs.
• To record images from the 3 simulated cameras, along with other measurements such as
speed and the position of the car.
• The images are cropped to remove part of the sky.
• CARLA also provides extra info. such as distance travelled, collisions, and the occurrence of
infractions such as drift onto the opposite lane or the sidewalk.
• This info. is used in evaluating different controllers.

Imitation Learning
Physical system setup. Red/black
indicate +/- power wires, green
indicates serial data connections,
and blue indicates PWM control
signals.

Conditional Affordance Learning for Driving
in Urban Environments
• Autonomous driving:
• 1) modular pipelines, that build an extensive model of the environment;
• 2) imitation learning approaches, that map images directly to control outputs.
• 3) direct perception, combine 1)+2) by NN to learn appropriate intermediate representations.
• Existing direct perception approaches are restricted to simple highway situations, lacking the ability to
navigate intersections, stop at traffic lights or respect speed limits.
• A direct perception approach maps video input to intermediate representations suitable for
autonomous navigation in complex urban environments given high-level directional inputs.
• Goal-directed navigation on the challenging CARLA simulation benchmark.
• handle traffic lights, speed signs and smooth car-following.

Conditional Affordance Learning (CAL) for Autonomous Urban Driving. The input video
and the high-level directional input are fed into a NN which predicts a set of affordances.
These affordances are used by a controller to calculate the control commands.

The CAL agent (top) receives the current camera image and a directional command from CARLA (“straight”, “left”, “right”).
The feature extractor converts the image into a feature map. The agent stores the last N feature maps in memory. This
sequence of feature maps, together with the directional commands from the planner, are exploited by the task blocks to
predict affordances. The control commands calculated by the controller are sent back to CARLA.

• A intermediate representation should be low-dimensional for autonomous navigation:
• (i) the agent should be able to drive from A to B as fast as possible while obeying traffic rules.
• (ii) infractions against traffic rules should be avoided at all times (6 types):
• Driving on the wrong lane, driving on the sidewalk, running a red light, colliding with other vehicles,
hitting pedestrians and hitting static objects.
• The speed limit must be obeyed at all times.
• (iii) the agent should be able to provide a pleasant driving experience.
• The car should stay in the center of the road and take turns smoothly.
• It should be able to follow a leading car at a safe distance.
• Controller: longitudinal controller to the throttle and brake, lateral controller to the steering;
• Longitudinal control: subdivided into several states as (>importance) cruising, following, over limit,
red light, and hazard stop; apply the PID controller.
• Lateral control: apply the Stanley Controller (SC) which uses 2 error metrics as the distance to
centerline d(t) and the relative angle ψ(t).
• Perception: a multi-task learning (MTL) problem, using a single NN to predict all affordances
in a single forward pass.

Affordances. Categorize affordances according to their type (discrete/continuous) and whether
they are conditional (dependent on directional input) or unconditional. The affordances (red)
and observation areas used by our model.

Failure Prediction for Autonomous Driving
• Driving models may fail more likely at places with heavy traffic, at complex intersections,
and/or under adverse weather/illumination conditions.
• Here is a method to learn to predict the occurrence of these failures, i.e. to assess how
difficult a scene is to a given driving model and to possibly give the driver an early headsup.
• A camera- based driving model is developed and trained over real driving datasets.
• The discrepancies between the model’s predictions and the human ‘ground-truth’
maneuvers were then recorded, to yield the ‘failure’ scores.
• The prediction method is able to improve the overall safety of an automated driving model
by alerting the human driver timely, leading to better human-vehicle collaborative driving.
• Scene Drivability, i.e. how easy a scene is for an automated car to navigate.
• A low drivability score means that the automated vehicle is likely to fail for the particular scene.
• To quantize the scene drivability scores for driving scenes: Safe and Hazardous.

The architecture of the
driving model which
provides future maneuvers
(i.e. speed and steering
angle) and the drivability
score of the scene. The
drivability scores are
quantized into two levels:
Safe and Hazardous. In
this case, the coming
scene is Safe for the
driving model, so the
system does not need to
alert the human driver.

A safe scene allows for a driving mode of High
Automation and a hazardous scene allows for a
driving mode of Partial/No Automation. These
thresholds can be set and tuned according to specific
driving models and legal regulations.

An illustrative flowchart of the training procedure and solution
space of the driving model and the failure prediction model.

Scene examples with their drivability scores, quantized into two levels: Safe and Hazardous
for rows, and the confidence of the prediction in columns ranging from high to low.

LIDAR-based Driving Path Generation Using
Fully Convolutional Neural Networks
• Mediated perception: modularity and interpretation;
• Behavior reflex: black box which maps raw input to control actions.
• A learning-based approach to generate driving paths by integrating LIDAR point clouds,
GPS-IMU information, and Google driving directions.
• A FCN jointly learns to carry out perception and path generation from real-world driving
sequences and that is trained using automatically generated training examples.
• It helps fill the gap between low-level scene parsing and behavior-reflex approaches by
generating outputs close to vehicle control and at the same time human-interpretable.
• Google Maps is used to obtain driving instructions when approaching locations where multiple
direction can be taken.
• When queried given the current position and a certain destination, Google Maps returns a human
interpretable driving action, such as turn left or take exit, together with an approximate distance of
where that action should be taken.
• intention direction id ∈ {left, straight, right}, and intention proximity ip ∈ [0, 1].

Overview of the I/O tensors relative to a validation example. (A) Forward acceleration. (B) Forward speed. (C) Yaw
rate. (D) Intention proximity. (E) Max elevation. (F) Average reflectivity. (G) Future path ground-truth. (H) Intention
direction. (I) Overlay of occupancy LIDAR image with ground-truth past (red) and future path (blue).

A schematic illustration of the FCN

Qualitative comparison of the FCN’s
performance using several combinations
of sensor and information modalities.
The blue for the future path, the red for
the past path. (A) and (B) where
intention information was particularly
useful: only Lidar-IMU-INT and Lidar-INT
were able to predict accurate paths. (C)
for IMU info. to predict the future path,
especially when a turning maneuver is
recently initiated. (D) for a single road
ahead; with the exception of IMU-only, all
the other FCNs predicted accurate paths
in this case.

In each column, the top panel shows the occupancy top-view overlayed with the past path (red) and the Lidar-IMU-INT’s
future path prediction (blue). The bottom panels are the driving intention proximity (left) and driving intention direction
(right). Column A where there is disagreement between the driving intention (turn right) and the LIDAR point cloud that
shows a straight road with no exits. Columns B–D where multiple directions are possible.

LiDAR-Video Driving Dataset: Learning
Driving Policies Effectively
• A LiDAR-Video dataset, provides large-scale high-quality point clouds scanned by a
Velodyne laser, videos recorded by a dashboard camera and standard drivers’ behaviors.

The data collection platform with multiple sensors.

The pipeline of data preprocessing when constructing dataset.

• Discrete action prediction: to predict current probability distribution over all possible actions.
• The limitation of discrete prediction is that autonomous vehicle can only make decisions
among limited predefined actions.
• not suitable for real driving, since it is too coarse to guide the vehicle driving.
• Continuous prediction: to predict current states of vehicles such as wheel angle and vehicle
speed as a regression task.
• If driving policies on all real-world states can be predicted correctly, vehicles are expected to be
driven successfully by trained model.
• Model driving process as a continuous prediction task: to train a model that receives
multiple perception information including video frames and point clouds, thus predict
correct steering angles and vehicle speeds.
• Learning tools: DNN + LSTM.
• Depth representation: Point Cloud Mapping and PointNet.

The pipeline of extracting feature maps from raw point clouds. Firstly, split XOZ plain into small
grids, one of which is corresponding to specific one pixel in feature maps. Secondly, group raw points by
projecting points into grids in XOZ. Then calculate feature values (F-values) in each grid. Finally, generate
feature maps by visualizing feature matrix and rendering with jet color map.

End-to-End Learning of Driving Models with
Surround-View Cameras and Route Planners
• A surround-view camera system, a route planner, and a CAN bus reader.
• A sensor setup provides data for a 360-degree view surrounding the vehicle, the driving
route, and low-level driving maneuvers by human drivers.
• A driving dataset covers diverse driving scenarios and weather/illumination conditions.
• Learn a driving model by integrating info. from the surround-views and the route planner.
• Two route planners are exploited:
• 1) by representing the planned routes on OpenStreetMap as a stack of GPS coordinates;
• 2) by rendering the routes on TomTom Go Mobile and recording the progression into a video.
• Data and code: http://www.vision.ee.ethz.ch/~heckers/Drive360.

An illustration of the driving system

• Cameras provide a 360-degree view of the area surrounding the vehicle.
• The driving maps or GPS coordinates generated by the route planner and the videos from
our cameras are synchronized; they are used as inputs to train the driving model.
• The driving model consists of CNN networks for feature encoding, LSTM networks to
integrate the outputs of the CNNs over time and FCN to integrate info. from multiple
sensors to predict the driving maneuvers.
• A driving dataset of 60 hours, featuring videos from eight surround-view cameras, two
forms of data representation for a route planner, low-level driving maneuvers, and GPS-IMU
data of the vehicle’s odometry;
• To keep the task tractable, learn the driving model in an E2E manner, i.e. to map inputs from
the surround-view cameras and the route planner directly to low-level maneuvers of the car.
• The incorporation of detection and tracking modules for traffic agents (e.g. cars and
pedestrians) and traffic control devices (e.g. traffic lights and signs) is future work.

camera rig
rig on the vehicle
The configuration of cameras.
The rig is 1.6 meters wide so
that the side- view cameras
can have a good view of road
surface without the
obstruction by the roof of the
vehicle. The cameras are
evenly distributed laterally and
angularly.

• 1. TomTom Map represents one of the s-o-t-a commercial maps for driving applications, it
does not provide open APIs to access their ‘raw’ data.
• Exploit the visual info. provided by Tom- Tom GO Mobile App, and recorded rendered map views
using screen recording.
• Since map rendering rather slow updates, capture at 30 fps with size of 1280 × 720.
• 2. It uses the real-time routing method [Luxen & Vetter] for OSM data as route planner.
• The past driving trajectories (GPS coordinates) are provided to the routing algorithm to localize
the vehicle, and the GPS tags of the planned road for the next 300m ahead are taken as the
representation of the planned route for the ‘current’ position.
• Because the GPS tags of the road networks of OSM are not distributed evenly according to
distance, fit a cubic smoothing spline to the obtained GPS tags and then sampled 300 data points
from the fitted spline with a stride of 1m.
• For the OSM route planner, a 300 × 2 matrix (300 GPS coordinates) as the representation of the
planned route for every ‘current’ position.

• Human Driving Maneuvers: record low level driving maneuvers, i.c. the steering wheel angle
and vehicle speed, registered on the CAN bus of the car at 50Hz.
• The CAN protocol is a simple ID and data payload broadcasting protocol that is used for low level
info. broadcasting in a vehicle.
• Read out the specific CAN IDs and corresponding payload for steering wheel angle and vehicle
speed via a CAN-to-USB device and record them on a computer connected to the bus.
• Vehicle’s Odometry: use the GoPro cameras’ built-in GPS and IMU module to record GPS
data at 18Hz and IMU measurements at 200Hz while driving.
• This data is then extracted and parsed from the meta-track of the GoPro created video.
• Synchronization: the internal clocks of all sensors are synchronized to the GPS clock.
• The resulting synchronization error for the video frames is up to 8.3 milliseconds (ms).

Qualitative results for driving action prediction, to compare 3 cases to the front camera-only-
model: (1) learning with TomTom route planner, (2) learning with surround-view cameras (3)
learning with TomTom route planner and surround- view cameras.

The architecture of
the Surround-View
and TomTom route
planner model.
Qualitative evaluation
of Surround-View +
TomTom and Front-
Camera-Only models.

Driving Behavior for ADAS and Autonomous Driving II

More Related Content

What's hot (20)

Similar to Driving Behavior for ADAS and Autonomous Driving II (20)

More from Yu Huang (20)

Recently uploaded (20)

Driving Behavior for ADAS and Autonomous Driving II