Interactive Report: Safety Evaluation of Motion Plans

Evaluating Motion Plan Safety with Trajectory Predictors

The Safety-Soundness Dilemma

Autonomous systems need safety monitors that are both **complete** (catch all true dangers) and **sound** (don't flag safe situations as dangerous). The figure from the paper below illustrates this trade-off.

Comparison of reachable set computation methods from the research paper — **Figure 1:** (a) A baseline predictor is overconfident and misses the true path (a False Negative). (b) A worst-case FRS is too conservative and flags a safe path as dangerous (a False Positive). (c) FORCE-OPT provides a balanced, accurate reachable set.

The FORCE-OPT Framework

Our solution, FORCE-OPT, is a principled framework built on four key components that work together to create a robust and reliable safety monitor.

FORCE-OPT framework process diagram — **Figure 2: FORCE-OPT Process Overview:** The complete pipeline showing how trajectory predictions are transformed into calibrated Forward Reachable Sets through convex optimization and conformal prediction.

Trajectory Predictor

Treats modern trajectory predictors as data-driven estimators of Forward Reachable Sets (FRS), grounding predictions in realistic, learned behavior.

Calibration

Calibrates the FRS to correct for model errors, guaranteeing that the true future path is covered with high, user-specified probability.

Convex Optimization

Uses an efficient optimization to find the smallest possible area that captures the most likely future positions, creating tight, accurate sets without slow sampling.

Bayesian Update

Dynamically adjusts the FRS conservativeness based on the predictor's real-time performance, adding a layer of safety for unexpected scenarios.

Performance

Experimental results from the nuScenes dataset showing how different methods perform on in-distribution scenarios.

False Positive Rate (FPR)

Lower is better

False Negative Rate (FNR)

Lower is better

Coverage (Cov)

Higher is better

An example of FORCE-OPT successfully covering the future trajectories of multiple agents. — **Visualizing High Coverage:** This image demonstrates FORCE-OPT's ability to generate predicted sets (red ellipses) that successfully cover the true future trajectories (blue boxes) of multiple agents.

An example of FORCE-OPT detecting a potential collision. — **Detecting Potential Collisions:** Here, the planned path of the ego vehicle (green outline) intersects with the predicted FRS (red ellipse) of another agent, correctly identifying a potential crash.

Adapting to Uncertainty with Bayesian Update

When the system detects the predictor is unreliable, the belief-based version of FORCE-OPT makes the reachable set more conservative to ensure safety, preventing a failure.

Standard FORCE-OPT failing to cover a trajectory. — **Failure Case:** Standard FORCE-OPT is overconfident and fails to cover the agent's true path.

Belief-based FORCE-OPT successfully covering a trajectory. — **Success Case:** The belief-based version detects the failure, becomes more conservative, and successfully covers the path.

Performance Comparison (Balanced Error Rate)

Impact of Multi-Modality on Performance

As shown in the heatmaps from the paper, using more prediction modes (moving from left to right on the x-axis) generally improves the performance of FORCE-OPT and its variants by reducing the Balanced Error Rate (BER).

Ablation study on the number of GMM modes from the research paper — **Figure 2 from the paper:** Performance heatmaps for (a) Balanced Error Rate, (b) False Positive Rate, and (c) False Negative Rate as the number of modes increases.

Conclusion & Future Directions

Key Takeaways

FORCE-OPT offers a robust, efficient, and principled framework for safety monitoring in learned autonomy stacks. By integrating convex optimization, conformal prediction, and Bayesian update, it significantly outperforms existing methods in balancing safety (low false negatives) and practicality (low false positives), even in challenging out-of-distribution scenarios.

Future Work

Joint Multi-Agent Reachability: Extend the framework to compute FRS for multiple agents simultaneously, capturing complex interactions in dense traffic.
Direct FRS Generation: Train a neural network specifically to output Forward Reachable Sets directly, potentially improving efficiency and accuracy.