where the weights p1 and p2 are determined by the user, c ̄ is the constraint violation rate defined as in Fig. 14, v ̄x is the average speed during the 200-s-simulations, and vmin and vmax are the lower and the upper bounds of the test vehicle’s speed, respectively. Note that this reward function is designed such that each of its terms is dimensionless. The parameters optimized in this example are the ratio (wl1)/(wl2),representing the weighting of the two layers in the evaluation metric function (13) in the decision tree evaluations, and xB, the size of region B in the longitudinal direction—a threshold of triggering the path planner.