Stochastic Mixture Modeling of Driving Behavior During Car Following
 Author: Angkititrakul Pongtep, Miyajima Chiyomi, Takeda Kazuya
 Organization: Angkititrakul Pongtep; Miyajima Chiyomi; Takeda Kazuya
 Publish: Journal of information and communication convergence engineering Volume 11, Issue2, p95~102, 30 June 2013

ABSTRACT
This paper presents a stochastic driver behavior modeling framework which takes into account both individual and general driving characteristics as one aggregate model. Patterns of individual driving styles are modeled using a Dirichlet process mixture model, as a nonparametric Bayesian approach which automatically selects the optimal number of model components to fit sparse observations of each particular driver’s behavior. In addition, general or background driving patterns are also captured with a Gaussian mixture model using a reasonably large amount of development data from several drivers. By combining both probability distributions, the aggregate driverdependent model can better emphasize driving characteristics of each particular driver, while also backing off to exploit general driving behavior in cases of unseen/unmatched parameter spaces from individual training observations. The proposed driver behavior model was employed to anticipate pedal operation behavior during carfollowing maneuvers involving several drivers on the road. The experimental results showed advantages of the combined model over the model adaptation approach.

KEYWORD
Car following , Driver behavior , Mixture model , Model adaptation , Nonparametric Bayesian

I. INTRODUCTION
Predicting driving behavior by employing mathematical driver models, which are obtained directly from the observed drivingbehavior data, has gained much attention in recent research. Various approaches have been proposed for modeling driving behavior based on different interpretations and assumptions, such as the piecewise autoregressive exogenous (PWARX) model [1,2], hidden Markov model (HMM) [3], neural network (NN) [4], and Gaussian mixture model (GMM) [5]. These approaches have reported impressive performance on simulated and controlled driving data. Some of these promising techniques exploit a set of localized relationships to model driving behavior (e.g., mixture models, piecewise linear models). These models assume that the observed data are generated by a set of latent components, each having different characteristics and corresponding parameters. Therefore, complex driving behavior can be broken down into a reasonable number of subpatterns. For instance, during car following, it is believed that drivers adopt different driving patterns or driving modes (e.g., normal following, approaching) under different driving situations, depending on individual and contextual factors. One challenge in behavior modeling is to determine how many latent classes or localized relationships exist between the stimuli and the driver’s responses (i.e., model selection problem), and to estimate the properties of these hidden components from the given observations. In general, a tradeoff in selecting the number of components arises: with too many components, the obtained model may overfit the data, while a model with too few components may not be flexible enough to represent an underlying distribution of observations.
A finite GMM [6] is a wellknown probabilistic and unsupervised modeling technique for multivariate data with an arbitrarily complex probability density function (pdf). Expectationmaximization (EM) is a powerful algorithm for estimating parameters of finite mixture models that maximizes the likelihood of observed data. However, the EM algorithm is sensitive to initialization (i.e., it may converge to a local maximum), and may converge to the boundary of a parameter space, leading to a meaningless estimate [6]. Moreover, EM provides no explicit solution to the model selection problem, and may not yield a wellbehaved distribution when the amount of training data is insufficient.
Recently, the Dirichlet process mixture model (DPM), a nonparametric Bayesian approach, has been proposed to circumvent such issues [7,8]. Unlike finite mixture models, DPM estimates the joint distribution of stimuli and responses using a Dirichlet process mixture by assuming that the number of components is random and unknown. Specifically, a hidden parameter is first drawn from a base distribution; consequently, observations are generated from a parametric distribution conditioned on the drawn parameter. Therefore, DPM avoids the problem of model selection by assuming that there are an infinite number of latent components, but that only a finite number of observations could be observed. Most importantly, DPM is capable of choosing an appropriate number of latent components to explain the given data in a probabilistic manner. DPM has been successfully applied in several applications such as modeling content of documents and spike sorting [7,9].
In car following, driver behavior is influenced by both individual and situational factors [10,11]; hence, the best driver behavior model for each particular driver should be obtained by using individual observations that include all possible driving situations. However, at present, it is not practical to collect such a large amount of driving data from one particular driver in order to create a driverspecific model. To circumvent this issue, a general or universal driver model, which is obtained by using a reasonable amount of observations from several drivers, is used to represent driving behavior in a broad sense (e.g., average or common relationships between stimuli and responses). Subsequently, a driverdependent model can be obtained using a model adaptation framework that can automatically adjust the parameters of the universal driver model by shifting the localized distributions towards the available individual observations [5].
In this paper, we proposed a new stochastic driver behavior model that better represents underlying individual driving characteristics, while retaining general driving patterns. To cope with sparse amounts of individual driving data and the model selection problem, we employed DPM to train an individual driver behavior model in order to capture unique driving styles from available observations. Furthermore, in order to cope with unseen or unmatched driving situations that may not be present in individual training observations, we employed a GMM with a classical EM algorithm to train a universal driver model from observations of several drivers. Finally, the driverdependent model is obtained by combining both driver models into one aggregate model in a probabilistic manner. As a result, the combined model contains both individual and background distributions that can better represent both observed and unobserved driving behavior of individual drivers.
Experimental validation was conducted by observing the carfollowing behavior of several drivers on the road. The objective of a driver behavior model is to anticipate carfollowing behavior in terms of pedal control operations (i.e., gas and brake pedal pressures) in response to the observable driving signals, such as the vehicle velocity and the following distance behind the leading vehicle. We demonstrated that the proposed combined driver model showed better prediction performance than both individual and general models, as well as the driveradapted model based on the maximum a posteriori (MAP) criterion [5].
II. CAR FOLLOWING AND DRIVER BEHAVIOR MODEL
Carfollowing characterizes longitudinal behavior of a driver while following behind another vehicle. In this study, we focus on car following in the sense of the way the behavior of the driver of a following vehicle is affected by the driving environment (i.e., the behavior of the leading vehicle) and by the status of the driver’s own vehicle. There are several contributory factors in carfollowing behavior such as the relative position and velocity of the following vehicle with respect to the lead vehicle, the acceleration and deceleration of both vehicles, and the perception and reaction time of the following driver.
Fig. 1 shows a basic diagram of car following and its corresponding parameters, where
v^{f}_{t} ,a^{f}_{t} ,f_{t} ,x^{f}_{t} represent the vehicle velocity, acceleration and deceleration, distance between vehicles, and observed feature vector at timet , respectively.In general, a driver behavior model predicts a pattern of pedal depression by a driver in response to the present velocity of the driver’s vehicle and the relative distance between the vehicles. Subsequently, the vehicle velocity and the relative distance are altered corresponding to the vehicle dynamics, which responds to the driver’s control behavior of the gas and brake pedals. Most conventional carfollowing models [1214] ignore the stochastic nature and multiple states of driving behavior characteristics. Some models assume that a driver’s responses depend on only one stimulus such as the distance between vehicles. In this study, we aim to model driver behavior by taking into account stochastic characteristics with multiple states involving multidimensional stimuli. Therefore, we adopt stochastic mixture models to represent driving behavior.
III. STOCHASTIC DRIVER MODELING
The underlying assumption of a stochastic driver behavior modeling framework is that as a driver operates the gas and brake pedals in response to the stimuli of the vehicle velocity and following distance, the patterns can be modeled accordingly using the joint distribution of all the correlated parameters. In the following subsections, we will describe driver behavior models based on GMM, DPM, and the model combination.
> A. Gaussian Mixture Model
In a finite mixture model, we assume that
K latent (hidden) components with different characteristics and corresponding parameters (θ_{k} ) underlie the observed dataThe observed data are generated from a mixture of these multiple components. In particular, the total amount of data generated by component
K is defined by its mixing probabilityπ_{k} . The model is formulated as:where
p (O ) denotes the pdf ofO andIn general, the hidden parameters (
θ ={μ, Σ} ) and mixing probability can be obtained or trained automatically by maximizing standard evaluation functions such as the maximum likelihood (ML) criterion. The most practical and powerful method for obtaining ML estimates of the parameters is the EM algorithm. However, the major drawback of the EM algorithm is that it is necessary to determineK in advance. In addition, specifying the correct value ofK is not an easy task and using an improper value forK may degrade model fitting [6], given that obtaining welldefined fullcovariance matrices for higher values ofK requires a large amount of training data. Further details on GMMbased driver models can be found in [5].> B. Dirichlet Process Mixture Model
By adopting a fully Bayesian approach, DPM does not require
K to be specified; instead, it chooses an appropriate number of components to explain the given data in a probabilistic manner. In a Bayesian mixture model, we assume that the underlying distribution of observationsO can be represented by a mixture of parametric densities conditioned on a hidden parameterθ ={μ, Σ} . In the abovementioned finite mixture model, the EM algorithm assumes that the prior probability of all hypotheses is equal, and hence seeks a single model with the highest posterior probability. However, in DPM, the hidden parameterθ is also considered to be a random variable that is drawn from a probability distribution, particularly a Dirichlet process, as:where
α is a concentration parameter, andG_{0} is a base distribution. The DPM here chooses the conjugate in advance for the model parameters: Dirichlet forπ , and normalinverse Wishart (NIW) forθ (therefore, both prior and posterior distributions are in the same family):where NIW is represented by a mean vector
μ_{0} and its scaling parameterυ , and a covariance matrix Λ with its scaling parameterα . These parameters are used to encode our prior belief regarding the shape and position of the mixture density. Finally, the posterior distribution of this model can be expressed by:where
indicates the component ownership or mixture index of each observation. One can obtain samples from this distribution using Markov chain Monte Carlo (MCMC) methods [8], particularly Gibbs sampling, in which new values of each model parameter are repeatedly sampled, conditioned on the current values of all the other parameters. Eventually, Gibbs samples approximate the posterior distribution upon convergence. As a result, this avoids the problem of model selection and local maxima by assuming that there are an infinite number of hidden components, but only a finite number of which could be observed from the data.
As the state of the distribution consists of parameters
C and Θ, the Gibbs sampling will first sample the new values of Θ conditioned on the initializedC and most recent values of the other variables as:where Θ
_{？k} = {θ 1,...,θ_{k} _{1},θ_{k} _{+1},...,θ_{N} } andP_{NIW} (θ_{k} ) is the probability ofθ_{k} under a given NIW. Subsequently, given a new Θ,C can be sampled according to the following conditional distribution:where
C_{？i} = {c _{1},...,c_{i} _{1},c_{i} _{+1},...,c_{N} }P (c_{i} C_{？i} ). The term can be derived using the Chinese restaurant process (a generalization of a Dirichlet process) [7]:where
m_{k} is the number of data in clusterk . Both steps are repeated iteratively until it converges. Further details can be found in [79].> C. Maximum A Posterior Adaptation
Also known as Bayesian adaptation, MAP adaptation reestimates the model parameters individually by shifting the original statistic toward the new adaptation data. Given a set of adapting data, {
o_{n} },n = 1,…,N , and an initialized GMM (i.e., driver model), the adapted GMM can be obtained by modifying the mean vectors as follows :where,
r is a constant relevant factor (e.g., [15]), andk andE_{k} can be computed aswhere
h_{k} (o_{n} ) is a posterior probability that on belongs to thek th component, aswhere
is the marginal probability of the observed parameter
o_{n} generated by thek th Gaussian component.The adapted model is thus updated so that the mixture components with high counts of data from a particular characteristic/correlation rely more on the new sufficient statistic of the final parameters. More discussion of MAP adaptation for a GMM can be found in [16].
> D. Model Combination
Fig. 2 illustrates an example of an observed driving trajectory (solid line) overlaid with a corresponding pdf generated by the welltrained DPM (the smaller pdf plot). The bigger pdf plot in the background represents a general joint distribution (e.g., the universal driver model). The dotted line represents an unseen carfollowing trajectory during the validation stage. As we can see, the individual driver model obtained using a DPM is better at modeling the joint probability of the observed driving trajectory than the universal background driver model. However, the individual model is focused on parameter space that does not cover the test driving trajectory, and hence cannot represent unseen driving behavior. Although not particularly optimized for this particular driver, the universal background model can better represent common driving behavior in most situations.
By combining these two probability distributions into a single aggregate distribution, the resulting driverdependent model can better represent individual driving characteristics that were previously observed by the individual distribution (Θ_{individual}), as well as explain unseen driving characteristics by the background distribution (Θ_{general}). In this study, we apply weighted linear aggregation of two probability distributions as
where 0 ≤
δ ≤ 1.0 is the mixing weight. This simple combination method is easy to comprehend and performs as well as more complex aggregation models. Moreover, the aggregation result satisfies the axioms of probability distribution, especially marginalization property [17]. As the mixing density components of both a DPM and GMM are assumed to be Gaussian, the combined mixture model can be obtained by merging all mixtures of both distributions and then constraining all the mixing weights to be equal to one.IV. MIXTURE MODEL REGRESSION
In a regression problem, an observation consists of both input stimuli and output responses (
O = {X ,Y }). Given a new set of stimuli x_{new}, the corresponding responses can be predicted via its conditional expectationE (Y x_{new} ). In Bayesian regression, given a joint (Gaussian) distribution betweenX andY , the posterior probability can be computed as follows:where the mean vector
μ is a concatenation of a mean vector of the present observationμ_{x} and a mean of response valueμ_{y} . Similarly, the covariance matrix is composed of the autocovariance and crosscovariance matrices of these two parameter sets.Thus, the optimal prediction of the observation
x_{new} given by each mixture component can be represented as the posterior expectation as:Consequently, the predicted responses
y_{pred} , givenx_{new} and a number of Gaussian components can be computed as:V. EXPERIMENTAL EVALUATION
> A. Data Preprocessing
The driving signals utilized are limited to following distance (m), vehicle velocity (km/hr), and gas and brake pedal forces (N), obtained from a realworld driving corpus [18]. All the acquired analog driving signals from the sensory systems of the instrumented vehicle are resampled to 10 Hz, as well as rescaled into their original units. The offset values caused by gas and brake pedal sensors are removed from each file, based on estimates obtained using a histogrambased technique. Furthermore, manual annotation of drivingsignal data and driving scenes was used to verify that only concrete carfollowing events with legitimate driving signals that last more than 10 seconds are considered in this study. Cases where the lead vehicle changes its lane position, or another vehicle cuts in and then acts as a new lead vehicle are regarded as two separate carfollowing events. Consequently, the evaluation is performed using approximately 300 minutes of clean and realistic carfollowing data from 64 drivers. The data was randomly partitioned into two subsets of drivers for the opentest evaluation (i.e., training and validation of the driver behavior model). All the following evaluation results are reported as the average of both subsets, except when stated otherwise.
> B. Feature Vector
In this study, an observed feature vector (stimuli) at time
t ,x_{t} , consists of the vehicle velocity, following distance, and pedal pattern (P_{t} ) with their firstorder (Δ) and secondorder (Δ^{2}) derivatives as:where the Δ(·) operator of a parameter is defined as
where
L is a window length (e.g., 0.8 seconds). Here, the driver’s response parameterY is future pedal operationP_{t+1} . Consequently, the observed feature vectors ot can be defined as> C. SignaltoDeviation Ratio
In order to assess the ability of the driver behavior model to anticipate pedal control behavior, the difference between the predicted and actually observed gaspedal operation signals is used as our measurement. The signaltodeviation ratio (SDR) is defined as follows
where
T is the length of the signal,G (t ) is the actually observed signal, and？ (t ) is the predicted signal.> D. Evaluation Results
First, the individual driver model is obtained by training a DPM with individual driving data [19]. Again, a DDPM automatically selects the appropriate number of mixture components that best fits the training observations. Next, the general driver models or universal background models (UBM) were obtained by employing the EM algorithm, using driving data from a pool of several drivers in the development set. In this study, we prepared the UBMs with 4, 8, 16, and 32 mixtures for comparison. Subsequently, a driverdependent model is obtained by merging the DPMbased individual driver model and the general driver models (UBMs).
Fig. 3 shows the prediction performance of the proposed combined models using a 16mixture UBM with different weighting scales (i.e.,
δ ) varying from 0 to 1.0. Without the background model, the individual model alone showed the worst prediction performance. This implied a significant portion of unmatched driving situations between the training and test data of each individual driver. However, merging the individual model and the background model provided a significant improvement over what either the individual model or the background model could achieve alone. The best performance in this experiment was achieved using a weighting scale of around 0.3, which resulted in a prediction performance of 19.95 dB.Fig. 4 illustrates example pdfs generated by an individual model (DPM), general model (UBM), driveradapted model (UBMMAP), and combined model (UBM+DPM). Finally, Fig. 5 compares the gaspedal prediction performance off various driver models based on a DPM , UBMMAP adaptation (with 4, 8, 16, and 32 mixtures), and the proposed combined DPMUBM models obtained from the same UBM sets.
In contrast to the EMbased individual driver model, the driveradapted (UBMMAP) models tended to have better performance as the number of mixture components increased. This is because a reasonable amount of training data is needed to train a welldefined UBM, and some local mixtures were then adapted to better fit individual driving characteristics. When we combined the UBMs with the DPMbased individual model, the prediction performance was better than the driveradapted (UBMMAP) model. The best performance was obtained by combining the 16mixture UBM with DPMs that contained approximately 10 mixtures per driver on the average. Although the total number of components in the combined model is more than the original UBM, the achieved performance is considerably better than the 32mixture UBMMAP adapted model with fewer total mixtures (26 mixtures per driver on average).
VI. CONCLUSIONS
In this paper, we presented a stochastic driver behavior model that takes into account both individual and general driving characteristics. In order to capture individual driving characteristics, we employed a DPM, which is capable of selecting the appropriate number of components to capture underlying distributions from a sparse or relatively small number of observations. Using different approach, a general driver model was obtained by using a parametric GMM trained with a reasonable amount of data from several drivers, and then employed as a background distribution. By combining these two distributions, the resulting driver model can effectively emphasize a driver’s observed personalized driving styles, as well as support many common driving patterns for unseen situations that may be encountered. The experimental results using ontheroad carfollowing behavior showed the advantages of the combined model over the adapted model. Our future work will consider a driver behavior model with tighter coupling between individual and general characteristics, while reducing the number of model components used, in order to achieve more efficient computation.

[Fig. 1.] Carfollowing and corresponding parameters.

[Fig. 2.] Illustration of the observed driving trajectory (solid line) overlaid with corresponding pdf of the trained DPM (smaller pdf). The bigger pdf represents the universal or background model. The dotted trajectory represents unseen/unmatched driving data from training observations. pdf: probability density function, DPM: Dirichlet process mixture model, GMM: Gaussian mixture model.

[Fig. 3.] Prediction performance oof combined driver models using different weighting scales. SDR: signaltodeviation ratio.

[Fig. 4.] Example probability density function generated by different driver models. DPM: Dirichlet process mixture model, UBM: universal background model, MAP: maximum a posteriori, GMM: Gaussian mixture model.

[Fig. 5.] Gaspedal prediction performance employing different driver models. SDR: signaltoodeviation ratio, UBM: universal background model, DPM: Dirichlet process mixture model, MAP: maximum a posteriori.