Modern Probabilistic Machine Learning and Control Methods for Portfolio Optimization
 Author: Park Jooyoung, Lim ungdong, Lee Wonbu, Ji Seunghyun, Sung Keehoon, Park Kyungwook
 Organization: Park Jooyoung; Lim ungdong; Lee Wonbu; Ji Seunghyun; Sung Keehoon; Park Kyungwook
 Publish: International Journal of Fuzzy Logic and Intelligent Systems Volume 14, Issue2, p73~83, 25 June 2014

ABSTRACT
Many recent theoretical developments in the field of machine learning and control have rapidly expanded its relevance to a wide variety of applications. In particular, a variety of portfolio optimization problems have recently been considered as a promising application domain for machine learning and control methods. In highly uncertain and stochastic environments, portfolio optimization can be formulated as optimal decisionmaking problems, and for these types of problems, approaches based on probabilistic machine learning and control methods are particularly pertinent. In this paper, we consider probabilistic machine learning and control based solutions to a couple of portfolio optimization problems. Simulation results show that these solutions work well when applied to real financial market data.

KEYWORD
Machine learning , Portfolio optimization , Evolution strategy , Value function.

1. Introduction
Recent theoretical progress in the field of machine learning and control has many implications for related academic and professional fields. The field of financial engineering is one particular area that has benefited greatly from these advancements. Portfolio optimization problems [1–8] and the pricing/hedging of derivatives [9] can be performed more effectively using recently developed machine learning and control methods. In particular, since portfolio optimization problems are essentially optimal decisionmaking problems that rely on actual data observed in a stochastic environment, theoretical and practical solutions can be formulated in light of recent advancements. These problems include the traditional meanvariance efficient portfolio problem [10], index tracking portfolio formulation [6–8, 11], riskadjusted expected return maximizing strategy [1, 2, 12], trend following strategy [13–17], longshort trading strategy (including the pairs trading strategy) [13, 18–20], and behavioral portfolio management.
Modern machine learning and control methods can effectively handle almost all of the portfolio optimization problems just listed. In this paper, we consider a solution to the trend following trading problem based on the natural evolution strategy (NES) [21–23, 25] and a riskadjusted expected profit maximization problem based on an approximate value function (AVF) method [27–30].
This paper is organized as follows: In Section 2, we briefly discuss relevant probabilistic machine learning and control methods. The exponential NES and iterated approximate value function method, which are the two main tools employed in this paper, are also summarized. Solutions to the trend following trading problem and the riskadjusted expected profit maximization problem as well as simulation results using real financial market data are presented in Section 3. Finally, in Section 4, we present our concluding remarks.
2. Modern Probabilistic Machine Learning and Control Methods
In this section, we describe relevant advanced versions of the NES and AVF methods that will be applied later in this paper.
The NES method belongs to a family of evolution strategy (ES) type optimization methods. Evolution strategy, in general, attempts to optimize utility functions that cannot be modeled directly, but can be efficiently sampled by users. A probability distribution (typically, a multivariate Gaussian distribution) is utilized by NES to generate a group of candidate solutions. In the process of updating distribution parameters based on the utility values of candidates, NES employs the socalled natural gradient [23, 31] to obtain a samplebased search direction. In other words, the main idea of NES is to follow a sampled natural gradient of expected utility to update the search distribution. The samples of NES are generated according to the search distribution π(·∣
θ ), and by utilizing these samples, NES tries to locate a parameter update direction that will increase the performance index,J (θ ). This performance index is defined as the expected value of the utility,f(z) , under the search distribution:Note that by the loglikelihood strategy, the gradient of the expected utility with respect to the search distribution parameter
θ can be expressed asHence, when we have independent and identically distributed samples,
z_{i}, i ∈ {1,···,n }, a samplebased approximation to the regular policy gradient (often referred to as the vanilla gradient) of the expected utility can be expressed asIt is widely accepted that using the natural gradient is more advantageous than the vanilla gradient when it is necessary to search optimal distribution parameters while staying close to the present search distribution [23, 31]. In the natural gradient based search method, the search direction is obtained by replacing the gradient ∇
_{θ}J(θ) with the natural gradient defined by F^{1}(θ )∇_{θ}J(θ) , whereF (θ ) =E [∇_{θ} log π(x ∣θ )∇_{θ} log π(x ∣θ )^{T} ] is the Fisher information matrix. Note that the Fisher information matrix can be estimated from samples. Therefore, the core procedure of the NES algorithm can be summarized as follows [22]:Preliminary steps :1. Choose the learning rate, η, number of samples in each generation, n, and utility function, f. 2. Initialize parameter θ of the search distribution π(·∣θ).Main steps : Repeat the following procedure until the stopping condition is met.1. For i = 1, · · · , n Draw a sample zi from the current search distribution. Compute the utility of the sample, f(zi). Compute the gradient of the loglikelihood, ∇θ logπ(ziθ). end
2. Obtain the MonteCarlo estimate of the gradient:
3. Obtain the MonteCarlo estimate of the Fisher information matrix:
4. Update parameter θ of the search distribution:
The procedure shown above is a basic form of NES and can be modified based on the application. For example, the concept of the baseline can be employed to reduce the estimation variance [21]. Recent improvements to the NES procedure can be found in [21–23, 25], and one of the most remarkable improvements is the exponential NES [23, 25]. The main idea of the exponential NES method is to represent the covariance matrix of the multivariate Gaussian distribution,
π (·θ ), using the following matrix exponential map:Two remarkable advantages of using the matrix exponential are that it enables the covariance matrix to be updated in a vector space, and it makes the resultant algorithm invariant under linear transformations. The key idea of the exponential NES is the use of natural coordinates defined according to the following change of variables [23]:
where
A andM are matrix variables satisfyingC =A^{T} A =exp (M ) for the covariance matrixC of the search distribution. The use of natural coordinates renders the step of inverting the Fisher information matrix unnecessary, hence, bypassing a major computational burden of the original NES. In Section 3, we apply the exponential NES, which is now a stateoftheart evolution strategy, to the problem of finding a flexible longflatshort type rule for a trend following trading strategy.Another tool used in portfolio optimization applications of this paper is a special class of approximate value function methods [27–30]. In general, stochastic optimal control problems can be solved by utilizing state value functions, which estimate performance at a given state. The solutions of stochastic optimal control problems based on the value function are called dynamic programming. For more details on the various applications of dynamic programming, please refer to [32, 33]. Solving stochastic control problems by dynamic programming corresponds to finding the best statefeedback control policy
to optimize the performance index of specified constraints and dynamics
where
x (t ) is the state,u (t ) is the control input, andw (t ) is the disturbance. The expected sum of discounted stage costs, is widely used as the performance index. The minimal performance index value,J *, is obtained by minimizing the performance index over all admissible control policies; the optimal control policy achievingJ * is denoted by The state value function is defined as the minimum total expected cost achieved by an optimal control policy for the given initial statex (0) =z . Formally,Note that the optimal performance index value is
J * =V *(x _{0}) for the given initial conditionx _{0}. The state value function in (11) is the fixed point of the following Bellman equation:In operator equation form, this fixed point property can be written as
where
Because it is hard to compute optimal control policy that satisfies the Bellman equation except special case [32], the AVF, :
X →U , is utilized to obtain approximate solutions to the stochastic control problem. In an example considering a set of real financial market data, we utilize an ADPbased solution procedure utilizing the iterated AVF policy approach of O’Donoghue, Wang, and Boyd [29]. In the iterated AVF method [29], by letting the parameters of the approximate value functions satisfy the iterated Bellman inequalitieswith , it is ensured that is a lower bound of the optimal state value function V* [27, 29]. Also by optimizing this lower bound via convex optimization, the iterated AVF approach finds the approximate state value functions, , and the associated policies Note that in this step, the associated iterated AVF policies are obtained by
for
t = 0, . . . ,M , andfor
t >M [29].3. Machine Learning and Control Based Portfolio Optimization
In this section, we present probabilistic machine learning and control based solutions to two important portfolio optimization problems: the trend following trading problem and the riskadjusted expected profit maximization problem. The first topic of our portfolio application concerns trading strategy. There have been a great deal of theoretical studies on trading strategies in financial markets; however, a majority of them are focused on the contratrend strategy, which includes trading policies for a meanreverting market [14]. Research interests in trendfollowing trading rules are also growing. A strong mathematical foundation has led to some important theorems regarding the trend following trading strategy [14–17]. We consider an exponential NES based solution to find an efficient trend following strategy. One of the key references for our solution is the stochastic control approach of Dai et al. [14, 15]. In [14], the authors considered a bullbear switching market, where the drift of the asset price switches back and forth between two values representing the bull market mode and the bear market mode. These switching patterns follow an unobservable Markov chain. More precisely, the governing equation for the asset price,
S_{r} , in [14] is given bywhere
α_{r} ∈ {1, 2} is the mode of the market at timer , and its movement follows a twostate Markov chain. The first state,α_{r} = 1, represents the bull market mode, and its second state,α_{r} = 2, represents the bear market mode. Drifts,μ (1) =μ _{1} andμ (2) =μ _{2}, represent the expected return rates in the bull and bear markets, respectively. Clearly, these drift values must satisfyμ _{1} > 0 andμ _{2} < 0. The Markov chain for the movement ofα_{r} is described by the following generator [14]:Note that in this Markov chain generator, λ_{1} and λ_{2} are the switching intensities for the bulltobear transition and bearto bull transition, respectively. In [14], it is assumed that the switching market mode, {
α_{r} }, and the Brownian motion for the asset price variation, {B_{r} }, are independent. Moreover, Dai et al. [14] showed that the optimal trend following longflat type trading rules can be found by solving the associated Hamilton JacobiBellman (HJB) equation and employing the conditional probability for the bull market to generate the trade signals. In their problem formulation, the transaction cost and riskfree interest rate were fixed at 100K [%] andρ , respectively. The resulting optimal buying times, τ_{1}, τ_{2}, · · · , and selling times,v 1,v 2, · · · , were obtained by optimizing the performance index given byif initially flat, and
if initially long [14]. The results of [14] are mathematically rigorous and establish a strong theoretical justification for the trend following trading theory. We utilize a less mathematical, but hopefully easier to understand approach, which is based on the exponential NES method [23, 25] for the trend following trading problem. Note that in the exponential NES approach, the performance index may be chosen with more flexibility. This paper extends our previous work on this topic [13] in two ways. First, we utilize a more advanced version of the NES–the exponential NES [23, 25]–that is now the stateoftheart method in the field. Second, we focus on more flexible longflatshort type trading rules whereas our previous paper considered only the longflat strategy. According to [14], there exist two monotonically increasing optimal sell and buy boundaries, and , in case of the finitehorizon problem for obtaining the optimal trend following longflat type trading rule. When longterm investments are emphasized, the behavior of the system is similar to the infinite horizon case, and these threshold functions can be approximated by constants [14, 15]. Following this approximation scheme, we try to find threshold constants for all possible transitions that can occur in longflatshort type trading, i.e., by applying the exponential NES method [23, 25]. The priceseries sample paths generated in accordance with the switching geometric Brownian motion (18) are required during the training phase. Simulating both Markov chains and geometric Brownian motions is not difficult; thus, the priceseries sample path generation can be performed efficiently. By combining the thresholds found by the exponential NES method together with the Wonham filter [14, 34] to estimate the conditional probability that the mode of the market is bull, one can obtain a longflatshort type trend following trading strategy. To illustrate the applicability of the exponential NES based trading strategy, we considered the problem of determining a trend following trading rule for the NASDAQ index. For the example, NASDAQ closing data from 1991 to 2008 was considered (see Fig. 2). According to the estimation results of [14], the parameters of the switching geometric Brownian motion for the NASDAQ are as follows: μ1 = 0:875, μ2 = −1.028, σ1 = 0.273, σ2 = 0.35, σ = 0.31, λ1 = 2.158, λ2 = 2.3, where
σ is the simple average ofσ _{1} andσ _{2}. Furthermore, the ratio of slippage per transaction and the riskfree interest rate were fixed as K = 0.001, and ρ = 5.4%, respectively. For training data, we generated episodes by utilizing parameter estimation results. By performing the exponential NES based training for these episodes, we obtained the threshold values for the trend following trading rules.Figures 16 show the simulation results for the exponential NES based trend following trading rules. For these simulations, we set the initial wealth to one. Figure 1 shows the learning curve, which graphs the average total cost sums versus the policy update over a set of 10 simulation runs. As shown in the curve, the exponential NES method exhibits desirable behavior within less than 250 policy updates. This indicates that the exponential NES method works well for finding a longflatshort type trading strategy. Figure 2 shows the NASDAQ index values together with the longflatshort trading signals resulting from a policy obtained by the exponential NES method over the entire period. For comparison purposes, we also show the longflat trading signals obtained by the NES approach [13]. According to Fig. 2, the total number of position changes in the longflatshort type trading strategy (2nd panel) is 345. Note that this value differs from the corresponding number of position changes in the longflat type trading strategy (3rd panel, 100 changes) obtained by the NES approach [13]. Simulation results in Fig. 3 show that with the exponential NES based longflatshort type trading strategy, trade returns are generally large and wealth steadily increases until it reaches 21.52 at the final time. In comparison, Fig. 4 shows that with the longflat type trading rule, the trade returns are generally small, and wealth increases at a relatively slower rate. Its wealth value at the final time is only 10.65. From these simulation results, one can see that when
K = 0.001, short positions slightly changed the number of trading and significantly improved wealth. To investigate the robustness of NES based trading rules against system changes, we performed simulations for various values of the transaction cost ratio. In particular, we considered the case when the transaction cost increased tenfold (i.e.,K = 0.01); simulation results are shown in Figs. 5 and 6. Figure 5 shows the trading positions resulting from the longflatshort type strategy (top) and the longflat type strategy (bottom) whenK = 0.01. When the trading cost increases, both the longflat short type strategy and the longflat type strategy trade less frequently compared to the case whenK = 0.001. Interestingly, the trading frequency of the longflatshort type strategy decreased at a slower rate than that of the longflat type trading strategy. We believe that this difference is due to the fact that the longflatshort type strategy has more flexibility and can cope with system changes with less sensitivity. Finally, Fig. 6 shows the wealth resulting from the longflatshort type policy (top) and the longflat type policy (bottom) whenK = 0.01. Note that the wealth values of the longflatshort type strategy and the longflat type strategy at the final time are 10.85 and 5.09, respectively. These values are still considerably larger than the wealth values of the buyandhold strategy (3.34) and the riskfree interest rate (2.74).For the second application example, we considered the riskadjusted, expected profit maximization problem and utilized an AVF [27–30] based procedure to find an efficient solution. To express the riskadjusted, expected profit maximization problem in statespace format, it is necessary to define the state and control input together with the performance index that is used as an optimization criterion. To do this, we follow the research of Boyd et al. [1, 28, 30]. We define the state vector as the collection of the portfolio positions. Let
x_{i}(t) denote the dollar value of asseti at the beginning of timet . Then the state vector is given byThe control input considered for this problem is a vector of trades,
executed for portfolio
x(t) at the beginning of each time stept . Note thatu_{i}(t) represents buying or selling assets; the asset associated withx_{i}(t) is bought whenu_{i}(t) ＞ 0 and sold whenu_{i}(t) ＜ 0. Having these state and input definitions, the state transition is given bywhere
r(t) is the vector of asset returns in periodt . The return vector,r(t) , is independent and identically distributed with mean vector and covariance matrix . Note that the mean vector and covariance matrix do not change over time. For the performance index, we consideredwhere the total gross cash entered in the portfolio is 1
^{T} u ,λ (x +u )^{T} Ʃ(x +u ) is the quadratic posttrade risk penalty,u^{T} diag(s )u is the quadratic transaction cost, andĸ ∣u ∣ is the linear transaction cost. Furthermore,λ ≥ 0 is the risk aversion parameter,s_{i} ≥ 0 is the priceimpact cost for thei th asset, andĸ is the ratio of slippage per transaction. We considered the case when the initial portfoliox (0) was fixed atx _{0}. In general, when portfolio optimization problems are solved, the control inputu(t) should satisfy certain, naturally arising constraints. In particular, we considered the control input bound constraint:which means that only a limited amount of trading is allowed for each asset. Thus, the riskadjusted profit maximization problem can be expressed as
To solve this optimization problem, we utilized the iterated AVF approach [29], which is one of the most advanced AVF methods. In the iterated AVF approach, convex quadratic functions
are used to approximate the state value function at time
t , and the parameters satisfy a series of Bellman inequalitieswith The Bellman inequalities guarantee that is a lower bound of the optimal state value function [27–29]. The iterated AVF method maximizes this lower bound using convex optimization [29]. Another constraint is kuk1 ∥
u ∥_{∞} ≤U _{bdd}, which can be written in terms of quadratic inequalities aswhere
e_{i} is thei th column of the identity matrixI_{n} . This constraint enables us to obtain sufficient conditions for the constrained Bellman inequality requirements in the form of linear matrix inequalities (LMIs) using the Sprocedure [28, 35]. As a realmarket example of portfolio optimization, we examined an application of the iterated AVF approach [29] for a set of real financial market data [2, 6]. The data considered five major stocks: IBM, 3M, Altria, Boeing, and AIG (the ticker symbols of these are IBM, MMM, MO, BA, and AIG, respectively). For the training data, we used the weekly prices of the five major stocks from Jan. 2, 1990 to Dec. 27, 2004, and obtained the exponentially weighted moving average (EWMA) of the mean return vector, (with the effective window size of three years), and covariance matrix, Ʃ. During the test period (2005 to 2007), iterated AVF based trading was performed every four weeks (20 trading days). For the riskfree rate, we assumedρ = 0.05 as in [2], and the discount factor,ϒ , was defined accordingly (i.e.,ϒ = exp(ρ /(52/4))). For the upper bound of the trading amount, we usedU _{bdd} = 20. The coefficients of the risk penalty and transaction costs wereλ = 0.005,ĸ = 0.005,s_{i} = 0.005,i = 1, , 5.For the Bellman inequalities in (28), we considered
M = 150 time steps. Finally, the initial portfolio vector and the initial wealth level were chosen to bex _{0} = [0, 0, 0, 0, 0]^{T} andW = 100, respectively.The control input considered for this problem is a vector of trades,
executed for portfolio
x (t ) at the beginning of each time stept . Note thatu_{i} (t ) represents buying or selling assets; the asset associated withx_{i} (t ) is bought whenu_{i} (t ) > 0 and sold whenu_{i} (t ) < 0. Having these state and input definitions, the state transition is given bywhere
r (t ) is the vector of asset returns in periodt . The return vector,r (t ), is independent and identically distributed with mean vector and covariance matrix Note that the mean vector and covariance matrix do not change over time. For the performance index, we considered where the total gross cash entered in the portfolio is 1^{T}u , λ (x +u )^{T} Σ(x +u ) is the quadratic posttrade risk penalty,u^{T} diag(s )u is the quadratic transaction cost, andκ u  is the linear transaction cost. Furthermore, λ ≥ 0 is the risk aversion parameter,s_{i} ≥ 0 is the priceimpact cost for thei th asset, andκ is the ratio of slippage per transaction. We considered the case when the initial portfoliox (0) was fixed atx _{0}. In general, when portfolio optimization problems are solved, the control inputu (t ) should satisfy certain, naturally arising constraints. In particular, we considered the control input bound constraint:which means that only a limited amount of trading is allowed for each asset. Thus, the riskadjusted profit maximization problem can be expressed as
To solve this optimization problem, we utilized the iterated AVF approach [29], which is one of the most advanced AVF methods. In the iterated AVF approach, convex quadratic functions
are used to approximate the state value function at time
t , and the parameters satisfy a series of Bellman inequalitieswith The Bellman inequalities guarantee that is a lower bound of the optimal state value function [27–29]. The iterated AVF method maximizes this lower bound using convex optimization [29]. Another constraint is ∥
u ∥_{∞} ≤U _{bdd}, which can be written in terms of quadratic inequalities aswhere
e_{i} is thei th column of the identity matrixI_{n} . This constraint enables us to obtain sufficient conditions for the constrained Bellman inequality requirements in the form of linear matrix inequalities (LMIs) using the Sprocedure [28, 35]. As a realmarket example of portfolio optimization, we examined an application of the iterated AVF approach [29] for a set of real financial market data [2, 6]. The data considered five major stocks: IBM, 3M, Altria, Boeing, and AIG (the ticker symbols of these are IBM, MMM, MO, BA, and AIG, respectively). For the training data, we used the weekly prices of the five major stocks from Jan. 2, 1990 to Dec. 27, 2004, and obtained the exponentially weighted moving average (EWMA) of the mean return vector, (with the effective window size of three years), and covariance matrix, Σ. During the test period (2005 to 2007), iterated AVF based trading was performed every four weeks (20 trading days). For the riskfree rate, we assumed ρ = 0.05 as in [2], and the discount factor,γ , was defined accordingly (i.e.,γ = exp(−ρ /(52/4))). For the upper bound of the trading amount, we usedU_{bdd} = 20. The coefficients of the risk penalty and transaction costs were λ = 0.005, κ = 0.005, si = 0.005, i = 1, · · · , 5.For the Bellman inequalities in (28), we consideredM = 150 time steps. Finally, the initial portfolio vector and the initial wealth level were chosen to bex _{0} = [0, 0, 0, 0, 0]^{T} andW = 100, respectively.Figures 713 show the simulation results of the portfolio optimization example under the iterated AVF method. Figure 7 depicts the portfolio profile during the test period. From this figure, one can see that with the passage of time, the portfolio profile slowly changes its direction to increase the performance index. Figure 8 shows the gross cash put into the portfolio, and Fig. 9 plots the cumulative cost sums. From Fig. 8, it is clear cash is entered into the portfolio during the early stage of trade; as time progresses, the portfolio gains income. Furthermore, based on the trend of this figure, we can expect more profit to be derived in the later stage of trading. Figure 10 shows the transaction cost. When the portfolio is being built, the transaction cost is very high; however, it stabilizes over time. Figure 11 shows the risk penalty, which is always nonnegative and increasing. Wealth history and amount of cash holdings are plotted in Figs. 12 and 13, respectively. Note that in this scenario, wealth steadily increases as trading proceeds and reaches approximately 220 (yielding a 120% profit at the end of 2007). Also note that according to Fig. 13, the amount of cash holding briefly remains at the initial value, rapidly decreases, and then slowly begins to restore itself. This behavior suggests that in the iterated AVF based trading strategy, a large amount of profit is obtained when aggressive initial investments are made with cashing out subsequently.
4. Conclusion
Machine learning and control methods have been applied to a variety of portfolio optimization problems. In particular, we considered two important classes of portfolio optimization problems: the trend following trading problem and the riskadjusted profit maximization problem. The exponential NES and iterated approximate value function methods were applied to solve these problems. Simulation results showed that these probabilistic machine learning and control based solutions worked well when applied to real financial market data. In the future, we plan to consider more extensive simulation studies, which will further identify the strengths and weaknesses of probabilistic machine learning and control based methods, and applications of our methods to other types of financial decision making problems.

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[Figure 1.] Learning curve.

[Figure 2.] NASDAQ index and trading position when K = 0.001.

[Figure 3.] Trade return and wealth resulting from the longflatshort type policy when K = 0.001.

[Figure 4.] Trade return and wealth resulting from the longflat type policy when K = 0.001.

[Figure 5.] Trading position resulting from the longflatshort type policy (top) and the longflat type policy (bottom) when K = 0.01.

[Figure 6.] Wealth resulting from the longflatshort type policy (top) and the longflat type policy (bottom) when K = 0.01.

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[Figure 7.] Portfolio profile.

[Figure 8.] Gross cash put into the portfolio.

[Figure 9.] Cumulative cost.

[Figure 10.] Transaction cost.

[Figure 11.] Risk penalty.

[Figure 12.] Wealth.

[Figure 13.] Amount of cash.