Modern Probabilistic Machine Learning and Control Methods for Portfolio Optimization

  • cc icon
  • ABSTRACT

    Many recent theoretical developments in the field of machine learning and control have rapidly expanded its relevance to a wide variety of applications. In particular, a variety of portfolio optimization problems have recently been considered as a promising application domain for machine learning and control methods. In highly uncertain and stochastic environments, portfolio optimization can be formulated as optimal decision-making problems, and for these types of problems, approaches based on probabilistic machine learning and control methods are particularly pertinent. In this paper, we consider probabilistic machine learning and control based solutions to a couple of portfolio optimization problems. Simulation results show that these solutions work well when applied to real financial market data.


  • KEYWORD

    Machine learning , Portfolio optimization , Evolution strategy , Value function.

  • 1. Introduction

    Recent theoretical progress in the field of machine learning and control has many implications for related academic and professional fields. The field of financial engineering is one particular area that has benefited greatly from these advancements. Portfolio optimization problems [18] and the pricing/hedging of derivatives [9] can be performed more effectively using recently developed machine learning and control methods. In particular, since portfolio optimization problems are essentially optimal decision-making problems that rely on actual data observed in a stochastic environment, theoretical and practical solutions can be formulated in light of recent advancements. These problems include the traditional mean-variance efficient portfolio problem [10], index tracking portfolio formulation [68, 11], risk-adjusted expected return maximizing strategy [1, 2, 12], trend following strategy [1317], long-short trading strategy (including the pairs trading strategy) [13, 1820], and behavioral portfolio management.

    Modern machine learning and control methods can effectively handle almost all of the portfolio optimization problems just listed. In this paper, we consider a solution to the trend following trading problem based on the natural evolution strategy (NES) [2123, 25] and a risk-adjusted expected profit maximization problem based on an approximate value function (AVF) method [2730].

    This paper is organized as follows: In Section 2, we briefly discuss relevant probabilistic machine learning and control methods. The exponential NES and iterated approximate value function method, which are the two main tools employed in this paper, are also summarized. Solutions to the trend following trading problem and the risk-adjusted expected profit maximization problem as well as simulation results using real financial market data are presented in Section 3. Finally, in Section 4, we present our concluding remarks.

    2. Modern Probabilistic Machine Learning and Control Methods

    In this section, we describe relevant advanced versions of the NES and AVF methods that will be applied later in this paper.

    The NES method belongs to a family of evolution strategy (ES) type optimization methods. Evolution strategy, in general, attempts to optimize utility functions that cannot be modeled directly, but can be efficiently sampled by users. A probability distribution (typically, a multi-variate Gaussian distribution) is utilized by NES to generate a group of candidate solutions. In the process of updating distribution parameters based on the utility values of candidates, NES employs the so-called natural gradient [23, 31] to obtain a sample-based search direction. In other words, the main idea of NES is to follow a sampled natural gradient of expected utility to update the search distribution. The samples of NES are generated according to the search distribution π(·∣θ), and by utilizing these samples, NES tries to locate a parameter update direction that will increase the performance index, J(θ). This performance index is defined as the expected value of the utility, f(z), under the search distribution:

    image

    Note that by the log-likelihood strategy, the gradient of the expected utility with respect to the search distribution parameter θ can be expressed as

    image

    Hence, when we have independent and identically distributed samples, zi, i ∈ {1,···,n}, a sample-based approximation to the regular policy gradient (often referred to as the vanilla gradient) of the expected utility can be expressed as

    image

    It is widely accepted that using the natural gradient is more advantageous than the vanilla gradient when it is necessary to search optimal distribution parameters while staying close to the present search distribution [23, 31]. In the natural gradient based search method, the search direction is obtained by replacing the gradient ∇θJ(θ) with the natural gradient defined by F-1(θ)∇θJ(θ), where F(θ) = E[∇θlog π(xθ)∇θlog π(xθ)T ] is the Fisher information matrix. Note that the Fisher information matrix can be estimated from samples. Therefore, the core procedure of the NES algorithm can be summarized as follows [22]:

    Preliminary steps:1. Choose the learning rate, η, number of samples in each generation, n, and utility function, f. 2. Initialize parameter θ of the search distribution π(·∣θ).

    Main steps: Repeat the following procedure until the stopping condition is met.

    1. For i = 1, · · · , n Draw a sample zi from the current search distribution. Compute the utility of the sample, f(zi). Compute the gradient of the log-likelihood, ∇θ logπ(zi|θ). end

    2. Obtain the Monte-Carlo estimate of the gradient:

    image

    3. Obtain the Monte-Carlo estimate of the Fisher information matrix:

    image

    4. Update parameter θ of the search distribution:

    image

    The procedure shown above is a basic form of NES and can be modified based on the application. For example, the concept of the baseline can be employed to reduce the estimation variance [21]. Recent improvements to the NES procedure can be found in [2123, 25], and one of the most remarkable improvements is the exponential NES [23, 25]. The main idea of the exponential NES method is to represent the covariance matrix of the multivariate Gaussian distribution, π(·|θ), using the following matrix exponential map:

    image

    Two remarkable advantages of using the matrix exponential are that it enables the covariance matrix to be updated in a vector space, and it makes the resultant algorithm invariant under linear transformations. The key idea of the exponential NES is the use of natural coordinates defined according to the following change of variables [23]:

    image

    where A and M are matrix variables satisfying C = AT A = exp(M) for the covariance matrix C of the search distribution. The use of natural coordinates renders the step of inverting the Fisher information matrix unnecessary, hence, bypassing a major computational burden of the original NES. In Section 3, we apply the exponential NES, which is now a state-of-the-art evolution strategy, to the problem of finding a flexible long-flatshort type rule for a trend following trading strategy.

    Another tool used in portfolio optimization applications of this paper is a special class of approximate value function methods [2730]. In general, stochastic optimal control problems can be solved by utilizing state value functions, which estimate performance at a given state. The solutions of stochastic optimal control problems based on the value function are called dynamic programming. For more details on the various applications of dynamic programming, please refer to [32, 33]. Solving stochastic control problems by dynamic programming corresponds to finding the best state-feedback control policy

    image

    to optimize the performance index of specified constraints and dynamics

    image

    where x(t) is the state, u(t) is the control input, and w(t) is the disturbance. The expected sum of discounted stage costs, is widely used as the performance index. The minimal performance index value, J*, is obtained by minimizing the performance index over all admissible control policies; the optimal control policy achieving J* is denoted by The state value function is defined as the minimum total expected cost achieved by an optimal control policy for the given initial state x(0) = z. Formally,

    image

    Note that the optimal performance index value is J* = V *(x0) for the given initial condition x0. The state value function in (11) is the fixed point of the following Bellman equation:

    image

    In operator equation form, this fixed point property can be written as

    image

    where

    image

    Because it is hard to compute optimal control policy that satisfies the Bellman equation except special case [32], the AVF, : XU, is utilized to obtain approximate solutions to the stochastic control problem. In an example considering a set of real financial market data, we utilize an ADP-based solution procedure utilizing the iterated AVF policy approach of O’Donoghue, Wang, and Boyd [29]. In the iterated AVF method [29], by letting the parameters of the approximate value functions satisfy the iterated Bellman inequalities

    image

    with , it is ensured that is a lower bound of the optimal state value function V* [27, 29]. Also by optimizing this lower bound via convex optimization, the iterated AVF approach finds the approximate state value functions, , and the associated policies Note that in this step, the associated iterated AVF policies are obtained by

    image

    for t = 0, . . . ,M, and

    image

    for t > M [29].

    3. Machine Learning and Control Based Portfolio Optimization

    In this section, we present probabilistic machine learning and control based solutions to two important portfolio optimization problems: the trend following trading problem and the riskadjusted expected profit maximization problem. The first topic of our portfolio application concerns trading strategy. There have been a great deal of theoretical studies on trading strategies in financial markets; however, a majority of them are focused on the contra-trend strategy, which includes trading policies for a mean-reverting market [14]. Research interests in trendfollowing trading rules are also growing. A strong mathematical foundation has led to some important theorems regarding the trend following trading strategy [1417]. We consider an exponential NES based solution to find an efficient trend following strategy. One of the key references for our solution is the stochastic control approach of Dai et al. [14, 15]. In [14], the authors considered a bull-bear switching market, where the drift of the asset price switches back and forth between two values representing the bull market mode and the bear market mode. These switching patterns follow an unobservable Markov chain. More precisely, the governing equation for the asset price, Sr, in [14] is given by

    image

    where αr ∈ {1, 2} is the mode of the market at time r, and its movement follows a two-state Markov chain. The first state, αr = 1, represents the bull market mode, and its second state, αr = 2, represents the bear market mode. Drifts, μ(1) = μ1 and μ(2) = μ2, represent the expected return rates in the bull and bear markets, respectively. Clearly, these drift values must satisfy μ1 > 0 and μ2 < 0. The Markov chain for the movement of αr is described by the following generator [14]:

    image

    Note that in this Markov chain generator, λ1 and λ2 are the switching intensities for the bull-to-bear transition and bear-to- bull transition, respectively. In [14], it is assumed that the switching market mode, {αr}, and the Brownian motion for the asset price variation, {Br}, are independent. Moreover, Dai et al. [14] showed that the optimal trend following long-flat type trading rules can be found by solving the associated Hamilton- Jacobi-Bellman (HJB) equation and employing the conditional probability for the bull market to generate the trade signals. In their problem formulation, the transaction cost and risk-free interest rate were fixed at 100K[%] and ρ, respectively. The resulting optimal buying times, τ1, τ2, · · · , and selling times, v1, v2, · · · , were obtained by optimizing the performance index given by

    image

    if initially flat, and

    image

    if initially long [14]. The results of [14] are mathematically rigorous and establish a strong theoretical justification for the trend following trading theory. We utilize a less mathematical, but hopefully easier to understand approach, which is based on the exponential NES method [23, 25] for the trend following trading problem. Note that in the exponential NES approach, the performance index may be chosen with more flexibility. This paper extends our previous work on this topic [13] in two ways. First, we utilize a more advanced version of the NES–the exponential NES [23, 25]–that is now the state-of-the-art method in the field. Second, we focus on more flexible long-flat-short type trading rules whereas our previous paper considered only the long-flat strategy. According to [14], there exist two monotonically increasing optimal sell and buy boundaries, and , in case of the finite-horizon problem for obtaining the optimal trend following long-flat type trading rule. When long-term investments are emphasized, the behavior of the system is similar to the infinite horizon case, and these threshold functions can be approximated by constants [14, 15]. Following this approximation scheme, we try to find threshold constants for all possible transitions that can occur in long-flat-short type trading, i.e., by applying the exponential NES method [23, 25]. The price-series sample paths generated in accordance with the switching geometric Brownian motion (18) are required during the training phase. Simulating both Markov chains and geometric Brownian motions is not difficult; thus, the price-series sample path generation can be performed efficiently. By combining the thresholds found by the exponential NES method together with the Wonham filter [14, 34] to estimate the conditional probability that the mode of the market is bull, one can obtain a long-flat-short type trend following trading strategy. To illustrate the applicability of the exponential NES based trading strategy, we considered the problem of determining a trend following trading rule for the NASDAQ index. For the example, NASDAQ closing data from 1991 to 2008 was considered (see Fig. 2). According to the estimation results of [14], the parameters of the switching geometric Brownian motion for the NASDAQ are as follows: μ1 = 0:875, μ2 = −1.028, σ1 = 0.273, σ2 = 0.35, σ = 0.31, λ1 = 2.158, λ2 = 2.3, where σ is the simple average of σ1 and σ2. Furthermore, the ratio of slippage per transaction and the risk-free interest rate were fixed as K = 0.001, and ρ = 5.4%, respectively. For training data, we generated episodes by utilizing parameter estimation results. By performing the exponential NES based training for these episodes, we obtained the threshold values for the trend following trading rules.

    Figures 1-6 show the simulation results for the exponential NES based trend following trading rules. For these simulations, we set the initial wealth to one. Figure 1 shows the learning curve, which graphs the average total cost sums versus the policy update over a set of 10 simulation runs. As shown in the curve, the exponential NES method exhibits desirable behavior within less than 250 policy updates. This indicates that the exponential NES method works well for finding a long-flat-short type trading strategy. Figure 2 shows the NASDAQ index values together with the long-flat-short trading signals resulting from a policy obtained by the exponential NES method over the entire period. For comparison purposes, we also show the long-flat trading signals obtained by the NES approach [13]. According to Fig. 2, the total number of position changes in the long-flat-short type trading strategy (2nd panel) is 345. Note that this value differs from the corresponding number of position changes in the long-flat type trading strategy (3rd panel, 100 changes) obtained by the NES approach [13]. Simulation results in Fig. 3 show that with the exponential NES based long-flat-short type trading strategy, trade returns are generally large and wealth steadily increases until it reaches 21.52 at the final time. In comparison, Fig. 4 shows that with the long-flat type trading rule, the trade returns are generally small, and wealth increases at a relatively slower rate. Its wealth value at the final time is only 10.65. From these simulation results, one can see that when K = 0.001, short positions slightly changed the number of trading and significantly improved wealth. To investigate the robustness of NES based trading rules against system changes, we performed simulations for various values of the transaction cost ratio. In particular, we considered the case when the transaction cost increased tenfold (i.e., K = 0.01); simulation results are shown in Figs. 5 and 6. Figure 5 shows the trading positions resulting from the long-flat-short type strategy (top) and the long-flat type strategy (bottom) when K = 0.01. When the trading cost increases, both the longflat- short type strategy and the long-flat type strategy trade less frequently compared to the case when K = 0.001. Interestingly, the trading frequency of the long-flat-short type strategy decreased at a slower rate than that of the long-flat type trading strategy. We believe that this difference is due to the fact that the long-flat-short type strategy has more flexibility and can cope with system changes with less sensitivity. Finally, Fig. 6 shows the wealth resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01. Note that the wealth values of the long-flat-short type strategy and the long-flat type strategy at the final time are 10.85 and 5.09, respectively. These values are still considerably larger than the wealth values of the buy-and-hold strategy (3.34) and the risk-free interest rate (2.74).

    For the second application example, we considered the riskadjusted, expected profit maximization problem and utilized an AVF [2730] based procedure to find an efficient solution. To express the risk-adjusted, expected profit maximization problem in state-space format, it is necessary to define the state and control input together with the performance index that is used as an optimization criterion. To do this, we follow the research of Boyd et al. [1, 28, 30]. We define the state vector as the collection of the portfolio positions. Let xi(t) denote the dollar value of asset i at the beginning of time t. Then the state vector is given by

    image

    The control input considered for this problem is a vector of trades,

    image

    executed for portfolio x(t) at the beginning of each time step t. Note that ui(t) represents buying or selling assets; the asset associated with xi(t) is bought when ui(t) > 0 and sold when ui(t) < 0. Having these state and input definitions, the state transition is given by

    image

    where r(t) is the vector of asset returns in period t. The return vector, r(t), is independent and identically distributed with mean vector and covariance matrix . Note that the mean vector and covariance matrix do not change over time. For the performance index, we considered

    image

    where the total gross cash entered in the portfolio is 1T u, λ(x + u)TƩ(x + u) is the quadratic post-trade risk penalty, uT diag(s)u is the quadratic transaction cost, and ĸu∣ is the linear transaction cost. Furthermore, λ ≥ 0 is the risk aversion parameter, si ≥ 0 is the price-impact cost for the ith asset, and ĸ is the ratio of slippage per transaction. We considered the case when the initial portfolio x(0) was fixed at x0. In general, when portfolio optimization problems are solved, the control input u(t) should satisfy certain, naturally arising constraints. In particular, we considered the control input bound constraint:

    image

    which means that only a limited amount of trading is allowed for each asset. Thus, the risk-adjusted profit maximization problem can be expressed as

    image

    To solve this optimization problem, we utilized the iterated AVF approach [29], which is one of the most advanced AVF methods. In the iterated AVF approach, convex quadratic functions

    image

    are used to approximate the state value function at time t, and the parameters satisfy a series of Bellman inequalities

    image

    with The Bellman inequalities guarantee that is a lower bound of the optimal state value function [2729]. The iterated AVF method maximizes this lower bound using convex optimization [29]. Another constraint is kuk1 ∥uUbdd, which can be written in terms of quadratic inequalities as

    image

    where ei is the ith column of the identity matrix In. This constraint enables us to obtain sufficient conditions for the constrained Bellman inequality requirements in the form of linear matrix inequalities (LMIs) using the S-procedure [28, 35]. As a real-market example of portfolio optimization, we examined an application of the iterated AVF approach [29] for a set of real financial market data [2, 6]. The data considered five major stocks: IBM, 3M, Altria, Boeing, and AIG (the ticker symbols of these are IBM, MMM, MO, BA, and AIG, respectively). For the training data, we used the weekly prices of the five major stocks from Jan. 2, 1990 to Dec. 27, 2004, and obtained the exponentially weighted moving average (EWMA) of the mean return vector, (with the effective window size of three years), and covariance matrix, Ʃ. During the test period (2005 to 2007), iterated AVF based trading was performed every four weeks (20 trading days). For the risk-free rate, we assumed ρ = 0.05 as in [2], and the discount factor, ϒ, was defined accordingly (i.e.,ϒ = exp(-ρ /(52/4))). For the upper bound of the trading amount, we used Ubdd = 20. The coefficients of the risk penalty and transaction costs were

    λ = 0.005, ĸ = 0.005, si = 0.005, i = 1, … , 5.

    For the Bellman inequalities in (28), we considered M = 150 time steps. Finally, the initial portfolio vector and the initial wealth level were chosen to be x0 = [0, 0, 0, 0, 0]T and W = 100, respectively.

    image

    The control input considered for this problem is a vector of trades,

    image

    executed for portfolio x(t) at the beginning of each time step t. Note that ui(t) represents buying or selling assets; the asset associated with xi(t) is bought when ui(t) > 0 and sold when ui(t) < 0. Having these state and input definitions, the state transition is given by

    image

    where r(t) is the vector of asset returns in period t. The return vector, r(t), is independent and identically distributed with mean vector and covariance matrix Note that the mean vector and covariance matrix do not change over time. For the performance index, we considered where the total gross cash entered in the portfolio is 1Tu, λ (x + u)T Σ(x + u) is the quadratic post-trade risk penalty, uT diag(s)u is the quadratic transaction cost, and κ|u| is the linear transaction cost. Furthermore, λ ≥ 0 is the risk aversion parameter, si ≥ 0 is the price-impact cost for the ith asset, and κ is the ratio of slippage per transaction. We considered the case when the initial portfolio x(0) was fixed at x0. In general, when portfolio optimization problems are solved, the control input u(t) should satisfy certain, naturally arising constraints. In particular, we considered the control input bound constraint:

    image

    which means that only a limited amount of trading is allowed for each asset. Thus, the risk-adjusted profit maximization problem can be expressed as

    image

    To solve this optimization problem, we utilized the iterated AVF approach [29], which is one of the most advanced AVF methods. In the iterated AVF approach, convex quadratic functions

    image

    are used to approximate the state value function at time t, and the parameters satisfy a series of Bellman inequalities

    image

    with The Bellman inequalities guarantee that is a lower bound of the optimal state value function [2729]. The iterated AVF method maximizes this lower bound using convex optimization [29]. Another constraint is ∥uUbdd, which can be written in terms of quadratic inequalities as

    image

    where ei is the ith column of the identity matrix In. This constraint enables us to obtain sufficient conditions for the constrained Bellman inequality requirements in the form of linear matrix inequalities (LMIs) using the S-procedure [28, 35]. As a real-market example of portfolio optimization, we examined an application of the iterated AVF approach [29] for a set of real financial market data [2, 6]. The data considered five major stocks: IBM, 3M, Altria, Boeing, and AIG (the ticker symbols of these are IBM, MMM, MO, BA, and AIG, respectively). For the training data, we used the weekly prices of the five major stocks from Jan. 2, 1990 to Dec. 27, 2004, and obtained the exponentially weighted moving average (EWMA) of the mean return vector, (with the effective window size of three years), and covariance matrix, Σ. During the test period (2005 to 2007), iterated AVF based trading was performed every four weeks (20 trading days). For the risk-free rate, we assumed ρ = 0.05 as in [2], and the discount factor, γ, was defined accordingly (i.e., γ = exp(−ρ/(52/4))). For the upper bound of the trading amount, we used Ubdd = 20. The coefficients of the risk penalty and transaction costs were λ = 0.005, κ = 0.005, si = 0.005, i = 1, · · · , 5.For the Bellman inequalities in (28), we considered M = 150 time steps. Finally, the initial portfolio vector and the initial wealth level were chosen to be x0 = [0, 0, 0, 0, 0]T and W = 100, respectively.

    Figures 7-13 show the simulation results of the portfolio optimization example under the iterated AVF method. Figure 7 depicts the portfolio profile during the test period. From this figure, one can see that with the passage of time, the portfolio profile slowly changes its direction to increase the performance index. Figure 8 shows the gross cash put into the portfolio, and Fig. 9 plots the cumulative cost sums. From Fig. 8, it is clear cash is entered into the portfolio during the early stage of trade; as time progresses, the portfolio gains income. Furthermore, based on the trend of this figure, we can expect more profit to be derived in the later stage of trading. Figure 10 shows the transaction cost. When the portfolio is being built, the transaction cost is very high; however, it stabilizes over time. Figure 11 shows the risk penalty, which is always non-negative and increasing. Wealth history and amount of cash holdings are plotted in Figs. 12 and 13, respectively. Note that in this scenario, wealth steadily increases as trading proceeds and reaches approximately 220 (yielding a 120% profit at the end of 2007). Also note that according to Fig. 13, the amount of cash holding briefly remains at the initial value, rapidly decreases, and then slowly begins to restore itself. This behavior suggests that in the iterated AVF based trading strategy, a large amount of profit is obtained when aggressive initial investments are made with cashing out subsequently.

    4. Conclusion

    Machine learning and control methods have been applied to a variety of portfolio optimization problems. In particular, we considered two important classes of portfolio optimization problems: the trend following trading problem and the risk-adjusted profit maximization problem. The exponential NES and iterated approximate value function methods were applied to solve these problems. Simulation results showed that these probabilistic machine learning and control based solutions worked well when applied to real financial market data. In the future, we plan to consider more extensive simulation studies, which will further identify the strengths and weaknesses of probabilistic machine learning and control based methods, and applications of our methods to other types of financial decision making problems.

  • 1. Boyd S., Mueller M. T., O’Donoghue B., Wang Y. 2014 “Performance bounds and suboptimal policies for multiperiod investment,” [Foundations and Trends in Optimization] Vol.1 P.1-72 google doi
  • 2. Primbs J. A. July 9-13, 2007 “Portfolio optimization applications of stochastic receding horizon control,” [Proceeding of the 2007 American Control Conference] P.1811-1816 google doi
  • 3. Calafiore G. C. 2008 “Multi-period portfolio optimization with linear control policies,” [Automatica] Vol.44 P.2463-2473 google doi
  • 4. Alenmyr S., ’Ogren A. 2010 “Model Predictive Control for Stock Portfolio Selection,“ M.S. Thesis google
  • 5. Barmish B. R. 2011 “On performance limits of feedback controlbased stock trading strategies,” [Proceedings of 2011 American Control Conference] P.3874-3879 google
  • 6. Primbs J. A., Sung C. 2008 “A stochastic receding horizon control approach to constrained index tracking,” [Asia-Pacific Financial Markets] Vol.15 P.3-24 google doi
  • 7. Beasley J. E., Meade N., Chang T. J. 2003 “An evolutionary heuristic for the index tracking problem,” [European Journal of Operational Research] Vol.148 P.621-643 google doi
  • 8. Jeurissen R., van den Berg J. 2005 “Index tracking using a hybrid genetic algorithm,” [Proceedings of the ICSC Congress on Computational Intelligence Methods and Applications] google doi
  • 9. Primbs J. A. 2009 “Dynamic hedging of basket options under proportional transaction costs using receding horizon control,” [International Journal of Control] Vol.82 P.1841-1855 google doi
  • 10. Markowitz H. 1959 Portfolio Selection: Efficient Diversification of Investments (Cowles Foundation for Research in Economics at Yale University Monograph 16) google
  • 11. Park J., Yang D., Park K. 2013 “Approximate dynamic programming-based dynamic portfolio optimization for constrained index tracking,” [International Journal of Fuzzy Logic and Intelligent Systems] Vol.13 P.19-28 google doi
  • 12. Park J., Jeong J., Park K. 2012 “An investigation on dynamic portfolio selection problems utilizing stochastic receding horizon approach,” [Journal of Korean Institute of Intelligent Systems] Vol.22 P.386-393 google doi
  • 13. Park J., Yang D., Park K. 2013 “Investigations on dynamic trading strategy utilizing stochastic optimal control and machine learning,” [Journal of Korean Institute of Intelligent Systems] Vol.23 P.348-353 google doi
  • 14. Dai M., Zhang Q., Zhu Q. J. 2010 “Trend following trading under a regime switching model,” [SIAM Journal on Financial Mathematics] Vol.1 P.780-810 google doi
  • 15. Dai M., Zhang Q., Zhu Q. J. 2011 “Optimal trend following trading rules,” google
  • 16. Kong H. T., Zhang Q., Yin G. G. 2011 “A trend-following strategy: conditions for optimality,” [Automatica] Vol.47 P.661-667 google doi
  • 17. Yu J., Zhang Q. 2012 “Optimal trend-following trading rules under a three-state regime switching model,” [Mathematical Control and Related Fields] Vol.2 P.81-100 google doi
  • 18. Kim S. J., Primbs J., Boyd S. 2008 “Dynamic spread trading,” google
  • 19. Primbs J. A. 2009 “A control systems based look at financial engineering,” google
  • 20. Mudchanatongsuk S., Primbs J. A., Wong W. 2008 “Optimal pairs trading: a stochastic control approach,” [Proceedings of the American Control Conference] P.1035-1039 google doi
  • 21. Wierstra D., Schaul T., Peters J., Schmidhuber J. June 1-6, 2008 “Natural evolution strategies,” [Proceedings of the IEEE World Congress on Evolutionary Computation] P.3381-3387 google doi
  • 22. Wierstra D., Schaul T., Glasmachers T., Sun Y., Schmidhuber J. 2011 “Natural evolution strategies,” google
  • 23. Glasmachers T., Schaul T., Yi S., Wierstra D., Schmidhuber J. July 7-11, 2010 “Exponential natural evolution strategies,” [Proceedings of the 12th Genetic and Evolutionary Computation Conference] google
  • 24. Schaul T. July 7-11, 2012 “Benchmarking exponential natural evolution strategies on the noiseless and noisy black-box optimization testbeds,” [Proceedings of the 14th Genetic and Evolutionary Computation Conference] google doi
  • 25. Wang Y., O’Donoghue B., Boyd S. 2014 O’Donoghue, and S. Boyd, “Approximate dynamic programming via iterated Bellman inequalities,” [International Journal of Robust and Nonlinear Control] google
  • 26. O’Donoghue B., Yang W., Boyd S. September 28-30, 2011 “Min-max approximate dynamic programming,” [Proceedings of the IEEE International Symposium on Computer-Aided Control System Design] P.424-431 google doi
  • 27. O’Donoghue B., Wang Y., Boyd S. July 17-19, 2013 “Iterated approximate value functions,” [Proceedings European Control Conference] P.3882-3888 google
  • 28. Keshavarz A., Boyd S. 2014 “Quadratic approximate dynamic programming for input-affine systems,” [International Journal of Robust and Nonlinear Control] Vol.24 P.432-449 google doi
  • 29. Peters J., Schaal S. 2008 “Natural actor-critic,” [Neurocomputing] Vol.71 P.1180-1190 google doi
  • 30. Bertsekas D. P. 1995 Dynamic Programming and Optimal Control google
  • 31. Powell W. B. 2007 Approximate Dynamic Programming : Solving the Curses of Dimensionality google
  • 32. Wonham W. M. 1965 “Some applications of stochastic differential equations to optimal non-linear filtering,” [SIAM Journal on Control] Vol.2 P.347-369 google
  • 33. Boyd S. P. 1994 Linear Matrix Inequalities in System and Control Theory google
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [Figure 1.] Learning curve.
    Learning curve.
  • [Figure 2.] NASDAQ index and trading position when K = 0.001.
    NASDAQ index and trading position when K = 0.001.
  • [Figure 3.] Trade return and wealth resulting from the long-flat-short type policy when K = 0.001.
    Trade return and wealth resulting from the long-flat-short type policy when K = 0.001.
  • [Figure 4.] Trade return and wealth resulting from the long-flat type policy when K = 0.001.
    Trade return and wealth resulting from the long-flat type policy when K = 0.001.
  • [Figure 5.] Trading position resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01.
    Trading position resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01.
  • [Figure 6.] Wealth resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01.
    Wealth resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01.
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [] 
  • [Figure 7.] Portfolio profile.
    Portfolio profile.
  • [Figure 8.] Gross cash put into the portfolio.
    Gross cash put into the portfolio.
  • [Figure 9.] Cumulative cost.
    Cumulative cost.
  • [Figure 10.] Transaction cost.
    Transaction cost.
  • [Figure 11.] Risk penalty.
    Risk penalty.
  • [Figure 12.] Wealth.
    Wealth.
  • [Figure 13.] Amount of cash.
    Amount of cash.