Heuristics for Motion Planning Based on Learning in Similar Environments
 Author: Ogay Dmitriy, Kim EunGyung
 Organization: Ogay Dmitriy; Kim EunGyung
 Publish: Journal of information and communication convergence engineering Volume 12, Issue2, p116~121, 30 June 2014

ABSTRACT
This paper discusses computergenerated heuristics for motion planning. Planning with many degrees of freedom is a challenging task, because the complexity of most planning algorithms grows exponentially with the number of dimensions of the problem. A welldesigned heuristic may greatly improve the performance of a planning algorithm in terms of the computation time. However, in recent years, with increasingly challenging highdimensional planning problems, the design of good heuristics has itself become a complicated task. In this paper, we present an approach to algorithmically develop a heuristic for motion planning, which increases the efficiency of a planner in similar environments. To implement the idea, we generalize modern motion planning algorithms to an extent, where a heuristic is represented as a set of random variables. Distributions of the variables are then analyzed with computer learning methods. The analysis results are then utilized to generate a heuristic. During the experiments, the proposed approach is applied to several planning tasks with different algorithms and is shown to improve performance.

KEYWORD
Heuristics , Learning from experience , Machine learning , Motion planning , Path finding

I. INTRODUCTION
Motion planning has been an essential part of robotics for a long time. Many algorithms have been developed, which develop a plan for a robot to move from the initial configuration to the target configuration. Nevertheless, algorithm complexity has been a challenge for reallife applications.
The development of motion planning from graph search algorithms [1] and has led to the modern complicated samplingbased algorithms [2]. Dijkstra’s algorithm is one of the simplest and basic algorithms for finding paths on a graph. One of the approaches for the construction of a motion plan in a configuration space is to discretize the space with hypercubes and then, execute a graph search algorithm on a discrete structure. However, in practical applications, this approach can rarely be used for tasks with high dimensionality, because the algorithm complexity increases exponentially.
An extension of Dijkstra’s algorithm, which is called A* [1], introduced heuristics for graphsearch algorithms to significantly improve the algorithm’s performance in terms of the computation time, depending on the task.
The other successful approach for motion planning in a highdimensional space is the Rapidly Exploring Random Tree algorithm [3]. This algorithm rapidly extends a tree structure in a highdimensional space and has proved to be efficient in many practical applications. However, it suffers in the presence of narrow paths. Various methods have been proposed to overcome this problem [4].
However, in recent years, more attention has been drawn to the area of experiencebased planning, because tasks have become increasingly complicated. Replanning strategies in similar environments are covered in [57]. Trajectory libraries approaches are covered in [810].
Heuristics are applied in many practical motion planning tasks. However, as planning problems become increasingly difficult, the complexity of heuristics increases, and designing a heuristic itself becomes a challenging task.
The main contribution of this work is the development of an approach for the algorithmic generation of heuristics. The work is done in the following steps: • Represent a motion planning algorithm’s interaction with the environment as a random process. • Represent a heuristic as a distribution of random variables that control the random process of planning. • Since a heuristic is usually developed by humans on the basis of their knowledge about the type of environment or the type of task. Assume that knowledge about environment is the knowledge of the distribution of obstacles in the environment. • Apply machine learning to learn the distribution of obstacles in typical environments. Utilize the learnt distributions as a heuristic to control the extension of a motion planning algorithm.
In the experimental section, the designed heuristic is applied to different types of motion planning algorithms in order to show the performance improvement.
II. PROPOSED METHOD DESCRIPTION
> A. Heuristic and Distribution of Obstacles
Let us consider an example of motion planning in a twodimensional space for a point mass in a free space. The shortest path would be a straight line, connecting the initial and the target configurations. The shortest path is different from a straight line, if there are obstacles between the initial and the target points. Therefore, obstacles do affect the final plan.
Now, let us analyze an example of planning in a real environment with obstacles. Let us have an intelligent agent
A , and an observable environmentI around it. We can model this environmentI in a natural world with a vector of random variables X = {X_{1}, X_{2}, ..., X_{n}}. If we take another vector of uniformly distributed random variables, then the information content ofX would be lower than a uniform random and thus, would have a certain pattern. We use the term information from the information theory developed by Shannon [11]. It means that the average length of the sequence that encodes the environmentI would be shorter than the sequence to encode the vector of uniformly distributed independent random variables.To illustrate the above idea, let us consider an example of a robot, which navigates on the ground. Let us model the surrounding physical world as a set of tuples {
,p }, whereo denotes the position of the object’s center mass andp represents the orientation of an object along the longest dimension. This is one example of how the problem may be modeled for the planning algorithm.o We can also model the abovementioned world with random variables
X andY , the first corresponding to the position and the second to the orientation. What would the distribution of these variables be? Here, the laws of nature and the corresponding constraints start to affect the distribution. For our example, let us take gravitation. Without gravitation, the distribution ofY would be closer to uniform. However, in the real world, most of the objects are either vertical or horizontal. This implies that the entropy H(Y ) is less than the maximum value, which is the entropy of a uniform distribution.Basic planning algorithms without heuristics are designed to be systematic and sufficiently universal to deal with any kind of input problems. Thus, they consider that obstacles are distributed uniformly. However, in fact, if obstacles have some pattern and are not distributed uniformly, then the planner may work inefficiently. This has led to the introduction of various heuristics to make planning algorithms solve particular problems more efficiently.
Let us represent the interaction between a planner and an environment as a random process. As a heuristic controls the behavior of this interaction at every cycle of the algorithm, the heuristic is considered to be a distribution of the possible actions that the planner could perform.
It should be noted that a heuristic may be deterministic with respect to some calculated values, and have conditional entropy equal to zero (e.g., the A* algorithm), but due to the randomness of obstacles, the full entropy of the heuristic may be more than zero.
The main idea behind this work is as follows: If a human designs a heuristic for a particular task, then it means that he/she tries to model a probability distribution function of the actions that the planner would take. In the proposed approach, we try to model this probability distribution function algorithmically, with the help of machine learning.
> B. Generalized Planning Framework
To further analyze motion planning algorithms and heuristics, let us describe a generalized planning framework.
It is an algorithm template that may be instantiated to describe most of the modern samplingbased planning algorithms. See Algorithm 1.
G (V,E ) is a graph, which is extended to cover the configuration space. The main extension point of the algorithm is the functionselectNodeAndAction (G ).This function generates a pair {
q _{ext},u _{new}}, whereq _{ext} denotes a node of graphG , which is to be extended, andu _{new} represents an action that will be applied to the system from the stateq _{ext}. The functionpropagate (q _{ext},u _{new}) serves as a collision checker, which integrates the control input and validates whether the generated path or trajectory is collision free.The generalized framework may be extended to be a random tree, with configurations sampled according to some random distributions. Algorithm 2 is an extension of a template for a randomly built tree. Nodes
q _{cand} to be extended are selected with a random variableX , and the control input is selected with a random variableY .This graph extension strategy is itself sufficiently general, because it can model other samplingbased planning algorithms in terms of the distributions of random variables
X andY .EXAMPLE 1 :X andY have a uniform distribution. Any node in the graph is selected randomly with equal probability for all nodes, and the control input is also selected to be a uniformly random direction. This example is important because we may use it as an unbiased random planner without heuristics. Therefore, we can consider a bias in the distribution of random variablesX andY as a heuristic.The generalized algorithm may also be utilized to implement the wellknown Rapidly Exploring Random Trees (RRT) [4]. The extension of the function
selectNodeAndAction (G ) for the RRT implementation is shown in Algorithm 3. In line 2, a random configuration is selected. The node to be extended is the node in the tree, which is the nearest node to the selected candidate configuration. The control input is then determined as a control input, required to bring a robot from configurationq _{ext} to the configurationq _{cand}, limited by a maximum time step of Δt .EXAMPLE 2: RRT may be modeled by biasing distributions ofX andY . ForX , the distribution will be determined by the current graphG , which is being extended. The existing nodes divide the configuration space with Voronoi cells. These cells have different areas, and nodes that are closer to the unexplored areas have a larger area of Voronoi cells around it. In the implementation of the RRT algorithm, the probability of a nodex to be selected is proportional to the volume of the Voronoi cell around this node. The distribution forY also depends on the size and the shape of the Voronoi cell around the node.We can conclude from these two examples that biasing of the distribution of
X andY creates a heuristic, which controls the extension of a search graph. The RRT nodes with large Voronoi regions, which are the nodes closer to the unexplored area, have larger chances to be extended. In other words, RRT has the heuristic of exploring the unknown regions.For the study presented in the given work, we apply machine learning algorithms to find the distribution of the variable
Y , which would make our motion planner work more efficiently. During the experiments, we compare the performance of planning with the learnt distribution ofY in two cases: {uniformX and uniformY } vs. {uniformX and learntY }; and {RRTbasedX and RRTbasedY } vs. {RRTbasedX and learntY }.III. EXPERIMENTS AND DISCUSSION
This section describes how heuristics may adapt to the given typical environment by learning. Learning is implemented as finding the probability distribution function of a planner’s parameters from the observations of a typical environment.
In previous sections, samplingbased planners were generalized to a tree structure, which expands in a search space. The expansion is parameterized in two ways. The first is which a node is selected to be extended, and the second is the selection of the control input to be applied to the extending node.
In the presented experiment, the second parameter is utilized to improve the performance of a planning algorithm. The control input is learned from the observations of successfully generated paths. RRT was used to generate the paths for the observations.
After the probability distribution function was approximated with the sum of the Gauss distributions with learned means, it was applied to the sampling of the control input.
In order to show that z planner with applied selflearning heuristics performs better, we applied the proposed strategy to different samplingbased planners. Planners were selected to differ in the way nodes for extensions are selected. In the first case, we utilized the uniform random selection of nodes from the whole tree. In the second case, we utilized the selection of nodes for rapid expansion in space. The probability that a node is selected is proportional to the volume of the Voronoi cell that corresponds to the node.
All the algorithms were run 10 times. The number of iterations, number of failed collision checks, and path lengths were recorded. In the first run, the algorithms were run until the path was found. In the second run, planning was run for a fixed number of iterations.
The results of the experiments with random node selections are shown in Tables 1 and 2. As is apparent, random node selection is inefficient, and an unbiased algorithm could not find the solution even after 500,000 iterations. Selflearning heuristics improved the performance of the planner, and thus, it could find a solution. The boundaries of exploration can be seen in Fig. 1.
The results of the experiments with rapidly expanding node selection are shown in Tables 3 and 4. This strategy extends the algorithm faster. An unbiased version of the algorithm is in fact an implementation of an RRT algorithm. Selflearning heuristics improved the performance of the planner, and thus, it could find the solution faster as shown in Fig. 2. Planning is performed from the initial point in the left top corner to the goal point in the right bottom corner.
Unlike expected, the selflearning heuristics caused more collision check fails during the test with a fixed number of iterations (See Table 2). In both cases, motion planners failed to reach the goal; therefore, the number of iterations was considerably small. If compared to the results from Table 1, it may be seen that the selflearning heuristic experiences relatively few collision failures before it reaches the target.
The same experiments were also performed on a similar map, with a considerably narrower distance between squares; see Fig. 3. The result is presented in Table 5. As expected, the original RRT’s performance started to drop significantly. In our experiments it sometimes took more than 10,000 iterations and the algorithm still could not find the solution. Because of these failures, the average number of iterations is high. In all three experiments, the algorithms enhanced with selflearning heuristics improved the performance of the original algorithm.
IV. CONCLUSIONS
In this work, we have proposed a method, which allows heuristics for a motion planner to be generated by a computer by observing the results of prior successful planning tasks in the environment. We have experimentally shown that if planning algorithms are generalized and parameterized with two random variables, then the observation of the distribution of these parameters on successful paths may be utilized to create a heuristic. The proposed method was applied to different samplingbased planning algorithms in different environments, and in all cases, it showed performance improvements.

[Table 1.] Performance of selflearning heuristics applied to random tree search algorithm

[Table 2.] Performance of random tree search algorithm with selflearning heuristics when the number of iteration is fixed

[Fig. 1.] Comparison of the performance of a planner with random selection of nodes for expansion. The initial point is in the top left corner, and the final point is at the bottom right. (a) Unbiased control input. After 500,000 iterations, a path was still not found. (b) Control input biased by selflearning heuristics.

[Table 3.] Performance of selflearning heuristics applied to the search algorithm with Voronoi cell volumebased node selection

[Table 4.] Performance of the search algorithm with Voronoi cell volumebased node selection and selflearning heuristics when the number of iteration is fixed

[Fig. 2.] Performance of planner with selection of nodes for expansion with rapid expansion. Extension is performed from the Init configuration to the Goal configuration. (a) Unbiased control input and (b) control input biased by selflearning heuristics.

[Fig. 3.] Performance of the planning algorithm on a map with narrower paths.

[Table 5.] Results of running algorithm, until a path is found, on a map with narrow paths: random tree with Voronoi cell volumebased node selection: comparison with and without selflearning heuristics