Selecting Ordering Policy and Items Classification Based on Canonical Correlation and Cluster Analysis
 Author: Nagasawa Keisuke, Irohara Takashi, Matoba Yosuke, Liu Shuling
 Organization: Nagasawa Keisuke; Irohara Takashi; Matoba Yosuke; Liu Shuling
 Publish: Industrial Engineering and Management Systems Volume 11, Issue2, p134~141, 30 June 2012

ABSTRACT
It is difficult to find an appropriate ordering policy for a many types of items. One of the reasons for this difficulty is that each item has a different demand trend. We will classify items by shipment trend and then decide the ordering policy for each item category. In this study, we indicate that categorizing items from their statistical characteristics leads to an ordering policy suitable for that category. We analyze the ordering policy and shipment trend and propose a new method for selecting the ordering policy which is based on finding the strongest relation between the classification of the items and the ordering policy. In our numerical experiment, from actual shipment data of about 5,000 items over the past year, we calculated many statistics that represent the trend of each item. Next, we applied the canonical correlation analysis between the evaluations of ordering policies and the various statistics. Furthermore, we applied the cluster analysis on the statistics concerning the performance of ordering policies. Finally, we separate items into several categories and show that the appropriate ordering policies are different for each category.

KEYWORD
Inventory Management , Ordering Policy , Multivariable Analysis , Canonical Correlation Analysis , Cluster Analysis

1. INTRODUCTION
Inventory management has been studied by many researchers and many practitioners. Tiacci and Saetta (2009) considered the relationship between forecasting and ordering policy with carrier capacity. Flores
et al . (1992) provided a matrixbased methodology formulticriteria ABC classification. Ramanathan (2006) proposed a weighted linear optimization model and classified inventory items by multiple criteria. Ernst and Cohen (1990) used cluster analysis to group similar items. Xaioet al . (2011) considered the importance of inventory items based on a loss rule. Berling and Marklund (2006) used linear regression technique to obtain approximate values of the induced backorder cost in a onewarehouse multiple retailer system.Many researchers started studying the lostsales model. In the case of stockout, Gruen
et al . (2002) revealed that only 15% of the customers who observe a stock out wait for the item to be on the shelves again. The other 85% of the customers decide to buy a different product, visit another store, or do not buy any product at all. According to Zipkin (2008), the cost deviations can run up to 30% when the lostsales system is approximated by a backorder model. Janakiramanet al . (2007) compared the performance of optimal replenishment policies in lostsales and backorder models. Huhet al . (2009) compared the performance of basestock policies in lostsales and backorder models. Leviet al . (2008) proposed a dualbalancing policy, in which the risk of ordering and holding are balanced. The authors proved that the expected total cost of this policy is at most twice the expected cost of the optimal policy. Bijvank and Vis (2011) classified the models based on the characteristics of the inventory system and they reviewed their proposed replenishment policies. Van Donselaar and Broekmeulen (2011) showed that the assumption which the lost sales system can simply be approximated by a backordering system if the target fill rate is at least 95%, which may lead to serious approximate errors.Forecasting is never separate from inventory management. Liao and Chang (2010) investigated the effects of five demand forecasting methods, two inventory policies, and three lead times on the total inventory cost of a 3echelon serial supply chain system. Tanaka
et al . (2012) proposed a decision support system that measures demand risks through a sales forecasting method, especially for new products. The system enables us to make the demand uncertainty controllable and gives the proper volume and timing guidance for daily reproduction of products.One expects that if a single ordering policy is applied to many different items with different shipping trends, the inventory would be a mix of items which has frequent shortages and those having excess inventory. On the other hand, even in using multiple ordering policies, it is not easy to find an appropriate policy for many types of items, particularly when each has a different shipping trend. In addition, given a particular shipping trend, no method or criteria has been specifically developed for choosing the ordering policy which would be applied to these items. For these reasons, the proper relationship between different item trends and the appropriate ordering policies is still not clear.
Therefore, we propose a new algorithm for finding a reasonable ordering policy using the relationships between shipping trends. In particular, our proposed algorithm classifies items according to shipping trends in order to find a suitable ordering policy and in addition reveals the shipping trends which are most desirable. In this case, shipping trends refer to shipping statistics related to the criteria of the ordering policy, which are calculated from past shipping data.
In section 2, we introduce the proposed model. The model development includes the converting of actual shipping data to data matrices for analysis. The details of the experiment and results of analysis are reported in section 3.
2. PROPOSED MODEL
2.1 Problem Outline
This study considers the inventory management problem for items having different shipping trends. Here, the ordering policyis assumed to perform differently according to item trends. To achieve efficient inventory management, we would like to select a suitable ordering policy for each item. Hence, we want to know which item trends are suitable for determining the ordering policy, or to find out a reasonable method for grouping which items would be suitable and which is not for a given ordering policy.
This study assumes that there has not yet been revealed a difference in item shipping trends and also what the suitable ordering policy should be. For deciding an ordering policy, the ideal case would be when the suitable ordering policy could be determined using only a single statistic. However, such a statistic is not yet known. Thus, many researchers have sought to find such a statistic based on theoretical considerations. Similarly, practitioners are also struggling to find such a trend.
In this study, we calculate and use shipping statistics and evaluation values of ordering policies. A general algorithm is presented in the following section.
2.2 Proposed Algorithm
Figure 1 shows the flowchart of our proposed algorithm. First, we assume that there is shipping data, which informs shipping date and volume of items, like that shown in Table 1.
In the following steps, we repeatedly use multivariate data matrices, as shown in Table 2. Before we proceed to explaining the algorithm flowchart, we will first explain the premise behind multivariate data matrices.
Let
X denote ann × v multivariate data matrix, which represents the shipping statistics. There aren items andv shipping criteria.x_{ij} gives the value for thej ^{th} shipping criteria of thei ^{th} item. For example, in Table 1, the daily shipping average is shipping criteriaj ,x_{ij} = (d _{i1} +d _{i2} + ㆍㆍㆍd _{im})/_{m}. Letx _{j} denoteann × 1 multivariate data matrix. These are the values of all items for thej ^{th} criteria.Let
Y denote the multivariate data matrix, which represents evaluation values of ordering policies.Y is similar to the previously mentionedX . For example,y_{izp} gives the value for the z^{th} evaluation criteria of thep^{th} ordering policy of thei^{th} item. So,y _{zp} denotes then × 1 variable matrix of the z^{th} evaluation criteria of thep^{th} ordering policy. Then,Y _{z} denotesthen × q variable matrix ofthe z^{th} evaluation criteria.For analysis, since there are differences in the scale of numbers, we should standardize the evaluation value of each evaluation criteria and item in order to comparethe ordering policies. If
y_{ikp} is the value for thek^{th} evaluation criteria of thep^{th} ordering policy of thei^{th} item, then lety’_{ikp} be the standardized evaluation value for this same combination, where they’_{ikp} are calculated using Eq. (1) and take valuesin [0, 1].In order to evaluate multiple evaluation values, we apply a weighted linear mixed evaluate criteria. For example, if the criteria under evaluation are criterion
e , criterionf , … and the corresponding weights arew_{e} ,w_{f} , …, theny_{lpi} , the value of the weighted linear mixed evaluation of criterionl , ordering policyp , and itemi , is calculated byy_{ilp} =w_{e}y_{iep} +w_{f}y_{ifp} + ^{…} . Similarly,y _{lp} =w_{e}y_{ep} +w_{f}y_{fp} + ^{…}. In addition, in order to find differences in ordering policy tends, we applied another weighted linear mixed evaluate criteria. Notably, the second evaluation value was calculated from the difference between evaluation values. For example, to analyze the trend difference for criteriac between policyf and policyg , naming the evaluation criteriond , it is calculatedy _{d} =y _{cf} y _{cg}.All the notations above, shown in Tables 1 and 2, are used in the sections that follow. The two large sets of variables, X and Y, are standard to, for example, the canonical correlation analysis and cluster analysis.
2.2.1 Canonical correlation analysis
The simplest known measure of relationships is the simple correlation coefficient between two variables. When interested in the relation between the set of variables
X and a variabley , the multiple linear regression analysis can be used. However, in this study, we are interested in the correlation between the set of variablesX and the set of variablesY , that is, each object for comparison has multiple variables. Canonical correlation analysis is one method for determining the relationship between sets of variables. According to Basu and Mandal( 2010), this was initially developed by Hotelling (1936).The canonical correlation analysis focuses on the correlation between the
i^{th} linear combination of the set of variablesX (sayU _{i}) and thei^{th} linear combination of the set of variablesY (sayV _{i}). It determines the coefficients of the linear combination which make the correlation betweenU _{i} andV _{i} the highest possible, whereas the correlation between bothU _{i} andV _{i} and each ofU _{i1}, …,U _{1},V _{i1}, …,V _{1} is zero (i.e., uncorrelatedness of other combinations is a constraint on the optimization). The pairs of linear combinations are called the canonical variables, and their correlation coefficients are called the canonical correlation coefficients. In addition, the correlations between a linear combination of the set of variables and the set of variables themselves are called canonical loadings. The canonical correlation analysis is also useful for determining how many dimensions are needed to account for the relationship.In the present study, we apply the canonical correlation analysis to the set of shipping statistics
X and the evaluation valuesY . In addition, we expect to find relationships between the sets of linear combinations of variables. This would allow us to sort shipping statistics into those which are significant and those which are not.2.2.2 Cluster analysis
The cluster analysis is an exploratory technique. Cluster analysis techniques themselves can be broadly grouped into hierarchical clustering and nonhierarchical clustering. We here apply hierarchical clustering using the Ward method, which is a frequently used method.
In this study, we separate items into some number of clusters from xs, the shipping statistics, according to their relationship to
Y _{i}, the evaluation value of the criteriai which the practitioner is focused on. Then we search for trends in shipping statistics and evaluation values of ordering policies.3. NUMERICAL EXPERIMENTS
Now, we calculate shipping statistics from actual shipping data and generate evaluation values of ordering policies from the simulation results. Then, we apply canonical correlation analysis to the set of shipping statistics and the evaluation values of the ordering policies. From these results, we can obtain the shipping statistics which exhibit a relatively strong relation with evaluation values of ordering policies. Clustering the items by these shipping statistics, we can group items according to a particular trend in the evaluation values of ordering policies.
In this study, we used SPSS ver. 19.0 (IBM Co., New York, NY, USA), a data mining and statistical analysis software package, for the canonical correlation analysis and cluster analysis, as well as the multiple linear regression analysis. The experimental environment is an Intel^{®} Core™ 2 Duo CPU, E8400 at 3.00 GHz, 4.00 GB RAM; we did not need long computational times for our algorithm.
3.1 Ordering Policy and Experiment
We used two ordering policies for this study: the socalled
(s, Q) policy and the(R, s, S) policy. There are two reasons why we choose these two ordering policies. First, these policies are the most commonly used ordering policies for inventory control. Second, the company that provided actual shipment data for us set the parameters and applied these ordering policies. So, we focused on these policies.For the
(s, Q) policy, when the inventory position declines to or below the reorder point s, a batch quantity of sizeQ is ordered. In this case, we never go beyond the positions + Q .For the
(R, s, S) policy, at the end of each time period of lengthR , if the inventory position declines to or below the reorder points , order an amount equal to the difference between the orderupto levelS and the current inventory position. In the special case in whichs =S  1 , we call this the(R, S) policy.In this study, we use
(s, Q) and(R, S) policies. In particular,s, Q , andS of thei^{th} item in thet^{th} date period are denoted bys_{i,t} ,Q_{i,t} , andS_{i,t} , which are calculated as follows:Q_{i,t} : ordering quantity of itemi on thet ^{th} days_{i,t} : reorder point for itemi on thet ^{th} dayS_{i,t} : orderupto level of itemi on thet ^{th} dayk : safety coefficientd_{i,t} : shipments of itemi on thet ^{th} dayR : periodic review termLT : lead timeSP : sampling periodI : the set of itemsT : the set of daysC : the set of days that the parameter is recalculated (C ⊆ T )D : the complement ofC with respect toT If practitioners would like to manage items precisely, it is necessary to shift ordering policy by some reason, like demand forecasting or seasonality. But, because we would like to find the relation between the shipping statistics values and the evaluation values of ordering policies, we did not shift ordering policies in the example applied.
3.2 Data Set
In numerical and simulation experiments, we used the actual data of a distribution company’s shipments for the previous year. These items are categorized broadly into 11 varieties, for example, beer, Japanese sake, foods, and spices. In these categories, there are no obvious characteristic trends. So we selected about 5,000 items to be analyzed from about 7,000 items total. The other 2,000 items were less frequently shipped, and in smaller amounts. We applied our algorithm to the selected 5,000 items, those considered analyzable.
We simulated
(s, Q) and(R, S) policies from the shipping data, which were used to calculate shipping statistics. We set common parameters between the set of(s, Q) and(R, S) policies, as described below. Safety coefficientk was set to 1.96, sampling periodSP was set to 2 months, and lead timeLT , the time between when the items are ordered and when they are delivered, was set to 2 days.We compared the performance between five ordering policies: the
(s, Q) policy and the(R, S) policies forR equal to 1, 2, 3, or 4 week. For simulation, we used the first 2 months of shipping data for calculating initial values ofs_{i,t} ,Q_{i,t} , andS_{i,t} . For the(R, S) policies,S_{i,t} were recalculated periodically, where ass_{i,t} andQ_{i,t} were fixed at the initial calculation values.3.3 Evaluation Criteria
From the results of the five ordering policy simulations, we define
Y _{π} as the set of evaluation values, whereY _{π} is calculated asY _{π} = [y _{π, (s, Q)},y _{π, (R = 1week, S)}, ^{…},y _{π, (R = 4weeks, S)}].y _{π ,p} is the set of linear combinations for ordering policyp , of the standardized evaluation of shortage time,y _{β,p}, and the standardized evaluation of total daily stock,y _{γ,p}. We definey _{π,p} as follows:We take the weights of shortage time,
w_{β} , as 0.5, and the weights of total daily stock,w_{γ } , as 0.5.Similarly, we define
Y _{δ } as the difference between evaluation values for criteriaπ of policyf and another policy usingy _{δ g} =y _{πf} y _{πg}. Ify _{iδ g} is positive, then ordering policyg is more suitable thanf for itemi . In the present study, we compare the evaluation value of criteriabetween policy
(s, Q) and(R, S) . Thus,Y _{δ} are calculated as follows:We then apply canonical correlation analysis to the set of shipping statistics
X and the set of evaluation criteria, using eitherY π orY _{δ} . Next, cluster analysis is applied for all items using the shipping statistics, which are with respect to the ordering policy.3.4 Experimental Results
The results of the canonical correlation analysis are listed in Table 3. We apply the canonical analysis to the sets (
σ, π ) and (σ, δ ), Here, σ is the set of shipping statistics andπ is the set of evaluation values. The setδ is the set of differences in evaluation value. This notation is also used in Tables 46.Conducting the analysis shown in Table 3, we obtained the five canonical variables, ？_{π1}, ？ _{π 2}, ？_{π3 }, ？ _{π 4} and ？_{π 5} from the canonical analysis of the sets
(σ, π ) and four canonical variables, ？_{δ 1}, ？_{δ 1}, ？_{δ 2}, ？_{δ 3} and ？ _{δ4} from the canonical analysis of the sets (σ , δ ). We extracted as canonical variables only those for which the canonical correlation coefficient was at least 0.20, the value at which variables can be said to have a weak relation in terms of one group being affected by the other groups. Here, in addition to the canonical correlation coefficients, we must consider the absolute value of canonical loadings. If these are large in absolute value, then the statistic indicates a strong effect on the other groups. For example, in Table 3, if ？_{π 1} inX are listed in order of the strength of their relation toY , then the canonical loadings are σ _{6} = 0.751, σ _{13} = 0.695 , σ _{11} …. So, σ _{6}, which is the average of between the date of shipping (BDS), BDSAvg is the most significant statistic for the first canonical variable, ？_{π 1} ofY . And, if ？ _{π 1} inY are listed in order of their relation toX , these are π _{5} = 0.732, = 0.558 , …. So, π _{5}, which is evaluation value (EV; R = 4 week, S), EV of(R, S) policy whenR = 4 weeks, and π_{1}, which is EV (R = 1 week, S), EV of(R, S) policy when R = 1 week, were strongly affected by the first canonical variable, ?_{π1} ofX . Thus, either π _{5} or π _{1} and σ _{6}, or possibly σ _{13}, are strongly related.The results of correlation analysis are listed in Table 4. In the Table 4, (σ_{6}, π _{2}) represents the correlation of the average of BDS and EV of ordering policy
(R, S) forR = 1 week, which is 0.04. It can seem from Table 3 that there is no relation between the shipping statistics and ordering policy.The last result is interesting in light of a comparison of Tables 3 and 4. Table 4 gives simple correlations of shipping statistics and evaluation values of ordering policies. By comparing these Tables, it is possible to say that no individual variables and evaluation values of an ordering policy clearly relate to each other. But individual groups of statistics and groups of evaluation values do. A comparison of Tables 3 and 5 reveals similar aspects, where Table 5 shows the results of the multiple linear regression analysis. For example, in Table 5, all entries of the σ_{13} and σ_{11} rows indicate a lack of a strong relation, whereas Table 3 does indicate a strong relation.
We, therefore, conjecture that searching for relationships between the set of shipping statistics and the ordering policies would produce results different than the other analyses.
By using cluster analysis, we separate all items into two groups, also called clusters. In Table 6, the averages of each statistic of each separated group, columns χ_{α1} and χ_{α2}, and the averages of each statistic of all items, column χ_{αall}, are shown. In the cluster analysis, we use the Ward method and calculate distance of each itemfrom the statistics σ_{6}, σ_{11}, and σ_{13}, those for which canonical correlation analysis results showed a relationship between shipping statistics and evaluation values of ordering policies in the canonical variable ?_{π1}, the first canonical variable of canonical analysis to the sets (σ, π). The signs " + " and "?" to the right of the values for columns χ_{α1} and χ_{α2} are relative to χ_{αall}. For example, the value of (σ_{6}, χ_{α1}) is higher than (σ_{6}, χ_{αall}). Thus, the sign to the right, separated by a dashed, is " ." From Table 6, we can observe that there are trends within each cluster.
From the results shown in Table 3 and the σ rows in Table 6, it is possible to make predictions. For example, from looking at the ?_{π1} column of Table 3, let us consider σ_{6} , σ_{11} , and σ_{13} , which were the top three in terms of the strength of their relation in
X toY of column ?_{π1} In Table 3, the values of those σ’s , (σ_{6}, ?_{π1}), (σ_{11}, ?_{π1}), and (σ_{13}, ?_{π1}), are greater than zero, less than zero, and less than zero, respectively; thus, (σ_{6}, ?_{π1}), (σ_{11}, ?_{π1}), and (σ_{13}, ?_{π1}) correspond to ( , , ). As another example, in the χ_{α1} column of Table 6, the signs of (σ_{6}, χ_{α1} ), (σ_{11}, χ_{α1} ), and (σ_{13}, χ_{α1} ) are (+, , ). Therefore, we can predict that (π_{1}, χ_{α1} ) and (π_{5}, χ_{α1} ) in Table 6 will follow the same tends as that represented in the ?_{π1} column of Table 3, that is, they will be less than the average.In a similar way, in the ？_{π2} column of Table 3, let us consider σ_{8}, σ_{12}, and σ_{14}, which were the top three in terms of the strength of their relation in
X toY of column ?_{π2} In Table 3, all three values were less than zero. In the χ_{α2} column of Table 6, (σ_{8}, χ_{α2} ), (σ_{12}, χ_{α2} ), and (σ _{14}, χ_{α2} ) correspond to (, , ). From these signs, we can predict that (π_{2}, χ_{α2} ) and (π_{3}, χ_{α2} ), and possibly (π_{4}, χ_{α2} ) and (π_{1}, χ_{α2} ) in Table 6, will follow the same trend as that represented in the ?_{π2} column of Table 3, which means they will be less than the average.Next, in the ？_{π1} column of Table 3, (π_{6}, ?_{π1} ), (π_{11}, ?_{π1} ) and (π_{13}, ?_{π1} ) have signs ( + , , ); in the χ_{α1} column of Table 6, (π_{6}, χ_{α1} ), (π_{11}, χ_{α1} ), and (π_{13}, χ_{α1} ) have the same signs, ( + , , ). Thus, as we would predict, (π_{1}, χ_{α1} ), (π_{2}, χ_{α1} ), and (π_{3}, χ_{α1} ) in Table 6 would follow the same trend as that shown in the ?_{π1} column of Table 3, meaning they would be less than the average.
For the case of comparing the evaluation value of a policy, if the average performance of ordering policies were not significantly different and there are complementarities, such as π_{1} and π_{3}, then a difference in evaluation values of ordering policies implies which ordering policy should be applied. For example, from Table 6, applying the original policy of the evaluation value π_{1}, which is a
(s, Q) policy, for group χ_{α1} , and applying the origin policy of the evaluation value π_{3} for group χ_{α2} is superior to using either one alone.4. CONCLUSION
In this paper, our focus was on an approach to how to categorize items for a suitable ordering policy based on shipping statistics according to an evaluation of the ordering policy.
The practical point of view for real business of our proposed method is that practitioners can find, easily, significant shipping statistics values for selecting an ordering policy and can select an ordering policy for each item based on those significant statistics values efficiently. In the phase of finding significant shipping statistics values, we can look at the relation between the set of shipping statistics and the set of evaluation values of ordering policies.
In the numerical experiments, actual shipping data were used for calculating shipping data and simulating ordering policies. To evaluate the ordering policy, we considered the simulation results of
(s, Q) and(R, S) policies for each item. Then, we applied canonical correlation analysis between the set of shipping statistics and the set of evaluation values of inventory policies.Using the results, we showed the effectiveness of grouping items by shipping statistics. Because groups, which are separated by cluster analysis, have different trends in evaluation values, it is possible to see how to select an ordering policy from shipping statistics. For example, by selecting ordering policies appropriately, we can reduce both shortage time and total daily stock.
Furthermore, we conjecture that the results of applying our method would vary according to the demand forecasting method, demand uncertainty, parameter settings for ordering policies, evaluation criteria settings and criteria of shifting ordering policy. By analyzing the interactions of these factors, we would like to extend the model beyond inventories to consider the global optimization of the performance of the entire supply chain management system.

[Figure 1.] Flowchart of proposed algorithm.

[Table 1.] Shipping data

[Table 2.] General variable of each item

[Table 3.] Canonical loadings (shipping statisticsevaluation value, differences between policy evaluation values)

[Table 4.] Correlation between shipping statistics and evaluation value of ordering policy

[Table 5.] Standard partial regression coefficients of multiple linear regression analysis between the set of shipping statistics and each evaluation value

[Table 6.] Statistical average of all items and two groups separated by the ward method (cluster analysis)