Energy Aware Scheduling of Aperiodic RealTime Tasks on Multiprocessor Systems
 DOI : 10.5626/JCSE.2013.7.1.30
 Author: Anne Naveen, Muthukumar Venkatesan
 Organization: Anne Naveen; Muthukumar Venkatesan
 Publish: Journal of Computing Science and Engineering Volume 7, Issue1, p30~43, 30 March 2013

ABSTRACT
Multicore and multiprocessor systems with dynamic voltage scaling architectures are being used as one of the solutions to satisfy the growing needs of high performance applications with low power constraints. An important aspect that has propelled this solution is effective task/application scheduling and mapping algorithms for multiprocessor systems. This work proposes an energy aware, offline, probabilitybased unified scheduling and mapping algorithm for multiprocessor systems, to minimize the number of processors used, maximize the utilization of the processors, and optimize the energy consumption of the multiprocessor system. The proposed algorithm is implemented, simulated and evaluated with synthetic task graphs, and compared with classical scheduling algorithms for the number of processors required, utilization of processors, and energy consumed by the processors for execution of the application task graphs.

KEYWORD
Multiprocessor scheduling , Application mapping , Energy aware scheduling , Task graphs , DVFS

I. INTRODUCTION
Various scheduling policies for realtime multiprocessors systems have been constantly evolving. Scheduling of realtime applications/tasks on multiprocessor systems often has to be combined with task to processor mapping. With the current design trends moving towards multicores and multiprocessor systems for high performance and embedded system applications, the need to develop design techniques to maximize utilization of processor time, and at the same time minimize power consumption, have gained importance. General design techniques to achieve the above goals have been fueled by the development of both hardware and software solutions. Hardware solutions to minimize power and energy consumption include: dynamic voltage and frequency scaling (DVFS) processors, dynamic power management modules (Advanced Configuration and Power Interface, ACPI), thermal management modules, intelligent energy management (IEM), and heterogeneous multicore or multiprocessor systems [1]. Software solutions for maximizing processor utilization and energy minimizations include: parallelization of instructions, threads, and tasks; effective thread/task scheduling; and mapping algorithms for multicore and multiprocessor systems.
In this work we propose a new energy aware, probability based offline scheduling algorithm of aperiodic realtime tasks for a multiprocessor system. The proposed solution combines the traditional mapping and scheduling algorithm into a unified process. The proposed algorithm is modeled for multiprocessor (homogeneous processors) systems that support dynamic voltage scaling (DVS) processors. The algorithm is based of the forcedirected scheduling (FDS) algorithm [2], most commonly used during the highlevel design synthesis. The algorithm determines an optimal schedule for each task, on a processor operating at a discrete voltage level with constant frequency, which will be able to satisfy the deadline. The algorithm also maximizes the utilization of all processors, and minimizes the energy consumption of the system. For comparison purposes, this work implements four different variations of the scheduling algorithms for multiprocessor systems that support DVS processors. The first three scheduling algorithms implement the scheduling and mapping algorithms as different processes. The first scheduling algorithm, called scheduling after scaling (SaS), performs the mapping of tasks to the processors, and iteratively scales down the voltage of the processors from the maximum operable voltage, based on the slack of the tasks. Once the tasks have been mapped to the processors, voltage levels are determined, and scheduling is performed. In the second scheduling algorithm, called scheduling before scaling (SbS), the tasks are first scheduled based on the deadline (earliest deadline first, EDF) [3]. The SbS scheduling algorithm is similar to the lowenergy EDF algorithm proposed by Manzak and Chakrabarti [4]. Once scheduled, the tasks with slack are assigned to processors with lower operating voltages. In the above two algorithms, processors are added to the system, to satisfy the dependency and deadline constraints. In the third scheduling algorithm, called probability based scheduling (PbS), the scheduling is based on optimizing the probability (probability of utilization) of the tasks executing in a unit time step. The probability function optimizes the utilization of each processor, and hence the utilization of the whole multiprocessor system. Mapping of the tasks and assignment of voltage levels to processors is achieved by determining the probability of utilization of each processor operating at different voltages. The final scheduling algorithm, called energyprobability based scheduling (EPbS), combines the process of scheduling, and mapping or assignment of voltages to processors, together. The probability of utilization function incorporates the energy factor, and the scheduling algorithm determines task operation times and voltage levels of the respective processors. The developed algorithms have been implemented, simulated and evaluated for synthetic realtime applications (standard task graphs). The algorithms are compared for utilization of the processors, number of processors (resources), and energy consumed by the processors.
II. PRIOR WORK
Scheduling algorithms have been classically categorized into online or offline, priority or nonpriority, preemptive or non preemptive, and hard or soft deadline scheduling algorithms. Two of the classical scheduling algorithms for uniprocessor are fixed priority ratemonotonic scheduling (RMS), and dynamic priority EDF algorithms [3]. With the advent of multicore and multiprocessor systems, where each processor is capable of dynamic performance and energy consumptions based on variable voltage and frequency operations, scheduling and mapping algorithms to optimize performance and energy have gained importance. Weiser et al. [5] were the first to propose the DVFS technique. Their approach was to reduce CPU ideal times, by dynamically adjusting DVFS levels in OSlevel scheduling algorithms. Yao et al. [6] and Ishihara and Yasuura [7] proposed an optimal offlinescheduling algorithm for independent and dependent tasks, respectively, to minimize energy consumption.
Aydin et al. [8] proposed a dynamic speculative scheduling algorithm, by exploiting slack times. Zhu et al. [9] proposed scheduling algorithms to improve the above algorithms for multiprocessors for dependent and independent tasks, to share slack times. More recent works [1014] on DVFS do not account for frequency scaling, as analysis shows performance loss on processor frequency scaling. Also, work by Dhiman and Rosing [14] has shown that memory intensive tasks have less dependency on frequency scaling, which is true with modern embedded applications. Also, issues such as task granularity and shared resource contention in caches, memories and buses have to be accounted for. Algorithms for resource contention aware or mitigations have been explored in [1518]. Lee and Zomaya [19] proposed a DVFS energy aware scheduling algorithm for a heterogeneous distributed multiprocessor system. Also, work comparing dynamic power management and DVFS methods has been conducted by Cho and Melhem [20], and has concluded that DVFS scheduling methods provide better energy saving for multiprocessor systems. Shin and Kim [21] have considered the problem of mapping the tasks of the task graphs from realtime applications to available cores, and then determined the DVFS for each core to meet the timing deadlines, while minimizing the power consumption. The problem of determining the optimal voltage schedule for a real time system with fixed priority jobs, implemented on a variable voltage processor, is discussed in [22].
Okuma et al. [23] addressed the problem of less energy consumption, by simultaneously assigning the CPU time and a supply voltage to each task that results in low power. This algorithm uses a variable voltage processor core to schedule the tasks [7]. An online energy aware algorithm for distributed heterogeneous hard realtime systems, based on a modification of the EDF algorithm, is proposed in [8]. The energy reduction is obtained by reducing the speed of the processor when the only ready task in the system has no successor, or when the previous task has not exhausted its worstcase execution time (WCET).
In this work, we present an offline dynamic voltage scaling scheduling algorithm for dependent and nonperiodic task graphs, to optimize energy consumption on a multiprocessor system. The proposed work in this article follows a similar approach to the method presented in [21], but with the following differences and advantages: 1) this work considers task graphs derived from realtime applications, as well as synthetic task graphs; 2) the process of mapping the tasks to a processor, voltage scaling of the processor and scheduling are considered together; and 3) only voltage scaling is considered in this work, for the reasons mentioned above. However, frequency scaling can also be implemented, by considering an appropriate cost function.
III. DEFINITIONS
Definition 1 (Task model). A task model is required as the basis for discussing scheduling. A realtime task is a basic executable entity, which can be scheduled; it can be either periodic or aperiodic, with soft or hard timing constraint. A task is best defined with its main timing parameters. This model includes the following primary parameters.ri  release time of the task
ei  WCET of the task
Di  relative deadline of the task
di  absolute deadline of the task
pi  period of the task
gi  start time of the task
hi  end time of the task
si  slack of the task = di ？ ei
Definition 2 (Scheduling model). For the problem of scheduling, we consider a task set or application A = {T_{0}, T_{1}, …, T_{n}} = {T_{i}}, where each task T_{i} is an aperiodic task with a hard deadline d_{i}, and WCET of e_{i} and bestcase execution time (BCET) of e_{z}. These tasks execute on a multiprocessor system consisting of processing elements PE = {PE_{j}} = {PE_{0}, PE_{1}, …, PE_{m}}, which can operate at a prespecified set of voltages V = {V_{1}, V_{2}, …, V_{p}}, where i < j implies V_{i} > V_{j}. We denote a task T_{i} running on the processor PE_{j} scaled at voltage V_{k} as T_{ik}. In this work we also phrase the WCET of a task T_{i} as its execution time at maximum voltage V_{1}, denoted by e_{i1}. Thus, its execution time at voltage V_{k} is determined as:Also, all tasks T are subject to precedence constraints, and a partial order T_{i} < T_{j} is defined on T, where T_{i} < T_{j} implies that T_{i} precedes T_{j}. The problem of scheduling of an application A is denoted as:
If there exists a precedence operation between any two tasks, such that the execution of a task T_{i} should occur before the execution of task T_{j}, represented as T_{i} < T_{j}, then for each T_{i} ∈ T, t_{i} + e_{i} ≤ d_{i} and t_{n} + e_{n} ≤ D, where D is the absolute deadline of the last task.
Definition 3 (Mapping and voltage scaling). Mapping is the process of assigning each task in an application to a processor, such that the processor executes the task and satisfies the task deadline, as well as other constraints (power, throughput, completion time), if any. For an application A = {T_{1}, T_{2}, …, T_{n}}, where T_{i} is a task instance in the application, mapping a task, denoted as σ(T_{i}), is defined as an assignment of the processor {P_{j}} = {PE_{1}, PE_{2}, …, PE_{m}} = PE for the task T_{i}:Application mapping is defined as mapping of all tasks, T_{i}, in the application A to processors, and is denoted as:
Voltage scaling is the processes of assigning a discrete operating voltage to each processor, such that the processor executes the tasks and satisfies the task deadline, as well as other constraints (power, throughput, completion time), if any. For a set of processors PE = {PE_{1}, PE_{2}, …, PE_{m}} that can be scaled at discrete voltages V = {V_{1}, V_{2}, …,V_{p}}, voltage scaling is denoted as δ(T_{ik}), and is defined as an assignment of voltages (V_{k}) and task (T_{i}) to the processors:
Definition 4 (Processor power model). Assuming that a processor PE_{ik} is scaled at a supply voltage V_{k}, and a frequency f_{j}, and a task T_{i} takes ‘n ’ clock cycles on the processor to complete its execution, the power consumption in the processor is given by:where, C_{L} is the output capacitance, and N_{sw} is the number of switches per clock. Energy consumption by a task T_{i} can be computed as:
From the above equations, we can conclude that lowering the supply voltage drastically reduces the power and energy consumption of the particular processor. The supply voltage also affects the circuit delay. The circuit delay is given by T_{d} = k * V_{k}^{2}/(V_{k} ？ V_{t})^{2}, where k is a constant that is dependent on output capacitance, and V_{t} is the threshold voltage.
Definition 5 (Scheduling power and energy model). The power consumed by a schedule is the total power consumed by all tasks that execute at the time period t, and is denoted as P_{α}(t),Energy consumed by the schedule is the integral of P_{α}(t),
Definition 6 (Problem statement). Given an application A with tasks T_{i}, determine an optimal schedule α(A) with minimal energy consumption for a multiprocessor system PE, while mapping tasks to PEs, σ(A) and voltage scaling the processors, δ(PE) in a multiprocessor system, such that all tasks are executed before their respective deadlines, and the end task before the final deadline D.Minimize and minimize (E_{α}), t_{i} + e_{i} ≤ d_{i} and t_{n} + e_{n} ≤ D. IV. ENERGY AWARE SCHEDULING ALGORITHM
> A. Tasks Deadlines
Since the input task graph considered has only a final hard deadline, the first step is to determine the deadlines of all the tasks.
Find the path with the highest execution time: the path with the highest execution time is determined using a depth first search algorithm that determines all paths, from source to sink, of the task graph. The highest execution time is denoted as emax.
Find the deadlines of the tasks by backtracking: the deadlines of all the tasks are determined by backtracking to each task from the sink task. In order to determine the efficiency of the proposed algorithms the highest execution time emax is relaxed to ？max (D = ？max). This relaxation of the deadline allows the tasks to be operated at a lower voltage, and hence better power and energy consumption.
The deadlines of all other tasks are found by backtracking from the sink task. The deadline of a predeccessor task d_{i} with one sucessor task d_{j} is determined as d_{i} = d_{j}e_{j}. If a task has more than one successor, then the deadline is obtained by choosing the minimum of the differences of the deadlines and execution times of the successors. For example, a task i has successors j, k and l, then d_{i} is given, determined as min(d_{j}e_{j}, d_{k}e_{k}, d_{l}e_{l}). The slack of each task is calculated as the difference of deadline and execution time of each task, s_{i} = d_{i}e_{i}.
> B. Scheduling after Scaling Algorithm (SaS)
The proposed SaS algorithm first allocates voltages to each task, and then progressively schedules and binds the tasks to the processors.
Step 1 (Scaling the voltage of each task). Assume that the tasks can only execute at a prespecified set of voltages, and all tasks are by default assigned the highest voltage. First the tasks are sorted with respect to their slack. Then the task with least slack is made the current task (T_{i} ), and its execution time (e_{i} ) is modified by a “scaling factor”. The scaling factor is defined as the ratio of the current voltage (V_{i} ) assigned to the task, to the minimum voltage (V_{k} ) that can be assigned to the task, provided that the execution time of the task is less than the deadline of the task. The new execution time of the task is denoted as e_{i}. The start times and end times of all the successors of the current task are modified, until the sink. If any task misses its deadline, then the scaling factor is reduced. The new voltage assigned to the task is denoted asV _{(k1)}. Again, the start and end times of the successors are modified, to see if any of the tasks misses its deadline. This procedure is repeated until none of the tasks misses its deadline, and the voltage at which the current task executes is fixed. This voltage scaling is performed on all the tasks in the graph, sorted by its slack time.Step 2 (Assigning the tasks on processors). The tasks are then scheduled on the processors, based on the EDF scheduling algorithm, and the first come first served (FCFS) mapping algorithm. At every instance of the start time of a task, a check is done to see if any processor is idle. The task is scheduled on the idle processor, if any; else a new processor is initialized. The pseudocode of the SaS algorithm is shown in Algorithm 1.Algorithm 1 Scheduling after Scaling (SaS)
Inputs: A = {T1, T2,…, Tn}, PE = {PE0, PE1, …, PEm}
Outputs: σ(A), δ(Tik), α(Ti)
repeat {
assign highest voltage to tasks;
sort tasks based on their slacks;
scale voltages based on slacks;
} until (all tasks are scaled)
repeat {
schedule tasks based on deadlines (EDF);
assign tasks to free processors;
} until (all tasks are scheduled)
> C. Scheduling before Scaling Algorithm (SbS)
This algorithm performs scheduling of the tasks on the available processor, before scaling the task voltages. This algorithm is an extension of the EDF algorithm. The SbS algorithm first performs scheduling, followed by assignment of voltage to the tasks.
Step 1 (Sorting the tasks). This algorithm is different from the SaS algorithm in sorting the tasks. All the tasks are initially sorted with respect to their deadlines (similar to EDF scheduling algorithm).Step 2 (Scheduling of tasks). The SbS algorithm proceeds from the source task to the sink task. A breadthfirst search approach is adopted, to determine the task with the shortest deadline, and is scheduled first. This process is repeated, until all tasks are scheduled.Step 3 (Assigning the tasks to processors). The tasks are then scheduled on the processors. At every instance of the start time of a task, a check is done to see if any processor is idle. The task is scheduled on the idle processor, if any; else a new processor is initialized.Algorithm 2 Scheduling before Scaling (SbS)
Inputs: A = {T1, T2, …, Tn}, PE = {PE0, PE1, …, PEm}
Outputs: σ(A), δ(Tik), α(Ti)
repeat {
assign highest voltage to tasks;
sort tasks based on their deadlines (EDF);
assign tasks to free processors;
} until (all tasks are scheduled)
repeat {
for each processor {
for each task {
scale voltages of based on slacks and deadline;
} }
} until (all tasks are scaled)
Step 4 (Voltage scaling of the task on processors). The schedule produced from the above step is based on task deadlines only, and results in the minimum number of processors required to execute a task set. After scheduling, the voltages of the tasks are scaled, as in the SaS algorithm. The algorithm proceeds by sorting the processor, based on the tasks assigned to the processor. Voltage scaling is performed for each task on the processor. As voltage scaling is preformed on each task, the start and end times of its successor task in the same processor and any other processor, and the subsequent tasks on the same processor, are also modified. The pseudocode of the SbS algorithm is shown in Algorithm 2.> D. Probability based Scheduling Algorithm to Maximize Utilization
The proposed PbS algorithm increases the utilization of the processors, by optimizing the number of processors and energy consumed by the processors, by voltage scaling the tasks (T_{i}) that execute in the processor. The schedule is divided into equal time steps (Δt_{j}). The PbS algorithm determines the probability of utilization P(T_{i}(Δt_{j})) of a task in each time step (Δt_{j}). A task is scheduled to a time step where the probability of utilization of the task is maximum. The probability of utilization is the sum of the probabilities of utilization of the current task (T_{i}) and its successors (T_{k}). Initial probabilities are calculated by assuming all tasks are operating at the highest voltage. When voltage scaling is performed on the tasks, the probability of utilization is dynamically updated, based on the time steps assigned to the voltage scaled task. Finally, the number of processors required to execute the final schedule is calculated. The following steps explain the algorithm in detail.
Step 1 (Determining the time step). All the tasks in the task set A = {T_{1}, T_{2}, …, T_{n}} are assumed to be executing at their BCET, i.e., all the tasks are currently running at the highest voltage. Divide the task graph into time steps, with the smallest execution time of all tasks as the unit of time steps.Step 2 (Bestcase time scheduling). The next step is to perform the bestcase time (BCT) scheduling algorithm. The BCT algorithm starts with the source tasks in the task graph, and proceeds in a breadthfirst order. Each task is scheduled in the earliest available time step (BCT(T_{i})), and only if its predecessors are scheduled. A task T_{i} is characterized by its start time, BCT_{g}(T_{i}), and end time, BCT_{h}(T_{i}). This algorithm determines the earliest possible execution time of the task graph.Step 3 (Worstcase time scheduling). The worstcase time (WCT) scheduling algorithm works exactly in the same way as the BCT algorithm, except that it starts at the sink of the task graph, and proceeds upwards. This algorithm determines the longest possible execution time for the task graph. A task T_{i} is characterized by its start time, WCT_{g}(T_{i}), and end time, WCT_{h}(T_{i}). This algorithm determines the latest possible execution time of the task graph.Step 4 (Determining the bound limit). The algorithm proceeds by determining the bound limits (Δt_{i}) of each task. The bound limit of a task T_{i} is calculated as:where Δt_{i} represents the time steps within which the current task can be scheduled.
Step 5 (Determining the probability of utilization of each task in each time step). The next step in the algorithm is to determine the probability of utilization of each task in each time step, within the bound limit of the task (Δt_{ij}). The probability of utilization of a task T_{i} in a time step Δt_{ij} is represented as PU(T_{i}(Δt_{ij})), where Δt_{ij} ranges from the earliest start time, to the latest end time of the task T_{i}.Step 6 (Distributed probability of utilization). The distributed probability of utilization is the summation of probabilities of utilization of each task in each time step. It signifies the portion of a task that will be executed in a time step. The distribution probability of utilization DPU(Δt_{ij}) for a time step Δt_{ij} is given as:Step 7 (Schedule the task in a time step). The algorithm schedules the task T_{i} at time step Δt_{ij} that has the maximum probability of utilization. Once the task (T_{i}) is fixed to a time step (Δt_{ij}), the probabilities of utilization of its successors (T_{k}) are to be updated.The change in the probability of utilization of a task T_{k} is denoted by (ΔPU(T_{k})). If the current task T_{i} is scheduled to execute in time step t_{j}, then the change in the probability of utilization of T_{i} is determined as ΔPU(T_{i}(Δt_{ij})) = 1 ？ PU(T_{i}(Δt_{ij})), and the change in the probability of utilization of successor task T_{k} is determined as ΔPU(T_{k}(Δt_{ij})) = PU(T_{i}(Δt_{ij})).
Step 8 (Calculate the cumulative probability of utilization of the current task). The algorithm balances the distribution probability, by calculating the cumulative probability of utilization (CPU) of each task to time step assignment, and then selects the tasks with the smallest CPU. The CPU of a task T_{i} in time step Δt_{ij} is determined as CPU(T_{i}(Δt_{ij})) = DPU(Δt_{ij}) * ΔPU(T_{i}(Δt_{ij})). The total CPU of a task that is scheduled from time step Δt_{ij} to Δt_{ik} is calculated as:Step 9 (Calculate the CPU of the successor tasks). Scheduling the current task (T_{i}) directly affects the scheduling of the successor tasks (T_{j}). Hence the successor effect is also considered, while determining the total cumulative probability of utilization TCPU of a task.Step 10 (TCPU). The TCPU of a task determines the feasibility of scheduling the task in that particular time step. The least effect determines the step in which the task has to be executed. TCPU signifies that the task graph will use fewer resources (higher utilization). The TCPU of a task T_{i} is given as:Steps 710 are repeated for all the tasks in the task set, and a final schedule is obtained.
Step 11 (Voltage scaling). Based on the schedule developed above, voltage scaling of the tasks are performed as in the SaS algorithm, and the FCFS mapping algorithm. At every instance of the start time of a task, a check is done to see if any processor is idle. The task is scheduled on the idle processor, if any; else a new processor is initialized. The final step in this algorithm is to find the number of processors required to execute the scheduled task set. By the end of scheduling, all the tasks are ready to be executed at a specified voltage. This algorithm finally returns a schedule in which the task voltages are scaled, making sure none of the tasks miss their deadlines, and that these tasks are executed on processors in such a way that the utilization of the processors is high. The pseudocode of the PbS algorithm is shown in Algorithm 3.Algorithm 3 Probability based Scheduling (PbS)
Inputs: A = {T1, T2, …, Tn}, PE = {PE0, PE1, …, PEm}
Outputs: σ(A), δ(Tik), α(Ti)
repeat {
determine the bound limit for tasks;
compute Probability of Utilization (PU);
compute Distributive PU (DPU);
compute Cummulative PU (CPU);
compute Total CPU (TCPU)
schedule tasks with min. TCPU;
update PU;
} until (all tasks are scheduled)
for each processor {
for each task {
scale voltages of based on slacks and deadline;
} }
} until (all tasks are scaled)
> E. EnergyProbability based Scheduling Algorithm to Minimize Energy
The PbS algorithm proposed above reduces the concurrency of tasks in all time steps, to minimize the resources. However, the energy factor is not considered during scheduling. To achieve this, the energyprobability based scheduling (EPbS) algorithm is proposed. The algorithm is similar to the PbS algorithm, but has an additional factor of voltage included in calculating the TCPU, called the TCPUenergy (TCPUE) factor.
Step 10 (TCPUE factor). Let V = {V_{1}, V_{2}, …, V_{k}} be the set of voltages at which a task can execute. The (TCPUE(T_{ik})) is calculated for different voltages V_{k} for the task T_{i}. The least TCPUE(T_{ik}) determines the voltage at which the task will be executed. After a task is fixed in voltage, the affected distributed probabilities of utilization are updated. This process is repeated for all the tasks, and each one is scheduled for execution at a particular voltage, and the distributed probabilities of utilization are updated dynamically, to get an optimized schedule. After determining the TCPUE, the task is scheduled in the time step where the TCPU is minimal. The TCPUE of a task T_{i} in time step Δt_{j} for each operating voltage V_{k} is determined as follows:The value of CPU(T_{i}) and CPU(T_{j}) for the current task and its sucessor tasks are determined as shown in steps 8 and 9, respectively, in Section IVD. The TCPUE of each task operating at a specific volatge is calculated, by multiplying a voltage factor V_{r}^{2} to the CPU terms. The voltage factor incorporates the energy awareness in the scheduling algorithm. The voltage factor V_{r}^{2} is the ratio of the current operating voltage to the maximum possible operating voltage, V_{k}^{2}/V_{p}^{2}. The algorithm schedules a task at a voltage in the time step where the TCPUE is minimal. Thus in this method, the voltage scaling and mapping are implicitly integrated into the scheduling algorithm. Also, initial voltages are assigned to tasks based on their respective slacks. After a task is assigned a voltage, the affected distributed probabilities of utilization are updated. This process is repeated for all the tasks, and each one is scheduled for execution at a particular voltage, and the TCPUE is updated dynamically to get an optimized schedule. The final step in this algorithm is to find the number of processors required to execute the scheduled task set, as shown in step 11 (Section IVD). The pseudocode of the EPbS algorithm is shown in Algorithm 4.
Algorithm 4 EnergyProbability based Scheduling (EPbS)
Inputs: A = {T1, T2, …, Tn}, PE = {PE0, PE1, …, PEm}
Outputs: σ(A), δ(Tik), α(Ti)
repeat {
determine the bound limit for tasks;
compute Probability of Utilization (PU);
compute Distributive PU (DPU);
compute Cummulative PU (CPU);
compute Total CPUEnergy (TCPUE)
schedule tasks and assign processor;
update PU;
} until (all tasks are scheduled)
V. EXAMPLE
This section explains the SaS, SbS, PbS, and EPbS algorithms, with the aid of an example. Fig. 1 shows an example task graph, consisting of ten tasks, with edges between them indicating the dependencies between the tasks. The source (task 0) and sink (task 11) tasks are redundant tasks that have zero execution time. This task graph is expressed in standard task graph (STG) format, as shown in Table 1. Only the final deadline of the end task in the STG is specified. The first step of all proposed algorithms is to determine the deadline and slack of each task, by backtracking from the sink task, as described in
Section IVA. Each task T_{i} is represented by parameters e_{i}, {T_{j}}, {T_{k}}, g_{i}, h_{i}, d_{i}, and s_{i}, which are the execution time, predecessor task set, successor task set, start time, end time, deadline and slack of the task, respectively. Also, a task executed on a processor operating at a specific voltage is simply refered to as a task operating at a specific voltage, or the voltage assigned to the task.
> A. SaS
Initially, all processors are assumed to be operating at a maximum voltage in the voltage set V = {5 V, 3.3 V, 2.4 V}. The task graph and each task in the task graph meet their respective deadlines when operated at the maximum voltage. Another assumption is that each processor runs at a preset voltage, and voltage switching between tasks is not supported. In the first step, tasks with zero execution time are eliminated, and the tasks are sorted with respect to their slack. Next, voltage scaling is applied to the tasks. For the example, task T_{1} has the least slack (s_{1} = 11.5), and has an initial execution time, e_{1} = 9. The new execution time e_{1} of the task T_{1} at voltage 2.4 V is determined as e_{1} = e_{1}×(5/2.4) = 18.75. The start and end times (？ _{j} and ？ _{j}) of all successor tasks T_{k} of T_{1}, given as T_{k} = {T_{2}, T_{3}, T_{9}, T_{10}}, are updated with respect to the new deadline of task T_{1}, and evaluated to see if they meet their respective deadlines. The above process is called voltage scaling. For the voltage scaled task T_{1}, none of the successor tasks (T_{k}) misses its deadline, and hence the task T_{1} is assigned a voltage of 2.4 V. If any task misses its deadline, then task T_{1} is scaled to a voltage greater than 2.4 V. This process is repeated for voltages greater than 2.4 V, and checked to satisfy deadlines. This procedure is repeated for all tasks in the task graph.
Voltage scaling of all the tasks results in partitioning the tasks into three partitions, where each partition contains tasks operating at a specified voltage. An earliest deadline first scheduling is performed on the task graphs. If a task is to be scheduled at a specific voltage and the processor operating at that specific volatge is busy, a new processor is added to the resource. The schedule developed by the SaS algorithm is shown in Fig. 2. The total number of processors required by the SaS algorithm is 6.
> B. SbS
The SbS algorithm performs scheduling of the tasks on the available processor, before scaling the task voltages.
The algorithm proceeds by first scheduling the tasks based on the EDF scheduling algorithm. The voltage scaling succeeds scheduling by scaling each task in each processor, and determines if the following conditions are satisfied: the current task or any of its successors meet their deadlines, or any of the tasks that are executing on the same processor as the current task do not miss their deadlines. It is important to note that when a task is scaled, all the tasks on that processor are also scaled to the same voltage, because it is assumed that a processor can operate at that specified voltage. As shown in Fig. 3, the SbS algorithm starts by choosing processor P_{1} and task T_{1}. The task T_{1} is voltage scaled to the voltage 2.4 V, and checked to see if it satisfies its deadline. If the deadline of task T_{1} is satisfied, the start and end times of all its successor tasks {T_{2}, T_{3}, T_{9}, T_{10}} are also modified for their start time and end time. If any task in this processor misses its deadline, task T_{1} is scaled to a higher voltage, and their execution times are checked, to see if all tasks satisfy their deadlines. If deadlines are satisfied, the task is operated at that voltage. This procedure is repeated on all the tasks in that processor. The total number of processors required by the EDF algorithm is 4. Fig. 3 shows the final schedule obtained using the SbS algorithm, where the processor P_{1} operates at 5 V, and all other processors are scaled to operate at 2.4 V.
> C. PbS
The first step in this algorithm is to determine the BCT and WCT schedules for the task graph. Since the final deadline of the task graph is ？34.5？ = 34, the whole graph is divided into 34 time steps (Δt_{j}). The bound limit for each task (Δt_{i}) is determined as shown in Equation (10). Table 2a shows the bound limit of each task, along with the BCT and WCT start times (g_{BCT} and g_{WCT}) of each task. For task T_{i}, the task has the earliest start time Δt_{1j} = 0, and the latest start time Δt_{1j} as 11. The next step is to determine the probabilities of utilization of each task in each time step from Δt_{11} to Δt_{111}+ e_{1} = 20 = d_{1}, where e_{1} = 9. These probabilities are determined by using Equation (11). For the example, the probabilities of utilization (PU(T_{1}(Δt_{j}))) for task T_{1} within the range Δt_{1j}, j = 0 to 20 are given in Table 2b. After determining the probabilities of utilization of each task in each time step, the next step is to find the distributed probabilities of utilization (DPU(Δt_{1j})) for each time step. This is done by calculating the sum of probabilities of utilization for all task occurring in each time step. Table 2c lists the distributed probabilities of utilization for time step Δt_{i1}. Then, the CCPU(T_{i}) of the task is determined by scheduling the task in a control step, and finding the variation in probability of utilization, as given by Equations (12) and (13). The TCPU(T_{j}) is again the summation of all CPU(T_{i}) of a task and its successors CPU(T_{j}). Due to the high volume of data, the CPU(T_{i}) and CPU(T_{k}) are omitted. The task T_{j} that has the highest value of TCPU in time step Δt_{1j} is scheduled in that current time step. The probabilities of utilization of subsequent time steps and tasks are updated, and the above processes repeated for sucessor tasks. Throughout this procedure, the tasks are assumed to be running at 5 V.
The next major step is to scale the voltages of the tasks in such a way that neither the task, nor its successors, miss their respective deadlines. The schedule obtained so far assumes that the tasks are assigned 5 V. Each task in this schedule is then voltage scaled, as described in the SaS algorithm. The last and final step in this algorithm is to determine the number of processors required to accommodate all the tasks in the schedule. This is accomplished as explained in step 11 of Section IVD. The processors are assumed to be operating at only one particular voltage,
without the ability to shift voltages. The final schedule for the benchmark program 10.stg is shown in Fig. 4a.
> D. EPbS
The CPU for each task and its sucessor in time step Δt_{i1} are evaluated as shown in step 10 of Section IVD. The TCPUEs of a task at 5.0 V (V_{1}), 3.3 V (V_{2}) and 2.4 V (V_{3}) for the task T_{i} are determined as shown in Equation (16), and are represented as TCPUE(T_{iV1}), TCPU(T_{iV2}), and TCPUE(T_{iV3}), respectively. The voltage factor V_{r}^{2} for the above three cases is given as V_{1}^{2}/V_{1}^{2}, V_{2}^{2}/V_{1}^{2}, and V_{3}^{2}/V_{1}^{2}, respectively. The algorithm schedules a task at a voltage in the time step where the TCPUE is minimal. After a task is assigned a voltage, the affected distributed probabilities of utilization are updated. This process is repeated for all the tasks, and each one is scheduled for execution at a particular voltage; and the TCPUE is
updated dynamically, to get an optimized schedule. The final schedule obtained by EPbS is shown in Fig. 4b.
VI. RESULT
Processors supporting several supply voltages are available. Various supply voltages result in different energy consumption levels for a given task T_{i}. The voltage at which a task is run can be decided before the execution of the task, or sometimes, when the task is executing. In the former method, a particular voltage is assigned to a task, and it is run at that voltage on the processor. In the latter case, the switching of the voltage while the tasks are running on the processor is controlled by the OS. These variable voltage processors operate at different voltage ranges, to achieve different levels of energy efficiency.
All the four scheduling algorithms have been implemented
in C++. The STG set were used to evaluate the performance of the above algorithms. The STG benchmark suite contains both synthetic, and practical realtime application task graphs. The assumptions followed during evaluation are as below:
All the processors operate at a prespecified set of voltages.
All the processors operate at a constant and identical frequency.
Processors cannot shift from one voltage level to another while executing a task set. This constraint allows us to ignore overheads, such as processor voltage transition times, transition energy, states, and steps [24] during scheduling.
The switching capacitance of the processors is constant, and is the same for all the processors.
All the tasks are assumed to be nonpreemptable, aperiodic, dependent tasks.
The performance of the SaS, SbS, PbS, and EPbS is summarized below, based on the results obtained for the various benchmarks, as shown in Tables 35. From Table 3, we conclude that the SbS algorithm results in the smallest required processors to complete the task graphs. However, the SaS and SbS algorithms have more processors operating at 2.4 V (lowest operation voltage), and hence will have larger completeion times for the task graphs. Almost 90% of the processors operate at this voltage. On the other hand, the PbS and EPbS algorithms have processors equally operating at all voltages (distributed almost uniformally). Table 4 shows the average processor power consumed by each algorithm (SaS = 208, SbS = 262, PbS = 351, EPbS = 324), for each benchmark. The following conclusions are derived from Table 4: the SaS and SbS algorithms have lesser average processor power consumption (？25% less) compared to the PbS and EPbS algorithms. Table 5 shows the processor utilization (SaS = 42%, SbS = 43%, PbS = 65%, EPbS = 81%) achieved by the algorithms for the various benchmarks. In this case, the PbS and EPbS algorithms outperform (？35%) the SaS and SbS algorithms. Typically, the SaS algorithm results in lower power consumption and higher completion times. This is evident by the large number of processors and tasks operating at 2.4 V (Table 3). Similarly, the SbS algorithm results in a smaller number of processors, and better completion times. The proposed PbS and EPbS algorithms results in higher processor utilizations, but with smaller completion times (Table 5). Hence, the PbS and EPbS algorithms results in smaller energy consumption (average processor power × completion times (or utilization)), as shown in Fig. 5. In this section we provide a quick time complexity analysis of the various algoirthms implemented in this work. Let ‘
n ’ be the number of tasks, ‘m ’ the number of processors, and ‘k ’ the number of voltage levels of the processors. Also, for the PbS and EPbS scheduling algorithms, let ‘Δt ’ be the fixed time interval for which the probability distibutions are determined. Given the above variables, the worst case complexity of the SaS and SbS algorithms can be derived asO(knlog_{2}n) andO(knm) , respectively. Similarly, the complexity of the PbS and EPbS algorithms can be determined asO(Δtn^{3}) andO(Δtk^{2}n^{3}) , respectively.VII. CONCLUSIONS
Scheduling of applications or task graphs in multiprocessor or multicore systems is an important research question that needs to be addressed, in light of today’s emerging embedded system demands. It is often necessary to develop scheduling and task mapping algorithms that optimize processor utilization and processor power consumption. This work proposes an integrated scheduling and mapping algorithm, called PbS, that optimizes the above requirements. The proposed PbS algorithm, and its variation, called EPbS algorithm, are based on the force directed scheduling algorithm used in highlevel synthesis of digital circuits. The algorithms were implemented and tested with synthetic and real application task graphs (benchmarks). The proposed algorithms are compared with traditional scheduling approaches like FCFS (SaS) and EDF (SbS). The SaS and SbS algorithms also implement an explicit mapping, or voltage scaling algorithms, after and before scheduling, respectively. The algorithms are compared for processor utilization, average power consumed per processor, and the number of processors required to implement the given set of task graphs. Based on the results discussed in Section VI, we can conclude that the PbS and EPbS algorithms provide better processor utilization and lower energy consumption, compared to the SaS and SbS algorithms.

[Fig. 1.] Standard task graph.

[Table 1.] Task graph example

[Fig. 2.] Scheduling after scaling algorithm execution for the sample task graph.

[Fig. 3.] Scheduling after scaling algorithm execution for the sample task graph.

[Table 2a.] BCT and WCT start time and bound limit of each task

[Table 2b.1] Distribution probabilities for task T1 within the bound limit Δt1j

[Table 2c.2] CCPU(T1) for task T1 at time step Δti1

[Fig. 4.] (a) Probability based scheduling algorithm (PbS) execution for the sample task graph and (b) energyPbS algorithm execution for the sample task graph.

[Table 3.] Number of processors required for scheduling

[Table 4.] Average power consumed by each processor in nWatts for the scheduling algorithms

[Table 5.] Utilization of the processors for the scheduling algorithms

[Fig. 5.] Comparison of utilization, average power/processor consumption, and total average energy consumption for the SaS, SbS, PbS and EPbS algorithms.