A Novel High Adaptive Fault Tolerance Model in Real Time Cloud Computing
Parveen Kumar
Asst. Professor, Computer Science &
Engineering Department,
National Institute of Technology,
Uttarakhand, India
parveen.it@gmail.com
Gaurav Raj
Asst. Professor,
ASET-CSE
Amity University, Noida.
raj@
Anjandeep Kaur Rai
Department of Computer Science and
Engineering,
Lovely Professional University
Phagwara, India
anjanrai@gmail.com
Abstract—Now-a-days, cloud computing is being used in a variety
of fields, whether it is storage, computation, education etc. It has
emerged from a larger number of technologies like utility
computing, grid computing, cluster computing etc. It offers a
number of advantages like on-demand access, resource pool,
device independence etc. Also, it suffers from various cons like
security, workflow management, fault tolerance. Here, a novel
model (HAFTRC) has been proposed, which is providing high
adaptive fault tolerance in real time cloud computing. The model
is based on computing the reliabilities of the virtual machines on
the basis of the cloudlets, mips, ram and bandwidth etc.
Whosoever virtual machine has the highest reliability is chosen as
the winning virtual machine. If at the end, there are two virtual
machines, whose reliabilities comes out to be same ,then the
winning machine is chosen base on the priority that is assigned to
them.
Keywords: — Reliability, fault tolerance, priority scheduling,
timeliness.
I. INTRODUCTION
Cloud computing refers to a model that provides a broad,
scalable and always available access to a variety of resources
like infrastructure, platform, software, storage etc over the
internet that can be accessed by the cloud users according
to their requirements [1]. It actually supports the reusability
of such resources across the boundaries of particular
organizations [2]. For example, with cloud computing, user
don’t need to install Microsoft Word on his 45 workstations,
instead he just needs to have access to the internet and he can
have the required software that is hosted on some other
location and also he don’t need to worry about the licensing
as well as installation or application of patches etc [3].
Real time systems are being employed in variety of
applications like railway reservation system, small mobile
phones, robotics, laser printers, pacemakers, video
conferencing etc [4] . Real time systems have two main
characteristics viz timeliness and fault tolerance [5].Timeliness
denotes the property of the system to work correctly in the
prescribed amount of time and fault tolerance is the ability of
the system to work gracefully even in the presence of the fault,
so that the user doesn’t get to know that any fault has occurred
in the system [6].
Cloud provides minimum lag and maximum performance
to such system, but also on the same side it increases the
chances of errors in the systems as the nodes are located very
far from each other [7]. Real time systems are also very critical
in nature, so they need such mechanism which will allow them
to work even if something mishappens in the system. So, the
need of the hour is to have such mechanism which will allow
the system to work well in the cloud. Here, a model HAFTRC
has been proposed for providing high fault tolerance to real
time application in the cloud infrastructure.
II. RELATED WORK
A large amount of work has been already done in the area
of real time systems, but there is large research space for fault
tolerance in real time systems on cloud infrastructure. Anjali D.
Meshram et al., [8] presented FTMC (Fault Tolerance Model
for Cloud computing) according to which virtual machines
are made to run different algorithms, and their respective
reliabilities are calculated based on whether the virtual
machines produce the correct result and that also within the
time. If they do so, then their reliability increases and similarly
decreases as well, if they can’t do so. Also, a checkpoint has
been added up in the model so that backward recovery can be
performed in case of complete failure of the system. Sheheryar
Malik et al., [7], presented a model which makes the system
handle the fault and makes the decision according to the
reliability of the virtual machines. Moreover, the reliability of
virtual machines is adaptive in nature i.e. it changes after every
computing cycle. Virtual machine’s reliability increases if
it produces the correct result and within time and also it
decreases if it fails to do so. In addition to it, if the node’s
reliability goes on decreasing, then it is removed and a new
node is added in its place. Reliability of every virtual machine
is checked against a minimum reliability level; if that level is
achieved by node, then it is fine otherwise the system will
perform backward recovery. Sheheryar Malik et al. [8] gave a
model which is based on the idea of time stamped fault
tolerance. In this model, methodology related to distributed
computing along with feed forward artificial neural network
has been adopted. It comprises of forward as well as backward
138978-1-4799-4236-7/14/$31.00 c 2014 IEEE
Figure 1: Proposed Model (HAFTRC) [11]
recovery mechanism. Weights are assigned to the nodes that
are made to run a variety of algorithms.
Proposed model is based on the adaptive reliability of the
virtual machine, i.e. the reliability of the node changes after
every computing cycle. Fault tolerance has been achieved
depending on the reliability of the virtual machines.
III. PROPOSED MODEL (HAFTRC)
Here, a model HAFTRC (High Adaptive Fault Tolerance in
Real time Cloud computing) has been introduced (Figure 1).
This model handles the fault on the basis of the reliability of
the virtual machines. This model consists of two types of
nodes: One node consists of a set of virtual machines and
acceptance module and the second node: adjudicator node
consists of three nodes for elasticity calculation, reliability
calculation and decision making.
A. Working
This model comprises of ‘N’ virtual machines or
nodes which are made to run some operation. Then we
have Acceptance Module (AM) which is responsible for
verifying whether the output that has been produced by
virtual machine is correct and that too within time limit or
not. On the basis of result that is produced by the AM,
Elasticity Calculation module (EC) checks whether the
failed cloudlets are liable of having some elasticity in terms
of CPU cycles. If they are, then they are declared as passed,
otherwise fail. Then, we have Reliability calculation (RC)
module, which is responsible for calculating the reliabilities
of the virtual machines Also, the virtual machine’s reliability
are matched with the System Reliability Level (SRL).The
nodes which have reliability equal to or greater than SRL, are
passed to the Decision Making module. Decision Making
(DM) module makes the final decision of selecting the most
reliable node by considering the reliabilities of the passed
virtual machines given by RC module. The node with highest
reliability is selected as the final output. If two nodes have
same higher reliability, then winning virtual machine is
selected according to priority. Priority is assigned according
to MIPS.
B. Model Description
Acceptance Module (AM) is responsible for two
things: first it is checking whether the operation that has
been performed is correct or not. Secondly, it makes sure
that the operation has been performed in a prescribed
amount of time. Each node or the virtual machine takes a
particular input, executes the operation and then produces
the output. It only passes the result of all the nodes to the
elasticity module. It also informs the Elasticity Calculation
(EC) module to determine whether elasticity can be
provided to the nodes in terms of CPU cycles.
Elasticity Calculation(EC) module analyses the
cloudlets and then determines whether the cloudlets are
applicable to have elasticity of 15 CPU cycles or not. If the
cloudlet is applicable to have elasticity then it is given so
and then its fail status is changed to pass. Using this
approach we can have more successful cloudlets than failed
ones. If the cloudlet is not applicable to get elasticity then it
is simply discarded and is declared as fail.
Reliability Calculation (RC) module is actually the
heart of this model. This module is responsible for
analyzing the reliabilities of each node. The reliability of
virtual machine is adaptive, that is it changes after every
2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 139
computing cycle. Reliability of virtual machine increases if
any of the condition becomes true:
• The amount of ram in host should be greater than the
amount of ram in each virtual machine.
• The amount of MIPS in host should be greater than the
amount of MIPS in each virtual machine.
• The bandwidth in host should be greater than the
bandwidth in each virtual machine.
• If all of the cloudlet gets succeeded then reliability of the
virtual machine increases.
Reliability of virtual machine decreases when any of the
above defined factors gets failed or in case if any of the
cloudlet fails, then the reliability of the virtual machine on
which it is running decreases by some extent. More the
failed cloudlets more will be the decrease in the reliability.
Decision Making (DM) selects the virtual machine
which is having the highest reliability among all the
nodes. If two nodes are having same highest reliability,
then the winning virtual machine will be selected according
to priority assigned. The node with highest priority will
be selected as the more reliable node and then will be
considered as winner. Priority of the virtual machine is
according to the MIPS of the virtual machine, i.e. the
node with highest MIPS is given the highest priority and so
on. This model is very reliable as it continues to operate
even if one of the nodes fails i.e. until all the nodes fail.
IV. FAULT TOLERANCE MECHANISM
Here, the algorithms of various nodes have been discussed.
Algorithm for Acceptance Module (AM)
If (cloudlet.Status=Success && cloudlet finishes in
prescribed time)
Then
The cloudlet will move to the next stage
Algorithm for Elasticity Calculation (EC) Module
If cloudlet needs 15 more CPU cycles to complete its
execution
Then
Cloudlet is allowed to complete its execution
Else
The cloudlet is designated as failed and is not allowed to
move to next level
Algorithm for Reliability Calculation (RC) module
If (total amount of ram, MIPS and bandwidth in the host
is less than amount of ram, mips and bandwidth in each
virtual machine)
Then
Reliability decreases
Else if cloudlet fails
Then
Reliability decreases
Else
Reliability increases
Algorithm for Decision Making (DM) Module
If (first machine is having higher reliability than the
second machine)
Then
First machine is declared as the reliable machine
Else if (two machines have the same highest reliability)
Then
The machine with higher priority is declared as the best
machine
Else
No machine is declared as the reliable machine
V. EXPERIMENTS AND RESULTS
The High Adaptive Fault Tolerance in Real Time
Cloud Computing (HAFTRC) is implemented in CloudSim
simulator. The version of CloudSim used is CloudSim
3.0.2.This is a bug free release. It has certain updates from the
previous version of CloudSim 3.0.0 which are as follows:
• The problem with ant class path declaration has been fixed.
• Calculation of MIPS in PowerVmAllocationPolicy
Migrationbstrac.findHostForVm () has been acknowledged.
• References have been updated to CCPE paper [12].
Here, 3 virtual machines have been created and two
tasks on each virtual machine are made to run, i.e. we
have total of 6 tasks or cloudlets. In this simulation, certain
parameters have been assumed which are as under:
• SRL (System Reliability Level) value is assumed to be 0.6.
• Elasticity has been provide of 15 CPU cycles to failed
cloudlets
MIPS of virtual machines will be changed and results will be
recorded.
First case: Here we are going to change only the MIPS of
VM1
MIPS of VM1:200, VM2:300, Vm3:400
In this case (Figure 2), all the cloudlets run on the three virtual
machines. Cloudlet id 6 and 4 gets failed and both the
cloudlets are moved to the elasticity calculation module,
where cloudlet id 6 is given elasticity. So it gets passed.
140 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence)
Figure 2: MIPS of VM1 changed- VM1: 200, VM2: 300,
VM3: 400
After that we have only cloudlet id 4 that with failed status.
Now, the reliability is calculated on the basis of MIPS, ram
and bandwidth, as every host is having all the above
mentioned parameters greater than the virtual machine, so this
gives them advantage. Along with that, because of the failure
of the cloudlet 4, reliability of the VM1 decreases. Now the
virtual machines are checked so as to know which of them
have reliability greater than or equal to SRL. Here, both the
virtual machines (2 and 3) have reliability greater than or
equal to SRL, so both are now passed. At last, for selecting the
most reliable machine, virtual machines are compared
according to their priorities. As virtual machine 3 is having
higher priority than 2, so it is declared as the more reliable
machine at the end.
MIPS of VM1:250, VM2:300, Vm3:400
In this case (Figure 3), all the cloudlets run on the three virtual
machines. Cloudlet id 6 gets failed and is moved to the
elasticity calculation module, where it is given elasticity. So it
gets passed. Hence, now we have no failed cloudlets.
Figure 3: MIPS of VM1 changed- VM1: 250, VM2: 300,
VM3: 400
Then, the reliability is calculated on the basis of MIPS, ram
and bandwidth, as every host is having all the above
mentioned parameters greater than the virtual machine, so this
gives them advantage. Along with that, because all the
cloudlets get succeeded, so reliability increases as well. Now
the virtual machines are checked so as to know which of them
have reliability greater than or equal to SRL. Here, all the
virtual machines (1, 2 and 3) have reliability greater than or
equal to SRL, so all are considered passed. At last, for
selecting the most reliable machine, virtual machines are
compared according to their priorities. As virtual machine 3 is
having higher priority than 1 and 2, so it is declared as the
more reliable machine at the end.
Case 2: Now, only the MIPS value of VM2 is changed,
rest are kept same.
MIPS of VM1:200, VM2:250, VM3:400
In this case (Figure 4), all the cloudlets run on the three
virtual machines.
Figure 4: MIPS of VM2 changed- VM1: 200, VM2: 250,
VM3: 400
Cloudlet id 6 and 4 gets failed and both the cloudlets are
moved to the elasticity calculation module, where cloudlet id 6
is given elasticity. So it gets passed. After that we have only
cloudlet id 4 that with failed status. Now, the reliability is
calculated on the basis of MIPS, ram and bandwidth, as every
host is having all the above mentioned parameters greater than
the virtual machine, so this gives them advantage. Along with
that, because of the failure of the cloudlet 4, reliability of the
VM1 decreases. Now the virtual machines are checked so as
to know which of them have reliability greater than or equal to
SRL. Here, both the virtual machines (2 and 3) have reliability
greater than or equal to SRL, so both are now passed. At last,
for selecting the most reliable machine, virtual machines are
compared according to their priorities. As virtual machine 3 is
having higher priority than 2, so it is declared as the more
2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 141
reliable machine at the end.
MIPS of VM1:200, VM2:410, VM3:400
In this case (Figure 5), we have 4 cloudlets running on two
virtual machines. Out of all the cloudlets that are running,
cloudlet 4 gets failed and then it is passed to the elasticity
calculation module.
Figure 5: MIPS of VM2 changed- VM1: 200, VM2: 410,
VM3: 400
As the cloudlet is not liable of getting the elasticity, so it is
ultimately declared as failed. Because of the failing of the
cloudlet, the reliability of the virtual machine hence
decreases. At last, Vm2 and Vm3 are declared as passed, as
they are having reliability equal to or greater than SRL. At
the end, VM 2 is considered as the passed machine as it is
having higher priority than the other.
Case 3: Now, only the MIPS value of VM3 is changed,
rest are kept same.
MIPS of VM1:200, VM2:300, VM3:215
Figure 6: MIPS of VM3 changed- VM1: 200, VM2: 300,
VM3: 215
In this case (Figure 6), all the cloudlets (1 to 6) get run on the
three virtual machines. Out of all the cloudlets, cloudlet 4, 5
and 6 gets failed as they are unable to perform their task
within prescribed time. So, they are passed on to the
elasticity calculation module, where they are given the
chance to become successful. But, as they are not liable of
getting the extra CPU cycles .i.e. they need more than 15
cycles to complete, their task, so they are declared as failed.
Because of their failure the reliability of corresponding
virtual machines decreases. At the end, we are having only
one virtual machine which is having reliability equal to or
greater than SRL, so it is declared as the most reliable
machine.
MIPS of VM1:200, VM2:300, VM3:450
In this case (Figure 7), we have 6 cloudlets running on three
virtual machines. One cloudlet gets failed. Hence it is passed
to the EC module.
Figure 7: MIPS of VM3 changed- VM1: 200, VM2: 300,
VM3: 450
As, it is not liable to getting 15 more CPU cycle, i.e. it
requires more than 15 cpu cycles to complete its task, so it is
declared as failed. Then at the end, we have two machines
having the same reliability. Winning machine is selected
according to priority in this situation. As VM3 is having
highest MIPS, so it is declared as the more reliable machine.
VI. DISCUSSIONS AND CONCLUSION
Fault tolerance is the capacity of the system to work normally
even in the presence of any fault in the system. Here, a novel
model named as HAFTRC (High Adaptive Fault Tolerance in
Real-time Cloud computing) has been proposed. This model
works on the principle of adaptive fault tolerance. There are
two main modules of the system. One set of module consists
of the Virtual Machines on which certain tasks or cloudlets
run along with acceptance module. Other set of module is
used for elasticity calculation, reliability calculation and
decision making.
142 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence)
The HAFTRC is a very reliable option as it can be used for
fault tolerance for all real time computing applications. The
main advantage that this model provides is that it continues to
function even if some of the cloudlets fail. It only stops when
all the tasks get failed.
VII. FUTURE WORK
Like any other model, HAFTRC can also be enhanced up to
an extent so that it performs well in cloud computing
environment.
Some more features or parameters can be added to this model
so as to render it more fault tolerant.
More parameters can be added so as to check the reliability of
the virtual machines, like here we have added, MIPS, ram
kind of features. In future some more parameters like PE’s,
number of users, hosts etc can be added..
In decision making module, here, we have used priority
scheduling to make the selection of more reliable machine. In
future, some other technique can be used to make the system
more fault tolerant.
Also, the concept of elasticity in terms of CPU cycles can be
enhanced by increasing it up to a certain level.
Along with that, the concept of check pointing can be
introduced which allows the user to hold the record of virtual
machines which gets failed so that in future these failed
machines can be retrieved easily.
REFERENCES
[1] Mell Peter, Grance Timothy (2011), The NIST Definition
of Cloud Computing,September,p.7
[2] Harris Torry (2010), CLOUD COMPUTING-An Overview,
Torry Harris Business Solutions, January
[3] Velte Anthony T., Velte Toby J., Elsenpeter Robert (2009)
Cloud Computing: A Practical Approach, Tata McGraw Hill
[4] http://my.safaribooksonline.com/book/software-engineering-
and-development/ 9788131700693/introduction/section_1.2
[5] W. T. Tsai, Q. Shao, X. Sun, J. Elston, “Real Time
ServiceOriented Cloud Computing”, School of Computing,
Informatics and Decision System Engineering Arizona State
University USA,
[6] J .Coenen, J. Hooman, “A Formal Approach to Fault Tolerance
in Distributed Real-Time Systems”, Department of
Mathematics and Computing Science, Eindhoven University of
Technology, Nether land
[7] Sheheryar Malik and Fabrice Huet, (2011) “Adaptive Fault
tolerance in Real Time Cloud Computing”, 2011 IEEE World
Congress on services, (pp. 28-287)
[8] Anjali D. Meshram, A.S. Sambare and S.D.Zade, (2013) “Fault
Tolerance Model for Reliable Cloud Computing”,
International Journal on Recent and Innovation Trends in
Computing and Communication Volume:1 Issue:7, (pp. 600-
603)
[9] Sheheryar Malik and M.J. Rehman,(2005) “Time Stamped
Fault Tolerance in Real Time systems”, 9th International
Multitopic Conference, IEEE INMIC 2005, (pp. 1-5)
[10] M. Young, The Technical Writer's Handbook. Mill Valley, CA:
University Science, 1989.
[11] Anjandeep Kaur Rai, Parveen Kumar, Pradheep
Manisekaran, (2014) ”High Adaptive Fault Tolerance in
Cloud Computing”, IOSR Journal of Engineering
(IOSRJEN) Vol. 04, Issue 03 (March. 2014), (pp. 24-27)
[12] https://code.google.com/p/cloudsim/downloads/detail?name=clo
udsim-3.0.2.zip& can=2&q=
[13] Raj Gaurav, Munish Katoch, "Security Implementation
through PCRE Signature over Cloud Network", Advanced
Computing: An International journal, May 2012, Vol. 3 No. 3
ISSN: 2229 - 6727[Online]; 2229 - 726X [Print],pg no. 119-
127.
[14] Raj Gaurav, Nitika, shaveta,"Comparative Analysis of Load
Balancing Algorithms in Cloud Computing", International
Journal of Advanced Research in Computer Engineering &
Technology, Vol. 1 No. 3 (2012)(ISSN:2278-1323), pg no. 120
-124.
[15] Raj Gaurav, Ankit Nischal, "Efficient Resource Allocation in
Resource Provisioning Policies Over Resource Cloud
Communication Paradigm", International Journal on Cloud
Computing: Services and Architecture, June 2012, Vol. 2, No.
3, ISSN: 2231 - 5853[Online]; 2231 - 6663 [Print], pg no. 11 -
18.
[16] Raj Gaurav, Kamaljeet Kaur, "Secure Cloud Communication
for Effective Cost Management System Through MSBE",
International Journal on Cloud Computing: Services and
Architecture, June 2012, Vol. 2, No. 3, ISSN: 2231 -
5853[Online]; 2231 - 6663 [Print], pg. no. 19 - 30.
2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 143

Fault tolerance on cloud computing

  • 1.
    A Novel HighAdaptive Fault Tolerance Model in Real Time Cloud Computing Parveen Kumar Asst. Professor, Computer Science & Engineering Department, National Institute of Technology, Uttarakhand, India [email protected] Gaurav Raj Asst. Professor, ASET-CSE Amity University, Noida. raj@ Anjandeep Kaur Rai Department of Computer Science and Engineering, Lovely Professional University Phagwara, India [email protected] Abstract—Now-a-days, cloud computing is being used in a variety of fields, whether it is storage, computation, education etc. It has emerged from a larger number of technologies like utility computing, grid computing, cluster computing etc. It offers a number of advantages like on-demand access, resource pool, device independence etc. Also, it suffers from various cons like security, workflow management, fault tolerance. Here, a novel model (HAFTRC) has been proposed, which is providing high adaptive fault tolerance in real time cloud computing. The model is based on computing the reliabilities of the virtual machines on the basis of the cloudlets, mips, ram and bandwidth etc. Whosoever virtual machine has the highest reliability is chosen as the winning virtual machine. If at the end, there are two virtual machines, whose reliabilities comes out to be same ,then the winning machine is chosen base on the priority that is assigned to them. Keywords: — Reliability, fault tolerance, priority scheduling, timeliness. I. INTRODUCTION Cloud computing refers to a model that provides a broad, scalable and always available access to a variety of resources like infrastructure, platform, software, storage etc over the internet that can be accessed by the cloud users according to their requirements [1]. It actually supports the reusability of such resources across the boundaries of particular organizations [2]. For example, with cloud computing, user don’t need to install Microsoft Word on his 45 workstations, instead he just needs to have access to the internet and he can have the required software that is hosted on some other location and also he don’t need to worry about the licensing as well as installation or application of patches etc [3]. Real time systems are being employed in variety of applications like railway reservation system, small mobile phones, robotics, laser printers, pacemakers, video conferencing etc [4] . Real time systems have two main characteristics viz timeliness and fault tolerance [5].Timeliness denotes the property of the system to work correctly in the prescribed amount of time and fault tolerance is the ability of the system to work gracefully even in the presence of the fault, so that the user doesn’t get to know that any fault has occurred in the system [6]. Cloud provides minimum lag and maximum performance to such system, but also on the same side it increases the chances of errors in the systems as the nodes are located very far from each other [7]. Real time systems are also very critical in nature, so they need such mechanism which will allow them to work even if something mishappens in the system. So, the need of the hour is to have such mechanism which will allow the system to work well in the cloud. Here, a model HAFTRC has been proposed for providing high fault tolerance to real time application in the cloud infrastructure. II. RELATED WORK A large amount of work has been already done in the area of real time systems, but there is large research space for fault tolerance in real time systems on cloud infrastructure. Anjali D. Meshram et al., [8] presented FTMC (Fault Tolerance Model for Cloud computing) according to which virtual machines are made to run different algorithms, and their respective reliabilities are calculated based on whether the virtual machines produce the correct result and that also within the time. If they do so, then their reliability increases and similarly decreases as well, if they can’t do so. Also, a checkpoint has been added up in the model so that backward recovery can be performed in case of complete failure of the system. Sheheryar Malik et al., [7], presented a model which makes the system handle the fault and makes the decision according to the reliability of the virtual machines. Moreover, the reliability of virtual machines is adaptive in nature i.e. it changes after every computing cycle. Virtual machine’s reliability increases if it produces the correct result and within time and also it decreases if it fails to do so. In addition to it, if the node’s reliability goes on decreasing, then it is removed and a new node is added in its place. Reliability of every virtual machine is checked against a minimum reliability level; if that level is achieved by node, then it is fine otherwise the system will perform backward recovery. Sheheryar Malik et al. [8] gave a model which is based on the idea of time stamped fault tolerance. In this model, methodology related to distributed computing along with feed forward artificial neural network has been adopted. It comprises of forward as well as backward 138978-1-4799-4236-7/14/$31.00 c 2014 IEEE
  • 2.
    Figure 1: ProposedModel (HAFTRC) [11] recovery mechanism. Weights are assigned to the nodes that are made to run a variety of algorithms. Proposed model is based on the adaptive reliability of the virtual machine, i.e. the reliability of the node changes after every computing cycle. Fault tolerance has been achieved depending on the reliability of the virtual machines. III. PROPOSED MODEL (HAFTRC) Here, a model HAFTRC (High Adaptive Fault Tolerance in Real time Cloud computing) has been introduced (Figure 1). This model handles the fault on the basis of the reliability of the virtual machines. This model consists of two types of nodes: One node consists of a set of virtual machines and acceptance module and the second node: adjudicator node consists of three nodes for elasticity calculation, reliability calculation and decision making. A. Working This model comprises of ‘N’ virtual machines or nodes which are made to run some operation. Then we have Acceptance Module (AM) which is responsible for verifying whether the output that has been produced by virtual machine is correct and that too within time limit or not. On the basis of result that is produced by the AM, Elasticity Calculation module (EC) checks whether the failed cloudlets are liable of having some elasticity in terms of CPU cycles. If they are, then they are declared as passed, otherwise fail. Then, we have Reliability calculation (RC) module, which is responsible for calculating the reliabilities of the virtual machines Also, the virtual machine’s reliability are matched with the System Reliability Level (SRL).The nodes which have reliability equal to or greater than SRL, are passed to the Decision Making module. Decision Making (DM) module makes the final decision of selecting the most reliable node by considering the reliabilities of the passed virtual machines given by RC module. The node with highest reliability is selected as the final output. If two nodes have same higher reliability, then winning virtual machine is selected according to priority. Priority is assigned according to MIPS. B. Model Description Acceptance Module (AM) is responsible for two things: first it is checking whether the operation that has been performed is correct or not. Secondly, it makes sure that the operation has been performed in a prescribed amount of time. Each node or the virtual machine takes a particular input, executes the operation and then produces the output. It only passes the result of all the nodes to the elasticity module. It also informs the Elasticity Calculation (EC) module to determine whether elasticity can be provided to the nodes in terms of CPU cycles. Elasticity Calculation(EC) module analyses the cloudlets and then determines whether the cloudlets are applicable to have elasticity of 15 CPU cycles or not. If the cloudlet is applicable to have elasticity then it is given so and then its fail status is changed to pass. Using this approach we can have more successful cloudlets than failed ones. If the cloudlet is not applicable to get elasticity then it is simply discarded and is declared as fail. Reliability Calculation (RC) module is actually the heart of this model. This module is responsible for analyzing the reliabilities of each node. The reliability of virtual machine is adaptive, that is it changes after every 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 139
  • 3.
    computing cycle. Reliabilityof virtual machine increases if any of the condition becomes true: • The amount of ram in host should be greater than the amount of ram in each virtual machine. • The amount of MIPS in host should be greater than the amount of MIPS in each virtual machine. • The bandwidth in host should be greater than the bandwidth in each virtual machine. • If all of the cloudlet gets succeeded then reliability of the virtual machine increases. Reliability of virtual machine decreases when any of the above defined factors gets failed or in case if any of the cloudlet fails, then the reliability of the virtual machine on which it is running decreases by some extent. More the failed cloudlets more will be the decrease in the reliability. Decision Making (DM) selects the virtual machine which is having the highest reliability among all the nodes. If two nodes are having same highest reliability, then the winning virtual machine will be selected according to priority assigned. The node with highest priority will be selected as the more reliable node and then will be considered as winner. Priority of the virtual machine is according to the MIPS of the virtual machine, i.e. the node with highest MIPS is given the highest priority and so on. This model is very reliable as it continues to operate even if one of the nodes fails i.e. until all the nodes fail. IV. FAULT TOLERANCE MECHANISM Here, the algorithms of various nodes have been discussed. Algorithm for Acceptance Module (AM) If (cloudlet.Status=Success && cloudlet finishes in prescribed time) Then The cloudlet will move to the next stage Algorithm for Elasticity Calculation (EC) Module If cloudlet needs 15 more CPU cycles to complete its execution Then Cloudlet is allowed to complete its execution Else The cloudlet is designated as failed and is not allowed to move to next level Algorithm for Reliability Calculation (RC) module If (total amount of ram, MIPS and bandwidth in the host is less than amount of ram, mips and bandwidth in each virtual machine) Then Reliability decreases Else if cloudlet fails Then Reliability decreases Else Reliability increases Algorithm for Decision Making (DM) Module If (first machine is having higher reliability than the second machine) Then First machine is declared as the reliable machine Else if (two machines have the same highest reliability) Then The machine with higher priority is declared as the best machine Else No machine is declared as the reliable machine V. EXPERIMENTS AND RESULTS The High Adaptive Fault Tolerance in Real Time Cloud Computing (HAFTRC) is implemented in CloudSim simulator. The version of CloudSim used is CloudSim 3.0.2.This is a bug free release. It has certain updates from the previous version of CloudSim 3.0.0 which are as follows: • The problem with ant class path declaration has been fixed. • Calculation of MIPS in PowerVmAllocationPolicy Migrationbstrac.findHostForVm () has been acknowledged. • References have been updated to CCPE paper [12]. Here, 3 virtual machines have been created and two tasks on each virtual machine are made to run, i.e. we have total of 6 tasks or cloudlets. In this simulation, certain parameters have been assumed which are as under: • SRL (System Reliability Level) value is assumed to be 0.6. • Elasticity has been provide of 15 CPU cycles to failed cloudlets MIPS of virtual machines will be changed and results will be recorded. First case: Here we are going to change only the MIPS of VM1 MIPS of VM1:200, VM2:300, Vm3:400 In this case (Figure 2), all the cloudlets run on the three virtual machines. Cloudlet id 6 and 4 gets failed and both the cloudlets are moved to the elasticity calculation module, where cloudlet id 6 is given elasticity. So it gets passed. 140 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence)
  • 4.
    Figure 2: MIPSof VM1 changed- VM1: 200, VM2: 300, VM3: 400 After that we have only cloudlet id 4 that with failed status. Now, the reliability is calculated on the basis of MIPS, ram and bandwidth, as every host is having all the above mentioned parameters greater than the virtual machine, so this gives them advantage. Along with that, because of the failure of the cloudlet 4, reliability of the VM1 decreases. Now the virtual machines are checked so as to know which of them have reliability greater than or equal to SRL. Here, both the virtual machines (2 and 3) have reliability greater than or equal to SRL, so both are now passed. At last, for selecting the most reliable machine, virtual machines are compared according to their priorities. As virtual machine 3 is having higher priority than 2, so it is declared as the more reliable machine at the end. MIPS of VM1:250, VM2:300, Vm3:400 In this case (Figure 3), all the cloudlets run on the three virtual machines. Cloudlet id 6 gets failed and is moved to the elasticity calculation module, where it is given elasticity. So it gets passed. Hence, now we have no failed cloudlets. Figure 3: MIPS of VM1 changed- VM1: 250, VM2: 300, VM3: 400 Then, the reliability is calculated on the basis of MIPS, ram and bandwidth, as every host is having all the above mentioned parameters greater than the virtual machine, so this gives them advantage. Along with that, because all the cloudlets get succeeded, so reliability increases as well. Now the virtual machines are checked so as to know which of them have reliability greater than or equal to SRL. Here, all the virtual machines (1, 2 and 3) have reliability greater than or equal to SRL, so all are considered passed. At last, for selecting the most reliable machine, virtual machines are compared according to their priorities. As virtual machine 3 is having higher priority than 1 and 2, so it is declared as the more reliable machine at the end. Case 2: Now, only the MIPS value of VM2 is changed, rest are kept same. MIPS of VM1:200, VM2:250, VM3:400 In this case (Figure 4), all the cloudlets run on the three virtual machines. Figure 4: MIPS of VM2 changed- VM1: 200, VM2: 250, VM3: 400 Cloudlet id 6 and 4 gets failed and both the cloudlets are moved to the elasticity calculation module, where cloudlet id 6 is given elasticity. So it gets passed. After that we have only cloudlet id 4 that with failed status. Now, the reliability is calculated on the basis of MIPS, ram and bandwidth, as every host is having all the above mentioned parameters greater than the virtual machine, so this gives them advantage. Along with that, because of the failure of the cloudlet 4, reliability of the VM1 decreases. Now the virtual machines are checked so as to know which of them have reliability greater than or equal to SRL. Here, both the virtual machines (2 and 3) have reliability greater than or equal to SRL, so both are now passed. At last, for selecting the most reliable machine, virtual machines are compared according to their priorities. As virtual machine 3 is having higher priority than 2, so it is declared as the more 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 141
  • 5.
    reliable machine atthe end. MIPS of VM1:200, VM2:410, VM3:400 In this case (Figure 5), we have 4 cloudlets running on two virtual machines. Out of all the cloudlets that are running, cloudlet 4 gets failed and then it is passed to the elasticity calculation module. Figure 5: MIPS of VM2 changed- VM1: 200, VM2: 410, VM3: 400 As the cloudlet is not liable of getting the elasticity, so it is ultimately declared as failed. Because of the failing of the cloudlet, the reliability of the virtual machine hence decreases. At last, Vm2 and Vm3 are declared as passed, as they are having reliability equal to or greater than SRL. At the end, VM 2 is considered as the passed machine as it is having higher priority than the other. Case 3: Now, only the MIPS value of VM3 is changed, rest are kept same. MIPS of VM1:200, VM2:300, VM3:215 Figure 6: MIPS of VM3 changed- VM1: 200, VM2: 300, VM3: 215 In this case (Figure 6), all the cloudlets (1 to 6) get run on the three virtual machines. Out of all the cloudlets, cloudlet 4, 5 and 6 gets failed as they are unable to perform their task within prescribed time. So, they are passed on to the elasticity calculation module, where they are given the chance to become successful. But, as they are not liable of getting the extra CPU cycles .i.e. they need more than 15 cycles to complete, their task, so they are declared as failed. Because of their failure the reliability of corresponding virtual machines decreases. At the end, we are having only one virtual machine which is having reliability equal to or greater than SRL, so it is declared as the most reliable machine. MIPS of VM1:200, VM2:300, VM3:450 In this case (Figure 7), we have 6 cloudlets running on three virtual machines. One cloudlet gets failed. Hence it is passed to the EC module. Figure 7: MIPS of VM3 changed- VM1: 200, VM2: 300, VM3: 450 As, it is not liable to getting 15 more CPU cycle, i.e. it requires more than 15 cpu cycles to complete its task, so it is declared as failed. Then at the end, we have two machines having the same reliability. Winning machine is selected according to priority in this situation. As VM3 is having highest MIPS, so it is declared as the more reliable machine. VI. DISCUSSIONS AND CONCLUSION Fault tolerance is the capacity of the system to work normally even in the presence of any fault in the system. Here, a novel model named as HAFTRC (High Adaptive Fault Tolerance in Real-time Cloud computing) has been proposed. This model works on the principle of adaptive fault tolerance. There are two main modules of the system. One set of module consists of the Virtual Machines on which certain tasks or cloudlets run along with acceptance module. Other set of module is used for elasticity calculation, reliability calculation and decision making. 142 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence)
  • 6.
    The HAFTRC isa very reliable option as it can be used for fault tolerance for all real time computing applications. The main advantage that this model provides is that it continues to function even if some of the cloudlets fail. It only stops when all the tasks get failed. VII. FUTURE WORK Like any other model, HAFTRC can also be enhanced up to an extent so that it performs well in cloud computing environment. Some more features or parameters can be added to this model so as to render it more fault tolerant. More parameters can be added so as to check the reliability of the virtual machines, like here we have added, MIPS, ram kind of features. In future some more parameters like PE’s, number of users, hosts etc can be added.. In decision making module, here, we have used priority scheduling to make the selection of more reliable machine. In future, some other technique can be used to make the system more fault tolerant. Also, the concept of elasticity in terms of CPU cycles can be enhanced by increasing it up to a certain level. Along with that, the concept of check pointing can be introduced which allows the user to hold the record of virtual machines which gets failed so that in future these failed machines can be retrieved easily. REFERENCES [1] Mell Peter, Grance Timothy (2011), The NIST Definition of Cloud Computing,September,p.7 [2] Harris Torry (2010), CLOUD COMPUTING-An Overview, Torry Harris Business Solutions, January [3] Velte Anthony T., Velte Toby J., Elsenpeter Robert (2009) Cloud Computing: A Practical Approach, Tata McGraw Hill [4] http://my.safaribooksonline.com/book/software-engineering- and-development/ 9788131700693/introduction/section_1.2 [5] W. T. Tsai, Q. Shao, X. Sun, J. Elston, “Real Time ServiceOriented Cloud Computing”, School of Computing, Informatics and Decision System Engineering Arizona State University USA, [6] J .Coenen, J. Hooman, “A Formal Approach to Fault Tolerance in Distributed Real-Time Systems”, Department of Mathematics and Computing Science, Eindhoven University of Technology, Nether land [7] Sheheryar Malik and Fabrice Huet, (2011) “Adaptive Fault tolerance in Real Time Cloud Computing”, 2011 IEEE World Congress on services, (pp. 28-287) [8] Anjali D. Meshram, A.S. Sambare and S.D.Zade, (2013) “Fault Tolerance Model for Reliable Cloud Computing”, International Journal on Recent and Innovation Trends in Computing and Communication Volume:1 Issue:7, (pp. 600- 603) [9] Sheheryar Malik and M.J. Rehman,(2005) “Time Stamped Fault Tolerance in Real Time systems”, 9th International Multitopic Conference, IEEE INMIC 2005, (pp. 1-5) [10] M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989. [11] Anjandeep Kaur Rai, Parveen Kumar, Pradheep Manisekaran, (2014) ”High Adaptive Fault Tolerance in Cloud Computing”, IOSR Journal of Engineering (IOSRJEN) Vol. 04, Issue 03 (March. 2014), (pp. 24-27) [12] https://code.google.com/p/cloudsim/downloads/detail?name=clo udsim-3.0.2.zip& can=2&q= [13] Raj Gaurav, Munish Katoch, "Security Implementation through PCRE Signature over Cloud Network", Advanced Computing: An International journal, May 2012, Vol. 3 No. 3 ISSN: 2229 - 6727[Online]; 2229 - 726X [Print],pg no. 119- 127. [14] Raj Gaurav, Nitika, shaveta,"Comparative Analysis of Load Balancing Algorithms in Cloud Computing", International Journal of Advanced Research in Computer Engineering & Technology, Vol. 1 No. 3 (2012)(ISSN:2278-1323), pg no. 120 -124. [15] Raj Gaurav, Ankit Nischal, "Efficient Resource Allocation in Resource Provisioning Policies Over Resource Cloud Communication Paradigm", International Journal on Cloud Computing: Services and Architecture, June 2012, Vol. 2, No. 3, ISSN: 2231 - 5853[Online]; 2231 - 6663 [Print], pg no. 11 - 18. [16] Raj Gaurav, Kamaljeet Kaur, "Secure Cloud Communication for Effective Cost Management System Through MSBE", International Journal on Cloud Computing: Services and Architecture, June 2012, Vol. 2, No. 3, ISSN: 2231 - 5853[Online]; 2231 - 6663 [Print], pg. no. 19 - 30. 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence) 143