5. On-demand delivery of IT resources over the Internet.
The delivery of computing services—including servers, storage, databases,
networking, software
13. SCALABLE COMPUTING OVER THE INTERNET
Over 60 Years , computing technology has undergone changes in
Machine architecture
Operating system platform
Network connectivity
workloads.
Shift from centralized computing to Parallel and distributed
computing for solving large scale problems over the internet.
Distributed computing becomes data-intensive and network-
centric.
Large-scale Internet applications leverage parallel & distributed
computing to Enhances quality of life and information services.
14. THE AGE OF INTERNET COMPUTING
Billions of people access the Internet daily, As a result, supercomputer sites and large data
centers must provide high-performance computing services to huge numbers of Internet users
concurrently.
The Linpack Benchmark, traditionally used to measure HPC performance, is not suitable
anymore.
This is because modern computing focuses more on handling a large number of tasks
efficiently rather than just solving complex equations quickly.
As a result , There is shift from High-performance computing (HPC) to High-throughput
computing (HTC) systems.
High-throughput computing (HTC) systems built with parallel and distributed computing
technologies.
To meet growing demand, data centers need better servers, storage, and high-bandwidth
networks.
15. THE PLATFORM EVOLUTION
1950–1970: Mainframe Era : Large computers (IBM 360, CDC 6400) used by
governments & businesses.
1960–1980: Minicomputer Era: Smaller, cost-effective systems (DEC PDP-11, VAX)
for businesses & universities.
1970–1990: Personal Computer (PC) Era : VLSI microprocessors led to the rise of
affordable PCs.
1980–2000: Portable & Wireless Computing Laptops, mobile devices, and early wireless
connectivity.
1990–Present: Cloud & High-Performance Computing Cloud computing, HPC, and
HTC powering large-scale applications.
17. TECHNOLOGIES FOR NETWORK-BASED SYSTEMS
1. Multi core CPUs and Multithreading Technologies
• Advances in CPU Processors
Today, advanced CPUs or microprocessor chips assume a multi core architecture with
dual, quad, six, or more processing cores. These processors exploit parallelism at ILP
and TLP levels.
Over the years , we can observe the changes in processor speed and network
bandwidth.
18. Multi-core CPU and many-core GPU processors can handle multiple instruction threads at different magnitudes today.
Following figure shows the architecture of a typical multicore processor.
Each core is essentially a processor with its own private cache (L1 cache).
Multiple cores are housed in the same chip with an L2 cache that is shared by all cores.
MULTICORE CPU AND MANY-CORE GPU ARCHITECTURES
22. 3. MEMORY, STORAGE, AND WIDE-AREA NETWORKING
Memory Technology
The growth of DRAM chip capacity has increased from 16 KB in 1976 to 64 GB in
2011.
This shows that memory chips have experienced a 4x increase in capacity every three
years.
Memory access time did not improve much in the past. In fact, the memory wall problem
is getting worse as the processor gets faster.
For hard drives, capacity increased from 260 MB in 1981 to 250 GB in 2004
23. Disks and Storage Technology
Beyond 2011, disks or disk arrays have exceeded 3 TB in capacity.
The rapid growth of flash memory and solid-state drives (SSDs) also impacts the future of HPC
and HTC systems.
A typical SSD can handle 300,000 to 1 million write cycles per block.
Power consumption, cooling, and packaging will limit large system development.
Wide-Area Networking
The rapid growth of Ethernet bandwidth has increased from 10 Mbps in
1979 to 1 Gbps in 1999, and 40 ~ 100 GE in 2011.
High-bandwidth networking increases the capability of building massively distributed systems.
Most data centers are using Gigabit Ethernet as the interconnect in their server clusters
24. System-Area Interconnects
The nodes in small clusters are mostly interconnected by an Ethernet switch or a local
area network (LAN).
Following figure shows, a LAN typically is used to connect client hosts to big servers
A storage area network (SAN) connects servers to network storage such as disk arrays.
Network attached storage (NAS) connects client hosts directly to the disk arrays.
25. 4.VIRTUAL MACHINES AND VIRTUALIZATION MIDDLEWARE
Virtual Machines
A Virtual Machine (VM) is a software-based emulation of a physical
computer.
The VM is built with virtual resources managed by a guest OS to run a specific
application.
Between the VMs and the host platform, we need to deploy a middleware
layer called a virtual machine monitor (VMM).
VMM is also called as hypervisor.
26. VM Primitive Operations
The VMs can be multiplexed between hardware machine.
A VM can be suspended and stored in stable storage.
A suspended VM can be resumed.
A VM can be migrated from one hardware platform to another.
27. 5. DATA CENTER VIRTUALIZATION FOR CLOUD COMPUTING
Cloud platforms choose the popular x86 processors. Low-cost terabyte disks and
Gigabit Ethernet are used to build data centers.
Data center design prioritizes overall efficiency and cost-effectiveness rather than
just maximizing processing speed.
A large data center may be built with thousands of servers. Smaller data centers are
typically built with hundreds of servers.
The cost to build and maintain data center servers has increased over the years.
Only 30 percent of data center costs goes toward purchasing IT equipment and
remaining costs goes to management and maintenance.
28. Convergence of Technologies
Hardware virtualization and multicore chips enable the existence of dynamic
configurations in the cloud.
Utility and grid computing technologies lay the necessary foundation for computing
clouds.
SOA, Web 2.0, and mashups of platforms are pushing the cloud another step forward.
Autonomic Computing
Data Center Automation
29. SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS
Service-Oriented Architecture (SOA)
Layered Architecture for Web Services and Grids
Web Services and Tools
The Evolution of SOA
Grids versus Clouds
38. GRIDS VERSUS CLOUDS
• The boundary between grids and clouds are getting blurred in recent
years.
• For web services, workflow technologies are used to coordinate or
orchestrate services with certain specifications used to define critical
business process models such as two-phase transactions.
• In general, a grid system applies static resources, while a cloud
emphasizes elastic resources.
• For some researchers, the differences between grids and clouds are limited
only in dynamic resource allocation based on virtualization and autonomic
computing.
• One can build a grid out of multiple clouds.
• This type of grid can do a better job than a pure cloud, because it can
explicitly support negotiated resource allocation.
• Thus one may end up building with a system of systems: such as a cloud of
clouds, a grid of clouds, or a cloud of grids, or inter-clouds as a basic SOA
architecture.
39. TRENDS TOWARD DISTRIBUTED OPERATING
SYSTEMS
• The computers in most distributed systems are loosely
coupled.
– This is mainly due to the fact that all node machines run with an
independent operating system.
• To promote resource sharing and fast communication among node
machines, it is best to have a distributed OS that manages all
resources coherently and efficiently.
• Such a system is most likely to be a closed system, and it will likely
rely on message passing and RPCs for internode communications.
– It should be pointed out that a distributed OS is crucial for upgrading
the performance, efficiency, and flexibility of distributed applications.
42. MESSAGE-PASSING INTERFACE (MPI)
• This is the primary programming standard used to develop
parallel and concurrent programs to run on a distributed
system.
– MPI is essentially a library of subprograms that can be
called from C or FORTRAN to write parallel programs
running on a distributed system.
• The idea is to embody clusters, grid systems, and P2P
systems with upgraded web services and utility computing
applications.
– Besides MPI, distributed programming can be also supported
with low-level primitives such as the Parallel Virtual
Machine (PVM).
43. MAPREDUCE
• MapReduce is a web programming model for
scalable data processing on large clusters over large
data sets.
– The model is applied mainly in web-scale search and
cloud computing applications.
• The user specifies a Map function to generate a set
of intermediate key/value pairs.
• Then the user applies a Reduce function to merge
all intermediate values with the same intermediate
key.
44. • MapReduce is highly scalable to explore high
degrees of parallelism at different job levels.
• A typical MapReduce computation process can
handle terabytes of data on tens of thousands or
more client machines:
– Hundreds of MapReduce programs can be executed
simultaneously; in fact, thousands of MapReduce jobs
are executed on Google’s clusters every day.
46. HADOOP LIBRARY
• Hadoop offers a software platform that was originally
developed by a Yahoo! group.
• The package enables users to write and run applications over vast
amounts of distributed data.
• Users can easily scale Hadoop to store and process petabytes of data
in the web space.
– Also, Hadoop is economical in that it comes with an open source version
of MapReduce that minimizes overhead in task spawning and massive
data communication.
• It is efficient, as it processes data with a high degree of parallelism
across a large number of commodity nodes, and it is reliable in that
it automatically keeps multiple data copies to facilitate redeployment
of computing tasks upon unexpected system failures.