Operating System
Chapter 1
Presented By:- Dr. Sanjeev Sharma
What is Operating system
• It is a control program that provides an interface between the
computer hardware and the user.
• Part of this interface includes tools and services for the user.
• From Silberschatz (page 3): “An operating system is a
program that acts as an intermediary between a user of
computer and computer hardware. The purpose of the OS is
provide an environment in which the user can execute
programs.
• The primary goal of an OS is thus to make the computer
convenient to use.
• A secondary goal is to use the computer hardware in an
efficient manner.”
Abstract view of the computer System
• Computer Hardware – CPU, memory, I/O devices provide
basic computing resources.
• System and Application Programs – Compilers, database
systems, games, business programs, etc. define the ways the
computing resources are used to solve the user’s problems.
• Operating System – Controls and coordinates the computing
resources among the system and application programs for the
users.
• End User – Views the computer system as a set of
applications. The End User is generally not concerned with
various details of the hardware.
• Programmer – Uses languages, utilities (frequently used
functions) and OS services (linkers, assemblers, etc.) to
develop applications instead. This method is used to reduce
complexity by abstracting the detail of machine dependant
calls into APIs and various utilities and OS services.
• OS – Masks the hardware details from the programmer and
provides an interface to the system. Manages the computers
resources. The OS designer has to be familiar with user
requirements and hardware details.
Functions of Operating System
• Memory Management
• Processor Management
• Device Management
• File Management
• Security
• Control over system performance
• Job accounting
• Error detecting aids
• Coordination between other software and users
• Memory management refers to management of Primary
Memory or Main Memory.
• Main memory is a large array of words or bytes where each
word or byte has its own address.
• Main memory provides a fast storage that can be accessed
directly by the CPU.
• For a program to be executed, it must in the main memory.
• An Operating System does the following activities for memory
management:
– Keeps tracks of primary memory, i.e., what part of it are in use
by whom, what parts are not in use?
– In multiprogramming, the OS decides which process will get
memory when and how much.
– Allocates the memory when a process requests it to do so.
– De-allocates the memory when a process no longer needs it or
has been terminated.
Processor Management
• In multiprogramming environment, the OS decides which
process gets the processor when and for how much time. This
function is called process scheduling. An Operating System
does the following activities for processor management:
– Keeps tracks of processor and status of process. The program
responsible for this task is known as traffic controller.
– Allocates the processor (CPU) to a process.
– De-allocates processor when a process is no longer required
Device Management
• An Operating System manages device communication via their
respective drivers. It does the following activities for device
management:
– Keeps tracks of all devices. The program responsible for this
task is known as the I/O controller.
– Decides which process gets the device when and for how much
time.
– Allocates the device in the most efficient way.
– De-allocates devices.
File Management
• A file system is normally organized into directories for easy
navigation and usage. These directories may contain files and
other directions.
• An Operating System does the following activities for file
management:
– Keeps track of information, location, uses, status etc. The
collective facilities are often known as file system.
– Decides who gets the resources.
– Allocates the resources.
– De-allocates the resources.
Other Important Activities
• Security - By means of password and similar other techniques,
it prevents unauthorized access to programs and data.
• Control over system performance - Recording delays
between request for a service and response from the system.
• Job accounting -- Keeping track of time and resources used
by various jobs and users.
• Error detecting aids - Production of dumps, traces, error
messages, and other debugging and error detecting aids.
• Coordination between other software and users -
Coordination and assignment of compilers, interpreters,
assemblers and other software to the various users of the
computer systems.
Operating System
Presented By:- Dr. Sanjeev Sharma
Operating system Evolution
• Let’s see how operating systems evolve over time.
• This will help us to identify some common features of
operating systems and how and why these systems have been
developed as they are.
• Serial Processing
• Simple Batch Systems (1960)
• Multiprogrammed Batch Systems (1970)
• Time-Sharing and Real-Time Systems (1970)
• Personal/Desktop Systems (1980)
• Multiprocessor Systems (1980)
• Networked/Distributed Systems (1980)
Early System
• Structure
– Single user system.
– Large machines run from console.
– Programmer/User as operator.
– Paper Tape or Punched cards.
– No tapes/disks in computer.
• Significant amount of setup time.
• Low CPU utilization.
• But very secure
Batch Operating System
• The users of a batch operating system do not interact with the
computer directly. Each user prepares his job on an off-line
device like punch cards and submits it to the computer
operator. To speed up processing, jobs with similar needs are
batched together and run as a group. The programmers leave
their programs with the operator and the operator then sorts the
programs with similar requirements into batches.
• The problems with Batch Systems are as follows:
– Lack of interaction between the user and the job.
– CPU is often idle, because the speed of the mechanical I/O
devices is slower than the CPU.
– Difficult to provide the desired priority.
• Here First the pooled jobs are read and executed by the batch
monitor, and then these jobs are grouped; placing the identical
jobs (jobs with the similar needs) in the same batch, So, in the
batch processing system, the batched jobs were executed
automatically one after another saving its time by performing
the activities (like loading of compiler) only for once. It
resulted in improved system utilization due to reduced turn
around time.
• The operating systems (called resident monitor) manages the
execution of each program in the batch.
– Monitor utilities are loaded when needed.
– Resident monitor is always in main memory and available for
execution.
– The resident monitor usually has the following part.
• Control card interpreter – responsible for reading and
carrying out instructions on the cards.
• Loader – loads systems programs and applications
programs into memory.
• Device drivers – know special characteristics and
properties for each of the system’s I/O devices.
• One big problem associate with these OS is CPU was often
idle.
• To over come this spooling can be used.
• Uniprogramming Until Now
– I/O operations are exceedingly slow (compared to instruction
execution).
– A program containing even a very small number of I/O
operations will spend most of its time waiting for them.
– Hence: poor CPU usage when only one program is present in
memory.
Memory model for uniprogramming
Multiprogrammed Batch Systems
• Several jobs are kept in main memory at the same time, and
the CPU is multiplexed among them.
• If memory can hold several programs, then CPU can switch to
another one whenever a program is waiting for an I/O to
complete – This is multiprogramming.
Time-sharing Operating Systems
• Time-sharing is a technique which enables many people,
located at various terminals, to use a particular computer
system at the same time.
• Time-sharing or multitasking is a logical extension of
multiprogramming.
• Processor's time which is shared among multiple users
simultaneously is termed as time-sharing.
• The main difference between Multiprogrammed Batch
Systems and Time-Sharing Systems is that in case of
Multiprogrammed batch systems, the objective is to maximize
processor use, whereas in Time-Sharing Systems, the objective
is to minimize response time.
• Multiple jobs are executed by the CPU by switching between
them, but the switches occur so frequently. Thus, the user can
receive an immediate response. For example, in a transaction
processing, the processor executes each user program in a
short burst or quantum of computation. That is, if n users are
present, then each user can get a time quantum. When the user
submits the command, the response time is in few seconds at
most.
• Advantages of Timesharing operating systems are as follows:
– Provides the advantage of quick response
– Avoids duplication of software
– Reduces CPU idle time
• Disadvantages of Time-sharing operating systems are as
follows:
– Problem of reliability
– Question of security and integrity of user programs and data
– Problem of data communication
Distributed Operating System
• Distribute the computation among several physically separated
processors.
• Loosely coupled system – each processor has its own local
memory; processors communicate with one another through
various communications lines, such as high-speed buses or
telephone lines.
• These processors are referred as sites, nodes, computers, and
so on.
• The advantages of distributed systems are as follows:
– With resource sharing facility, a user at one site may be able to
use the resources available at another.
– Speedup the exchange of data with one another via electronic
mail.
– If one site fails in a distributed system, the remaining sites can
potentially continue operating.
– Better service to the customers.
– Reduction of the load on the host computer.
– Reduction of delays in data processing.
Network Operating System
• A Network Operating System runs on a server and provides
the server the capability to manage data, users, groups,
security, applications, and other networking functions. The
primary purpose of the network operating system is to allow
shared file and printer access among multiple computers in a
network, typically a local area network (LAN), a private
network or to other networks.
•
• Examples of network operating systems include Microsoft
Windows Server 2003, Microsoft Windows Server 2008,
UNIX, Linux, Mac OS X, Novell NetWare
• The advantages of network operating systems are as follows:
– Centralized servers are highly stable.
– Security is server managed.
– Upgrades to new technologies and hardware can be easily
integrated into the system.
– Remote access to servers is possible from different locations and
types of systems.
• The disadvantages of network operating systems are as follows:
– High cost of buying and running a server.
– Dependency on a central location for most operations.
– Regular maintenance and updates are required.
Real-Time Operating System
• A real-time system is defined as a data processing system in which the time
interval required to process and respond to inputs is so small that it controls
the environment. The time taken by the system to respond to an input and
display of required updated information is termed as the response time. So
in this method, the response time is very less as compared to online
processing.
•
• Real-time systems are used when there are rigid time requirements on the
operation of a processor or the flow of data and real-time systems can be
used as a control device in a dedicated application. A real-time operating
system must have well-defined, fixed time constraints, otherwise the
system will fail. For example, Scientific experiments, medical imaging
systems, industrial control systems, weapon systems, robots, air traffic
control systems, etc.
• There are two types of real-time operating systems.
• Hard real-time systems
Hard real-time systems guarantee that critical tasks complete on time. In
hard real-time systems, secondary storage is limited or missing and the data
is stored in ROM. In these systems, virtual memory is almost never found.
• Soft real-time systems
• Soft real-time systems are less restrictive. A critical real-time task gets
priority over other tasks and retains the priority until it completes. Soft real-
time systems have limited utility than hard real-time systems. For example,
multimedia, virtual reality, Advanced Scientific Projects like undersea
exploration and planetary rovers, etc.
Operating System
Concurrent Process and Scheduling
Process Concept
• A process is a program in execution. Process is not as same as
program code but a lot more than it. A process is an 'active'
entity as opposed to program which is considered to be a
'passive' entity. Attributes held by process include hardware
state, memory, CPU etc.
• To put it in simple terms, we write our computer programs in a
text file and when we execute this program, it becomes a
process which performs all the tasks mentioned in the program.
• When a program is loaded into the memory and it becomes a
process, it can be divided into four sections ─ stack, heap, text
and data.
Process Section
• Stack:- The process Stack contains the temporary data such as
method/function parameters, return address and local
variables.
• Heap:-This is dynamically allocated memory to a process
during its run time.
• Text section is made up of the compiled program code, read in
from non-volatile storage when the program is launched..
• Data:-This section contains the global and static variables.
Process Section
Process State
• When a process executes, it passes through different states. These stages
may differ in different operating systems, and the names of these states are
also not standardized.
• In general, a process can have one of the following five states at a time.
• New:- This is the initial state when a process is first started/created.
• Ready:- The process is waiting to be assigned to a processor. Ready
processes are waiting to have the processor allocated to them by the
operating system so that they can run. Process may come into this state
after Start state or while running it by but interrupted by the scheduler to
assign CPU to some other process.
• Running:-Once the process has been assigned to a processor by the OS
scheduler, the process state is set to running and the processor executes its
instructions.
• Waiting
• Process moves into the waiting state if it needs to wait for a
resource, such as waiting for user input, or waiting for a file to
become available.
• Terminated or Exit
• Once the process finishes its execution, or it is terminated by
the operating system, it is moved to the terminated state where
it waits to be removed from main memory.
Diagram of Process State
Process Control Block (PCB)
A Process Control Block is a data structure maintained by the
Operating System for every process. The PCB is identified by an
integer process ID (PID). A PCB keeps all the information needed to
keep track of a process
Information associated with each process
• Process state:-The current state of the process i.e., whether it is
ready, running, waiting, or whatever.
• Program counter:- Program Counter is a pointer to the address of
the next instruction to be executed for this process.
• CPU registers:- Various CPU registers where process need to be
stored for execution for running state.
• CPU scheduling information:- Process priority and other
scheduling information which is required to schedule the process.
• Memory-management information:-This includes the
information of page table, memory limits, Segment table
depending on memory used by the operating system.
• Accounting information:” -This includes the amount of CPU
used for process execution, time limits, execution ID etc.
• I/O status information:-This includes a list of I/O devices
allocated to the process.
• The PCB is maintained for a process throughout its lifetime,
and is deleted once the process terminates.
Process Control Block
CPU Switch from Process to Process
Process Scheduling Queues
• The OS maintains all PCBs in Process Scheduling Queues. The OS
maintains a separate queue for each of the process states and PCBs
of all processes in the same execution state are placed in the same
queue. When the state of a process is changed, its PCB is unlinked
from its current queue and moved to its new state queue.
• The Operating System maintains the following important process
scheduling queues −
• Job queue − This queue keeps all the processes in the system.
• Ready queue − This queue keeps a set of all processes residing in
main memory, ready and waiting to execute. A new process is
always put in this queue.
• Device queues − The processes which are blocked due to
unavailability of an I/O device constitute this queue.
• Schedulers are special system software which handle process
scheduling in various ways. Their main task is to select the
jobs to be submitted into the system and to decide which
process to run. Schedulers are of three types −
• Long-Term Scheduler
• Short-Term Scheduler
• Medium-Term Scheduler
Long Term Scheduler
• It is also called a job scheduler. A long-term scheduler
determines which programs are admitted to the system for
processing. It selects processes from the queue and loads them
into memory for execution. Process loads into the memory for
CPU scheduling.
• The primary objective of the job scheduler is to provide a
balanced mix of jobs, such as I/O bound and processor bound.
It also controls the degree of multiprogramming. If the degree
of multiprogramming is stable, then the average rate of process
creation must be equal to the average departure rate of
processes leaving the system.
• It is also called as CPU scheduler. Its main objective is to
increase system performance in accordance with the chosen set
of criteria. It is the change of ready state to running state of the
process. CPU scheduler selects a process among the processes
that are ready to execute and allocates CPU to one of them.
• Short-term schedulers, also known as dispatchers, make the
decision of which process to execute next. Short-term
schedulers are faster than long-term schedulers.
• Medium-term scheduling is a part of swapping. It removes the
processes from the memory. It reduces the degree of
multiprogramming. The medium-term scheduler is in-charge
of handling the swapped out-processes.
• A running process may become suspended if it makes an I/O
request. A suspended processes cannot make any progress
towards completion. In this condition, to remove the process
from memory and make space for other processes, the
suspended process is moved to the secondary storage. This
process is called swapping, and the process is said to be
swapped out or rolled out. Swapping may be necessary to
improve the process mix.
Ready Queue vs I/O queue
Representation of Process Scheduling
Action of Medium Term Schedular
Context Switch
• When CPU switches to another process, the system must save
the state of the old process and load the saved state for the
new process
• Context-switch time is overhead; the system does no useful
work while switching
• Time dependent on hardware support
Inter Process Communication
• A process can be of two type:
• Independent process.
• Co-operating process.
• An independent process is not affected by the execution of other
processes while a co-operating process can be affected by other
executing processes.
• Though one can think that those processes, which are running
independently, will execute very efficiently but in practical, there are
many situations when co-operative nature can be utilised for
increasing computational speed, convenience and modularity.
• Inter process communication (IPC) is a mechanism which allows
processes to communicate each other and synchronize their actions.
The communication between these processes can be seen as a
method of co-operation between them.
• There are numerous reasons for providing an environment or situation
which allows process co-operation:
• Information sharing: Since a number of users may be interested in the
same piece of information (for example, a shared file), you must provide a
situation for allowing concurrent access to those information.
• Computation speedup: If you want a particular work to run fast, you must
break it into sub-tasks where each of them will get execute in parallel with
the other tasks. Note that such a speed-up can be attained only when the
computer has compound or various processing elements like CPUs or I/O
channels.
• Modularity: You may want to build the system in a modular way by
dividing the system functions into split processes or threads.
• Convenience: Even a single user may work on many tasks at a time. For
example, a user may be editing, formatting, printing, and compiling in
parallel.
• Working together multiple processes, require an inter process
communication (IPC) method which will allow them to
exchange data along with various information. There are two
primary models of inter process communication:
– shared memory and
– message passing.
• In the shared-memory model, a region of memory which is
shared by cooperating processes gets established. Processes
can then able to exchange information by reading and writing
all the data to the shared region. In the message-passing form,
communication takes place by way of messages exchanged
among the cooperating processes.
Process Synchronization
• Concurrent access to shared data may result in data
inconsistency
• Maintaining data consistency requires mechanisms to ensure
the orderly execution of cooperating processes
• Suppose that we wanted to provide a solution to the consumer-
producer problem that fills all the buffers. We can do so by
having an integer counter that keeps track of the number of full
buffers. Initially, counter is set to 0. It is incremented by the
producer after it produces a new buffer and is decremented by
the consumer after it consumes a buffer.
Producer- Consumer Problem solution
using Counter Variable
Code For Producer Process
while (true) {
/* produce an item and put in nextProduced */
while (counter == BUFFER_SIZE)
; // do nothing
buffer [in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
counter++;
}
Code for Consumer Process
while (true) {
while (counter == 0)
; // do nothing
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
counter--;
/* consume the item in nextConsumed
}
Race Condition
• A race condition is a special condition that may occur inside a
critical section. A critical section is a section of code that is executed
by multiple threads and where the sequence of execution for the
threads makes a difference in the result of the concurrent execution
of the critical section.
• When the result of multiple threads executing a critical section may
differ depending on the sequence in which the threads execute, the
critical section is said to contain a race condition. The term race
condition stems from the metaphor that the threads are racing
through the critical section, and that the result of that race impacts
the result of executing the critical section.
• This may all sound a bit complicated, so I will elaborate more on
race conditions and critical sections in the following sections.
• To prevent race conditions from occurring you must make sure
that the critical section is executed as an atomic instruction.
That means that once a single thread is executing it, no other
threads can execute it until the first thread has left the critical
section.
• Race conditions can be avoided by proper thread
synchronization in critical sections.
Race Condition
• counter++ could be implemented as
register1 = counter
register1 = register1 + 1
counter = register1
• counter-- could be implemented as
register2 = counter
register2 = register2 – 1
count = register2
Consider this execution interleaving with “counter= 5” initially:
S0: producer execute register1 = counter {register1 = 5}
S1: producer execute register1 = register1 + 1 {register1 = 6}
S2: consumer execute register2 = counter {register2 = 5}
S3: consumer execute register2 = register2 - 1 {register2 = 4}
S4: producer execute counter = register1 {count = 6 }
S5: consumer execute counter = register2 {count = 4}
Critical Section
• A critical section is a region of code in which a process uses a
variable (which may be an object or some other data structure)
that is shared with another process (e.g. the “code” that read,
modified, and wrote an account balance in the example you
did.)
• Problems can arise if two processes are in critical sections
accessing the same variable at the same time.
• The critical section problem refers to the problem of how to
ensure that at most one process is executing its critical section
at a given time.
Solution to Critical-Section Problem
1. Mutual Exclusion - If process Pi is executing in its critical
section, then no other processes can be executing in their
critical sections
2. Progress - If no process is executing in its critical section and
there exist some processes that wish to enter their critical
section, then the selection of the processes that will enter the
critical section next cannot be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of
times that other processes are allowed to enter their critical
sections after a process has made a request to enter its critical
section and before that request is granted
Assume that each process executes at a nonzero speed
No assumption concerning relative speed of the N processes
Peterson Solution
• Peterson's Solution is a classic software-based solution to the critical
section problem.
• Peterson's solution is based on two processes, P0 and P1, which
alternate between their critical sections and remainder sections. For
convenience of discussion, "this" process is Pi, and the "other"
process is Pj. ( I.e. j = 1 - i )
• Peterson's solution requires two shared data items:
• int turn - Indicates whose turn it is to enter into the critical section.
If turn = = i, then process i is allowed into their critical section.
• boolean flag[ 2 ] - Indicates when a process wants to enter into their
critical section. When process i wants to enter their critical section,
it sets flag[ i ] to true.
Peterson’s Solution for Process i
• In the entry section, process i first raises a flag indicating a desire to
enter the critical section.
• Then turn is set to j to allow the other process to enter their critical
section if process j so desires.
• The while loop is a busy loop ( notice the semicolon at the end ),
which makes process i wait as long as process j has the turn and
wants to enter the critical section.
• Process i lowers the flag[ i ] in the exit section, allowing process j to
continue if it has been waiting.
• To prove that the solution is correct, we must examine the three conditions
listed above:
– Mutual exclusion - If one process is executing their critical section when the other
wishes to do so, the second process will become blocked by the flag of the first
process. If both processes attempt to enter at the same time, the last process to
execute "turn = j" will be blocked.
– Progress - Each process can only be blocked at the while if the other process wants
to use the critical section ( flag[ j ] = = true ), AND it is the other process's turn to
use the critical section ( turn = = j ). If both of those conditions are true, then the
other process ( j ) will be allowed to enter the critical section, and upon exiting the
critical section, will set flag[ j ] to false, releasing process i. The shared variable turn
assures that only one process at a time can be blocked, and the flag variable allows
one process to release the other when exiting their critical section.
– Bounded Waiting - As each process enters their entry section, they set the turn
variable to be the other processes turn. Since no process ever sets it back to their
own turn, this ensures that each process will have to let the other process go first at
most one time before it becomes their turn again.
• Note that the instruction "turn = j" is atomic, that is it is a single machine
instruction which cannot be interrupted.
Semaphore
• In 1965, Dijkstra proposed a new and very significant technique for
managing concurrent processes by using the value of a simple integer
variable to synchronize the progress of interacting processes. This integer
variable is called semaphore. So it is basically a synchronizing tool and is
accessed only through two low standard atomic operations, wait and signal
designated by P() and V() respectively.
• Two standard operations, wait and signal are defined on the semaphore.
Entry to the critical section is controlled by the wait operation and exit
from a critical region is taken care by signal operation.
• The manipulation of semaphore (S) takes place as following:
• The wait command P(S) decrements the semaphore value by 1..
• The V(S) i.e. signals operation increments the semaphore value by 1.
• Mutual exclusion on the semaphore is enforced within P(S) and V(S). If a
number of processes attempt P(S) simultaneously, only one process will be
allowed to proceed & the other processes will be waiting.
Wait and Signal function
• In practice, semaphores can take on one of two forms:
• Binary semaphores can take on one of two values, 0 or 1. They can
be used to solve the critical section problem as described above, and
are sometimes known as mutexes, because they provide mutual
exclusion.
• Counting semaphores can take on any integer value, and are
usually used to count the number remaining of some limited
resource. The counter is initialized to the number of such resources
available in the system, and whenever the counting semaphore is
greater than zero, then a process can enter a critical section and use
one of the resources. When the counter gets to zero ( or negative in
some implementations ), then the process blocks until another
process frees up a resource and increments the counting semaphore
with a signal call. ( The binary semaphore can be seen as just a
special case where the number of resources initially available is just
one. )
Semaphore Implementation
• Must guarantee that no two processes can execute wait () and
signal () on the same semaphore at the same time
• Thus, implementation becomes the critical section problem
where the wait and signal code are placed in the crtical section.
– Could now have busy waiting in critical section implementation
• But implementation code is short
• Little busy waiting if critical section rarely occupied
• Note that applications may spend lots of time in critical
sections and therefore this is not a good solution.
Semaphore Implementation with no Busy waiting
• With each semaphore there is an associated waiting queue.
Each entry in a waiting queue has two data items:
– value (of type integer)
– pointer to next record in the list
• Two operations:
– block – place the process invoking the operation on the
appropriate waiting queue.
– wakeup – remove one of processes in the waiting queue and
place it in the ready queue.
Semaphore Implementation with no Busy waiting (Cont.)
• Implementation of wait:
wait (S){
value--;
if (value < 0) {
add this process to waiting queue
block(); }
}
• Implementation of signal:
Signal (S){
value++;
if (value <= 0) {
remove a process P from the waiting queue
wakeup(P); }
}
Deadlock and Starvation
• Deadlock – two or more processes are waiting indefinitely for
an event that can be caused by only one of the waiting
processes
• Let S and Q be two semaphores initialized to 1
P0 P1
wait (S); wait (Q);
wait (Q); wait (S);
. .
. .
. .
signal (S); signal (Q);
signal (Q); signal (S);
• Starvation – indefinite blocking. A process may never be
removed from the semaphore queue in which it is suspended.
Classical Problems of Synchronization
• Bounded-Buffer Problem
• Readers and Writers Problem
• Dining-Philosophers Problem
Bounded-Buffer Problem
• This is a generalization of the producer-consumer problem
wherein access is controlled to a shared group of buffers of a
limited size. In this solution, the two counting semaphores
"full" and "empty" keep track of the current number of full and
empty buffers respectively ( and initialized to 0 and N
respectively. )
• The binary semaphore mutex controls access to the critical
section.
• The producer and consumer processes are nearly identical -
One can think of the producer as producing full buffers, and
the consumer producing empty buffers Semaphore mutex
initialized to the value 1
• Semaphore full initialized to the value 0
• Semaphore empty initialized to the value N.
Bounded Buffer Problem (Cont.)
Bounded Buffer Problem (Cont.)
Readers-Writers Problem
• In the readers-writers problem there are some processes ( termed readers ) who only
read the shared data, and never change it, and there are other processes ( termed
writers ) who may change the data in addition to or instead of reading it. There is no
limit to how many readers can access the data simultaneously, but when a writer
accesses the data, it needs exclusive access.
• There are several variations to the readers-writers problem, most centered around
relative priorities of readers versus writers. The first readers-writers problem gives
priority to readers. In this problem, if a reader wants access to the data, and there is
not already a writer accessing it, then access is granted to the reader. A solution to
this problem can lead to starvation of the writers, as there could always be more
readers coming along to access the data. ( A steady stream of readers will jump
ahead of waiting writers as long as there is currently already another reader
accessing the data, because the writer is forced to wait until the data is idle, which
may never happen if there are enough readers. )
• The second readers-writers problem gives priority to the writers. In this problem,
when a writer wants access to the data it jumps to the head of the queue - All
waiting readers are blocked, and the writer gets access to the data as soon as it
becomes available. In this solution the readers may be starved by a steady stream of
writers.
Readers-Writers Problem (Cont.)
• The following code is an example of the first readers-writers
problem, and involves an important counter and two binary
semaphores: readcount is used by the reader processes, to count the
number of readers currently accessing the data.
• mutex is a semaphore used only by the readers for controlled access
to readcount.
• rw_mutex is a semaphore used to block and release the writers. The
first reader to access the data will set this lock and the last reader to
exit will release it; The remaining readers do not touch rw_mutex. (
Eighth edition called this variable wrt. )
• Note that the first reader to come along will block on rw_mutex if
there is currently a writer accessing the data, and that all following
readers will only block on mutex for their turn to increment
readcount.
Readers-Writers Problem (Cont.)
Dining-Philosophers Problem
• The dining philosophers problem is a classic synchronization
problem involving the allocation of limited resources amongst a
group of processes in a deadlock-free and starvation-free manner:
Consider five philosophers sitting around a table, in which there are
five chopsticks evenly distributed and an endless bowl of rice in the
center, as shown in the diagram below. ( There is exactly one
chopstick between each pair of dining philosophers. )
• These philosophers spend their lives alternating between two
activities: eating and thinking.
• When it is time for a philosopher to eat, it must first acquire two
chopsticks - one from their left and one from their right.
• When a philosopher thinks, it puts down both chopsticks in their
original locations.
Dining-Philosophers Problem (Cont.)
• One possible solution, as shown in the following code section,
is to use a set of five semaphores ( chopsticks[ 5 ] ), and to
have each hungry philosopher first wait on their left chopstick
( chopsticks[ i ] ), and then wait on their right chopstick (
chopsticks[ ( i + 1 ) % 5 ] )
• But suppose that all five philosophers get hungry at the same
time, and each starts by picking up their left chopstick. They
then look for their right chopstick, but because it is
unavailable, they wait for it, forever, and eventually all the
philosophers starve due to the resulting deadlock.
• Some potential solutions to the problem include: Only allow
four philosophers to dine at the same time. ( Limited
simultaneous processes. )
• Allow philosophers to pick up chopsticks only when both are
available, in a critical section. ( All or nothing allocation of
critical resources. )
• Use an asymmetric solution, in which odd philosophers pick
up their left chopstick first and even philosophers pick up their
right chopstick first. ( Will this solution always work? What if
there are an even number of philosophers? )
CPU Scheduling
• CPU scheduling is a process which allows one process to use
the CPU while the execution of another process is on hold(in
waiting state) due to unavailability of any resource like I/O etc,
thereby making full use of CPU. The aim of CPU scheduling
is to make the system efficient, fast and fair.
• Whenever the CPU becomes idle, the operating system must
select one of the processes in the ready queue to be executed.
The selection process is carried out by the short-term scheduler
(or CPU scheduler). The scheduler selects from among the
processes in memory that are ready to execute, and allocates
the CPU to one of them.
CPU-I/O Burst Cycle
• Almost all processes alternate between two states in a
continuing cycle, as shown in Figure 5.1 below :
– A CPU burst of performing calculations, and
– An I/O burst, waiting for data transfer in or out of the system.
• Whenever the CPU becomes idle, it is the job of the CPU
Scheduler ( a.k.a. the short-term scheduler ) to select another
process from the ready queue to run next. The storage structure
for the ready queue and the algorithm used to select the next
process are not necessarily a FIFO queue. There are several
alternatives to choose from, as well as numerous adjustable
parameters for each algorithm
• CPU scheduling decisions take place under one of four conditions:
– When a process switches from the running state to the waiting state,
such as for an I/O request or invocation of the wait( ) system call.
– When a process switches from the running state to the ready state, for
example in response to an interrupt.
– When a process switches from the waiting state to the ready state, say
at completion of I/O or a return from wait( ).
– When a process terminates.
• For conditions 1 and 4 there is no choice - A new process must be
selected. For conditions 2 and 3 there is a choice - To either continue
running the current process, or select a different one. If scheduling
takes place only under conditions 1 and 4, the system is said to be
non-preemptive. Under these conditions, once a process starts
running it keeps running, until it either voluntarily blocks or until it
finishes. Otherwise the system is said to be preemptive.
• Dispatcher
• The dispatcher is the module that gives control of the CPU to
the process selected by the scheduler. This function involves:
– Switching context.
– Switching to user mode.
– Jumping to the proper location in the newly loaded program.
• The dispatcher needs to be as fast as possible, as it is run on
every context switch. The time consumed by the dispatcher is
known as dispatch latency.
Scheduling Criteria
• There are several different criteria to consider when trying to select
the "best" scheduling algorithm for a particular situation and
environment, including:
– CPU utilization - Ideally the CPU would be busy 100% of the time, so
as to waste 0 CPU cycles. On a real system CPU usage should range
from 40% ( lightly loaded ) to 90% ( heavily loaded. )
– Throughput - Number of processes completed per unit time. May
range from 10 / second to 1 / hour depending on the specific processes.
– Turnaround time - Time required for a particular process to complete,
from submission time to completion.
– Waiting time - How much time processes spend in the ready queue
waiting their turn to get on the CPU.
– Response time - The time taken in an interactive program from the
issuance of a command to the commence of a response to that
command.
Scheduling Algorithms
• First Come First Serve (FCFS)
• Shortest Job First Scheduling Algorithm (SJF)
• Priority Scheduling
• Round Robin Scheduling
• Multilevel Queue Scheduling
• Multilevel Feedback Queue Scheduling Algorithm
First Come First Serve
• FCFS is very simple - Just a FIFO queue, like customers
waiting in line at the bank or the post office or at a copying
machine.
• Unfortunately, however, FCFS can yield some very long
average wait times, particularly if the first process to get there
takes a long time. For example, consider the following three
processes:
Process Burst Time
P1 24
P2 3
P3 3
• In the first Gantt chart below, process P1 arrives first. The
average waiting time for the three processes is ( 0 + 24 + 27 ) /
3 = 17.0 ms.
• In the second Gantt chart below, the same three processes have
an average wait time of ( 0 + 3 + 6 ) / 3 = 3.0 ms. The total run
time for the three bursts is the same, but in the second case two
of the three finish much quicker, and the other process is only
delayed by a short amount.
• FCFS can also block the system in a busy dynamic system in
another way, known as the convoy effect. When one CPU
intensive process blocks the CPU, a number of I/O intensive
processes can get backed up behind it, leaving the I/O devices
idle. When the CPU hog finally relinquishes the CPU, then the
I/O processes pass through the CPU quickly, leaving the CPU
idle while everyone queues up for I/O, and then the cycle
repeats itself when the CPU intensive process gets back to the
ready queue.
Shortest-Job-First Scheduling, SJF
• The idea behind the SJF algorithm is to pick the quickest
fastest little job that needs to be done, get it out of the way
first, and then pick the next smallest fastest job to do next.
• ( Technically this algorithm picks a process based on the next
shortest CPU burst, not the overall process time. )
• For example, the Gantt chart below is based upon the
following CPU burst times, ( and the assumption that all jobs
arrive at the same time. )
Process Burst Time
P1 6
P2 8
P3 7
P4 3
• In the case above the average wait time is ( 0 + 3 + 9 + 16 ) / 4
= 7.0 ms, ( as opposed to 10.25 ms for FCFS for the same
processes. )
• SJF can be proven to be the fastest scheduling algorithm, but it
suffers from one important problem: How do you know how
long the next CPU burst is going to be? For long-term batch
jobs this can be done based upon the limits that users set for
their jobs when they submit them, which encourages them to
set low limits, but risks their having to re-submit the job if they
set the limit too low. However that does not work for short-
term CPU scheduling on an interactive system.
• Another option would be to statistically measure the run time
characteristics of jobs, particularly if the same tasks are run
repeatedly and predictably. But once again that really isn't a
viable option for short term CPU scheduling in the real world.
• A more practical approach is to predict the length of the next burst,
based on some historical measurement of recent burst times for this
process. One simple, fast, and relatively accurate method is the
exponential average, which can be defined as follows. ( The book
uses tau and t for their variables, but those are hard to distinguish
from one another and don't work well in HTML. )
• estimate[ i + 1 ] = alpha * burst[ i ] + ( 1.0 - alpha ) * estimate[ i ]
• In this scheme the previous estimate contains the history of all
previous times, and alpha serves as a weighting factor for the
relative importance of recent data versus past history. If alpha is 1.0,
then past history is ignored, and we assume the next burst will be the
same length as the last burst. If alpha is 0.0, then all measured burst
times are ignored, and we just assume a constant burst time. Most
commonly alpha is set at 0.5, as illustrated in Figure 5.3:
• SJF can be either preemptive or non-preemptive. Preemption
occurs when a new process arrives in the ready queue that has
a predicted burst time shorter than the time remaining in the
process whose burst is currently on the CPU. Preemptive SJF
is sometimes referred to as shortest remaining time first
scheduling.
• For example, the following Gantt chart is based upon the
following data:
Process Arrival Time Burst Time
P1 0 8
P2 1 4
P3 2 9
p4 3 5
• The average wait time in this case is ( ( 5 - 3 ) + ( 10 - 1 ) + (
17 - 2 ) ) / 4 = 26 / 4 = 6.5 ms. ( As opposed to 7.75 ms for
non-preemptive SJF or 8.75 for FCFS. )
Priority Scheduling
• Priority scheduling is a more general case of SJF, in which
each job is assigned a priority and the job with the highest
priority gets scheduled first. ( SJF uses the inverse of the next
expected burst time as its priority - The smaller the expected
burst, the higher the priority. )
• Note that in practice, priorities are implemented using integers
within a fixed range, but there is no agreed-upon convention as
to whether "high" priorities use large numbers or small
numbers. This book uses low number for high priorities, with
0 being the highest possible priority.
• For example, the following Gantt chart is based upon these
process burst times and priorities, and yields an average
waiting time of 8.2 ms:
Process Burst Time Priority
P1 10 3
P2 1 1
P3 2 4
P4 1 5
P5 5 2
• Priorities can be assigned either internally or externally. Internal priorities
are assigned by the OS using criteria such as average burst time, ratio of
CPU to I/O activity, system resource use, and other factors available to the
kernel. External priorities are assigned by users, based on the importance of
the job, fees paid, politics, etc.
• Priority scheduling can be either preemptive or non-preemptive.
• Priority scheduling can suffer from a major problem known as indefinite
blocking, or starvation, in which a low-priority task can wait forever
because there are always some other jobs around that have higher priority.
– If this problem is allowed to occur, then processes will either run
eventually when the system load lightens ( at say 2:00 a.m. ), or will
eventually get lost when the system is shut down or crashes. ( There are
rumors of jobs that have been stuck for years. )
– One common solution to this problem is aging, in which priorities of jobs
increase the longer they wait. Under this scheme a low-priority job will
eventually get its priority raised high enough that it gets run.
Round Robin Scheduling
• Round robin scheduling is similar to FCFS scheduling, except that
CPU bursts are assigned with limits called time quantum.
• When a process is given the CPU, a timer is set for whatever value
has been set for a time quantum.
– If the process finishes its burst before the time quantum timer expires,
then it is swapped out of the CPU just like the normal FCFS algorithm.
– If the timer goes off first, then the process is swapped out of the CPU
and moved to the back end of the ready queue.
• The ready queue is maintained as a circular queue, so when all
processes have had a turn, then the scheduler gives the first process
another turn, and so on.
• RR scheduling can give the effect of all processors sharing the CPU
equally, although the average wait time can be longer than with
other scheduling algorithms. In the following example the average
wait time is 5.66 ms.
Process Burst Time
P1 24
P2 3
P3 3
• The performance of RR is sensitive to the time quantum
selected. If the quantum is large enough, then RR reduces to
the FCFS algorithm; If it is very small, then each process gets
1/nth of the processor time and share the CPU equally.
• BUT, a real system invokes overhead for every context switch,
and the smaller the time quantum the more context switches
there are. ( See Figure 6.4 below. ) Most modern systems use
time quantum between 10 and 100 milliseconds, and context
switch times on the order of 10 microseconds, so the overhead
is small relative to the time quantum.
• Turn around time also varies with quantum time, in a non-
apparent manner. Consider, for example the processes shown
in Figure 6.5:
• In general, turnaround time is minimized if most processes
finish their next cpu burst within one time quantum. For
example, with three processes of 10 ms bursts each, the
average turnaround time for 1 ms quantum is 29, and for 10
ms quantum it reduces to 20. However, if it is made too large,
then RR just degenerates to FCFS. A rule of thumb is that 80%
of CPU bursts should be smaller than the time quantum.
Multilevel Queue Scheduling
• When processes can be readily categorized, then multiple
separate queues can be established, each implementing
whatever scheduling algorithm is most appropriate for that
type of job, and/or with different parametric adjustments.
• Scheduling must also be done between queues, that is
scheduling one queue to get time relative to other queues. Two
common options are strict priority ( no job in a lower priority
queue runs until all higher priority queues are empty ) and
round-robin ( each queue gets a time slice in turn, possibly of
different sizes. )
• Note that under this algorithm jobs cannot switch from queue
to queue - Once they are assigned a queue, that is their queue
until they finish.
Multilevel Feedback-Queue Scheduling
• Multilevel feedback queue scheduling is similar to the ordinary multilevel
queue scheduling described above, except jobs may be moved from one
queue to another for a variety of reasons:
– If the characteristics of a job change between CPU-intensive and I/O
intensive, then it may be appropriate to switch a job from one queue to
another.
– Aging can also be incorporated, so that a job that has waited for a long time
can get bumped up into a higher priority queue for a while.
• Multilevel feedback queue scheduling is the most flexible, because it can
be tuned for any situation. But it is also the most complex to implement
because of all the adjustable parameters. Some of the parameters which
define one of these systems include:
– The number of queues.
– The scheduling algorithm for each queue.
– The methods used to upgrade or demote processes from one queue to
another. ( Which may be different. )
– The method used to determine which queue a process enters initially.
• Now let us suppose that queue 1 and 2 follow round robin with
time quantum 4 and 8 respectively and queue 3 follow FCFS.
One implementation of MFQS is given below –
• When a process starts executing then it first enters queue 1.
• In queue 1 process executes for 4 unit and if it completes in
this 4 unit or it gives CPU for I/O operation in this 4 unit than
the priority of this process does not change and if it again
comes in the ready queue than it again starts its execution in
Queue 1.
• If a process in queue 1 does not complete in 4 unit then its
priority gets reduced and it shifted to queue 2
• Above points 2 and 3 are also true for queue 2 processes but
the time quantum is 8 unit.In a general case if a process does
not complete in a time quantum than it is shifted to the lower
priority queue.
• In the last queue, processes are scheduled in FCFS manner.
• A process in lower priority queue can only execute only when
higher priority queues are empty.
• A process running in the lower priority queue is interrupted by
a process arriving in the higher priority queue.
A CONTROL STRUCTURE FOR
INDICATING PARALLELISM
• Many programmmg language constructs for indicating parallelism have
appeared in the literature. These generally involve pairs of statements as
follows
• One statement indicating that execution is to split into several parallel
execution sequences (threads of control).
• One statement indicating that certain parallel execution sequences are to
merge sequential execution is to resume
• These statements occur in pairs and are commonly called parbegin for
begin end parallel begin and end paren in text use parbegin/parend as
concurrent execution Fig. 4.1 suggested by Dijkstra form is shown par
begin statement statement 2 statement parend The parbeginparend
parallelism construct. a program currently executing a sin parbegin
construct. This causes threads of control-one for each state simple
statements, procedure calls, b or combinations of these. Each and reaches
the parend. Wh a single thread of control is after the parend.
Multiple-Processor Scheduling
• When multiple processors are available, then the scheduling
gets more complicated, because now there is more than one
CPU which must be kept busy and in effective use at all times.
• Load sharing revolves around balancing the load between
multiple processors.
• Multi-processor systems may be heterogeneous, ( different
kinds of CPUs ), or homogenous, ( all the same kind of CPU
).
• Issue may be related to
– which process to be run and on which CPU
– Whether process are unrelated or come into group.
Approaches to Multiple-Processor Scheduling
• One approach to multi-processor scheduling is asymmetric
multiprocessing, in which one processor is the master,
controlling all activities and running all kernel code, while the
other runs only user code. This approach is relatively simple,
as there is no need to share critical system data.
• Another approach is symmetric multiprocessing, SMP, where
each processor schedules its own jobs, either from a common
ready queue or from separate ready queues for each processor.
Multi processor timesharing
• The simplest strategy of scheduling is that of time sharing by
maintaining a global ready queue as they would be in
uniprocessor system. it can never happen would be in a
processor system. It provides automatic load balancing
because it can never happen one CPU is idle while other are
overloaded.
• The disadvantages of this approach overhead in contention for
the scheduling data structure as the numbers of CPUs grow
and the usual overhead in doing a context switch when a
process blocks for I/O.
Load Balancing
• Obviously an important goal in a multiprocessor system is to
balance the load between processors, so that one processor won't be
sitting idle while another is overloaded.
• Systems using a common ready queue are naturally self-balancing,
and do not need any special handling. Most systems, however,
maintain separate ready queues for each processor.
• Balancing can be achieved through either push migration or pull
migration:
– Push migration involves a separate process that runs periodically, ( e.g.
every 200 milliseconds ), and moves processes from heavily loaded
processors onto less loaded ones.
– Pull migration involves idle processors taking processes from the
ready queues of other processors.
Affinity Scheduling
• Processor Affinity means a processes has an affinity for the
processor on which it is currently running.
• When a process runs on a specific processor there are certain effects
on the cache memory. The data most recently accessed by the
process populate the cache for the processor and as a result
successive memory access by the process are often satisfied in the
cache memory. Now if the process migrates to another processor, the
contents of the cache memory must be invalidated for the first
processor and the cache for the second processor must be
repopulated. Because of the high cost of invalidating and
repopulating caches, most of the SMP(symmetric multiprocessing)
systems try to avoid migration of processes from one processor to
another and try to keep a process running on the same processor.
This is known as PROCESSOR AFFINITY.
Deadlocks
Operating System Concepts – 9th Edition Silberschatz, Galvin and Gagne ©2013
Chapter : Deadlocks
System Model
Deadlock Characterization
Methods for Handling Deadlocks
Deadlock Prevention
Deadlock Avoidance
Deadlock Detection
Recovery from Deadlock
Operating System Concepts – 9th Edition 7.2 Silberschatz, Galvin and Gagne ©2013
Chapter Objectives
To develop a description of deadlocks, which prevent
sets of concurrent processes from completing their
tasks
To present a number of different methods for
preventing or avoiding deadlocks in a computer
system
Operating System Concepts – 9th Edition 7.3 Silberschatz, Galvin and Gagne ©2013
System Model
System consists of resources
Resource types R1, R2, . . ., Rm
CPU cycles, memory space, I/O devices
Each resource type Ri has Wi instances.
Each process utilizes a resource as follows:
request
use
release
Operating System Concepts – 9th Edition 7.4 Silberschatz, Galvin and Gagne ©2013
Deadlock Characterization
Deadlock can arise if four conditions hold simultaneously.
Mutual exclusion: only one process at a time can use a
resource
Hold and wait: a process holding at least one resource is
waiting to acquire additional resources held by other
processes
No preemption: a resource can be released only voluntarily
by the process holding it, after that process has completed
its task
Circular wait: there exists a set {P0, P1, …, Pn} of waiting
processes such that P0 is waiting for a resource that is held
by P1, P1 is waiting for a resource that is held by P2, …, Pn–1
is waiting for a resource that is held by Pn, and Pn is waiting
for a resource that is held by P0.
Operating System Concepts – 9th Edition 7.5 Silberschatz, Galvin and Gagne ©2013
Resource-Allocation Graph
A set of vertices V and a set of edges E.
V is partitioned into two types:
P = {P1, P2, …, Pn}, the set consisting of all the processes
in the system
R = {R1, R2, …, Rm}, the set consisting of all resource
types in the system
request edge – directed edge Pi → Rj
assignment edge – directed edge Rj → Pi
Operating System Concepts – 9th Edition 7.6 Silberschatz, Galvin and Gagne ©2013
Resource-Allocation Graph (Cont.)
Process
Resource Type with 4 instances
Pi requests instance of Rj
Pi
Rj
Pi is holding an instance of Rj
Pi
Rj
Operating System Concepts – 9th Edition 7.7 Silberschatz, Galvin and Gagne ©2013
Example of a Resource Allocation Graph
Operating System Concepts – 9th Edition 7.8 Silberschatz, Galvin and Gagne ©2013
Resource Allocation Graph With A Deadlock
Operating System Concepts – 9th Edition 7.9 Silberschatz, Galvin and Gagne ©2013
Graph With A Cycle But No Deadlock
Operating System Concepts – 9th Edition 7.10 Silberschatz, Galvin and Gagne ©2013
Basic Facts
If graph contains no cycles ⇒ no deadlock
If graph contains a cycle ⇒
if only one instance per resource type, then deadlock
if several instances per resource type, possibility of
deadlock
Operating System Concepts – 9th Edition 7.11 Silberschatz, Galvin and Gagne ©2013
Methods for Handling Deadlocks
Ensure that the system will never enter a deadlock
state:
Deadlock prevention
Deadlock avoidence
Allow the system to enter a deadlock state and then
recover
Ignore the problem and pretend that deadlocks never
occur in the system; used by most operating systems,
including UNIX
Operating System Concepts – 9th Edition 7.12 Silberschatz, Galvin and Gagne ©2013
Deadlock Prevention
Restrain the ways request can be made
Mutual Exclusion – not required for sharable resources
(e.g., read-only files); must hold for non-sharable resources
Hold and Wait – must guarantee that whenever a process
requests a resource, it does not hold any other resources
Require process to request and be allocated all its
resources before it begins execution, or allow process
to request resources only when the process has none
allocated to it.
Low resource utilization; starvation possible
Operating System Concepts – 9th Edition 7.13 Silberschatz, Galvin and Gagne ©2013
Deadlock Prevention (Cont.)
No Preemption –
If a process that is holding some resources requests
another resource that cannot be immediately allocated to
it, then all resources currently being held are released
Preempted resources are added to the list of resources
for which the process is waiting
Process will be restarted only when it can regain its old
resources, as well as the new ones that it is requesting
Circular Wait – impose a total ordering of all resource types,
and require that each process requests resources in an
increasing order of enumeration
Operating System Concepts – 9th Edition 7.14 Silberschatz, Galvin and Gagne ©2013
Deadlock Example
/* thread one runs in this function */
void *do_work_one(void *param)
{
pthread_mutex_lock(&first_mutex);
pthread_mutex_lock(&second_mutex);
/** * Do some work */
pthread_mutex_unlock(&second_mutex);
pthread_mutex_unlock(&first_mutex);
pthread_exit(0);
}
/* thread two runs in this function */
void *do_work_two(void *param)
{
pthread_mutex_lock(&second_mutex);
pthread_mutex_lock(&first_mutex);
/** * Do some work */
pthread_mutex_unlock(&first_mutex);
pthread_mutex_unlock(&second_mutex);
pthread_exit(0);
}
Operating System Concepts – 9th Edition 7.15 Silberschatz, Galvin and Gagne ©2013
Deadlock Example with Lock Ordering
void transaction(Account from, Account to, double amount)
{
mutex lock1, lock2;
lock1 = get_lock(from);
lock2 = get_lock(to);
acquire(lock1);
acquire(lock2);
withdraw(from, amount);
deposit(to, amount);
release(lock2);
release(lock1);
}
Transactions 1 and 2 execute concurrently. Transaction 1 transfers $25
from account A to account B, and Transaction 2 transfers $50 from account
B to account A
Operating System Concepts – 9th Edition 7.16 Silberschatz, Galvin and Gagne ©2013
Deadlock Avoidance
Requires that the system has some additional a priori information
available
Simplest and most useful model requires that each process
declare the maximum number of resources of each type
that it may need
The deadlock-avoidance algorithm dynamically examines
the resource-allocation state to ensure that there can never
be a circular-wait condition
Resource-allocation state is defined by the number of
available and allocated resources, and the maximum
demands of the processes
Operating System Concepts – 9th Edition 7.17 Silberschatz, Galvin and Gagne ©2013
Safe State
When a process requests an available resource, system must
decide if immediate allocation leaves the system in a safe state
System is in safe state if there exists a sequence <P1, P2, …, Pn>
of ALL the processes in the systems such that for each Pi, the
resources that Pi can still request can be satisfied by currently
available resources + resources held by all the Pj, with j < I
That is:
If Pi resource needs are not immediately available, then Pi can
wait until all Pj have finished
When Pj is finished, Pi can obtain needed resources, execute,
return allocated resources, and terminate
When Pi terminates, Pi +1 can obtain its needed resources, and
so on
Operating System Concepts – 9th Edition 7.18 Silberschatz, Galvin and Gagne ©2013
Basic Facts
If a system is in safe state ⇒ no deadlocks
If a system is in unsafe state ⇒ possibility of deadlock
Avoidance ⇒ ensure that a system will never enter an
unsafe state.
Operating System Concepts – 9th Edition 7.19 Silberschatz, Galvin and Gagne ©2013
Safe, Unsafe, Deadlock State
Operating System Concepts – 9th Edition 7.20 Silberschatz, Galvin and Gagne ©2013
Avoidance Algorithms
Single instance of a resource type
Use a resource-allocation graph
Multiple instances of a resource type
Use the banker’s algorithm
Operating System Concepts – 9th Edition 7.21 Silberschatz, Galvin and Gagne ©2013
Resource-Allocation Graph Scheme
Claim edge Pi → Rj indicated that process Pj may request
resource Rj; represented by a dashed line
Claim edge converts to request edge when a process requests
a resource
Request edge converted to an assignment edge when the
resource is allocated to the process
When a resource is released by a process, assignment edge
reconverts to a claim edge
Resources must be claimed a priori in the system
Operating System Concepts – 9th Edition 7.22 Silberschatz, Galvin and Gagne ©2013
Resource-Allocation Graph
Operating System Concepts – 9th Edition 7.23 Silberschatz, Galvin and Gagne ©2013
Unsafe State In Resource-Allocation Graph
Operating System Concepts – 9th Edition 7.24 Silberschatz, Galvin and Gagne ©2013
Resource-Allocation Graph Algorithm
Suppose that process Pi requests a resource Rj
The request can be granted only if converting the
request edge to an assignment edge does not result
in the formation of a cycle in the resource allocation
graph
Operating System Concepts – 9th Edition 7.25 Silberschatz, Galvin and Gagne ©2013
Banker’s Algorithm
Multiple instances
Each process must a priori claim maximum use
When a process requests a resource it may have to wait
When a process gets all its resources it must return them in a
finite amount of time
Operating System Concepts – 9th Edition 7.26 Silberschatz, Galvin and Gagne ©2013
Data Structures for the Banker’s Algorithm
Let n = number of processes, and m = number of resources types.
Available: Vector of length m. If available [j] = k, there are k
instances of resource type Rj available
Max: n x m matrix. If Max [i,j] = k, then process Pi may request at
most k instances of resource type Rj
Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently
allocated k instances of Rj
Need: n x m matrix. If Need[i,j] = k, then Pi may need k more
instances of Rj to complete its task
Need [i,j] = Max[i,j] – Allocation [i,j]
Operating System Concepts – 9th Edition 7.27 Silberschatz, Galvin and Gagne ©2013
Safety Algorithm
1. Let Work and Finish be vectors of length m and n, respectively.
Initialize:
Work = Available
Finish [i] = false for i = 0, 1, …, n- 1
2. Find an i such that both:
(a) Finish [i] = false
(b) Needi ≤ Work
If no such i exists, go to step 4
3. Work = Work + Allocationi
Finish[i] = true
go to step 2
4. If Finish [i] == true for all i, then the system is in a safe state
Operating System Concepts – 9th Edition 7.28 Silberschatz, Galvin and Gagne ©2013
Resource-Request Algorithm for Process Pi
Requesti = request vector for process Pi. If Requesti [j] = k then
process Pi wants k instances of resource type Rj
1. If Requesti ≤ Needi go to step 2. Otherwise, raise error condition,
since process has exceeded its maximum claim
2. If Requesti ≤ Available, go to step 3. Otherwise Pi must wait,
since resources are not available
3. Pretend to allocate requested resources to Pi by modifying the
state as follows:
Available = Available – Requesti;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;
If safe ⇒ the resources are allocated to Pi
If unsafe ⇒ Pi must wait, and the old resource-allocation state
is restored
Operating System Concepts – 9th Edition 7.29 Silberschatz, Galvin and Gagne ©2013
Example of Banker’s Algorithm
5 processes P0 through P4;
3 resource types:
A (10 instances), B (5instances), and C (7 instances)
Snapshot at time T0:
Allocation Max Available
ABC ABC ABC
P0 010 753 332
P1 200 322
P2 302 902
P3 211 222
P4 002 433
Operating System Concepts – 9th Edition 7.30 Silberschatz, Galvin and Gagne ©2013
Example (Cont.)
The content of the matrix Need is defined to be Max – Allocation
Need
ABC
P0 743
P1 122
P2 600
P3 011
P4 431
The system is in a safe state since the sequence < P1, P3, P4, P2, P0>
satisfies safety criteria
Operating System Concepts – 9th Edition 7.31 Silberschatz, Galvin and Gagne ©2013
Example: P1 Request (1,0,2)
Check that Request ≤ Available (that is, (1,0,2) ≤ (3,3,2) ⇒ true
Allocation Need Available
ABC ABC ABC
P0 010 743 230
P1 302 020
P2 302 600
P3 211 011
P4 002 431
Executing safety algorithm shows that sequence < P1, P3, P4, P0, P2>
satisfies safety requirement
Can request for (3,3,0) by P4 be granted?
Can request for (0,2,0) by P0 be granted?
Operating System Concepts – 9th Edition 7.32 Silberschatz, Galvin and Gagne ©2013
Deadlock Detection
Allow system to enter deadlock state
Detection algorithm
Recovery scheme
Operating System Concepts – 9th Edition 7.33 Silberschatz, Galvin and Gagne ©2013
Single Instance of Each Resource Type
Maintain wait-for graph
Nodes are processes
Pi → Pj if Pi is waiting for Pj
Periodically invoke an algorithm that searches for a cycle in the
graph. If there is a cycle, there exists a deadlock
An algorithm to detect a cycle in a graph requires an order of n2
operations, where n is the number of vertices in the graph
Operating System Concepts – 9th Edition 7.34 Silberschatz, Galvin and Gagne ©2013
Resource-Allocation Graph and Wait-for Graph
Resource-Allocation Graph Corresponding wait-for graph
Operating System Concepts – 9th Edition 7.35 Silberschatz, Galvin and Gagne ©2013
Several Instances of a Resource Type
Available: A vector of length m indicates the number of
available resources of each type
Allocation: An n x m matrix defines the number of resources
of each type currently allocated to each process
Request: An n x m matrix indicates the current request of
each process. If Request [i][j] = k, then process Pi is
requesting k more instances of resource type Rj.
Operating System Concepts – 9th Edition 7.36 Silberschatz, Galvin and Gagne ©2013
Detection Algorithm
1. Let Work and Finish be vectors of length m and n, respectively
Initialize:
(a) Work = Available
(b) For i = 1,2, …, n, if Allocationi ≠ 0, then
Finish[i] = false; otherwise, Finish[i] = true
2. Find an index i such that both:
(a) Finish[i] == false
(b) Requesti ≤ Work
If no such i exists, go to step 4
Operating System Concepts – 9th Edition 7.37 Silberschatz, Galvin and Gagne ©2013
Detection Algorithm (Cont.)
3. Work = Work + Allocationi
Finish[i] = true
go to step 2
4. If Finish[i] == false, for some i, 1 ≤ i ≤ n, then the system is in
deadlock state. Moreover, if Finish[i] == false, then Pi is
deadlocked
Algorithm requires an order of O(m x n2) operations to detect
whether the system is in deadlocked state
Operating System Concepts – 9th Edition 7.38 Silberschatz, Galvin and Gagne ©2013
Example of Detection Algorithm
Five processes P0 through P4; three resource types
A (7 instances), B (2 instances), and C (6 instances)
Snapshot at time T0:
Allocation Request Available
ABC ABC ABC
P0 010 000 000
P1 200 202
P2 303 000
P3 211 100
P4 002 002
Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i
Operating System Concepts – 9th Edition 7.39 Silberschatz, Galvin and Gagne ©2013
Example (Cont.)
P2 requests an additional instance of type C
Request
ABC
P0 000
P1 202
P2 001
P3 100
P4 002
State of system?
Can reclaim resources held by process P0, but insufficient
resources to fulfill other processes; requests
Deadlock exists, consisting of processes P1, P2, P3, and P4
Operating System Concepts – 9th Edition 7.40 Silberschatz, Galvin and Gagne ©2013
Detection-Algorithm Usage
When, and how often, to invoke depends on:
How often a deadlock is likely to occur?
How many processes will need to be rolled back?
one for each disjoint cycle
If detection algorithm is invoked arbitrarily, there may be many
cycles in the resource graph and so we would not be able to tell
which of the many deadlocked processes “caused” the
deadlock.
Operating System Concepts – 9th Edition 7.41 Silberschatz, Galvin and Gagne ©2013
Recovery from Deadlock: Process Termination
Abort all deadlocked processes
Abort one process at a time until the deadlock cycle is eliminated
In which order should we choose to abort?
1. Priority of the process
2. How long process has computed, and how much longer to
completion
3. Resources the process has used
4. Resources process needs to complete
5. How many processes will need to be terminated
6. Is process interactive or batch?
Operating System Concepts – 9th Edition 7.42 Silberschatz, Galvin and Gagne ©2013
Recovery from Deadlock: Resource Preemption
Selecting a victim – minimize cost
Rollback – return to some safe state, restart process for that
state
Starvation – same process may always be picked as victim,
include number of rollback in cost factor
Operating System Concepts – 9th Edition 7.43 Silberschatz, Galvin and Gagne ©2013
Main Memory
Operating System Concepts – 9th Edition Silberschatz, Galvin and Gagne ©2013
Background
Program must be brought (from disk) into memory and
placed within a process for it to be run
Main memory and registers are only storage CPU can
access directly
Memory unit only sees a stream of addresses + read
requests, or address + data and write requests
Register access in one CPU clock (or less)
Main memory can take many cycles, causing a stall
Cache sits between main memory and CPU registers
Protection of memory required to ensure correct operation
8.2
Base and Limit Registers
A pair of base and limit registers define the logical address space
CPU must check every memory access generated in user mode to
be sure it is between base and limit for that user
8.3
Hardware Address Protection
8.4
Address Binding
Programs on disk, ready to be brought into memory to execute form an
input queue
Without support, must be loaded into address 0000
Inconvenient to have first user process physical address always at 0000
How can it not be?
Further, addresses represented in different ways at different stages of a
program’s life
Source code addresses usually symbolic
Compiled code addresses bind to relocatable addresses
i.e. “14 bytes from beginning of this module”
Linker or loader will bind relocatable addresses to absolute addresses
i.e. 74014
Each binding maps one address space to another
8.5
Logical vs. Physical Address Space
The concept of a logical address space that is bound to a
separate physical address space is central to proper memory
management
Logical address – generated by the CPU; also referred to
as virtual address
Physical address – address seen by the memory unit
Logical address space is the set of all logical addresses
generated by a program
Physical address space is the set of all physical addresses
generated by a program
8.6
Memory-Management Unit (MMU)
Hardware device that at run time maps virtual to physical
address
Many methods possible, covered in the rest of this chapter
To start, consider simple scheme where the value in the
relocation register is added to every address generated by a
user process at the time it is sent to memory
Base register now called relocation register
MS-DOS on Intel 80x86 used 4 relocation registers
The user program deals with logical addresses; it never sees the
real physical addresses
Execution-time binding occurs when reference is made to
location in memory
Logical address bound to physical addresses
8.7
Dynamic relocation using a relocation register
Routine is not loaded until it is
called
Better memory-space utilization;
unused routine is never loaded
All routines kept on disk in
relocatable load format
Useful when large amounts of
code are needed to handle
infrequently occurring cases
No special support from the
operating system is required
Implemented through program
design
OS can help by providing libraries
to implement dynamic loading
8.8
Swapping
A process can be swapped temporarily out of memory to a
backing store, and then brought back into memory for continued
execution
Total physical memory space of processes can exceed
physical memory
Backing store – fast disk large enough to accommodate copies
of all memory images for all users; must provide direct access to
these memory images
Roll out, roll in – swapping variant used for priority-based
scheduling algorithms; lower-priority process is swapped out so
higher-priority process can be loaded and executed
Major part of swap time is transfer time; total transfer time is
directly proportional to the amount of memory swapped
System maintains a ready queue of ready-to-run processes
which have memory images on disk
8.9
Schematic View of Swapping
8.10
Context Switch Time including Swapping
If next processes to be put on CPU is not in memory, need to
swap out a process and swap in target process
Context switch time can then be very high
100MB process swapping to hard disk with transfer rate of
50MB/sec
Swap out time of 2000 ms
Plus swap in of same sized process
Total context switch swapping component time of 4000ms
(4 seconds)
Can reduce if reduce size of memory swapped – by knowing
how much memory really being used
System calls to inform OS of memory use via
request_memory() and release_memory()
8.11
Context Switch Time and Swapping (Cont.)
Other constraints as well on swapping
Pending I/O – can’t swap out as I/O would occur to wrong
process
Or always transfer I/O to kernel space, then to I/O device
Known as double buffering, adds overhead
Standard swapping not used in modern operating systems
But modified version common
Swap only when free memory extremely low
8.12
Contiguous Allocation
Main memory must support both OS and user processes
Limited resource, must allocate efficiently
Contiguous allocation is one early method
Main memory usually into two partitions:
Resident operating system, usually held in low memory with
interrupt vector
User processes then held in high memory
Each process contained in single contiguous section of
memory
8.13
Contiguous Allocation (Cont.)
Relocation registers used to protect user processes from each
other, and from changing operating-system code and data
Base register contains value of smallest physical address
Limit register contains range of logical addresses – each
logical address must be less than the limit register
MMU maps logical address dynamically
Can then allow actions such as kernel code being transient
and kernel changing size
8.14
Hardware Support for Relocation and Limit Registers
8.15
Multiple-partition allocation
Multiple-partition allocation
Degree of multiprogramming limited by number of partitions
Variable-partition sizes for efficiency (sized to a given process’ needs)
Hole – block of available memory; holes of various size are scattered
throughout memory
When a process arrives, it is allocated memory from a hole large enough to
accommodate it
Process exiting frees its partition, adjacent free partitions combined
Operating system maintains information about:
a) allocated partitions b) free partitions (hole)
8.16
Dynamic Storage-Allocation Problem
How to satisfy a request of size n from a list of free holes?
First-fit: Allocate the first hole that is big enough
Best-fit: Allocate the smallest hole that is big enough; must
search entire list, unless ordered by size
Produces the smallest leftover hole
Worst-fit: Allocate the largest hole; must also search entire list
Produces the largest leftover hole
First-fit and best-fit better than worst-fit in terms of speed and storage
utilization
8.17
Fragmentation
External Fragmentation – total memory space exists to
satisfy a request, but it is not contiguous
Internal Fragmentation – allocated memory may be slightly
larger than requested memory; this size difference is memory
internal to a partition, but not being used
First fit analysis reveals that given N blocks allocated, 0.5 N
blocks lost to fragmentation
1/3 may be unusable -> 50-percent rule
8.18
Fragmentation (Cont.)
Reduce external fragmentation by compaction
Shuffle memory contents to place all free memory together
in one large block
Compaction is possible only if relocation is dynamic, and is
done at execution time
I/O problem
Latch job in memory while it is involved in I/O
Do I/O only into OS buffers
Now consider that backing store has same fragmentation
problems
8.19
Segmentation
Memory-management scheme that supports user view of memory
A program is a collection of segments
A segment is a logical unit such as:
main program
procedure
function
method
object
local variables, global variables
common block
stack
symbol table
arrays
8.20
User’s View of a Program
8.21
Logical View of Segmentation
4
1
3 2
4
user space physical memory space
8.22
Segmentation Architecture
Logical address consists of a two tuple:
<segment-number, offset>,
Segment table – maps two-dimensional physical addresses; each
table entry has:
base – contains the starting physical address where the
segments reside in memory
limit – specifies the length of the segment
Segment-table base register (STBR) points to the segment
table’s location in memory
Segment-table length register (STLR) indicates number of
segments used by a program;
segment number s is legal if s < STLR
8.23
Segmentation Architecture (Cont.)
Protection
With each entry in segment table associate:
validation bit = 0 illegal segment
read/write/execute privileges
Protection bits associated with segments; code sharing
occurs at segment level
Since segments vary in length, memory allocation is a
dynamic storage-allocation problem
A segmentation example is shown in the following diagram
8.24
Segmentation Hardware
8.25
Paging
Physical address space of a process can be noncontiguous;
process is allocated physical memory whenever the latter is
available
Avoids external fragmentation
Avoids problem of varying sized memory chunks
Divide physical memory into fixed-sized blocks called frames
Size is power of 2, between 512 bytes and 16 Mbytes
Divide logical memory into blocks of same size called pages
Keep track of all free frames
To run a program of size N pages, need to find N free frames and
load program
Set up a page table to translate logical to physical addresses
Backing store likewise split into pages
Still have Internal fragmentation
8.26
Address Translation Scheme
Address generated by CPU is divided into:
Page number (p) – used as an index into a page table which
contains base address of each page in physical memory
Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit
page number page offset
p d
m -n n
For given logical address space 2m and page size 2n
8.27
Paging Hardware
8.28
Paging Model of Logical and Physical Memory
8.29
Paging Example
n=2 and m=4 32-byte memory and 4-byte pages
8.30
Paging (Cont.)
Calculating internal fragmentation
Page size = 2,048 bytes
Process size = 72,766 bytes
35 pages + 1,086 bytes
Internal fragmentation of 2,048 - 1,086 = 962 bytes
Worst case fragmentation = 1 frame – 1 byte
On average fragmentation = 1 / 2 frame size
So small frame sizes desirable?
But each page table entry takes memory to track
Page sizes growing over time
Solaris supports two page sizes – 8 KB and 4 MB
Process view and physical memory now very different
By implementation process can only access its own memory
8.31
Free Frames
Before allocation After allocation
8.32
Implementation of Page Table
Page table is kept in main memory
Page-table base register (PTBR) points to the page table
Page-table length register (PTLR) indicates size of the page
table
In this scheme every data/instruction access requires two
memory accesses
One for the page table and one for the data / instruction
The two memory access problem can be solved by the use of
a special fast-lookup hardware cache called associative
memory or translation look-aside buffers (TLBs)
8.33
Implementation of Page Table (Cont.)
Some TLBs store address-space identifiers (ASIDs) in each
TLB entry – uniquely identifies each process to provide
address-space protection for that process
Otherwise need to flush at every context switch
TLBs typically small (64 to 1,024 entries)
On a TLB miss, value is loaded into the TLB for faster access
next time
Replacement policies must be considered
Some entries can be wired down for permanent fast
access
8.34
Associative Memory
Associative memory – parallel search
Page # Frame #
Address translation (p, d)
If p is in associative register, get frame # out
Otherwise get frame # from page table in memory
8.35
Paging Hardware With TLB
8.36
8.37
Effective Access Time
Associative Lookup = time unit
Can be < 10% of memory access time
Hit ratio =
Hit ratio – percentage of times that a page number is found in the
associative registers; ratio related to number of associative
registers
Consider = 80%, = 20ns for TLB search, 100ns for memory access
Consider = 80%, = 20ns for TLB search, 100ns for memory access
EAT = 0.80 x 100 + 0.20 x 200 = 120ns
Consider more realistic hit ratio -> = 99%, = 20ns for TLB search,
100ns for memory access
EAT = 0.99 x 100 + 0.01 x 200 = 101ns
8.38
Memory Protection
Memory protection implemented by associating protection bit
with each frame to indicate if read-only or read-write access is
allowed
Can also add more bits to indicate page execute-only, and
so on
Valid-invalid bit attached to each entry in the page table:
“valid” indicates that the associated page is in the
process’ logical address space, and is thus a legal page
“invalid” indicates that the page is not in the process’
logical address space
Or use page-table length register (PTLR)
Any violations result in a trap to the kernel
8.39
Valid (v) or Invalid (i) Bit In A Page Table
8.40
Virtual Memory
Background (Cont.)
Virtual memory – separation of user logical memory from
physical memory
Only part of the program needs to be in memory for execution
Logical address space can therefore be much larger than physical
address space
Allows address spaces to be shared by several processes
Allows for more efficient process creation
More programs running concurrently
Less I/O needed to load or swap processes
Background (Cont.)
Virtual address space – logical view of how process is
stored in memory
Usually start at address 0, contiguous addresses until end of
space
Meanwhile, physical memory organized in page frames
MMU must map logical to physical
Virtual memory can be implemented via:
Demand paging
Demand segmentation
Virtual Memory That is Larger Than Physical Memory
Demand Paging
Could bring entire process into memory
at load time
Or bring a page into memory only when
it is needed
Less I/O needed, no unnecessary
I/O
Less memory needed
Faster response
More users
Similar to paging system with swapping
(diagram on right)
Page is needed reference to it
invalid reference abort
not-in-memory bring to memory
Lazy swapper – never swaps a page
into memory unless page will be needed
Swapper that deals with pages is a
pager
Basic Concepts
With swapping, pager guesses which pages will be used before
swapping out again
Instead, pager brings in only those pages into memory
How to determine that set of pages?
Need new MMU functionality to implement demand paging
If pages needed are already memory resident
No difference from non demand-paging
If page needed and not memory resident
Need to detect and load the page into memory from storage
Without changing program behavior
Without programmer needing to change code
Valid-Invalid Bit
With each page table entry a valid–invalid bit is associated
(v in-memory – memory resident, i not-in-memory)
Initially valid–invalid bit is set to i on all entries
Example of a page table snapshot:
During MMU address translation, if valid–invalid bit in page table
entry is i page fault
Page Table When Some Pages Are Not in Main Memory
Page Fault
If there is a reference to a page, first reference to that page will
trap to operating system:
page fault
1. Operating system looks at another table to decide:
Invalid reference abort
Just not in memory
2. Find free frame
3. Swap page into frame via scheduled disk operation
4. Reset tables to indicate page now in memory
Set validation bit = v
5. Restart the instruction that caused the page fault
Steps in Handling a Page Fault
Aspects of Demand Paging
Extreme case – start process with no pages in memory
OS sets instruction pointer to first instruction of process, non-
memory-resident -> page fault
And for every other process pages on first access
Pure demand paging
Actually, a given instruction could access multiple pages -> multiple
page faults
Consider fetch and decode of instruction which adds 2 numbers
from memory and stores result back to memory
Pain decreased because of locality of reference
Hardware support needed for demand paging
Page table with valid / invalid bit
Secondary memory (swap device with swap space)
Instruction restart
Instruction Restart
Consider an instruction that could access several different locations
block move
auto increment/decrement location
Restart the whole operation?
What if source and destination overlap?
Performance of Demand Paging
Stages in Demand Paging (worse case)
1. Trap to the operating system
2. Save the user registers and process state
3. Determine that the interrupt was a page fault
4. Check that the page reference was legal and determine the location of the page on the disk
5. Issue a read from the disk to a free frame:
1. Wait in a queue for this device until the read request is serviced
2. Wait for the device seek and/or latency time
3. Begin the transfer of the page to a free frame
6. While waiting, allocate the CPU to some other user
7. Receive an interrupt from the disk I/O subsystem (I/O completed)
8. Save the registers and process state for the other user
9. Determine that the interrupt was from the disk
10. Correct the page table and other tables to show page is now in memory
11. Wait for the CPU to be allocated to this process again
12. Restore the user registers, process state, and new page table, and then resume the
interrupted instruction
Performance of Demand Paging (Cont.)
Three major activities
Service the interrupt – careful coding means just several hundred
instructions needed
Read the page – lots of time
Restart the process – again just a small amount of time
Page Fault Rate 0 p 1
if p = 0 no page faults
if p = 1, every reference is a fault
Effective Access Time (EAT)
EAT = (1 – p) x memory access
+ p (page fault overhead
+ swap page out
+ swap page in )
Demand Paging Example
Memory access time = 200 nanoseconds
Average page-fault service time = 8 milliseconds
EAT = (1 – p) x 200 + p (8 milliseconds)
= (1 – p x 200 + p x 8,000,000
= 200 + p x 7,999,800
If one access out of 1,000 causes a page fault, then
EAT = 8.2 microseconds.
This is a slowdown by a factor of 40!!
If want performance degradation < 10 percent
220 > 200 + 7,999,800 x p
20 > 7,999,800 x p
p < .0000025
< one page fault in every 400,000 memory accesses
What Happens if There is no Free Frame?
Used up by process pages
Also in demand from the kernel, I/O buffers, etc
How much to allocate to each?
Page replacement – find some page in memory, but not really in
use, page it out
Algorithm – terminate? swap out? replace the page?
Performance – want an algorithm which will result in minimum
number of page faults
Same page may be brought into memory several times
Page Replacement
Prevent over-allocation of memory by modifying page-
fault service routine to include page replacement
Use modify (dirty) bit to reduce overhead of page
transfers – only modified pages are written to disk
Page replacement completes separation between logical
memory and physical memory – large virtual memory can
be provided on a smaller physical memory
Need For Page Replacement
Basic Page Replacement
1. Find the location of the desired page on disk
2. Find a free frame:
- If there is a free frame, use it
- If there is no free frame, use a page replacement algorithm to
select a victim frame
- Write victim frame to disk if dirty
3. Bring the desired page into the (newly) free frame; update the page
and frame tables
4. Continue the process by restarting the instruction that caused the trap
Note now potentially 2 page transfers for page fault – increasing EAT
Page Replacement
Page and Frame Replacement Algorithms
Frame-allocation algorithm determines
How many frames to give each process
Which frames to replace
Page-replacement algorithm
Want lowest page-fault rate on both first access and re-access
Evaluate algorithm by running it on a particular string of memory
references (reference string) and computing the number of page
faults on that string
String is just page numbers, not full addresses
Repeated access to the same page does not cause a page fault
Results depend on number of frames available
In all our examples, the reference string of referenced page
numbers is
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
Graph of Page Faults Versus The Number of Frames
First-In-First-Out (FIFO) Algorithm
Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
3 frames (3 pages can be in memory at a time per process)
15 page faults
Can vary by reference string: consider 1,2,3,4,1,2,5,1,2,3,4,5
Adding more frames can cause more page faults!
Belady’s Anomaly
How to track ages of pages?
Just use a FIFO queue
FIFO Illustrating Belady’s Anomaly
Optimal Algorithm
Replace page that will not be used for longest period of time
9 is optimal for the example
How do you know this?
Can’t read the future
Used for measuring how well your algorithm performs
Least Recently Used (LRU) Algorithm
Use past knowledge rather than future
Replace page that has not been used in the most amount of time
Associate time of last use with each page
12 faults – better than FIFO but worse than OPT
Generally good algorithm and frequently used
But how to implement?
LRU Algorithm (Cont.)
Counter implementation
Every page entry has a counter; every time page is referenced
through this entry, copy the clock into the counter
When a page needs to be changed, look at the counters to find
smallest value
Search through table needed
Stack implementation
Keep a stack of page numbers in a double link form:
Page referenced:
move it to the top
requires 6 pointers to be changed
But each update more expensive
No search for replacement
LRU and OPT are cases of stack algorithms that don’t have
Belady’s Anomaly
Use Of A Stack to Record Most Recent Page References
LRU Approximation Algorithms
LRU needs special hardware and still slow
Reference bit
With each page associate a bit, initially = 0
When page is referenced bit set to 1
Replace any with reference bit = 0 (if one exists)
We do not know the order, however
Second-chance algorithm
Generally FIFO, plus hardware-provided reference bit
Clock replacement
If page to be replaced has
Reference bit = 0 -> replace it
reference bit = 1 then:
– set reference bit 0, leave page in memory
– replace next page, subject to same rules
Second-Chance (clock) Page-Replacement Algorithm
Counting Algorithms
Keep a counter of the number of references that have been made
to each page
Not common
Lease Frequently Used (LFU) Algorithm: replaces page with
smallest count
Most Frequently Used (MFU) Algorithm: based on the argument
that the page with the smallest count was probably just brought in
and has yet to be used
Applications and Page Replacement
All of these algorithms have OS guessing about future page
access
Some applications have better knowledge – i.e. databases
Memory intensive applications can cause double buffering
OS keeps copy of page in memory as I/O buffer
Application keeps page in memory for its own work
Operating system can given direct access to the disk, getting out
of the way of the applications
Raw disk mode
Bypasses buffering, locking, etc
Allocation of Frames
Each process needs minimum number of frames
Example: IBM 370 – 6 pages to handle SS MOVE instruction:
instruction is 6 bytes, might span 2 pages
2 pages to handle from
2 pages to handle to
Maximum of course is total frames in the system
Two major allocation schemes
fixed allocation
priority allocation
Many variations
Fixed Allocation
Equal allocation – For example, if there are 100 frames (after
allocating frames for the OS) and 5 processes, give each process
20 frames
Keep some as free frame buffer pool
Proportional allocation – Allocate according to the size of process
Dynamic as degree of multiprogramming, process sizes
change
m = 64
si size of process pi s1 = 10
S si s2 = 127
m total number of frames a1 =
10
´ 62 » 4
137
s
ai allocation for pi i m 127
S a2 = ´ 62 » 57
137
Priority Allocation
Use a proportional allocation scheme using priorities rather
than size
If process Pi generates a page fault,
select for replacement one of its frames
select for replacement a frame from a process with lower
priority number
Global vs. Local Allocation
Global replacement – process selects a replacement frame
from the set of all frames; one process can take a frame from
another
But then process execution time can vary greatly
But greater throughput so more common
Local replacement – each process selects from only its own
set of allocated frames
More consistent per-process performance
But possibly underutilized memory
Thrashing
If a process does not have “enough” pages, the page-fault rate is
very high
Page fault to get page
Replace existing frame
But quickly need replaced frame back
This leads to:
Low CPU utilization
Operating system thinking that it needs to increase the
degree of multiprogramming
Another process added to the system
Thrashing a process is busy swapping pages in and out
Thrashing (Cont.)
Demand Paging and Thrashing
Why does demand paging work?
Locality model
Process migrates from one locality to another
Localities may overlap
Why does thrashing occur?
size of locality > total memory size
Limit effects by using local or priority page replacement
Locality In A Memory-Reference Pattern
Working-Set Model
working-set window a fixed number of page references
Example: 10,000 instructions
WSSi (working set of Process Pi) =
total number of pages referenced in the most recent (varies in time)
if too small will not encompass entire locality
if too large will encompass several localities
if = will encompass entire program
D = WSSi total demand frames
Approximation of locality
if D > m Thrashing
Policy if D > m, then suspend or swap out one of the processes
Keeping Track of the Working Set
Approximate with interval timer + a reference bit
Example: = 10,000
Timer interrupts after every 5000 time units
Keep in memory 2 bits for each page
Whenever a timer interrupts copy and sets the values of all
reference bits to 0
If one of the bits in memory = 1 page in working set
Why is this not completely accurate?
Improvement = 10 bits and interrupt every 1000 time units
Page-Fault Frequency
More direct approach than WSS
Establish “acceptable” page-fault frequency (PFF) rate
and use local replacement policy
If actual rate too low, process loses frame
If actual rate too high, process gains frame
Working Sets and Page Fault Rates
Direct relationship between working set of a process and its
page-fault rate
Working set changes over time
Peaks and valleys over time
Disk Scheduling
Overview of Mass Storage Structure
Magnetic disks provide bulk of secondary storage of modern computers
Drives rotate at 60 to 250 times per second
Transfer rate is rate at which data flow between drive and computer
Positioning time (random-access time) is time to move disk arm to
desired cylinder (seek time) and time for desired sector to rotate
under the disk head (rotational latency)
Head crash results from disk head making contact with the disk
surface -- That’s bad
Disks can be removable
Drive attached to computer via I/O bus
Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel,
SCSI, SAS, Firewire
Host controller in computer uses bus to talk to disk controller built
into drive or storage array
Moving-head Disk Mechanism
Disk Scheduling
The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a fast
access time and disk bandwidth
Minimize seek time
Seek time seek distance
Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for service
and the completion of the last transfer
Disk Scheduling (Cont.)
There are many sources of disk I/O request
OS
System processes
Users processes
I/O request includes input or output mode, disk address, memory
address, number of sectors to transfer
OS maintains queue of requests, per disk or device
Idle disk can immediately work on I/O request, busy disk means
work must queue
Optimization algorithms only make sense when a queue exists
Disk Scheduling (Cont.)
Note that drive controllers have small buffers and can manage a
queue of I/O requests (of varying “depth”)
Several algorithms exist to schedule the servicing of disk I/O
requests
The analysis is true for one or many platters
We illustrate scheduling algorithms with a request queue (0-199)
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
FCFS
Illustration shows total head movement of 640 cylinders
SSTF
Shortest Seek Time First selects the request with the minimum
seek time from the current head position
SSTF scheduling is a form of SJF scheduling; may cause
starvation of some requests
Illustration shows total head movement of 236 cylinders
SCAN
The disk arm starts at one end of the disk, and moves toward the
other end, servicing requests until it gets to the other end of the
disk, where the head movement is reversed and servicing
continues.
SCAN algorithm Sometimes called the elevator algorithm
Illustration shows total head movement of 236 cylinders
But note that if requests are uniformly dense, largest density at
other end of disk and those wait the longest
SCAN (Cont.)
C-SCAN
Provides a more uniform wait time than SCAN
The head moves from one end of the disk to the other, servicing
requests as it goes
When it reaches the other end, however, it immediately
returns to the beginning of the disk, without servicing any
requests on the return trip
Treats the cylinders as a circular list that wraps around from the
last cylinder to the first one
Total number of cylinders?
C-SCAN (Cont.)
C-LOOK
LOOK a version of SCAN, C-LOOK a version of C-SCAN
Arm only goes as far as the last request in each direction,
then reverses direction immediately, without first going all
the way to the end of the disk
Total number of cylinders?
C-LOOK (Cont.)