Shared Memory Programming with Pthreads and OpenMP

1
Shared Memory
Programming with
Pthreads & OpenMP
Dilum Bandara
Dilum.Bandara@uom.lk
Slides extended from
An Introduction to Parallel Programming by
Peter Pacheco

2
Shared Memory System
Copyright © 2010, Elsevier Inc. All rights Reserved

3
POSIX® Threads
 Also known as Pthreads
 Standard for Unix-like operating systems
 Library that can be linked with C programs
 Specifies an API for multi-threaded
programming

4
Hello World!
Declares various Pthreads
functions, constants, types, etc.

5
Hello World! (Cont.)

6
Hello World! (Cont.)

7
Compiling a Pthread program
gcc −g −Wall −o pth_hello pth_hello.c −lpthread
Link Pthreads library

8
Running a Pthreads program
. /pth_hello <number of threads>
. /pth_hello 1
Hello from the main thread
Hello from thread 0 of 1
. /pth_hello 4
Hello from the main thread

9
Running the Threads
Main thread forks & joins 2 threads

10
Global Variables
 Can introduce subtle & confusing bugs!
 Use them only when they are essential
 Shared variables

11
Starting Threads
pthread.h
pthread_t
int pthread_create (
pthread_t* thread_p, /* out */
const pthread_attr_t* attr_p, /* in */
void* (*start_routine) (void), /* in */
void* arg_p); /* in */
One object for
each thread
We ignore return value
from pthread_create

12
Function Started by pthread_create
 Function start by pthread_create should have
following prototype
void* thread_function ( void* args_p ) ;
 Void* can be cast to any pointer type in C
 So args_p can point to a list containing one or more
values needed by thread_function
 Similarly, return value of thread_function can
point to a list of one or more values

13
Stopping Threads
 Single call to pthread_join will wait for
thread associated with pthread_t object to
complete
 Suspend execution of calling thread until
target thread terminates, unless it has already
terminated
 Call pthread_join once for each thread
int pthread_join(
pthread_t* thread /* in */ ,
void** ret_val_p /* out */ ) ;

14
Matrix-Vector Multiplication in
Pthreads

15
Serial Pseudo-code

16
Using 3 Pthreads
 Assign each row to a separate thread
 Suppose 6x6 matrix & 3 threads
Thread 0
General case

17
Pthreads Matrix-Vector Multiplication

18
Estimating π

19
Thread Function for Computing π

20
Using a dual core processor
As we increase n, estimate with 1
thread gets better & better
2 thread case produce different
answers in different runs
Why?

21
Pthreads Global Sum with Busy-Waiting
Shared variable

22
Mutexes
 Make sure only 1 thread in critical region
 Pthreads standard includes a special type
for mutexes: pthread_mutex_t

23
Mutexes
 Lock
 To gain access to a critical section
 Unlock
 When a thread is finished executing code in a
critical section
 Termination
 When a program finishes using a mutex

24
Global Sum Function Using a Mutex

25
Global Sum Function Using a Mutex (Cont.)

26
Busy-Waiting vs. Mutex
Run-times (in seconds) of π programs using n = 108
terms on a system with 2x4-core processors

27
Semaphores
Semaphores are not part of Pthreads;
you need to add this

28
Read-Write Locks
 While controlling access to a large, shared
data structure
 Example
 Suppose shared data structure is a sorted
linked list of ints, & operations of interest are
Member, Insert, & Delete

29
Linked Lists

30
Linked List Membership

31
Inserting New Node Into a List

32
Inserting New Node Into a List (Cont.)

33
Deleting a Node From a Linked List

34
Deleting a Node From a Linked List (Cont.)

35
Multi-Threaded Linked List
 To share access to the list, we can define
head_p to be a global variable
 This will simplify function headers for Member,
Insert, & Delete
 Because we won’t need to pass in either
head_p or a pointer to head_p: we’ll only need
to pass in the value of interest

36
Simultaneous Access by 2 Threads

37
Solution #1
 Simply lock the list any time that a thread
attempts to access it
 Call to each of the 3 functions can be
protected by a mutex
In place of calling Member(value).

38
Issues
 Serializing access to the list
 If vast majority of our operations are calls
to Member
 We fail to exploit opportunity for parallelism
 If most of our operations are calls to Insert
& Delete
 This may be the best solution

39
Solution #2
 Instead of locking entire list, we could try to
lock individual nodes
 A “finer-grained” approach

40
Issues
 Much more complex than original Member
function
 Much slower
 Because each time a node is accessed, a
mutex must be locked & unlocked
 Addition of a mutex field to each node
substantially increase memory needed for the
list

41
Pthreads Read-Write Locks
 Neither multi-threaded linked lists exploits
potential for simultaneous access to any node by
threads that are executing Member
 1st solution only allows 1 thread to access the entire
list at any instant
 2nd only allows 1 thread to access any given node at
any instant
 Read-write lock is somewhat like a mutex except
that it provides 2 lock functions
 1st locks the read-write lock for reading
 2nd locks it for writing

42
Pthreads Read-Write Locks (Cont.)
 Multiple threads can simultaneously obtain lock
by calling read-lock function
 While only 1 thread can obtain lock by calling
write-lock function
 Thus
 If any thread owns lock for reading, any thread that
wants to obtain a lock for writing will be blocked
 If any thread owns lock for writing, any threads that
want to obtain lock for reading or writing will be
blocked

43
Protecting Our Linked List Functions

44
Linked List Performance
100,000 ops/thread
99.9% Member
0.05% Insert
0.05% Delete
100,000 ops/thread
80% Member
10% Insert
10% Delete

45
OpenMP

46
OpenMP
 High-level API for shared-memory parallel
programming
 MP = multiprocessing
 Use Pragmas
 Special preprocessor instructions
 #pragma
 Typically added to support behaviors that aren’t
part of the basic C specification
 Compilers that don’t support pragmas ignore
them

47

48
Compiling & Running
gcc −g −Wall −fopenmp −o omp_hello omp_hello.c
. / omp_hello 4
compiling
running with 4 threads
Hello from thread 3 of 4 Hello from thread 1 of 4
possible
outcomes

49
OpenMp pragmas
 # pragma omp parallel
 Most basic parallel directive
 Original thread is called master
 Additional threads are called slaves
 Original thread & new threads called a team

50
Clause
 Text that modifies a directive
 num_threads clause can be added to a
parallel directive
 Allows programmer to specify no of
threads that should execute following block
# pragma omp parallel num_threads ( thread_count )

51
Be Aware…
 There may be system-defined limitations on
number of threads that a program can start
 OpenMP standard doesn’t guarantee that this
will actually start thread_count threads
 Most current systems can start hundreds or even
1,000s of threads
 Unless we’re trying to start a lot of threads, we
will almost always get desired no of threads

52
Mutual Exclusion
# pragma omp critical
{
global_result += my_result ;
}
only 1 thread can execute following
structured block at a time

53
Trapezoidal Rule
Serial algorithm

54
Assignment of Trapezoids to Threads

55

56

Shared Memory Programming with Pthreads and OpenMP

More Related Content

Similar to Shared Memory Programming with Pthreads and OpenMP (20)

More from Dilum Bandara (20)

Recently uploaded (20)

Shared Memory Programming with Pthreads and OpenMP

Editor's Notes