SlideShare a Scribd company logo
MPI Rohit Banga Prakher Anand K Swagat Manoj Gupta Advanced Computer Architecture Spring, 2010
ORGANIZATION Basics of MPI Point to Point Communication Collective Communication Demo
GOALS Explain basics of  MPI Start coding today! Keep It Short and Simple
MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language or compiler specification Not a specific implementation, several implementations (like pthread) standard for distributed memory, message passing, parallel computing Distributed Memory – Shared Nothing approach! Some interconnection technology – TCP, INFINIBAND (on our cluster)
GOALS OF MPI SPECIFICATION Provide source code portability Allow efficient implementations Flexible to port different algorithms on different hardware environments Support for heterogeneous architectures – processors not identical
REASONS FOR USING MPI Standardization – virtually all HPC platforms Portability – same code runs on another platform Performance – vendor implementations should exploit native hardware features Functionality – 115 routines Availability – a variety of implementations available
BASIC MODEL Communicators and Groups Group ordered set of processes each process is associated with a unique integer rank rank from 0 to (N-1) for N processes an object in system memory accessed by handle MPI_GROUP_EMPTY MPI_GROUP_NULL
BASIC MODEL (CONTD.) Communicator Group of processes that may communicate with each other MPI messages must specify a communicator An object in memory Handle to access the object There is a default communicator (automatically defined): MPI_COMM_WORLD identify the group of all processes
COMMUNICATORS Intra-Communicator – All processes from the same group Inter-Communicator – Processes picked up from several groups
COMMUNICATOR AND GROUPS For a programmer, group and communicator are one Allow you to organize tasks, based upon function, into task groups Enable  Collective Communications  (later)  operations across a subset of related tasks safe communications Many Communicators at the same time Dynamic – can be created and destroyed at run time Process may be in more than one group/communicator – unique rank in every group/communicator implementing user defined virtual topologies
VIRTUAL TOPOLOGIES coord (0,0): rank 0 coord (0,1): rank 1 coord (1,0): rank 2 coord (1,1): rank 3 Attach graph topology information to an existing communicator
SEMANTICS Header file #include <mpi.h> (C) include mpif.h (fortran) Java, Python etc. Format:  rc = MPI_Xxxxx(parameter, ... )  Example:  rc = MPI_Bsend(&buf,count,type,dest,tag,comm)  Error code:  Returned as &quot;rc&quot;. MPI_SUCCESS if successful
MPI PROGRAM STRUCTURE
MPI FUNCTIONS – MINIMAL SUBSET MPI_Init – Initialize MPI MPI_Comm_size – size of group associated with the communicator MPI_Comm_rank – identify the process MPI_Send MPI_Recv MPI_Finalize We will discuss simple ones first
CLASSIFICATION OF MPI ROUTINES Environment Management MPI_Init, MPI_Finalize Point-to-Point Communication MPI_Send, MPI_Recv Collective Communication MPI_Reduce, MPI_Bcast Information on the Processes MPI_Comm_rank, MPI_Get_processor_name
MPI_INIT All MPI programs call this before using other MPI functions int MPI_Init(int *pargc, char ***pargv); Must be called in every MPI program Must be called only  once  and before any other MPI functions are called Pass command line arguments to all processes int main(int argc, char **argv) { MPI_Init(&argc, &argv); … }
MPI_COMM_SIZE Number of processes in the group associated with a communicator int MPI_Comm_size(MPI_Comm comm, int *psize); Find out number of processes being used by your application int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); … }
MPI_COMM_RANK Rank of the calling process within the communicator Unique Rank between 0 and (p-1) Can be called task ID int MPI_Comm_rank(MPI_Comm comm, int *rank); Unique rank for a process in each communicator it belongs to Used to identify work for the processor int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); … }
MPI_FINALIZE Terminates the MPI execution environment Last MPI routine to be called in any MPI program int MPI_Finalize(void); int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(“no. of processors: %d\n rank: %d”, p, rank); MPI_Finalize(); }
 
HOW TO COMPILE THIS Open MPI implementation on our Cluster mpicc -o test_1 test_1.c Like gcc only mpicc not a special compiler $mpicc: gcc: no input files Mpi implemented just as any other library Just a wrapper around gcc that includes required command line parameters
HOW TO RUN THIS mpirun -np X test_1 Will run X copies of program in your current run time environment np option specifies number of copies of program
MPIRUN Only rank 0 process can receive standard input. mpirun redirects standard input of all others to /dev/null Open MPI redirects standard input of mpirun to standard input of rank 0 process Node which invoked mpirun need not be the same as the node for the MPI_COMM_WORLD rank 0 process mpirun directs standard output and error of remote nodes to the node that invoked mpirun SIGTERM, SIGKILL kill all processes in the communicator SIGUSR1, SIGUSR2 propagated to all processes All other signals ignored
A NOTE ON IMPLEMENTATION I want to implement my own version of MPI Evidence MPI_Init MPI Thread MPI_Init MPI Thread
SOME MORE FUNCTIONS int MPI_Init (&flag) Check if MPI_Initialized has been called Why? int MPI_Wtime() Returns elapsed wall clock time in seconds (double precision) on the calling processor int MPI_Wtick() Returns the resolution in seconds (double precision) of MPI_Wtime() Message Passing Functionality That is what MPI is meant for!
POINT TO POINT COMMUNICATION
POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes One sending and one receiving Types Synchronous send Blocking send / blocking receive Non-blocking send / non-blocking receive Buffered send Combined send/receive &quot;Ready&quot; send
POINT-TO-POINT COMMUNICATION Processes can be collected into  groups Each message is sent in a context, and must be received in the same  context A group and context together form a  Communicator A process is identified by its  rank  in the group associated with a communicator Messages are sent with an accompanying user defined integer tag, to assist the receiving process in identifying the message MPI_ANY_TAG
POINT-TO-POINT COMMUNICATION How is “data” described? How are processes identified? How  does the receiver recognize messages? What does it mean for these operations to complete?
BLOCKING SEND/RECEIVE int MPI_Send( void * buf ,  int  count , MPI_Datatype  datatype , int  dest , int  tag , MPI_Comm  communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ?
BLOCKING SEND/RECEIVE int MPI_Send(void * buf ,  int  count , MPI_Datatype  datatype , int  dest , int  tag , MPI_Comm  communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ?
BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int  count ,  MPI_Datatype   datatype , int  dest , int  tag , MPI_Comm  communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ?
 
BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int  count , MPI_Datatype  datatype ,  int  dest , int  tag , MPI_Comm  communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ? dest: the receiver tag: the label of the message communicator: set of processors involved (MPI_COMM_WORLD)
BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int  count , MPI_Datatype  datatype , int  dest ,  int  tag , MPI_Comm  communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ? dest: the receiver tag: the label of the message communicator: set of processors involved (MPI_COMM_WORLD)
BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int  count , MPI_Datatype  datatype , int  dest , int  tag ,  MPI_Comm  communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ? dest: the receiver tag: the label of the message communicator: set of processors involved (MPI_COMM_WORLD)
BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
A WORD ABOUT SPECIFICATION The user does not know if MPI implementation: copies BUFFER in an internal buffer, start communication, and returns control before all the data are transferred. (BUFFERING) create links between processors, send data and return control when all the data are sent (but NOT received) uses a combination of the above methods
BLOCKING SEND/RECEIVE (CONTD.) &quot;return&quot; after it is  safe  to modify the application buffer Safe modifications will not affect the data intended for the receive task does not imply that the data was actually received Blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive A blocking receive only &quot;returns&quot; after the data has arrived and is ready for use by the program
NON-BLOCKING SEND/RECEIVE return almost immediately simply &quot;request&quot; the MPI library to perform the operation when it is able Cannot predict when that will happen request a send/receive and start doing other work! unsafe to modify the application buffer (your variable space) until you know that the non-blocking operation has been completed MPI_Isend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irecv (&buf,count,datatype,source,tag,comm,&request)
NON-BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
To check if the send/receive operations have completed int MPI_Irecv (void *buf, int count,  MPI_Datatype type, int dest, int tag,  MPI_Comm comm, MPI_Request *req); int MPI_Wait(MPI_Request *req, MPI_Status *status); A call to this subroutine cause the code to wait until the communication pointed by req is complete input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV). input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV). NON-BLOCKING SEND/RECEIVE (CONTD.)
int MPI_Test(MPI_Request *req, int *flag, MPI_Status *status); A call to this subroutine sets flag to true if the communication pointed by req is complete, sets flag to false otherwise. NON-BLOCKING SEND/RECEIVE (CONTD.)
STANDARD MODE Returns when Sender is free to access and overwrite the send buffer. Might be copied directly into the matching receive buffer, or might be copied into a temporary system buffer. Message buffering decouples the send and receive operations. Message buffering can be expensive. It is up to MPI to decide whether outgoing messages will be buffered The standard mode send is  non-local .
SYNCHRONOUS MODE Send can be started whether or not a matching receive was posted. Send completes successfully only if a corresponding receive was already posted and has already started to receive the message sent. Blocking send & Blocking receive in synchronous mode. Simulate a synchronous communication. Synchronous Send is  non-local .
BUFFERED MODE Send operation can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted. Operation is  local. MPI must buffer the outgoing message. Error will occur if there is insufficient buffer space. The amount of available buffer space is controlled by the user.
BUFFER MANAGEMENT int MPI_Buffer_attach( void* buffer, int size)  Provides to MPI a buffer in the user's memory to be used for buffering outgoing messages. int MPI_Buffer_detach( void* buffer_addr, int* size)  Detach the buffer currently associated with MPI. MPI_Buffer_attach( malloc(BUFFSIZE), BUFFSIZE);  /* a buffer of BUFFSIZE bytes can now be used by MPI_Bsend */  MPI_Buffer_detach( &buff, &size);  /* Buffer size reduced to zero */  MPI_Buffer_attach( buff, size);  /* Buffer of BUFFSIZE bytes available again */
READY MODE A send may be started  only  if the matching receive is already posted. The user must be sure of this. If the receive is not already posted, the operation is erroneous and its outcome is undefined. Completion of the send operation does not depend on the status of a matching receive. Merely indicates that the send buffer can be reused. Ready-send could be replaced by a standard-send with no effect on the behavior of the program other than performance.
ORDER AND FAIRNESS Order:   MPI  Messages are  non-overtaking .   When a receive matches 2 messages. When a sent message matches 2 receive statements. Message-passing code is deterministic, unless the processes are multi-threaded or the wild-card  MPI_ANY_SOURCE  is used in a receive statement. Fairness:   MPI does not guarantee fairness Example: task 0 sends a message to task 2. However, task 1 sends a competing message that matches task 2's receive. Only one of the sends will complete.
EXAMPLE OF NON-OVERTAKING MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr)  IF (rank.EQ.0) THEN  CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr)  CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ierr)  ELSE ! rank.EQ.1  CALL MPI_RECV(buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm,  status, ierr)  CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr)  END IF
EXAMPLE OF INTERTWINGLED MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr)  IF (rank.EQ.0) THEN  CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag1, comm, ierr)  CALL MPI_SSEND(buf2, count, MPI_REAL, 1, tag2, comm, ierr)  ELSE ! rank.EQ.1  CALL MPI_RECV(buf1, count, MPI_REAL, 0, tag2, comm,  status, ierr)  CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag1, comm, status, ierr)  END IF
DEADLOCK EXAMPLE CALL MPI_COMM_RANK(comm, rank, ierr)  IF (rank.EQ.0) THEN  CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)  ELSE ! rank.EQ.1  CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)  END IF
EXAMPLE OF BUFFERING CALL MPI_COMM_RANK(comm, rank, ierr)  IF (rank.EQ.0) THEN  CALL MPI_SEND(buf1, count, MPI_REAL, 1, tag, comm, ierr)  CALL MPI_RECV (recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE ! rank.EQ.1  CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr)  END IF
COLLECTIVE COMMUNICATIONS
COLLECTIVE ROUTINES Collective routines provide a  higher-level way to organize a parallel program. Each process executes the same communication operations. Communications involving group of processes in a communicator. Groups and communicators can be constructed “by hand” or using topology routines. Tags are not used; different communicators deliver similar functionality. No non-blocking collective operations. Three classes of operations: synchronization, data movement, collective computation.
COLLECTIVE ROUTINES (CONTD.) int MPI_Barrier(MPI_Comm comm) Stop processes until all processes within a communicator reach the barrier Occasionally useful in measuring performance
COLLECTIVE ROUTINES (CONTD.) int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm) Broadcast One-to-all communication: same data sent from root process to all others in the communicator
COLLECTIVE ROUTINES (CONTD.) Reduction The reduction operation allow to: Collect data from each process Reduce the data to a single value Store the result on the root processes Store the result on all processes Reduction function works with arrays other operation: product, min, max, and, …. Internally is usually implemented with a binary tree
COLLECTIVE ROUTINES (CONTD.) int MPI_Reduce/MPI_Allreduce(void * snd_buf, void * rcv_buf, int count,  MPI_Datatype type, MPI_Op op, int root, MPI_Comm comm) snd_buf: input array rcv_buf output array count: number of element of snd_buf and rcv_buf type: MPI type of snd_buf and rcv_buf op: parallel operation to be performed root: MPI id of the process storing the result comm: communicator of processes involved in the operation
MPI OPERATIONS MPI_OP operator MPI_MIN Minimum MPI_SUM Sum MPI_PROD product MPI_MAX maximum MPI_LAND Logical and MPI_BAND Bitwise and MPI_LOR Logical or MPI_BOR Bitwise or MPI_LXOR Logical xor MPI_BXOR Bit-wise xor MPI_MAXLOC Max value and location MPI_MINLOC Min value and location
COLLECTIVE ROUTINES (CONTD.)
Learn by Examples
Parallel Trapezoidal Rule Output:  Estimate of the integral from a to b of f(x) using the trapezoidal rule and n trapezoids. Algorithm: 1.  Each process calculates &quot;its&quot; interval of integration. 2.  Each process estimates the integral of f(x) over its interval using the trapezoidal rule. 3a. Each process != 0 sends its integral to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Notes:  1.  f(x), a, b, and n are all hardwired. 2.  The number of processes (p) should evenly divide the number of trapezoids (n = 1024)
Parallelizing the Trapezoidal Rule #include <stdio.h> #include &quot;mpi.h&quot; main(int argc, char** argv) { int  my_rank;  /* My process rank  */ int  p;  /* The number of processes  */ double  a = 0.0;  /* Left endpoint  */ double  b = 1.0;  /* Right endpoint  */ int  n = 1024;  /* Number of trapezoids  */ double  h;  /* Trapezoid base length  */ double  local_a;  /* Left endpoint my process  */ double  local_b;  /* Right endpoint my process */ int  local_n;  /* Number of trapezoids for  */ /* my calculation  */ double  integral;  /* Integral over my interval */ double  total;  /* Total integral  */ int  source;  /* Process sending integral  */ int  dest = 0;  /* All messages go to 0  */ int  tag = 0; MPI_Status  status;
Continued… double Trap(double local_a, double local_b, int local_n,double h);  /* Calculate local integral  */ MPI_Init (&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); double elapsed_time = -MPI_Wtime(); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); h = (b-a)/n;  /* h is the same for all processes */ local_n = n/p;  /* So is the number of trapezoids */ /* Length of each process' interval of integration = local_n*h.  So my interval starts at: */ local_a = a + my_rank*local_n*h; local_b = local_a + local_n*h; integral = Trap(local_a, local_b, local_n, h);
Continued… /* Add up the integrals calculated by each process */ if (my_rank == 0) { total = integral; for (source = 1; source < p; source++) { MPI_Recv(&integral, 1, MPI_DOUBLE, source, tag,  MPI_COMM_WORLD,  &status); total = total + integral; }//End for } else  MPI_Send(&integral, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD); elapsed_time += MPI_Wtime(); /* Print the result */ if (my_rank == 0) { printf(&quot;With n = %d trapezoids, our estimate\n&quot;,n); printf(&quot;of the integral from %lf to %lf = %lf\n&quot;,a, b, total); printf(&quot;time taken: %lf\n&quot;, elapsed_time); }
Continued… /* Shut down MPI */ MPI_Finalize(); } /*  main  */  double Trap(  double  local_a , double  local_b, int local_n, double  h) { double integral;  /* Store result in integral  */ double x; int i; double f(double x); /* function we're integrating */ integral = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) { x = x + h; integral = integral + f(x); } integral = integral*h; return integral; } /*  Trap  */
Continued… double f(double x) { double return_val; /* Calculate f(x). */ /* Store calculation in return_val. */ return_val = 4 / (1+x*x); return return_val; } /* f */
Program 2 Process other than root generates the random value less than 1 and sends to root. Root sums up and displays sum.
#include <stdio.h> #include <mpi.h> #include<stdlib.h> #include <string.h> #include<time.h> int main(int argc, char **argv) { int myrank,  p; int tag =0, dest=0; int i; double randIn,randOut; int source; MPI_Status status; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
MPI_Comm_size(MPI_COMM_WORLD, &p); if(myrank==0)//I am the root { double total=0,average=0; for(source=1;source<p;source++) { MPI_Recv(&randIn,1, MPI_DOUBLE, source, MPI_ANY_TAG,  MPI_COMM_WORLD, &status); printf(&quot;Message from root: From %d received number %f\n&quot;,source  ,randIn); total+=randIn; }//End for average=total/(p-1); }//End if
else//I am other than root { srand48((long int) myrank); randOut=drand48(); printf(&quot;randout=%f, myrank=%d\n&quot;,randOut,myrank); MPI_Send(&randOut,1,MPI_DOUBLE,dest,tag,MPI_COMM_WORLD); }//End If-Else MPI_Finalize(); return 0; }
MPI References The Standard itself: at   http://www.mpi-forum.org All MPI official releases, in both postscript and HTML Books: Using MPI:  Portable Parallel Programming with the Message-Passing Interface , 2nd Edition, by Gropp, Lusk, and Skjellum, MIT Press, 1999.  Also  Using MPI-2 , w. R. Thakur MPI:  The Complete Reference,  2 vols ,  MIT Press, 1999. Designing and Building Parallel Programs , by Ian Foster, Addison-Wesley, 1995. Parallel Programming with MPI , by Peter Pacheco, Morgan-Kaufmann, 1997. Other information on Web: at  http://www.mcs.anl.gov/mpi For man pages of open MPI on the web  : http://www.open-mpi.org/doc/v1.4/ apropos mpi
THANK YOU

More Related Content

What's hot (20)

PDF
Linux-Internals-and-Networking
Emertxe Information Technologies Pvt Ltd
 
PDF
RTOS - Real Time Operating Systems
Emertxe Information Technologies Pvt Ltd
 
PPT
Flynns classification
Yasir Khan
 
PPT
Distributed Operating System
SanthiNivas
 
PDF
Inter Process Communication
Anil Kumar Pugalia
 
PDF
Linux Internals - Part I
Emertxe Information Technologies Pvt Ltd
 
PDF
parallel Questions &amp; answers
Md. Mashiur Rahman
 
PPT
Rtos Concepts
Sundaresan Sundar
 
PDF
Introduction to OpenMP
Akhila Prabhakaran
 
PPTX
cpu scheduling
hashim102
 
PPT
Inter process communication
Mohd Tousif
 
PDF
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
PPTX
Multi threaded programming
AnyapuPranav
 
PPT
Inter-Process communication in Operating System.ppt
NitihyaAshwinC
 
PPT
Introduction to MPI
Hanif Durad
 
PPT
Shell and its types in LINUX
SHUBHA CHATURVEDI
 
PDF
MQTT Protocol: IOT Technology
Shashank Kapoor
 
PDF
Distributed Operating System_1
Dr Sandeep Kumar Poonia
 
PPTX
Chapter 10 Operating Systems silberschatz
GiulianoRanauro
 
PPTX
Distributed file system
Anamika Singh
 
Linux-Internals-and-Networking
Emertxe Information Technologies Pvt Ltd
 
RTOS - Real Time Operating Systems
Emertxe Information Technologies Pvt Ltd
 
Flynns classification
Yasir Khan
 
Distributed Operating System
SanthiNivas
 
Inter Process Communication
Anil Kumar Pugalia
 
parallel Questions &amp; answers
Md. Mashiur Rahman
 
Rtos Concepts
Sundaresan Sundar
 
Introduction to OpenMP
Akhila Prabhakaran
 
cpu scheduling
hashim102
 
Inter process communication
Mohd Tousif
 
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Multi threaded programming
AnyapuPranav
 
Inter-Process communication in Operating System.ppt
NitihyaAshwinC
 
Introduction to MPI
Hanif Durad
 
Shell and its types in LINUX
SHUBHA CHATURVEDI
 
MQTT Protocol: IOT Technology
Shashank Kapoor
 
Distributed Operating System_1
Dr Sandeep Kumar Poonia
 
Chapter 10 Operating Systems silberschatz
GiulianoRanauro
 
Distributed file system
Anamika Singh
 

Similar to MPI Introduction (20)

PDF
Parallel and Distributed Computing Chapter 10
AbdullahMunir32
 
PPTX
Intro to MPI
jbp4444
 
PPT
Parallel computing(2)
Md. Mahedi Mahfuj
 
PPT
BASIC_MPI.ppt
aminnezarat
 
PPTX
Smalland Survive the Wilds v1.6.2 Free Download
elonbuda
 
PPTX
Cricket 07 Download For Pc Windows 7,10,11 Free
michaelsatle759
 
PPTX
TVersity Pro Media Server Free CRACK Download
softcover72
 
PPTX
ScreenHunter Pro 7 Free crack Download
sgabar822
 
PPTX
Arcsoft TotalMedia Theatre crack Free 2025 Download
gangpage308
 
PPTX
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
castp261
 
PDF
AutoCAD 2025 Crack By Autodesk Free Serial Number
fizaabbas585
 
PDF
Wondershare Filmora Crack 2025 For Windows Free
tanveerkhansahabkp027
 
PDF
Smalland Survive the Wilds v1.6.2 Free Download
mohsinrazakpa43
 
PDF
ScreenHunter Pro 7 Free crack Download 2025
mohsinrazakpa43
 
PDF
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
mohsinrazakpa43
 
PDF
Wondershare Filmora Crack 2025 For Windows Free
mohsinrazakpa43
 
PDF
Arcsoft TotalMedia Theatre crack Free 2025 Download
mohsinrazakpa43
 
PDF
TVersity Pro Media Server Free CRACK Download
mohsinrazakpa43
 
PDF
Wondershare Filmora Crack 2025 For Windows Free
blouch10kp
 
Parallel and Distributed Computing Chapter 10
AbdullahMunir32
 
Intro to MPI
jbp4444
 
Parallel computing(2)
Md. Mahedi Mahfuj
 
BASIC_MPI.ppt
aminnezarat
 
Smalland Survive the Wilds v1.6.2 Free Download
elonbuda
 
Cricket 07 Download For Pc Windows 7,10,11 Free
michaelsatle759
 
TVersity Pro Media Server Free CRACK Download
softcover72
 
ScreenHunter Pro 7 Free crack Download
sgabar822
 
Arcsoft TotalMedia Theatre crack Free 2025 Download
gangpage308
 
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
castp261
 
AutoCAD 2025 Crack By Autodesk Free Serial Number
fizaabbas585
 
Wondershare Filmora Crack 2025 For Windows Free
tanveerkhansahabkp027
 
Smalland Survive the Wilds v1.6.2 Free Download
mohsinrazakpa43
 
ScreenHunter Pro 7 Free crack Download 2025
mohsinrazakpa43
 
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
mohsinrazakpa43
 
Wondershare Filmora Crack 2025 For Windows Free
mohsinrazakpa43
 
Arcsoft TotalMedia Theatre crack Free 2025 Download
mohsinrazakpa43
 
TVersity Pro Media Server Free CRACK Download
mohsinrazakpa43
 
Wondershare Filmora Crack 2025 For Windows Free
blouch10kp
 
Ad

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PPTX
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PDF
SSHS-2025-PKLP_Quarter-1-Dr.-Kerby-Alvarez.pdf
AishahSangcopan1
 
PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PDF
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
SSHS-2025-PKLP_Quarter-1-Dr.-Kerby-Alvarez.pdf
AishahSangcopan1
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
community health nursing question paper 2.pdf
Prince kumar
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
Ad

MPI Introduction

  • 1. MPI Rohit Banga Prakher Anand K Swagat Manoj Gupta Advanced Computer Architecture Spring, 2010
  • 2. ORGANIZATION Basics of MPI Point to Point Communication Collective Communication Demo
  • 3. GOALS Explain basics of MPI Start coding today! Keep It Short and Simple
  • 4. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language or compiler specification Not a specific implementation, several implementations (like pthread) standard for distributed memory, message passing, parallel computing Distributed Memory – Shared Nothing approach! Some interconnection technology – TCP, INFINIBAND (on our cluster)
  • 5. GOALS OF MPI SPECIFICATION Provide source code portability Allow efficient implementations Flexible to port different algorithms on different hardware environments Support for heterogeneous architectures – processors not identical
  • 6. REASONS FOR USING MPI Standardization – virtually all HPC platforms Portability – same code runs on another platform Performance – vendor implementations should exploit native hardware features Functionality – 115 routines Availability – a variety of implementations available
  • 7. BASIC MODEL Communicators and Groups Group ordered set of processes each process is associated with a unique integer rank rank from 0 to (N-1) for N processes an object in system memory accessed by handle MPI_GROUP_EMPTY MPI_GROUP_NULL
  • 8. BASIC MODEL (CONTD.) Communicator Group of processes that may communicate with each other MPI messages must specify a communicator An object in memory Handle to access the object There is a default communicator (automatically defined): MPI_COMM_WORLD identify the group of all processes
  • 9. COMMUNICATORS Intra-Communicator – All processes from the same group Inter-Communicator – Processes picked up from several groups
  • 10. COMMUNICATOR AND GROUPS For a programmer, group and communicator are one Allow you to organize tasks, based upon function, into task groups Enable Collective Communications (later) operations across a subset of related tasks safe communications Many Communicators at the same time Dynamic – can be created and destroyed at run time Process may be in more than one group/communicator – unique rank in every group/communicator implementing user defined virtual topologies
  • 11. VIRTUAL TOPOLOGIES coord (0,0): rank 0 coord (0,1): rank 1 coord (1,0): rank 2 coord (1,1): rank 3 Attach graph topology information to an existing communicator
  • 12. SEMANTICS Header file #include <mpi.h> (C) include mpif.h (fortran) Java, Python etc. Format: rc = MPI_Xxxxx(parameter, ... ) Example: rc = MPI_Bsend(&buf,count,type,dest,tag,comm) Error code: Returned as &quot;rc&quot;. MPI_SUCCESS if successful
  • 14. MPI FUNCTIONS – MINIMAL SUBSET MPI_Init – Initialize MPI MPI_Comm_size – size of group associated with the communicator MPI_Comm_rank – identify the process MPI_Send MPI_Recv MPI_Finalize We will discuss simple ones first
  • 15. CLASSIFICATION OF MPI ROUTINES Environment Management MPI_Init, MPI_Finalize Point-to-Point Communication MPI_Send, MPI_Recv Collective Communication MPI_Reduce, MPI_Bcast Information on the Processes MPI_Comm_rank, MPI_Get_processor_name
  • 16. MPI_INIT All MPI programs call this before using other MPI functions int MPI_Init(int *pargc, char ***pargv); Must be called in every MPI program Must be called only once and before any other MPI functions are called Pass command line arguments to all processes int main(int argc, char **argv) { MPI_Init(&argc, &argv); … }
  • 17. MPI_COMM_SIZE Number of processes in the group associated with a communicator int MPI_Comm_size(MPI_Comm comm, int *psize); Find out number of processes being used by your application int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); … }
  • 18. MPI_COMM_RANK Rank of the calling process within the communicator Unique Rank between 0 and (p-1) Can be called task ID int MPI_Comm_rank(MPI_Comm comm, int *rank); Unique rank for a process in each communicator it belongs to Used to identify work for the processor int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); … }
  • 19. MPI_FINALIZE Terminates the MPI execution environment Last MPI routine to be called in any MPI program int MPI_Finalize(void); int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(“no. of processors: %d\n rank: %d”, p, rank); MPI_Finalize(); }
  • 20.  
  • 21. HOW TO COMPILE THIS Open MPI implementation on our Cluster mpicc -o test_1 test_1.c Like gcc only mpicc not a special compiler $mpicc: gcc: no input files Mpi implemented just as any other library Just a wrapper around gcc that includes required command line parameters
  • 22. HOW TO RUN THIS mpirun -np X test_1 Will run X copies of program in your current run time environment np option specifies number of copies of program
  • 23. MPIRUN Only rank 0 process can receive standard input. mpirun redirects standard input of all others to /dev/null Open MPI redirects standard input of mpirun to standard input of rank 0 process Node which invoked mpirun need not be the same as the node for the MPI_COMM_WORLD rank 0 process mpirun directs standard output and error of remote nodes to the node that invoked mpirun SIGTERM, SIGKILL kill all processes in the communicator SIGUSR1, SIGUSR2 propagated to all processes All other signals ignored
  • 24. A NOTE ON IMPLEMENTATION I want to implement my own version of MPI Evidence MPI_Init MPI Thread MPI_Init MPI Thread
  • 25. SOME MORE FUNCTIONS int MPI_Init (&flag) Check if MPI_Initialized has been called Why? int MPI_Wtime() Returns elapsed wall clock time in seconds (double precision) on the calling processor int MPI_Wtick() Returns the resolution in seconds (double precision) of MPI_Wtime() Message Passing Functionality That is what MPI is meant for!
  • 26. POINT TO POINT COMMUNICATION
  • 27. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes One sending and one receiving Types Synchronous send Blocking send / blocking receive Non-blocking send / non-blocking receive Buffered send Combined send/receive &quot;Ready&quot; send
  • 28. POINT-TO-POINT COMMUNICATION Processes can be collected into groups Each message is sent in a context, and must be received in the same context A group and context together form a Communicator A process is identified by its rank in the group associated with a communicator Messages are sent with an accompanying user defined integer tag, to assist the receiving process in identifying the message MPI_ANY_TAG
  • 29. POINT-TO-POINT COMMUNICATION How is “data” described? How are processes identified? How does the receiver recognize messages? What does it mean for these operations to complete?
  • 30. BLOCKING SEND/RECEIVE int MPI_Send( void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ?
  • 31. BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ?
  • 32. BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ?
  • 33.  
  • 34. BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ? dest: the receiver tag: the label of the message communicator: set of processors involved (MPI_COMM_WORLD)
  • 35. BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ? dest: the receiver tag: the label of the message communicator: set of processors involved (MPI_COMM_WORLD)
  • 36. BLOCKING SEND/RECEIVE int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) buf: pointer - data to send count: number of elements in buffer . Datatype : which kind of data types in buffer ? dest: the receiver tag: the label of the message communicator: set of processors involved (MPI_COMM_WORLD)
  • 37. BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
  • 38. A WORD ABOUT SPECIFICATION The user does not know if MPI implementation: copies BUFFER in an internal buffer, start communication, and returns control before all the data are transferred. (BUFFERING) create links between processors, send data and return control when all the data are sent (but NOT received) uses a combination of the above methods
  • 39. BLOCKING SEND/RECEIVE (CONTD.) &quot;return&quot; after it is safe to modify the application buffer Safe modifications will not affect the data intended for the receive task does not imply that the data was actually received Blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive A blocking receive only &quot;returns&quot; after the data has arrived and is ready for use by the program
  • 40. NON-BLOCKING SEND/RECEIVE return almost immediately simply &quot;request&quot; the MPI library to perform the operation when it is able Cannot predict when that will happen request a send/receive and start doing other work! unsafe to modify the application buffer (your variable space) until you know that the non-blocking operation has been completed MPI_Isend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irecv (&buf,count,datatype,source,tag,comm,&request)
  • 41. NON-BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
  • 42. To check if the send/receive operations have completed int MPI_Irecv (void *buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm, MPI_Request *req); int MPI_Wait(MPI_Request *req, MPI_Status *status); A call to this subroutine cause the code to wait until the communication pointed by req is complete input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV). input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV). NON-BLOCKING SEND/RECEIVE (CONTD.)
  • 43. int MPI_Test(MPI_Request *req, int *flag, MPI_Status *status); A call to this subroutine sets flag to true if the communication pointed by req is complete, sets flag to false otherwise. NON-BLOCKING SEND/RECEIVE (CONTD.)
  • 44. STANDARD MODE Returns when Sender is free to access and overwrite the send buffer. Might be copied directly into the matching receive buffer, or might be copied into a temporary system buffer. Message buffering decouples the send and receive operations. Message buffering can be expensive. It is up to MPI to decide whether outgoing messages will be buffered The standard mode send is non-local .
  • 45. SYNCHRONOUS MODE Send can be started whether or not a matching receive was posted. Send completes successfully only if a corresponding receive was already posted and has already started to receive the message sent. Blocking send & Blocking receive in synchronous mode. Simulate a synchronous communication. Synchronous Send is non-local .
  • 46. BUFFERED MODE Send operation can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted. Operation is local. MPI must buffer the outgoing message. Error will occur if there is insufficient buffer space. The amount of available buffer space is controlled by the user.
  • 47. BUFFER MANAGEMENT int MPI_Buffer_attach( void* buffer, int size) Provides to MPI a buffer in the user's memory to be used for buffering outgoing messages. int MPI_Buffer_detach( void* buffer_addr, int* size) Detach the buffer currently associated with MPI. MPI_Buffer_attach( malloc(BUFFSIZE), BUFFSIZE); /* a buffer of BUFFSIZE bytes can now be used by MPI_Bsend */ MPI_Buffer_detach( &buff, &size); /* Buffer size reduced to zero */ MPI_Buffer_attach( buff, size); /* Buffer of BUFFSIZE bytes available again */
  • 48. READY MODE A send may be started only if the matching receive is already posted. The user must be sure of this. If the receive is not already posted, the operation is erroneous and its outcome is undefined. Completion of the send operation does not depend on the status of a matching receive. Merely indicates that the send buffer can be reused. Ready-send could be replaced by a standard-send with no effect on the behavior of the program other than performance.
  • 49. ORDER AND FAIRNESS Order: MPI Messages are non-overtaking . When a receive matches 2 messages. When a sent message matches 2 receive statements. Message-passing code is deterministic, unless the processes are multi-threaded or the wild-card MPI_ANY_SOURCE is used in a receive statement. Fairness: MPI does not guarantee fairness Example: task 0 sends a message to task 2. However, task 1 sends a competing message that matches task 2's receive. Only one of the sends will complete.
  • 50. EXAMPLE OF NON-OVERTAKING MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
  • 51. EXAMPLE OF INTERTWINGLED MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag1, comm, ierr) CALL MPI_SSEND(buf2, count, MPI_REAL, 1, tag2, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, tag2, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag1, comm, status, ierr) END IF
  • 52. DEADLOCK EXAMPLE CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) END IF
  • 53. EXAMPLE OF BUFFERING CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_SEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV (recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE ! rank.EQ.1 CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
  • 55. COLLECTIVE ROUTINES Collective routines provide a higher-level way to organize a parallel program. Each process executes the same communication operations. Communications involving group of processes in a communicator. Groups and communicators can be constructed “by hand” or using topology routines. Tags are not used; different communicators deliver similar functionality. No non-blocking collective operations. Three classes of operations: synchronization, data movement, collective computation.
  • 56. COLLECTIVE ROUTINES (CONTD.) int MPI_Barrier(MPI_Comm comm) Stop processes until all processes within a communicator reach the barrier Occasionally useful in measuring performance
  • 57. COLLECTIVE ROUTINES (CONTD.) int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm) Broadcast One-to-all communication: same data sent from root process to all others in the communicator
  • 58. COLLECTIVE ROUTINES (CONTD.) Reduction The reduction operation allow to: Collect data from each process Reduce the data to a single value Store the result on the root processes Store the result on all processes Reduction function works with arrays other operation: product, min, max, and, …. Internally is usually implemented with a binary tree
  • 59. COLLECTIVE ROUTINES (CONTD.) int MPI_Reduce/MPI_Allreduce(void * snd_buf, void * rcv_buf, int count, MPI_Datatype type, MPI_Op op, int root, MPI_Comm comm) snd_buf: input array rcv_buf output array count: number of element of snd_buf and rcv_buf type: MPI type of snd_buf and rcv_buf op: parallel operation to be performed root: MPI id of the process storing the result comm: communicator of processes involved in the operation
  • 60. MPI OPERATIONS MPI_OP operator MPI_MIN Minimum MPI_SUM Sum MPI_PROD product MPI_MAX maximum MPI_LAND Logical and MPI_BAND Bitwise and MPI_LOR Logical or MPI_BOR Bitwise or MPI_LXOR Logical xor MPI_BXOR Bit-wise xor MPI_MAXLOC Max value and location MPI_MINLOC Min value and location
  • 63. Parallel Trapezoidal Rule Output: Estimate of the integral from a to b of f(x) using the trapezoidal rule and n trapezoids. Algorithm: 1. Each process calculates &quot;its&quot; interval of integration. 2. Each process estimates the integral of f(x) over its interval using the trapezoidal rule. 3a. Each process != 0 sends its integral to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Notes: 1. f(x), a, b, and n are all hardwired. 2. The number of processes (p) should evenly divide the number of trapezoids (n = 1024)
  • 64. Parallelizing the Trapezoidal Rule #include <stdio.h> #include &quot;mpi.h&quot; main(int argc, char** argv) { int my_rank; /* My process rank */ int p; /* The number of processes */ double a = 0.0; /* Left endpoint */ double b = 1.0; /* Right endpoint */ int n = 1024; /* Number of trapezoids */ double h; /* Trapezoid base length */ double local_a; /* Left endpoint my process */ double local_b; /* Right endpoint my process */ int local_n; /* Number of trapezoids for */ /* my calculation */ double integral; /* Integral over my interval */ double total; /* Total integral */ int source; /* Process sending integral */ int dest = 0; /* All messages go to 0 */ int tag = 0; MPI_Status status;
  • 65. Continued… double Trap(double local_a, double local_b, int local_n,double h); /* Calculate local integral */ MPI_Init (&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); double elapsed_time = -MPI_Wtime(); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); h = (b-a)/n; /* h is the same for all processes */ local_n = n/p; /* So is the number of trapezoids */ /* Length of each process' interval of integration = local_n*h. So my interval starts at: */ local_a = a + my_rank*local_n*h; local_b = local_a + local_n*h; integral = Trap(local_a, local_b, local_n, h);
  • 66. Continued… /* Add up the integrals calculated by each process */ if (my_rank == 0) { total = integral; for (source = 1; source < p; source++) { MPI_Recv(&integral, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status); total = total + integral; }//End for } else MPI_Send(&integral, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD); elapsed_time += MPI_Wtime(); /* Print the result */ if (my_rank == 0) { printf(&quot;With n = %d trapezoids, our estimate\n&quot;,n); printf(&quot;of the integral from %lf to %lf = %lf\n&quot;,a, b, total); printf(&quot;time taken: %lf\n&quot;, elapsed_time); }
  • 67. Continued… /* Shut down MPI */ MPI_Finalize(); } /* main */ double Trap( double local_a , double local_b, int local_n, double h) { double integral; /* Store result in integral */ double x; int i; double f(double x); /* function we're integrating */ integral = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) { x = x + h; integral = integral + f(x); } integral = integral*h; return integral; } /* Trap */
  • 68. Continued… double f(double x) { double return_val; /* Calculate f(x). */ /* Store calculation in return_val. */ return_val = 4 / (1+x*x); return return_val; } /* f */
  • 69. Program 2 Process other than root generates the random value less than 1 and sends to root. Root sums up and displays sum.
  • 70. #include <stdio.h> #include <mpi.h> #include<stdlib.h> #include <string.h> #include<time.h> int main(int argc, char **argv) { int myrank, p; int tag =0, dest=0; int i; double randIn,randOut; int source; MPI_Status status; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
  • 71. MPI_Comm_size(MPI_COMM_WORLD, &p); if(myrank==0)//I am the root { double total=0,average=0; for(source=1;source<p;source++) { MPI_Recv(&randIn,1, MPI_DOUBLE, source, MPI_ANY_TAG, MPI_COMM_WORLD, &status); printf(&quot;Message from root: From %d received number %f\n&quot;,source ,randIn); total+=randIn; }//End for average=total/(p-1); }//End if
  • 72. else//I am other than root { srand48((long int) myrank); randOut=drand48(); printf(&quot;randout=%f, myrank=%d\n&quot;,randOut,myrank); MPI_Send(&randOut,1,MPI_DOUBLE,dest,tag,MPI_COMM_WORLD); }//End If-Else MPI_Finalize(); return 0; }
  • 73. MPI References The Standard itself: at http://www.mpi-forum.org All MPI official releases, in both postscript and HTML Books: Using MPI: Portable Parallel Programming with the Message-Passing Interface , 2nd Edition, by Gropp, Lusk, and Skjellum, MIT Press, 1999. Also Using MPI-2 , w. R. Thakur MPI: The Complete Reference, 2 vols , MIT Press, 1999. Designing and Building Parallel Programs , by Ian Foster, Addison-Wesley, 1995. Parallel Programming with MPI , by Peter Pacheco, Morgan-Kaufmann, 1997. Other information on Web: at http://www.mcs.anl.gov/mpi For man pages of open MPI on the web : http://www.open-mpi.org/doc/v1.4/ apropos mpi

Editor's Notes

  • #5: Can work with shared memory architectures also
  • #6: Why MPI is still being used
  • #7: Many vendors can compete for providing better implementation
  • #8: Boring topic. But fundamental for understanding the basics
  • #11: Safe – different libraries can work together
  • #13: Different return codes for different functions
  • #15: To start coding we need to use these functions.
  • #45: Mention the case where the buffer space might not be available
  • #48: Buffer used only during Buffered mode communication.
  • #49: Ready call indicates the system that a receive has already been posted.
  • #50: Built in collective operations. Reduce, Bcast, Datatypes