Skip to content

Commit f4d6b41

Browse files
committed
added Rmpi codes and discussons
1 parent 03dcea8 commit f4d6b41

File tree

3 files changed

+104
-12
lines changed

3 files changed

+104
-12
lines changed

episodes/data/code.tar.gz

735 Bytes
Binary file not shown.

episodes/data/code.zip

2.34 KB
Binary file not shown.

episodes/message-passing.md

Lines changed: 104 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -203,16 +203,82 @@ if ( myrank == 0 ): # Only rank 0 will print results
203203

204204
* [Rmpi documentations](https://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf)
205205

206-
While there is an MPI interface for R called **Rmpi**, it was originally developed for
207-
LAM MPI which has not been actively developed in 20 years and the documentations
208-
still cite LAM commands.
209-
Although it is getting some updates it is strongly recommended to avoid
210-
this package. An **Rmpi** version **dot_product_doMPI.R" in the **code** directory
211-
run on a modern Linux system with up-to-date R 4.3.2 and a GNU build of OpenMPI 5.0.3
212-
spawns processes but never returns from the **startMPIcluster()** call. **Rmpi** can
213-
also be used to write explicit MPI code. If **Rmpi** can be made to work it would
214-
bring the ability to spread work across multiple compute nodes as well as multiple
215-
cores within each node, but the performance is unknown.
206+
The **Rmpi** package was developed about 20 years ago but has been updated every few
207+
years to be compatible with current versions of R and OpenMPI (except my tests
208+
failed with OpenMPI 5.0.3 so I had to use an older OpenMPI 4.1.6 version).
209+
This package provides a **doMPI** back end that can be easily slipped into a
210+
program using **foreach** loops with **%dopar%** allowing the code to run
211+
on cores on multiple compute nodes.
212+
**Rmpi** also provides wrappered MPI commands for programmers who wish to
213+
write explicit MPI programs in R.
214+
215+
```R
216+
# Do the dot product between two vectors X and Y then print the result
217+
# USAGE: mpirun -np 4 Rscript dot_product_message_passing.R 100000
218+
# This will run 100,000 elements on 4 cores, possibly spread on multiple compute nodes
219+
# must install.packages("Rmpi") first
220+
221+
library( Rmpi ) # This does the MPI_Init() behind the scenes
222+
223+
# Get the vector size from the command line
224+
225+
args <- commandArgs(TRUE)
226+
if( length( args ) == 1 ) {
227+
n <- as.integer( args[1] )
228+
} else {
229+
n <- 100000
230+
}
231+
232+
# Get my rank and the number of ranks - (MPI talks about ranks instead of threads)
233+
234+
com <- 0 # MPI_COMM_WORLD or all ranks
235+
nRanks <- mpi.comm.size( com ) # The number of ranks (threads)
236+
myRank <- mpi.comm.rank( com ) # Which rank am I ( 1 .. nRanks )
237+
238+
if( (n %% nRanks) != 0 ) {
239+
print("Please ensure vector size is divisable by the number of ranks")
240+
quit()
241+
}
242+
myElements <- n / nRanks
243+
244+
# Allocate space and initialize the reduced arrays for each rank
245+
246+
x <- vector( "double", myElements )
247+
y <- vector( "double", myElements )
248+
249+
j <- 0
250+
for( i in seq( myRank+1, n, nRanks ) )
251+
{
252+
j <- j + 1
253+
x[j] <- as.double(i)
254+
y[j] <- as.double(3*i)
255+
}
256+
257+
# Clear cache then barrier sync so all ranks are ready then time
258+
259+
dummy <- matrix( 1:125000000 ) # Clear the cache buffers before timing
260+
261+
ret <- mpi.barrier( com ) # mpi.barrier() returns 1 if successful
262+
263+
t_start <- proc.time()[[3]]
264+
265+
p_sum <- 0.0
266+
for( i in 1:myElements )
267+
{
268+
p_sum <- p_sum + x[i] * y[i]
269+
}
270+
271+
dot_product <- mpi.allreduce( p_sum, type = 2, op = "sum", comm = com )
272+
273+
t_end <- proc.time()[[3]]
274+
275+
if( myRank == 0 ) {
276+
print(sprintf("Rmpi dot product with nRanks workers took %6.3f seconds", t_end-t_start))
277+
print(sprintf("dot_product = %.6e on %i MPI ranks for vector size %i", dot_product, nRanks, n ) )
278+
}
279+
280+
mpi.quit( )
281+
```
216282

217283
### C
218284

@@ -456,7 +522,13 @@ the Python matrix multiply code and test the scaling.
456522

457523
### R
458524

459-
**Rmpi** is not recommended.
525+
Measure the execution time for the **dot_product_message_passing.R** code
526+
for 1, 4, 8, and 16 cores on a single compute node to compare with other
527+
parallelizaton methods available in R.
528+
If you are on an HPC system with multiple nodes, try running the same
529+
tests on 2 or 4 compute nodes to compare.
530+
You can also try running the **dot_product_doMPI.R** code to see how
531+
it compares to using explicit **MPI** programming in R.
460532

461533
### C
462534

@@ -509,7 +581,27 @@ the added global summation after the loop.
509581

510582
### R
511583

512-
Not implemented yet.
584+
For the single node tests I used 10,000,000 element vectors to
585+
get a good result with enough work to expect better scaling.
586+
Smaller vector tests will illustrate the difference in
587+
overhead better but be less indicative of the performance of most
588+
real applications.
589+
590+
For the **dot_product_message_passing.R** code I got 481 ms for 1 core,
591+
132 ms for 4 cores, 68 ms for 8 cores, and 49 ms for 16 cores showing good
592+
performance and scaling which is expected given the only communication is
593+
for the global summation at the end.
594+
The **dot_product_doMPI.R** code had 1.1 seconds, 0.85 seconds, 4.8 seconds,
595+
and 8.4 seconds respectively showing much poorer performance and actually
596+
got worse as more cores were used. The overhead was just too great, so while
597+
using a **doMPI** back end is much easier than using explicit MPI commands,
598+
the performance and scaling are much worse.
599+
600+
Running on 4 nodes 4 cores each I got 53 ms for **dot_product_message_passing.R**
601+
compared to 49 ms on a single node which is very good but again the only
602+
communication is the global summation at the end.
603+
I did not manage to get the **dot_product_doMPI.R** code to run on multiple
604+
nodes yet.
513605

514606
### C
515607

0 commit comments

Comments
 (0)