Skip to content

Commit fbb0862

Browse files
committed
spell checked some files
1 parent 4707668 commit fbb0862

File tree

4 files changed

+16
-16
lines changed

4 files changed

+16
-16
lines changed

episodes/message-passing.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ failed with OpenMPI 5.0.3 so I had to use an older OpenMPI 4.1.6 version).
209209
This package provides a **doMPI** back end that can be easily slipped into a
210210
program using **foreach** loops with **%dopar%** allowing the code to run
211211
on cores on multiple compute nodes.
212-
**Rmpi** also provides wrappered MPI commands for programmers who wish to
212+
**Rmpi** also provides wrapped MPI commands for programmers who wish to
213213
write explicit MPI programs in R.
214214

215215
```R
@@ -524,7 +524,7 @@ the Python matrix multiply code and test the scaling.
524524

525525
Measure the execution time for the **dot_product_message_passing.R** code
526526
for 1, 4, 8, and 16 cores on a single compute node to compare with other
527-
parallelizaton methods available in R.
527+
parallelization methods available in R.
528528
If you are on an HPC system with multiple nodes, try running the same
529529
tests on 2 or 4 compute nodes to compare.
530530
You can also try running the **dot_product_doMPI.R** code to see how

episodes/multi-threaded.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ that cleans up the virtual cluster before the program ends.
186186
This basic approach is simple and can be useful but also may be inefficient since
187187
the overhead for dividing the work between threads may be much greater
188188
than the work done within each iteration, as is clearly the case in
189-
our simle example where there is only a single multiplication for each
189+
our simple example where there is only a single multiplication for each
190190
pass through the loop.
191191
In the second part of this code, the loop is instead
192192
divided over the number of threads with the function then manually splitting
@@ -296,7 +296,7 @@ the processes that are spawned inherit the environment of the parent process.
296296
So we get more flexibility in the back ends as well as a more convenient
297297
programming approach.
298298
You'll be asked to measure the performance of each approach in the
299-
excersize below.
299+
exercise below.
300300

301301
```R
302302
# Dot product in R using a loop and a vector summation

episodes/performance-concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -605,7 +605,7 @@ The built in matrix multiplication is expected to be more optimized
605605
and I get 13 ms for a 10x10 matrix, 1 ms for a 100x100 matrix, and
606606
26 ms for the 1000x1000 matrix.
607607
The built in matrix multiplication is clearly better except for the
608-
very small 10x10 matrix which seems to be an aberation.
608+
very small 10x10 matrix which seems to be an aberration.
609609
The 1000x1000 matrix is where it shines taking only 26 ms where the
610610
**for** loop takes 148 seconds or nearly 5700 times as long.
611611
It isn't clear why the difference is this large, but clearly the

episodes/profiling-code.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Linux has a **time** function that can proceed any command, so this
3939
can be used to time the entire job externally even if we don't have
4040
access to the source code.
4141
This can be used to time a short test run in order to estimate the
42-
runtime needed for the complete job.
42+
run time needed for the complete job.
4343
When we get to talking about parallel computing, or using multiple
4444
compute cores to run a single job, the **time** function will
4545
prove useful for getting the execution time for a job as it
@@ -105,7 +105,7 @@ for programs when they are running.
105105
Below we are going to use the **sleep** command to time an interval
106106
of 5 seconds just to simulate what we might see in a real job.
107107
In this case, even though the sleep function was supposed to go
108-
5 seconds, there was some overhead or inaccuracy in the timnig
108+
5 seconds, there was some overhead or inaccuracy in the timing
109109
routine or the length of the sleep.
110110
This is one thing that you always need to be aware of when measuring
111111
performance.
@@ -122,7 +122,7 @@ sys 0m0.003s
122122
```
123123

124124
In addition to worrying about the clock accuracy, you also need to
125-
worry about interferrence from other jobs that may be running on
125+
worry about interference from other jobs that may be running on
126126
the same compute node you are on.
127127
The best way to time a real job is to test it out on a completely
128128
isolated computer.
@@ -140,7 +140,7 @@ If your job is using the network to communicate with other
140140
compute nodes, that might also be shared with other jobs running
141141
on the same node.
142142
The single largest factor to be aware of is that other jobs using
143-
the same file server as you are can definitely affect the peroformance
143+
the same file server as you are can definitely affect the performance
144144
of your job if your code is doing lots of IO (Input and Output).
145145
On HPC systems, this can be true even if the other jobs are not on
146146
the same compute node as your job.
@@ -392,7 +392,7 @@ Since both of these are above the nanosecond range, we can be confident
392392
that the timing routine is accurately measuring each.
393393

394394
Let's see what we can learn by playing around with it some more.
395-
When I run the python version preceeded by the linux **time** function,
395+
When I run the python version preceded by the Linux **time** function,
396396
I see a real time significantly larger than the loop time and output time
397397
combined.
398398
The initialization time is not measured but shouldn't be more than
@@ -625,7 +625,7 @@ If we look at the **t_loop** time instead, in my computer it is more
625625
than double what it was before.
626626
When the clock routine is measuring very small intervals each time,
627627
it can be intrusive in that it distorts the measurement by increasing
628-
the runtime of the entire code.
628+
the run time of the entire code.
629629
It isn't surprising that this is intrusive since we are measuring the
630630
time it takes to retrieve a single array element and do one addition.
631631
The code is doing a subtraction and addition itself to calculate the
@@ -636,10 +636,10 @@ doing the timing in this way is not more intrusive.
636636

637637
The goal is to fully profile your code so that you understand
638638
where all the time is being spent.
639-
This means timnig each computational section where time is being
639+
This means timing each computational section where time is being
640640
spent, usually the loops for example.
641-
While simple print statements may not be important contributers to
642-
the overall runtime of a code, any large input or output from files
641+
While simple print statements may not be important contributors to
642+
the overall run time of a code, any large input or output from files
643643
may be.
644644
When we start talking about parallel programs that use multiple
645645
cores or even multiple compute nodes it will become important
@@ -684,7 +684,7 @@ both increase the scaling efficiency.
684684

685685
## Tracking Memory Usage
686686

687-
When we think about high peformance, we mostly think about running
687+
When we think about high performance, we mostly think about running
688688
jobs faster.
689689
For some programs, the memory usage may be the factor limiting what
690690
types of science we can do.
@@ -777,7 +777,7 @@ We will practice these approaches more in the upcoming modules.
777777

778778
::::::::::::::::::::::::::::::::::::: keypoints
779779
- The **time** function can always be used externally to measure performance
780-
but has limitted accuracy of around 1 millisecond.
780+
but has limited accuracy of around 1 millisecond.
781781
- Internally there are precise clock routines that can be used to measure
782782
the performance of each part of a code. These are different for each
783783
programming language, but the use is always the same.

0 commit comments

Comments
 (0)