spell checked some files

DrDaveTurner · DrDaveTurner · commit fbb086257c45 · 2025-09-17T01:17:18.000-05:00
diff --git a/episodes/message-passing.md b/episodes/message-passing.md
@@ -209,7 +209,7 @@ failed with OpenMPI 5.0.3 so I had to use an older OpenMPI 4.1.6 version).
 This package provides a **doMPI** back end that can be easily slipped into a
 program using **foreach** loops with **%dopar%** allowing the code to run
 on cores on multiple compute nodes.
-**Rmpi** also provides wrappered MPI commands for programmers who wish to
+**Rmpi** also provides wrapped MPI commands for programmers who wish to
 write explicit MPI programs in R.
 
 ```R
@@ -524,7 +524,7 @@ the Python matrix multiply code and test the scaling.
 
 Measure the execution time for the **dot_product_message_passing.R** code
 for 1, 4, 8, and 16 cores on a single compute node to compare with other
-parallelizaton methods available in R.
+parallelization methods available in R.
 If you are on an HPC system with multiple nodes, try running the same
 tests on 2 or 4 compute nodes to compare.
 You can also try running the **dot_product_doMPI.R** code to see how
diff --git a/episodes/multi-threaded.md b/episodes/multi-threaded.md
@@ -186,7 +186,7 @@ that cleans up the virtual cluster before the program ends.
 This basic approach is simple and can be useful but also may be inefficient since 
 the overhead for dividing the work between threads may be much greater
 than the work done within each iteration, as is clearly the case in
-our simle example where there is only a single multiplication for each
+our simple example where there is only a single multiplication for each
 pass through the loop.
 In the second part of this code, the loop is instead
 divided over the number of threads with the function then manually splitting
@@ -296,7 +296,7 @@ the processes that are spawned inherit the environment of the parent process.
 So we get more flexibility in the back ends as well as a more convenient 
 programming approach.
 You'll be asked to measure the performance of each approach in the
-excersize below.
+exercise below.
 
 ```R
 # Dot product in R using a loop and a vector summation
diff --git a/episodes/performance-concepts.md b/episodes/performance-concepts.md
@@ -605,7 +605,7 @@ The built in matrix multiplication is expected to be more optimized
 and I get 13 ms for a 10x10 matrix, 1 ms for a 100x100 matrix, and
 26 ms for the 1000x1000 matrix.
 The built in matrix multiplication is clearly better except for the
-very small 10x10 matrix which seems to be an aberation.
+very small 10x10 matrix which seems to be an aberration.
 The 1000x1000 matrix is where it shines taking only 26 ms where the
 **for** loop takes 148 seconds or nearly 5700 times as long.
 It isn't clear why the difference is this large, but clearly the
diff --git a/episodes/profiling-code.md b/episodes/profiling-code.md
@@ -39,7 +39,7 @@ Linux has a **time** function that can proceed any command, so this
 can be used to time the entire job externally even if we don't have
 access to the source code.
 This can be used to time a short test run in order to estimate the
-runtime needed for the complete job.
+run time needed for the complete job.
 When we get to talking about parallel computing, or using multiple
 compute cores to run a single job, the **time** function will
 prove useful for getting the execution time for a job as it
@@ -105,7 +105,7 @@ for programs when they are running.
 Below we are going to use the **sleep** command to time an interval
 of 5 seconds just to simulate what we might see in a real job.
 In this case, even though the sleep function was supposed to go
-5 seconds, there was some overhead or inaccuracy in the timnig 
+5 seconds, there was some overhead or inaccuracy in the timing 
 routine or the length of the sleep.
 This is one thing that you always need to be aware of when measuring
 performance.
@@ -122,7 +122,7 @@ sys	0m0.003s
 ```
 
 In addition to worrying about the clock accuracy, you also need to 
-worry about interferrence from other jobs that may be running on
+worry about interference from other jobs that may be running on
 the same compute node you are on.
 The best way to time a real job is to test it out on a completely
 isolated computer.
@@ -140,7 +140,7 @@ If your job is using the network to communicate with other
 compute nodes, that might also be shared with other jobs running
 on the same node.
 The single largest factor to be aware of is that other jobs using
-the same file server as you are can definitely affect the peroformance
+the same file server as you are can definitely affect the performance
 of your job if your code is doing lots of IO (Input and Output).
 On HPC systems, this can be true even if the other jobs are not on
 the same compute node as your job.
@@ -392,7 +392,7 @@ Since both of these are above the nanosecond range, we can be confident
 that the timing routine is accurately measuring each.
 
 Let's see what we can learn by playing around with it some more.
-When I run the python version preceeded by the linux **time** function, 
+When I run the python version preceded by the Linux **time** function, 
 I see a real time significantly larger than the loop time and output time 
 combined.
 The initialization time is not measured but shouldn't be more than
@@ -625,7 +625,7 @@ If we look at the **t_loop** time instead, in my computer it is more
 than double what it was before.
 When the clock routine is measuring very small intervals each time,
 it can be intrusive in that it distorts the measurement by increasing
-the runtime of the entire code.
+the run time of the entire code.
 It isn't surprising that this is intrusive since we are measuring the
 time it takes to retrieve a single array element and do one addition.
 The code is doing a subtraction and addition itself to calculate the
@@ -636,10 +636,10 @@ doing the timing in this way is not more intrusive.
 
 The goal is to fully profile your code so that you understand
 where all the time is being spent.
-This means timnig each computational section where time is being
+This means timing each computational section where time is being
 spent, usually the loops for example.
-While simple print statements may not be important contributers to
-the overall runtime of a code, any large input or output from files
+While simple print statements may not be important contributors to
+the overall run time of a code, any large input or output from files
 may be.
 When we start talking about parallel programs that use multiple
 cores or even multiple compute nodes it will become important
@@ -684,7 +684,7 @@ both increase the scaling efficiency.
 
 ## Tracking Memory Usage
 
-When we think about high peformance, we mostly think about running
+When we think about high performance, we mostly think about running
 jobs faster.
 For some programs, the memory usage may be the factor limiting what
 types of science we can do.
@@ -777,7 +777,7 @@ We will practice these approaches more in the upcoming modules.
 
 ::::::::::::::::::::::::::::::::::::: keypoints
 - The **time** function can always be used externally to measure performance
-  but has limitted accuracy of around 1 millisecond.
+  but has limited accuracy of around 1 millisecond.
 - Internally there are precise clock routines that can be used to measure 
   the performance of each part of a code.  These are different for each 
   programming language, but the use is always the same.