Skip to content

Conversation

@stanmoore1
Copy link
Collaborator

@stanmoore1 stanmoore1 commented Dec 8, 2017

Don't merge yet. This is an example of using the new duplicated memory feature in Kokkos as an alternative to thread atomics, see kokkos/kokkos#1225 and kokkos/kokkos#825. For OpenMP or PThreads it uses a duplicated non-atomic view, for CUDA it still uses a non-duplicated atomic view, and for Serial it uses a non-duplicated, non-atomic view. On my Linux box with OpenMP it does give speedup over atomics. The API/naming may change a bit before it is formally released into Kokkos.

@crtrott

@stanmoore1 stanmoore1 self-assigned this Dec 8, 2017
@stanmoore1
Copy link
Collaborator Author

Here is some performance data for ExaMiniMD on a 4 core Linux box
LJ benchmark, 256,000 atoms, 1 MPI x 4 OpenMP threads

Method Performance (atom-steps/s)
thread atomic 4.86e+05
work duplication (full neigh list) 6.15e+05
data duplication 7.98e+05
data duplication, persistent memory 7.99e+05

@sslattery
Copy link

@stanmoore1 any comments on differences in memory usage for OpenMP and Pthreads vs. using atomics?

@stanmoore1
Copy link
Collaborator Author

There is some memory overhead because the force array is duplicated. The force array is the second largest data structure, after the neighbor list, however typically each atom has many neighbors, so the neighbor list is much larger than the force array, and typically we only duplicate 8 or less times.

@stanmoore1
Copy link
Collaborator Author

Also the numbers for data duplication may be a little better because we fixed this bug: #22.

@janciesko
Copy link

What's the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants