Describe caching of git repositories#89
Conversation
| ## Pros&Cons | ||
|
|
||
| - Cloning is much faster for repositories in the cache. | ||
| - Cloning is slower for repositories not present in the cache. |
There was a problem hiding this comment.
we can have a list of repos where we know they are being cached: cockpit, systemd, kernel sound like great candidates, especially when we can reuse the cached copy in upstream and c9s
| - Cloning is slower for repositories not present in the cache. | ||
| - Less memory is needed to clone repositories in the cache. | ||
| (Which makes it possible to clone kernel for example.) | ||
| - More memory is needed to clone repositories not present in the cache. |
There was a problem hiding this comment.
it depends what would be the workflow to populate the cache but we should do it outside sandcastle/worker
| - Less storage is needed for the cloned repo if it is in the cache. | ||
| (Only the current state of the repo is saved, historical commits reference the cache repo.) |
There was a problem hiding this comment.
storage is so much cheaper in comparison to memory
| - The cache does not need to be writable for cloning. | ||
| Only for creating/updating. | ||
| - Persistent volumes can be used. | ||
| - How much storage we can afford? |
There was a problem hiding this comment.
our current cost in online is 25€ for 1G of mem and 1€ for 1G of storage - so we can easily start with 16G of storage
|
|
||
| - Manually on request. Mount the volume once with more memory and fetch the needed repository. | ||
| - Manually on sentry issue. As previous but gather the problematic repos in sentry. | ||
| - Start with kernel manually and add new ones on the go. |
There was a problem hiding this comment.
It would be delightful if we could create a workflow how to populate the cache (e.g. regen it weekly), have metadata which repositories are in the cache and when should they be "attached" so they are being used transparently.
| - Just kernel. | ||
| - A group of hardcoded/configured repositories. | ||
| - All repositories matching some condition (at least some commits, some size, ...) | ||
| - All repositories. (Add if not present.) |
There was a problem hiding this comment.
using kernel for a PoC would be very nice, we could prototype this in the SIG
after it's proven, we could introduce this in the upstream to projects with at least N runs a week?
| 4. Or, we can forward some method for handling the cloning. | ||
| (Defined in the service repo, run in the packit.) | ||
|
|
||
| ## Is this relevant for the CLI users? |
There was a problem hiding this comment.
I'm sorry but I don't see any value here since people already have those projects cloned.
There was a problem hiding this comment.
I agree that it does not make sense to spend much time on it, but (as I wrote below, in this part),
- second repo (upstream/downstream) is temporarily cloned by default
- both repos are temporarily cloned when URL is used as an argument (That's what I use for example. You don't need to care about the state of the local repository.)
There was a problem hiding this comment.
I really liked how you explained it during the arch meeting, it indeed makes perfect sense for those temporary clones, +1
Signed-off-by: Frantisek Lachman <flachman@redhat.com>
630bbbe to
b4f4ef9
Compare
Preview of the markdown content