Model packaging for Replicate

This project holds model packaging folders to build containers to deploy to Replicate using the cog project.

Tools

cog

You will need to select the release of cog to use for a model. I am using 0.14.4 as of this writing. You will need to place the cog command on your PATH.

On any cog command you can add --debug for more output.

uv

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

docker

You will need to install Docker CE, and its buildx plugin, and the Nvidia container toolkit.

For example,

sudo dnf config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
sudo systemctl enable docker
sudo systemctl start docker
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
docker run --rm  --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility ubuntu nvidia-smi

Also see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html. You may need to edit /etc/nvidia-container-runtime/config.toml to change supported-driver-capabilities and restart Docker.

supported-driver-capabilities = "compute,utility"

Model folder

Create folders whose path matches the Huggingface slug for the model. For example, ibm-granite/granite-3.3-8b-instruct. Change into this folder for the remaining steps. You will want to populate this folder with the files in git from an existing model folder. The new folder should contain:

ibm-granite/granite-3.3-8b-instruct
├── .dockerignore
├── cog.yaml
├── predict.py
├── predictor_config.json
├── pyproject.toml
├── requirements.txt
└── weights
    └── .gitignore

pyproject.toml

This file needs to be edited to specify the cog version matching the cog command installed earlier. You will also need to specify the python-version for the container and any other python packages, such as vllm which are needed by predict.py. Use == version specifications for build reproducibility.

requirements.txt

This file is generated from the pyproject.toml file to capture the python package dependencies with exact versions.

uv pip compile pyproject.toml --output-file requirements.txt

If you need to add more packages, edit the pyproject.toml file and rerun the uv pip compile command.

Once you have built the requirements.txt file, you will want to create a virtual env and install the packages in the requirements.txt file into the virtual env. Use this virtual env in VSCode to enable code completion/etc.

Just don't put the virtual env folder in the model folder or cog will include it in the container image which we don't want.

(cd .. && uv venv --python 3.11 venv)
source ../venv/bin/activate
uv pip install --requirements requirements.txt

Make sure to use the same python version as specified in pyproject.toml.

cog.yaml

Edit the image key to the full name of the image to use when pushing to Replicate. For example,

image: "r8.im/ibm-granite/granite-3.3-8b-instruct:1.0.0"

Make sure to update the image version if you have already pushed that version.

Also update any other versions, cuda, python, as needed. Make sure to use the same python version as specified in pyproject.toml.

predictor_config.json

This file needs to be edited to specify the served_model_name in the engine_arg and any other desired engine args for vLLM.

predict.py

This module contains the setup and predict methods invoked to setup and infer the model. This code may need changes to support certain models and their parameters such as multimodal parameters.

weights

Model weights are packaged in the container. They need to be downloaded into the weights folder of the model folder. For example,

hf download --local-dir weights ibm-granite/granite-3.3-8b-instruct

These model weight files are not committed to the git repo but are added to the container image.

Building

To build the container, use the following command.

cog build --progress plain --separate-weights --use-cog-base-image

When the build is done, 2 docker images will be created. One for the weights and another for the rest which includes the weights as a layer.

➜ docker image ls
REPOSITORY                                  TAG             IMAGE ID       CREATED       SIZE
r8.im/ibm-granite/granite-3.3-8b-instruct   1.0.0           d3f4991be79d   3 hours ago   34.2GB
r8.im/ibm-granite/granite-3.3-8b-instruct   1.0.0-weights   d643be567003   3 hours ago   16.3GB

Testing

To test the container, you can use the cog predict command.

cog predict r8.im/ibm-granite/granite-3.3-8b-instruct:1.0.0 --progress plain --gpus 1 -i "prompt=What is your name?"

You will need to specify the --gpus argument to ensure the container can access a GPU.

This will start the container, call setup, and call the predict method with the specified prompt.

To test using curl, you can start the container with

docker run --rm -p 5000:5000 --gpus 1 r8.im/ibm-granite/granite-3.3-8b-instruct:1.0.0

Then from another shell

curl -s http://localhost:5000/health-check | jq .
curl -s http://localhost:5000/openapi.json | jq .
curl -s http://localhost:5000/predictions -X POST -H 'Content-Type: application/json' -d '{"input": {"prompt": "Who is the all-time winner of the Masters Golf Tournament?"}}' | jq '.output | join("")'
curl -s http://localhost:5000/shutdown -X POST | jq .

Use LocalForward 5000 localhost:5000 in your .ssh/config file if you ssh into the build/docker host so you can curl from your local system.

Deploying

When you are ready to deploy to Replicate, you must first create the model on the Replicate web site if this is the first container deployment for the model.

Then login to Replicate and push the container to the Replicate container repository.

cog login
cog push --progress plain --separate-weights --use-cog-base-image

After the container is pushed, you will need to go to the Replicate web site and configure the model settings and create a deployment for the model.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
ibm-granite		ibm-granite
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model packaging for Replicate

Tools

cog

uv

docker

Model folder

pyproject.toml

requirements.txt

cog.yaml

predictor_config.json

predict.py

weights

Building

Testing

Deploying

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

bjhargrave/cog-models

Folders and files

Latest commit

History

Repository files navigation

Model packaging for Replicate

Tools

cog

uv

docker

Model folder

pyproject.toml

requirements.txt

cog.yaml

predictor_config.json

predict.py

weights

Building

Testing

Deploying

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages