Running CosmicAC
Deploy and operate the CosmicAC Docker Compose stack.
This guide shows you how to deploy and operate the CosmicAC stack with Docker Compose.
Stack
The stack runs six application services, plus Redis and Caddy.
cosmicac-wrk-orkcosmicac-app-node(port 3000)cosmicac-proxy-inference-{http,hrpc}cosmicac-wrk-server-k8s-nvidiacosmicac-uirediscaddy(port 5173)
The CosmicAC application services use private images published under ghcr.io/tetherto/ (ghcr.io/tetherto/<repo>:<tag>). Redis and Caddy use their official public images.
Prerequisites
Complete these prerequisites before you deploy the stack.
The recommended host is Ubuntu 22.04 or 24.04 LTS on x86_64. On that host, you need the following.
- Docker Engine and Docker Compose v2.
- Task.
- The
jq,node, andkubectlcommand-line tools. - Access to the CosmicAC deployment repo, which holds the Compose files and deploy scripts.
- GHCR credentials for private images.
- Your GitHub username.
- A GitHub personal access token (classic) with the
read:packagesscope, able to pull the privateghcr.io/tethertoimages. If the tetherto org enforces SSO, authorize the token for the org. See Managing your personal access tokens.
- A GPU Kubernetes cluster that already meets the Requirements. CosmicAC connects to this cluster, it doesn't set it up.
- A valid, readable kubeconfig file for the GPU Kubernetes cluster. This is mandatory. Bootstrap stops immediately if the file is missing or invalid.
The following commands install the base packages and Task on Ubuntu.
sudo apt-get update
sudo apt-get install -y ca-certificates curl jq
sh -c "$(curl -ssL https://taskfile.dev/install.sh)" -- -d -b "$HOME/.local/bin"
export PATH="$HOME/.local/bin:$PATH"After you install Docker Engine, Compose v2, Node.js, and kubectl from their official repositories, enable Docker and verify the tools.
sudo systemctl enable --now docker
docker compose version
task --version
node --version
kubectl version --client1. Set up the environment
Clone the deployment repo and change into it.
git clone https://github.com/tetherto/<deployment-repo>.git
cd <deployment-repo>Create the .env file from the example.
cp .env.example .envSet the variables in .env.
| Variable | Value |
|---|---|
TAG | release image tag to deploy, for example release-1.0.0 |
GITHUB_USER, GITHUB_PAT | GHCR credentials. PAT needs read:packages |
DOCKER_PLATFORM | linux/amd64 |
CONTAINER_UID / CONTAINER_GID | 0:0 for rootless Docker, or host UID:GID for rootful |
KUBECONFIG_SRC | required absolute host path to the kubeconfig file |
K8S_KUBECONFIG | optional custom in-container path. Omit it to use /app/kube/config |
K8S_NAMESPACE | Kubernetes namespace for job resources (default cosmic-ac) |
K8S_WORKER_TYPE / K8S_WORKER_ENV / K8S_WORKER_ARGS | wrk-server-rack-kv, development, "--rack rack-0" |
K8S_MQ_TOPIC | job-workflow-<env> (for example job-workflow-dev) |
K8S_RACK_ID / K8S_RACK_LOCATION / K8S_GPU_PRICE | rack id (rack-0), location (IN), GPU price |
BOOTSTRAP_ADMIN_EMAIL / BOOTSTRAP_ADMIN_PASSWORD | optional first-run UI login seeded by task bootstrap |
BOOTSTRAP_ADMIN_ROLES | * for a bootstrap admin |
PROXY_AUTOBASE_KEY | leave blank. wire or bootstrap sets it automatically |
2. Generate the service config
Generate the per-service config files, then add the secrets config-init can't generate for you. Run these from the deployment directory.
task login # skip if GITHUB_PAT is set in .env
task config-initconfig-init pulls each image, writes its default config to services/<repo>/config/*.json, and auto-generates the app-node secrets and the proxy HRPC keypair. When you bootstrap, the wire step links the services' shared keys automatically.
Edit the generated files to provide the secrets that are specific to you.
app-node
| File | Set these keys |
|---|---|
| config/common.json | pushNotification.vapid.*, passkey.* (your domain) |
| config/facs/httpd-oauth2.config.json | Google OAuth clientId, clientSecret, callbackUrl, sessionSecret, cookieDomain |
| config/facs/auth.config.json | a0.superAdmin (email) |
proxy-inference
| File | Set these keys |
|---|---|
| config/common.json | topicConf.capability, topicConf.crypto.key |
3. Run the first-time bootstrap
task bootstrap deploys the whole stack with one command, using TAG from .env. It generates the service config, starts and wires the services, then runs the post-deploy tasks. It logs in to GHCR only if it needs to pull a missing image, so a locally built stack deploys offline.
Set KUBECONFIG_SRC to the absolute host path of your kubeconfig in .env before you bootstrap. Bootstrap copies the file to services/cosmicac-wrk-server-k8s-nvidia/kube/config and flattens it into the clusters/users/contexts structure the upstream worker requires. The file stays on your host and is never built into an image.
To deploy release images from GHCR on Ubuntu, run the following steps.
-
Verify the kubeconfig source.
test -s "$KUBECONFIG_SRC" kubectl --kubeconfig "$KUBECONFIG_SRC" config current-context kubectl --kubeconfig "$KUBECONFIG_SRC" cluster-info -
Run the bootstrap. It uses
TAGfrom.env, for examplerelease-1.0.0.task bootstrap
Bootstrap performs its own availability and structure checks. A missing kubeconfig path stops bootstrap with an error, not just a warning.
Kubeconfig requirements
The kubeconfig is a standard Kubernetes YAML file. Your cluster administrator normally provides it, or you can generate it with kubectl config view --raw --flatten. It must contain the following.
apiVersion: v1
kind: Config
current-context: <context-name>
clusters:
- name: <cluster-name>
cluster:
server: https://<kubernetes-api>
certificate-authority-data: <base64-ca>
users:
- name: <user-name>
user:
client-certificate-data: <base64-certificate>
client-key-data: <base64-private-key>
contexts:
- name: <context-name>
context:
cluster: <cluster-name>
user: <user-name>Token-based user credentials also work. Flatten paths such as certificate-authority, client-certificate, and client-key into inline *-data fields so they stay valid after Bootstrap copies the file into the container.
4. Verify the deployment
Run these checks. All services should report Up, and the authenticated API calls should return data.
task ps # all services Up
curl -s4 -o /dev/null -w '%{http_code}\n' http://127.0.0.1:5173/ # 200
TOKEN=$(curl -s4 -X POST http://127.0.0.1:5173/api/login -H 'content-type: application/json' -d "{\"email\":\"${BOOTSTRAP_ADMIN_EMAIL}\",\"password\":\"${BOOTSTRAP_ADMIN_PASSWORD}\"}" | jq -r .token)
curl -s4 "http://127.0.0.1:5173/api/auth/servers?overwrite_cache=true" -H "ttr-token: $TOKEN"
curl -s4 "http://127.0.0.1:5173/api/auth/pricing?location=IN&type=gpu&hrs=2" -H "ttr-token: $TOKEN"
curl -s4 "http://127.0.0.1:5173/api/auth/jobs?page=1&pageSize=10" -H "ttr-token: $TOKEN"The exact values depend on your .env, so treat these as reference numbers. With the reference config (K8S_RACK_LOCATION=IN, an 8-GPU node, and K8S_GPU_PRICE=1), the authenticated calls return the following.
serversreturns{"data":[{"location":"IN","gpu":{"available":8}}]}(your rack and its available GPUs).pricing(withhrs=2) returns{"total_cost":2}(the hourly price times the hours).jobsreturns an empty list on a fresh deployment.
The deployment is healthy when servers lists your GPUs, pricing returns a total_cost instead of an error, and jobs responds.
5. Stop and start the stack
Stop containers without deleting them.
cd <deployment-repo>
task stopStart those same containers again.
cd <deployment-repo>
task start
task psUse task down when you want to remove the containers and the Compose network. Configuration and worker state remain under services/. Recreate the removed containers with task up. For a new or reset installation, use task bootstrap, not task up, because bootstrap generates configuration and wires the services before starting the full stack.
To upgrade to a new image release, see Upgrades.