Running CosmicAC

This guide shows you how to deploy and operate the CosmicAC stack with Docker Compose.

Stack

The stack runs six application services, plus Redis and Caddy.

cosmicac-wrk-ork
cosmicac-app-node (port 3000)
cosmicac-proxy-inference-{http,hrpc}
cosmicac-wrk-server-k8s-nvidia
cosmicac-ui
redis
caddy (port 5173)

The CosmicAC application services use private images published under ghcr.io/tetherto/ (ghcr.io/tetherto/<repo>:<tag>). Redis and Caddy use their official public images.

Prerequisites

Complete these prerequisites before you deploy the stack.

The recommended host is Ubuntu 22.04 or 24.04 LTS on x86_64. On that host, you need the following.

Docker Engine and Docker Compose v2.
Task.
The jq, node, and kubectl command-line tools.
Access to the CosmicAC deployment repo, which holds the Compose files and deploy scripts.
GHCR credentials for private images.
- Your GitHub username.
- A GitHub personal access token (classic) with the read:packages scope, able to pull the private ghcr.io/tetherto images. If the tetherto org enforces SSO, authorize the token for the org. See Managing your personal access tokens.
A GPU Kubernetes cluster that already meets the Requirements. CosmicAC connects to this cluster, it doesn't set it up.
A valid, readable kubeconfig file for the GPU Kubernetes cluster. This is mandatory. Bootstrap stops immediately if the file is missing or invalid.

The following commands install the base packages and Task on Ubuntu.

sudo apt-get update
sudo apt-get install -y ca-certificates curl jq
sh -c "$(curl -ssL https://taskfile.dev/install.sh)" -- -d -b "$HOME/.local/bin"
export PATH="$HOME/.local/bin:$PATH"

After you install Docker Engine, Compose v2, Node.js, and kubectl from their official repositories, enable Docker and verify the tools.

sudo systemctl enable --now docker
docker compose version
task --version
node --version
kubectl version --client

1. Set up the environment

Clone the deployment repo and change into it.

git clone https://github.com/tetherto/<deployment-repo>.git
cd <deployment-repo>

Create the .env file from the example.

cp .env.example .env

Set the variables in .env.

Variable	Value
`TAG`	release image tag to deploy, for example `release-1.0.0`
`GITHUB_USER`, `GITHUB_PAT`	GHCR credentials. PAT needs `read:packages`
`DOCKER_PLATFORM`	`linux/amd64`
`CONTAINER_UID` / `CONTAINER_GID`	`0:0` for rootless Docker, or host `UID:GID` for rootful
`KUBECONFIG_SRC`	required absolute host path to the kubeconfig file
`K8S_KUBECONFIG`	optional custom in-container path. Omit it to use `/app/kube/config`
`K8S_NAMESPACE`	Kubernetes namespace for job resources (default `cosmic-ac`)
`K8S_WORKER_TYPE` / `K8S_WORKER_ENV` / `K8S_WORKER_ARGS`	`wrk-server-rack-kv`, `development`, `"--rack rack-0"`
`K8S_MQ_TOPIC`	`job-workflow-<env>` (for example `job-workflow-dev`)
`K8S_RACK_ID` / `K8S_RACK_LOCATION` / `K8S_GPU_PRICE`	rack id (`rack-0`), location (`IN`), GPU price
`BOOTSTRAP_ADMIN_EMAIL` / `BOOTSTRAP_ADMIN_PASSWORD`	optional first-run UI login seeded by `task bootstrap`
`BOOTSTRAP_ADMIN_ROLES`	`*` for a bootstrap admin
`PROXY_AUTOBASE_KEY`	leave blank. `wire` or `bootstrap` sets it automatically

2. Generate the service config

Generate the per-service config files, then add the secrets config-init can't generate for you. Run these from the deployment directory.

task login        # skip if GITHUB_PAT is set in .env
task config-init

config-init pulls each image, writes its default config to services/<repo>/config/*.json, and auto-generates the app-node secrets and the proxy HRPC keypair. When you bootstrap, the wire step links the services' shared keys automatically.

Edit the generated files to provide the secrets that are specific to you.

app-node

File	Set these keys
config/common.json	`pushNotification.vapid.`, `passkey.` (your domain)
config/facs/httpd-oauth2.config.json	Google OAuth `clientId`, `clientSecret`, `callbackUrl`, `sessionSecret`, `cookieDomain`
config/facs/auth.config.json	`a0.superAdmin` (email)

proxy-inference

File	Set these keys
config/common.json	`topicConf.capability`, `topicConf.crypto.key`

3. Run the first-time bootstrap

task bootstrap deploys the whole stack with one command, using TAG from .env. It generates the service config, starts and wires the services, then runs the post-deploy tasks. It logs in to GHCR only if it needs to pull a missing image, so a locally built stack deploys offline.

Set KUBECONFIG_SRC to the absolute host path of your kubeconfig in .env before you bootstrap. Bootstrap copies the file to services/cosmicac-wrk-server-k8s-nvidia/kube/config and flattens it into the clusters/users/contexts structure the upstream worker requires. The file stays on your host and is never built into an image.

To deploy release images from GHCR on Ubuntu, run the following steps.

Verify the kubeconfig source.

test -s "$KUBECONFIG_SRC"
kubectl --kubeconfig "$KUBECONFIG_SRC" config current-context
kubectl --kubeconfig "$KUBECONFIG_SRC" cluster-info

Run the bootstrap. It uses TAG from .env, for example release-1.0.0.
```
task bootstrap
```

Bootstrap performs its own availability and structure checks. A missing kubeconfig path stops bootstrap with an error, not just a warning.

Kubeconfig requirements

The kubeconfig is a standard Kubernetes YAML file. Your cluster administrator normally provides it, or you can generate it with kubectl config view --raw --flatten. It must contain the following.

apiVersion: v1
kind: Config
current-context: <context-name>
clusters:
  - name: <cluster-name>
    cluster:
      server: https://<kubernetes-api>
      certificate-authority-data: <base64-ca>
users:
  - name: <user-name>
    user:
      client-certificate-data: <base64-certificate>
      client-key-data: <base64-private-key>
contexts:
  - name: <context-name>
    context:
      cluster: <cluster-name>
      user: <user-name>

Token-based user credentials also work. Flatten paths such as certificate-authority, client-certificate, and client-key into inline *-data fields so they stay valid after Bootstrap copies the file into the container.

4. Verify the deployment

Run these checks. All services should report Up, and the authenticated API calls should return data.

task ps                                          # all services Up
curl -s4 -o /dev/null -w '%{http_code}\n' http://127.0.0.1:5173/   # 200
TOKEN=$(curl -s4 -X POST http://127.0.0.1:5173/api/login -H 'content-type: application/json' -d "{\"email\":\"${BOOTSTRAP_ADMIN_EMAIL}\",\"password\":\"${BOOTSTRAP_ADMIN_PASSWORD}\"}" | jq -r .token)
curl -s4 "http://127.0.0.1:5173/api/auth/servers?overwrite_cache=true" -H "ttr-token: $TOKEN"
curl -s4 "http://127.0.0.1:5173/api/auth/pricing?location=IN&type=gpu&hrs=2" -H "ttr-token: $TOKEN"
curl -s4 "http://127.0.0.1:5173/api/auth/jobs?page=1&pageSize=10" -H "ttr-token: $TOKEN"

The exact values depend on your .env, so treat these as reference numbers. With the reference config (K8S_RACK_LOCATION=IN, an 8-GPU node, and K8S_GPU_PRICE=1), the authenticated calls return the following.

servers returns {"data":[{"location":"IN","gpu":{"available":8}}]} (your rack and its available GPUs).
pricing (with hrs=2) returns {"total_cost":2} (the hourly price times the hours).
jobs returns an empty list on a fresh deployment.

The deployment is healthy when servers lists your GPUs, pricing returns a total_cost instead of an error, and jobs responds.

5. Stop and start the stack

Stop containers without deleting them.

cd <deployment-repo>
task stop

Start those same containers again.

cd <deployment-repo>
task start
task ps

Use task down when you want to remove the containers and the Compose network. Configuration and worker state remain under services/. Recreate the removed containers with task up. For a new or reset installation, use task bootstrap, not task up, because bootstrap generates configuration and wires the services before starting the full stack.

To upgrade to a new image release, see Upgrades.

On this page