GPU Cloud Platform

GPU compute,
unsigned complexity.

Kubernetes-native AI/ML platform. Deploy inference endpoints and training jobs on GPU clusters with scale-to-zero, zero-trust networking, and no vendor lock-in.

Request Access View Architecture
unsigned deploy
$ unsigned deploy --model llama-3.1-70b --gpu a100 --replicas 0:8
Resolving model registry...
Image pulled from Harbor (cached)
GPU pool: 8x A100-80GB MIG (3g.40gb partitions)
Autoscaler: KEDA + Kueue (0 → 8 replicas)
Network: Cilium eBPF + WireGuard mTLS
Endpoint: llama-70b.unsigned.gg

Deployed. Scale-to-zero active. First request wakes in ~4s.
$

Infrastructure that gets out of your way.

GPU

Scale to zero, scale to thousands

KEDA + Kueue autoscaling with MIG-isolated GPU partitions. Pay nothing when idle. Burst to full cluster capacity on demand. No cold start tax beyond first request.

NET

Zero-trust by default

Cilium eBPF dataplane with WireGuard-encrypted pod-to-pod traffic. Network policies enforced at the kernel level. mTLS everywhere, no sidecars.

OBS

Full observability stack

Prometheus, Grafana, Loki, and Jaeger pre-configured. GPU utilization, inference latency, and cost dashboards out of the box. Alert on what matters.

SEC

Secrets & policy as code

HashiCorp Vault for secrets management. OPA Gatekeeper for admission control. External Secrets Operator syncs credentials. Nothing hardcoded, ever.

GIT

GitOps everything

ArgoCD manages the full stack. Every change is a pull request. Every deployment is auditable. Rollback is a git revert.

API

OpenAI-compatible endpoints

NVIDIA Dynamo + Triton inference server. Drop-in replacement for OpenAI API. Bring your own models or pull from our registry. One endpoint, any framework.

Production-grade, opinionated stack.

Compute
Vultr VKE
Orchestration
Kubernetes
IaC
Terraform
GitOps
ArgoCD
Network
Cilium eBPF
Ingress
Traefik
API Gateway
Kong
Encryption
WireGuard mTLS
GPU Inference
Dynamo + Triton
Autoscaling
KEDA + Kueue
Secrets
Vault + ESO
Policy
OPA Gatekeeper
Auth
Keycloak
Registry
Harbor
Monitoring
Prometheus + Grafana
Tracing
Jaeger

From request to inference.

Every layer is replaceable. No proprietary glue. Fork the stack and run it yourself.

Request Flow Client │ ▼ Traefik (TLS termination, rate limiting) │ ▼ Kong (API gateway, auth, quota enforcement) │ ▼ Cilium (eBPF network policy + WireGuard encryption) │ ├─── Control Plane API (Go, manages deployments/scaling) │ │ │ ▼ │ Keycloak (OIDC, RBAC, tenant isolation) │ └─── Inference Plane │ ▼ KEDA (scale 0 → N from queue depth / HTTP RPS) │ ▼ Kueue (GPU quota, fair scheduling, preemption) │ ▼ Dynamo + Triton (model serving, batching, MIG isolation) │ ▼ A100 / H100 GPU (MIG-partitioned, scale-to-zero)

Pay for compute, not complexity.

No platform fees. No egress charges. No surprise bills. You pay for GPU-seconds consumed. Scale to zero = $0.

A100 80GB (MIG 3g.40gb) Coming soon
A100 80GB (full) Coming soon
H100 80GB (MIG) Coming soon
H100 80GB (full) Coming soon
CPU (general) Coming soon

Get on the list.

We're onboarding a small group of teams for the private beta. Founding users get locked-in pricing and direct access to the engineering team.

✓ You're in. We'll reach out soon.

No spam. Early access and launch updates only.