DevOps/Platform Engineer

Permanent employee, Full-time · UK Remote

Your role
Overview:
  • Build and run a reliable platform for services and data workflows across Kubernetes and Prefect.
  • Own CI/CD, observability, security, and developer experience for Python/Go/Rust services.
Responsibilities:
  • Design, provision, and operate Kubernetes workloads (deployments, networking, autoscaling, storage).
  • Build and maintain GitLab CI/CD pipelines for Python, Go, and Rust services (build, test, scan, release).
  • Operate Prefect (agents, work queues, deployments, concurrency limits, task execution environments).
  • Implement environment strategy and promotion flow (dev/staging/prod) with clear release gates.
  • Create golden paths and templates for FastAPI microservices and Prefect flows.
  • Manage secrets, configuration, and access (e.g., GitLab variables, K8s secrets).
  • Establish observability: logging, metrics, traces, alerting, runbooks, and SLOs.
  • Operate data stores (MySQL, PostgreSQL, Redis): provisioning, backups, migration execution, monitoring, and capacity planning.
  • Optimise build and runtime costs (container images, caching, autoscaling, resource requests/limits).
  • Lead incident response, postmortems, and reliability improvements.
Your profile
You have:
  • 4+ years in DevOps/SRE/Platform roles with production Kubernetes.
  • Strong GitLab CI/CD experience (pipelines, runners, caching, artifact management).
  • Proficiency with containers and image optimization; comfortable with Linux internals and networking.
  • Hands-on with Prefect in production (deployments, flow orchestration, storage, results).
  • Familiar with operating MySQL/PostgreSQL/Redis in production (availability, performance, backups).
  • Scripting/automation with Python or Go; ability to read Rust build pipelines.
  • Solid understanding of security fundamentals (least privilege, image scanning, SBOM, secret hygiene).
  • Experience instrumenting systems and creating actionable alerts.
Nice to have:
  • Helm/Kustomize, policy-as-code (OPA), and basic gRPC.
  • Performance tuning for high‑throughput data or API services.
  • Experience in multi‑tenant or multi‑cluster environments.
About us
At Stelia, we are building the AI Operating System for a distributed, intelligent world. Our mission is to dismantle the boundaries between humanity and technology by creating an Enterprise AI designed for trust, resilience, and scale.
We look forward to hearing from you!
Uploading document. Please wait.
Please add all mandatory information with a * to send your application.