Here

Born

to build

what’s next. Together

Looking to join a remote team of high performers? Eager to be truly supported by AI and guided by purpose?

DevOps Lead

Location:

Remote/Hybrid (East Coast US Preferred)

Department:

Engineering

Reports To:

CTO

About Baryons

Baryons is at the forefront of AI-powered mentorship, designed to empower every member of an organization, from frontline employees to C-suite executives. Our mission is to bridge skill gaps and enhance decision-making through tailored guidance and actionable insights. We’re a fast-moving, values-driven team building scalable, high-impact products that connect cutting-edge research with real-world needs.

When it comes to building tech, Baryons is transforming how organizations leverage AI agents—including voice and other modalities—by building scalable, cloud-native products at the intersection of AI and real-world business value. Our team thrives on innovation, technical excellence, and rapid delivery.

Role Overview

As DevOps Lead, you’ll architect, implement, and maintain the cloud infrastructure and automation that powers our AI-driven applications. You’ll be hands-on with Azure and Google Cloud (GCP) environments, focusing on Kubernetes orchestration, scalability, automation, cost optimization, and security. You’ll guide DevOps best practices, lead our CI/CD initiatives, and ensure our systems are secure, reliable, and built to scale. Experience with LiveKit and real-time agent infrastructure is a strong plus.

Responsibilities


  • Lead the design, deployment, and management of scalable, secure infrastructure in Azure and Google Cloud (GCP).

  • Architect and manage Kubernetes clusters, ensuring high-availability, disaster recovery, and efficient orchestration of containerized workloads.

  • Build, configure, and maintain automation for infrastructure provisioning, application deployment, monitoring, and alerting (using tools like Terraform, Helm, etc.).

  • Implement and refine CI/CD pipelines for all engineering teams, ensuring rapid, safe, and repeatable delivery of code and AI models.

  • Monitor and optimize cloud infrastructure for cost management and resource utilization.

  • Develop and maintain comprehensive observability and logging systems (metrics, logs, tracing) to enable real-time monitoring, alerting, and performance optimization.

  • Implement and enforce cloud security best practices, secrets management, and compliance protocols (SOC2, HIPAA, or similar, as applicable).

  • Design and maintain disaster recovery, backup, and high-availability strategies for critical applications and data.

  • Champion DevOps best practices, automation, and a culture of ownership and operational excellence across the engineering organization.

  • Collaborate with software engineers, data scientists, and product leads to enable efficient development, deployment, and operation of AI- and voice-powered systems.

  • Ensure all infrastructure, automation, and deployment processes are well-documented and accessible.

  • Mentor and support junior DevOps and engineering team members.

  • Troubleshoot, resolve, and prevent production issues in a fast-paced environment.

Required Qualifications


  • 5+ years of experience in DevOps, Site Reliability Engineering, or related roles.

  • Deep expertise with Azure and Google Cloud (GCP) environments, including network, security, and storage services.

  • Proven experience architecting, deploying, and managing Kubernetes clusters in production environments.

  • Strong automation skills: Terraform (or equivalent IaC tools), Helm, and scripting languages (Python, Bash, etc.).

  • Demonstrated experience building and maintaining robust CI/CD pipelines (GitHub Actions, GitLab CI, or similar).

  • Solid understanding of cost optimization strategies for cloud-native applications.

  • Experience with monitoring, alerting, and observability (Prometheus, Grafana, Datadog, etc.).

  • Experience implementing security best practices and compliance protocols.

  • Experience designing and maintaining disaster recovery and high-availability solutions.

  • Excellent troubleshooting, communication, and collaboration skills.

Nice-to-Have


  • Experience with LiveKit and real-time agent infrastructure (LiveKit Agents, voice/video, WebRTC, etc.).

  • Background working with AI/ML model deployment and scaling in production.

  • Familiarity with additional clouds (AWS, OCI) or hybrid cloud architectures.

  • Prior experience in a startup or high-growth SaaS environment.

Join the Waitlist

Be among the first to lead with AI mentorship. Your Baryon AI Mentor is built to help your team think clearly, act intentionally, and grow with less friction. Join the waitlist to gain early access and shape the future of work.

Join the Waitlist

Be among the first to lead with AI mentorship. Your Baryon AI Mentor is built to help your team think clearly, act intentionally, and grow with less friction. Join the waitlist to gain early access and shape the future of work.

Join the Waitlist

Be among the first to lead with AI mentorship. Your Baryon AI Mentor is built to help your team think clearly, act intentionally, and grow with less friction. Join the waitlist to gain early access and shape the future of work.