Shivam Gupta

Cloud Computing & AI/ML Infrastructure Expert with 7+ years of experience building scalable, enterprise-grade solutions at Microsoft Azure

Professional Summary

Driving next-generation cloud infrastructure and AI/ML platforms at Microsoft Azure

☁️

7+ Years of Experience

Deep expertise in cloud computing, distributed systems, container orchestration, and AI/ML infrastructure at Microsoft Azure.

🚀

Azure Container Orchestration Leader

Led the end-to-end architecture and delivery of Azure Container Instance Orchestration Platform with advanced orchestration features—auto-scaling, load balancing, rolling upgrades, and self-healing—for enterprise-grade container workloads.

🤖

AI/ML Infrastructure Architect

Driving AI/ML infrastructure for Azure AI and Copilot teams, enabling GPU-backed container orchestration, high-throughput pipelines, and optimized compute environments for large-scale model training and inference.

🔗

Open-Source Integration

Integrated the serverless containers platform into key open-source projects, accelerating adoption and driving onboarding of high-value enterprise customers.

Massive Scale Infrastructure

Designed and scaled serverless container infrastructure supporting 500M+ container workloads globally, building key features such as confidential containers, networking, overcapacity allocation, and E2E testing frameworks.

🛡️

Scalable & Resilient Systems Expert

Expert in building scalable, resilient services with strong focus on reliability, operational excellence, and distributed system design for next-generation AI workloads.

Professional Experience

Software Engineer 2

Microsoft | Redmond, WA
June 2020 – Present
  • Led end-to-end development of Azure serverless container orchestration platform with auto-scaling, rolling upgrades, load balancing, and self-repair features
  • Drove AI/ML infrastructure for Azure AI and Copilot teams, designing GPU allocation strategies and high-throughput compute pipelines
  • Scaled Azure Containers platform to support 500M+ container workloads globally
  • Contributing to Radius open-source project, collaborating with Azure CTO's team
  • Built monitoring platforms, E2E testing frameworks, and alerting systems improving service reliability

Software Engineer

Wayfair | Boston, MA
Feb 2019 – Jun 2020
  • Designed and developed full-stack order management platform for call center teams
  • Built automated resolution engine providing financially optimal solutions for customer issues
  • Improved order resolution time by 20% through system enhancements and workflow optimization
  • Collaborated with cross-functional teams to boost efficiency and enhance customer satisfaction

Software Engineer

Egen Solutions | Denver, CO
Apr 2018 – Feb 2019
  • Developed real-time metrics and analytics platform for business teams
  • Designed RESTful microservices using Spring Boot, Java, and Apache Spark
  • Configured and optimized Spark clusters, improving data processing efficiency by 30%
  • Contributed to data-driven insights and reporting pipelines supporting critical business decisions

Technical Expertise

Cloud & Distributed Systems

Microsoft Azure Azure Container Instances Kubernetes VMSS & VMSS Flex Serverless Computing

AI/ML Infrastructure

GPU Scheduling Model Training Pipelines High-Throughput Compute AI Inference Optimization

Containers & Orchestration

Docker Kafka Container Orchestration Confidential Containers SwiftV2 Networking

DevOps & Programming

C# Python Go CI/CD Pipelines Infrastructure-as-Code Kusto (KQL)

Get In Touch

Phone

650-431-4451

Location

Redmond, WA