Kubernetes Deployment Tutorial: Complete Guide

Deploying applications to Kubernetes has become the de facto standard for running containerized workloads at scale, yet most teams struggle with the gap between basic tutorials and production-ready configurations. A misconfigured Kubernetes deployment can lead to cascading failures during traffic spikes, silent data loss during rollouts, security vulnerabilities exposing sensitive workloads, and cost overruns from inefficient resource allocation. In 2025, with organizations running increasingly complex microservices architectures, AI inference workloads, and real-time data pipelines on Kubernetes, understanding production-grade deployment patterns is no longer optional.

The challenge isn't just getting containers running—it's ensuring they stay running reliably under production conditions. Traditional deployment approaches that worked for monolithic applications or simple containerized services fail when faced with modern requirements: zero-downtime deployments across multiple availability zones, automatic rollback on subtle performance degradations, fine-grained resource management for cost optimization, and compliance with evolving security standards like SLSA Level 3 and supply chain attestation requirements.

This kubernetes deployment tutorial walks through building production-ready deployments from first principles, explaining the architectural decisions that separate hobby projects from systems that handle millions of requests daily.

Why Basic Kubernetes Deployments Fail in Production

Most developers start with a minimal Deployment manifest that runs containers but lacks the resilience mechanisms required for production traffic. These basic configurations typically omit health checks, resource limits, pod disruption budgets, and proper security contexts—all critical for preventing outages.

The consequences manifest in predictable patterns: pods crash-looping during memory pressure because no limits were set, rolling updates causing brief outages because readiness probes weren't configured, and security incidents because containers ran as root with excessive privileges. In 2025, with Kubernetes clusters often running hundreds of microservices and supporting real-time AI inference endpoints, these gaps compound into systemic reliability issues.

Modern Kubernetes environments also face challenges that didn't exist in earlier iterations. Multi-tenancy requirements demand strict resource isolation and network policies. Compliance frameworks require immutable container images with verified signatures. Cost optimization requires precise resource requests that align with actual usage patterns, not guesswork. The deployment configuration itself must encode these requirements.

Anatomy of a Production-Grade Kubernetes Deployment

A production kubernetes deployment configuration addresses reliability, security, and operational concerns through multiple layers of specification. Let's build one component by component, starting with the fundamental structure and progressively adding production requirements.

Core Deployment Structure

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: production
  labels:
    app: api-service
    version: v2.4.1
    team: platform
  annotations:
    deployment.kubernetes.io/revision: "12"
    kubernetes.io/change-cause: "Update to v2.4.1 with performance optimizations"
spec:
  replicas: 5
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
        version: v2.4.1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
    spec:
      serviceAccountName: api-service-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: api
        image: registry.company.com/api-service:v2.4.1@sha256:abc123...
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        - name: metrics
          containerPort: 9090
          protocol: TCP
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: database-url
        - name: LOG_LEVEL
          value: "info"
        - name: MAX_CONNECTIONS
          value: "100"
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            httpHeaders:
            - name: X-Health-Check
              value: liveness
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        startupProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 0
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 30
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir:
          sizeLimit: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - api-service
              topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api-service

This configuration embeds multiple production requirements. The image reference uses a digest rather than just a tag, ensuring immutability and preventing tag-based attacks. The security context enforces non-root execution and drops all Linux capabilities, reducing the attack surface. Resource requests and limits prevent resource contention and enable proper bin-packing.

Health Checks and Lifecycle Management

The three-probe pattern—startup, liveness, and readiness—addresses different failure modes. Startup probes handle slow-starting applications without triggering premature restarts. Liveness probes detect deadlocked processes that need restarting. Readiness probes prevent traffic routing to pods that aren't ready to serve requests.

Many teams configure these incorrectly, using the same endpoint for all probes or setting timeouts too aggressively. A liveness probe that fails during temporary database connection issues will cause unnecessary pod restarts, amplifying the problem. The readiness probe should fail fast when dependencies are unavailable, while the liveness probe should only fail when the application itself is unrecoverable.

# Separate health check implementation in application code (TypeScript example)
import express from 'express';
import { DatabaseConnection } from './database';
import { CacheClient } from './cache';

const app = express();
let isShuttingDown = false;

// Liveness: Only checks if the application process is responsive
app.get('/healthz', (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: 'shutting_down' });
  }
  res.status(200).json({ status: 'alive' });
});

// Readiness: Checks if the application can serve traffic
app.get('/ready', async (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: 'not_ready', reason: 'shutting_down' });
  }

  try {
    // Check critical dependencies
    await Promise.all([
      DatabaseConnection.ping({ timeout: 2000 }),
      CacheClient.ping({ timeout: 1000 })
    ]);

    res.status(200).json({ status: 'ready' });
  } catch (error) {
    res.status(503).json({ 
      status: 'not_ready', 
      reason: 'dependency_unavailable',
      error: error.message 
    });
  }
});

// Graceful shutdown handling
process.on('SIGTERM', async () => {
  isShuttingDown = true;
  console.log('Received SIGTERM, starting graceful shutdown');

  // Stop accepting new requests
  server.close(async () => {
    // Close database connections
    await DatabaseConnection.close();
    await CacheClient.disconnect();
    process.exit(0);
  });

  // Force shutdown after 30 seconds
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 30000);
});

Deployment Strategies and Rollout Control

The rolling update strategy with maxSurge: 2 and maxUnavailable: 1 ensures capacity remains above baseline during deployments. For a 5-replica deployment, Kubernetes can create up to 7 pods during rollout but will never drop below 4 available pods. This prevents capacity degradation during deployments.

For more sophisticated rollout control, progressive delivery patterns using tools like Argo Rollouts or Flagger enable canary deployments with automatic rollback based on metrics:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
spec:
  replicas: 5
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 5m}
      - setWeight: 40
      - pause: {duration: 5m}
      - setWeight: 60
      - pause: {duration: 5m}
      - setWeight: 80
      - pause: {duration: 5m}
      analysis:
        templates:
        - templateName: error-rate-analysis
        startingStep: 2
        args:
        - name: service-name
          value: api-service
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: api-service
  template:
    # Same pod template as before

This canary strategy gradually shifts traffic while monitoring error rates. If error rates exceed thresholds defined in the analysis template, the rollout automatically reverts to the previous version.

Resource Management and Autoscaling

Proper resource configuration prevents both resource starvation and waste. Resource requests determine scheduling decisions and quality-of-service class. Resource limits prevent runaway processes from affecting other workloads.

In 2025, with FinOps practices becoming standard, teams use Vertical Pod Autoscaler (VPA) recommendations to right-size resource requests based on actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 256Mi
      maxAllowed:
        cpu: 2000m
        memory: 2Gi
      controlledResources: ["cpu", "memory"]

Horizontal Pod Autoscaler (HPA) handles traffic-based scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 5
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max

The scaling behavior configuration prevents thrashing by limiting scale-down rate and enabling aggressive scale-up during traffic spikes.

Security Hardening and Compliance

Modern kubernetes production deployments must address supply chain security, runtime security, and network isolation. The security context in the deployment manifest enforces baseline protections, but comprehensive security requires additional layers.

Pod Security Standards enforcement at the namespace level prevents deployment of non-compliant workloads:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Network policies restrict traffic to only necessary communication paths:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-service-netpol
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-service
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: production
      podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - namespaceSelector:
        matchLabels:
          name: production
      podSelector:
        matchLabels:
          app: redis
    ports:
    - protocol: TCP
      port: 6379
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53

Observability and Debugging

Production deployments require comprehensive observability. Beyond basic logging, structured events and metrics enable rapid troubleshooting:

import { Logger } from 'winston';
import { Counter, Histogram, register } from 'prom-client';

// Metrics for monitoring deployment health
const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5]
});

const httpRequestTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

const deploymentInfo = new Counter({
  name: 'deployment_info',
  help: 'Deployment metadata',
  labelNames: ['version', 'commit_sha', 'build_timestamp']
});

// Initialize deployment metadata
deploymentInfo.inc({
  version: process.env.APP_VERSION || 'unknown',
  commit_sha: process.env.GIT_COMMIT || 'unknown',
  build_timestamp: process.env.BUILD_TIMESTAMP || 'unknown'
});

// Middleware for request tracking
export function metricsMiddleware(req, res, next) {
  const start = Date.now();

  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const labels = {
      method: req.method,
      route: req.route?.path || 'unknown',
      status_code: res.statusCode
    };

    httpRequestDuration.observe(labels, duration);
    httpRequestTotal.inc(labels);
  });

  next();
}

Common Pitfalls and Failure Modes

Insufficient Resource Limits: Deployments without memory limits can trigger OOM kills on nodes, affecting other workloads. Always set limits based on observed peak usage plus headroom.

Missing PodDisruptionBudgets: Without PDBs, cluster maintenance or node failures can take down all replicas simultaneously. Define PDBs to ensure minimum availability:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-service-pdb
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: api-service

Incorrect Probe Configuration: Probes that check external dependencies in liveness checks cause cascading failures. Liveness should only verify the application process health.

Ignoring Graceful Shutdown: Applications that don't handle SIGTERM properly cause connection errors during deployments. Implement proper shutdown handlers that stop accepting new requests, drain existing connections, and close resources.

Single Availability Zone Deployments: Without topology spread constraints, all pods might land in one zone. Use topologySpreadConstraints to enforce multi-zone distribution.

Mutable Image Tags: Using latest or mutable tags prevents reliable rollbacks and creates security risks. Always use immutable digests in production.

Excessive Privilege: Running containers as root or with unnecessary capabilities expands the attack surface. Drop all capabilities and use read-only root filesystems where possible.

Best Practices Checklist

Use immutable image references with SHA256 digests, not mutable tags
Implement all three probe types with appropriate timeouts and failure thresholds
Set resource requests and limits based on actual usage patterns, not guesses
Configure PodDisruptionBudgets to maintain minimum availability during disruptions
Enforce security contexts with non-root users, dropped capabilities, and read-only filesystems
Implement graceful shutdown handlers that respect SIGTERM and drain connections
Use topology spread constraints to distribute pods across availability zones
Apply network policies to restrict traffic to necessary paths only
Enable structured logging with correlation IDs for request tracing
Expose Prometheus metrics for monitoring deployment health and performance
Configure HPA and VPA for automatic scaling based on load and resource optimization
Maintain revision history with meaningful change-cause annotations
Test rollback procedures regularly to ensure they work under pressure
Implement progressive delivery for high-risk deployments with automatic rollback
Use separate service accounts with minimal RBAC permissions per deployment

FAQ

What is the difference between a Kubernetes Deployment and a Pod?

A Pod is the smallest deployable unit in Kubernetes, representing one or more containers that share resources. A Deployment is a higher-level controller that manages Pods, providing declarative updates, scaling, and self-healing capabilities. Deployments handle rolling updates, rollbacks, and ensure the desired number of Pod replicas are running. You should almost never create Pods directly in production—always use Deployments or other controllers.

How does Kubernetes rolling update work in 2025?

Kubernetes Deployment: Complete Guide

Kubernetes Deployment Tutorial: Complete Guide

Why Basic Kubernetes Deployments Fail in Production

Anatomy of a Production-Grade Kubernetes Deployment

Core Deployment Structure

Health Checks and Lifecycle Management

Deployment Strategies and Rollout Control

Resource Management and Autoscaling

Security Hardening and Compliance

Observability and Debugging

Common Pitfalls and Failure Modes

Best Practices Checklist

FAQ

Comments

More from this blog

Embedding-First Architecture for Real-World LLM Apps

AI/ML Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Command Palette

Kubernetes Deployment Tutorial: Complete Guide

Why Basic Kubernetes Deployments Fail in Production

Anatomy of a Production-Grade Kubernetes Deployment

Core Deployment Structure

Health Checks and Lifecycle Management

Deployment Strategies and Rollout Control

Resource Management and Autoscaling

Security Hardening and Compliance

Observability and Debugging

Common Pitfalls and Failure Modes

Best Practices Checklist

FAQ

Comments

More from this blog