Kubernetes Deployment: Complete Guide
Welcome to TopperBlog! 👋
I'm a tech content creator passionate about helping developers level up their careers and master cutting-edge technologies.
🎯 What I Write About:
• AI/ML Engineering & LLMs
• Web3 & Blockchain Development
• System Design & Architecture
• Interview Preparation (FAANG)
• Freelancing & Remote Work
• Modern Tech Stacks (Next.js, React, Rust, TypeScript)
• Performance Optimization & Best Practices
💼 Mission: Sharing practical, actionable insights that accelerate your tech career and maximize your earning potential.
📚 15+ In-Depth Guides covering everything from earning $10k/month as a freelancer to cracking FAANG interviews.
🌐 Let's connect and grow together in this amazing tech journey!
#TechBlogger #SoftwareEngineering #CareerGrowth #WebDevelopment #AIEngineering
Kubernetes Deployment Tutorial: Complete Guide
Deploying applications to Kubernetes has become the de facto standard for running containerized workloads at scale, yet most teams struggle with the gap between basic tutorials and production-ready configurations. A misconfigured Kubernetes deployment can lead to cascading failures during traffic spikes, silent data loss during rollouts, security vulnerabilities exposing sensitive workloads, and cost overruns from inefficient resource allocation. In 2025, with organizations running increasingly complex microservices architectures, AI inference workloads, and real-time data pipelines on Kubernetes, understanding production-grade deployment patterns is no longer optional.
The challenge isn't just getting containers running—it's ensuring they stay running reliably under production conditions. Traditional deployment approaches that worked for monolithic applications or simple containerized services fail when faced with modern requirements: zero-downtime deployments across multiple availability zones, automatic rollback on subtle performance degradations, fine-grained resource management for cost optimization, and compliance with evolving security standards like SLSA Level 3 and supply chain attestation requirements.
This kubernetes deployment tutorial walks through building production-ready deployments from first principles, explaining the architectural decisions that separate hobby projects from systems that handle millions of requests daily.
Why Basic Kubernetes Deployments Fail in Production
Most developers start with a minimal Deployment manifest that runs containers but lacks the resilience mechanisms required for production traffic. These basic configurations typically omit health checks, resource limits, pod disruption budgets, and proper security contexts—all critical for preventing outages.
The consequences manifest in predictable patterns: pods crash-looping during memory pressure because no limits were set, rolling updates causing brief outages because readiness probes weren't configured, and security incidents because containers ran as root with excessive privileges. In 2025, with Kubernetes clusters often running hundreds of microservices and supporting real-time AI inference endpoints, these gaps compound into systemic reliability issues.
Modern Kubernetes environments also face challenges that didn't exist in earlier iterations. Multi-tenancy requirements demand strict resource isolation and network policies. Compliance frameworks require immutable container images with verified signatures. Cost optimization requires precise resource requests that align with actual usage patterns, not guesswork. The deployment configuration itself must encode these requirements.
Anatomy of a Production-Grade Kubernetes Deployment
A production kubernetes deployment configuration addresses reliability, security, and operational concerns through multiple layers of specification. Let's build one component by component, starting with the fundamental structure and progressively adding production requirements.
Core Deployment Structure
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: production
labels:
app: api-service
version: v2.4.1
team: platform
annotations:
deployment.kubernetes.io/revision: "12"
kubernetes.io/change-cause: "Update to v2.4.1 with performance optimizations"
spec:
replicas: 5
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 1
selector:
matchLabels:
app: api-service
template:
metadata:
labels:
app: api-service
version: v2.4.1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
serviceAccountName: api-service-sa
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: api
image: registry.company.com/api-service:v2.4.1@sha256:abc123...
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
- name: LOG_LEVEL
value: "info"
- name: MAX_CONNECTIONS
value: "100"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Health-Check
value: liveness
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir:
sizeLimit: 1Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-service
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-service
This configuration embeds multiple production requirements. The image reference uses a digest rather than just a tag, ensuring immutability and preventing tag-based attacks. The security context enforces non-root execution and drops all Linux capabilities, reducing the attack surface. Resource requests and limits prevent resource contention and enable proper bin-packing.
Health Checks and Lifecycle Management
The three-probe pattern—startup, liveness, and readiness—addresses different failure modes. Startup probes handle slow-starting applications without triggering premature restarts. Liveness probes detect deadlocked processes that need restarting. Readiness probes prevent traffic routing to pods that aren't ready to serve requests.
Many teams configure these incorrectly, using the same endpoint for all probes or setting timeouts too aggressively. A liveness probe that fails during temporary database connection issues will cause unnecessary pod restarts, amplifying the problem. The readiness probe should fail fast when dependencies are unavailable, while the liveness probe should only fail when the application itself is unrecoverable.
# Separate health check implementation in application code (TypeScript example)
import express from 'express';
import { DatabaseConnection } from './database';
import { CacheClient } from './cache';
const app = express();
let isShuttingDown = false;
// Liveness: Only checks if the application process is responsive
app.get('/healthz', (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'shutting_down' });
}
res.status(200).json({ status: 'alive' });
});
// Readiness: Checks if the application can serve traffic
app.get('/ready', async (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'not_ready', reason: 'shutting_down' });
}
try {
// Check critical dependencies
await Promise.all([
DatabaseConnection.ping({ timeout: 2000 }),
CacheClient.ping({ timeout: 1000 })
]);
res.status(200).json({ status: 'ready' });
} catch (error) {
res.status(503).json({
status: 'not_ready',
reason: 'dependency_unavailable',
error: error.message
});
}
});
// Graceful shutdown handling
process.on('SIGTERM', async () => {
isShuttingDown = true;
console.log('Received SIGTERM, starting graceful shutdown');
// Stop accepting new requests
server.close(async () => {
// Close database connections
await DatabaseConnection.close();
await CacheClient.disconnect();
process.exit(0);
});
// Force shutdown after 30 seconds
setTimeout(() => {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 30000);
});
Deployment Strategies and Rollout Control
The rolling update strategy with maxSurge: 2 and maxUnavailable: 1 ensures capacity remains above baseline during deployments. For a 5-replica deployment, Kubernetes can create up to 7 pods during rollout but will never drop below 4 available pods. This prevents capacity degradation during deployments.
For more sophisticated rollout control, progressive delivery patterns using tools like Argo Rollouts or Flagger enable canary deployments with automatic rollback based on metrics:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-service
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 5m}
- setWeight: 40
- pause: {duration: 5m}
- setWeight: 60
- pause: {duration: 5m}
- setWeight: 80
- pause: {duration: 5m}
analysis:
templates:
- templateName: error-rate-analysis
startingStep: 2
args:
- name: service-name
value: api-service
revisionHistoryLimit: 5
selector:
matchLabels:
app: api-service
template:
# Same pod template as before
This canary strategy gradually shifts traffic while monitoring error rates. If error rates exceed thresholds defined in the analysis template, the rollout automatically reverts to the previous version.
Resource Management and Autoscaling
Proper resource configuration prevents both resource starvation and waste. Resource requests determine scheduling decisions and quality-of-service class. Resource limits prevent runaway processes from affecting other workloads.
In 2025, with FinOps practices becoming standard, teams use Vertical Pod Autoscaler (VPA) recommendations to right-size resource requests based on actual usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 2000m
memory: 2Gi
controlledResources: ["cpu", "memory"]
Horizontal Pod Autoscaler (HPA) handles traffic-based scaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 5
periodSeconds: 30
selectPolicy: Max
The scaling behavior configuration prevents thrashing by limiting scale-down rate and enabling aggressive scale-up during traffic spikes.
Security Hardening and Compliance
Modern kubernetes production deployments must address supply chain security, runtime security, and network isolation. The security context in the deployment manifest enforces baseline protections, but comprehensive security requires additional layers.
Pod Security Standards enforcement at the namespace level prevents deployment of non-compliant workloads:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Network policies restrict traffic to only necessary communication paths:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-service-netpol
namespace: production
spec:
podSelector:
matchLabels:
app: api-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: production
podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector:
matchLabels:
name: production
podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 53
- protocol: UDP
port: 53
Observability and Debugging
Production deployments require comprehensive observability. Beyond basic logging, structured events and metrics enable rapid troubleshooting:
import { Logger } from 'winston';
import { Counter, Histogram, register } from 'prom-client';
// Metrics for monitoring deployment health
const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5]
});
const httpRequestTotal = new Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code']
});
const deploymentInfo = new Counter({
name: 'deployment_info',
help: 'Deployment metadata',
labelNames: ['version', 'commit_sha', 'build_timestamp']
});
// Initialize deployment metadata
deploymentInfo.inc({
version: process.env.APP_VERSION || 'unknown',
commit_sha: process.env.GIT_COMMIT || 'unknown',
build_timestamp: process.env.BUILD_TIMESTAMP || 'unknown'
});
// Middleware for request tracking
export function metricsMiddleware(req, res, next) {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const labels = {
method: req.method,
route: req.route?.path || 'unknown',
status_code: res.statusCode
};
httpRequestDuration.observe(labels, duration);
httpRequestTotal.inc(labels);
});
next();
}
Common Pitfalls and Failure Modes
Insufficient Resource Limits: Deployments without memory limits can trigger OOM kills on nodes, affecting other workloads. Always set limits based on observed peak usage plus headroom.
Missing PodDisruptionBudgets: Without PDBs, cluster maintenance or node failures can take down all replicas simultaneously. Define PDBs to ensure minimum availability:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-service-pdb
spec:
minAvailable: 3
selector:
matchLabels:
app: api-service
Incorrect Probe Configuration: Probes that check external dependencies in liveness checks cause cascading failures. Liveness should only verify the application process health.
Ignoring Graceful Shutdown: Applications that don't handle SIGTERM properly cause connection errors during deployments. Implement proper shutdown handlers that stop accepting new requests, drain existing connections, and close resources.
Single Availability Zone Deployments: Without topology spread constraints, all pods might land in one zone. Use topologySpreadConstraints to enforce multi-zone distribution.
Mutable Image Tags: Using latest or mutable tags prevents reliable rollbacks and creates security risks. Always use immutable digests in production.
Excessive Privilege: Running containers as root or with unnecessary capabilities expands the attack surface. Drop all capabilities and use read-only root filesystems where possible.
Best Practices Checklist
- Use immutable image references with SHA256 digests, not mutable tags
- Implement all three probe types with appropriate timeouts and failure thresholds
- Set resource requests and limits based on actual usage patterns, not guesses
- Configure PodDisruptionBudgets to maintain minimum availability during disruptions
- Enforce security contexts with non-root users, dropped capabilities, and read-only filesystems
- Implement graceful shutdown handlers that respect SIGTERM and drain connections
- Use topology spread constraints to distribute pods across availability zones
- Apply network policies to restrict traffic to necessary paths only
- Enable structured logging with correlation IDs for request tracing
- Expose Prometheus metrics for monitoring deployment health and performance
- Configure HPA and VPA for automatic scaling based on load and resource optimization
- Maintain revision history with meaningful change-cause annotations
- Test rollback procedures regularly to ensure they work under pressure
- Implement progressive delivery for high-risk deployments with automatic rollback
- Use separate service accounts with minimal RBAC permissions per deployment
FAQ
What is the difference between a Kubernetes Deployment and a Pod?
A Pod is the smallest deployable unit in Kubernetes, representing one or more containers that share resources. A Deployment is a higher-level controller that manages Pods, providing declarative updates, scaling, and self-healing capabilities. Deployments handle rolling updates, rollbacks, and ensure the desired number of Pod replicas are running. You should almost never create Pods directly in production—always use Deployments or other controllers.
How does Kubernetes rolling update work in 2025?