Skip to main content

Command Palette

Search for a command to run...

DevOps Tutorial: Complete CI/CD Guide

Published
•9 min read
T

Welcome to TopperBlog! 👋

I'm a tech content creator passionate about helping developers level up their careers and master cutting-edge technologies.

🎯 What I Write About: • AI/ML Engineering & LLMs • Web3 & Blockchain Development
• System Design & Architecture • Interview Preparation (FAANG) • Freelancing & Remote Work • Modern Tech Stacks (Next.js, React, Rust, TypeScript) • Performance Optimization & Best Practices

💼 Mission: Sharing practical, actionable insights that accelerate your tech career and maximize your earning potential.

📚 15+ In-Depth Guides covering everything from earning $10k/month as a freelancer to cracking FAANG interviews.

🌐 Let's connect and grow together in this amazing tech journey!

#TechBlogger #SoftwareEngineering #CareerGrowth #WebDevelopment #AIEngineering

Complete CI/CD Pipeline Tutorial: Building Production-Grade DevOps Infrastructure

Modern software teams deploy code dozens or hundreds of times per day, yet many organizations still struggle with manual deployments, inconsistent environments, and broken builds that reach production. A properly implemented CI/CD pipeline tutorial addresses these challenges by automating the entire software delivery lifecycle—from code commit to production deployment—while maintaining security, reliability, and compliance standards that regulators and customers demand in 2025.

The consequences of inadequate CI/CD infrastructure are severe and measurable. Teams without automated pipelines experience 46% more production incidents, spend 60% more time on deployment-related tasks, and face significantly higher cloud costs due to inefficient resource utilization. Security vulnerabilities slip through manual review processes, compliance audits fail due to lack of deployment traceability, and developer productivity suffers when engineers wait hours for feedback on code changes.

The problem intensifies as organizations adopt microservices architectures, edge computing, and AI-driven applications that require coordinated deployments across distributed systems. Traditional Jenkins-based pipelines with shell scripts and manual approval gates cannot handle the complexity, velocity, and security requirements of modern cloud-native applications running on Kubernetes clusters across multiple regions.

Why Traditional CI/CD Approaches Fail in 2025

Legacy CI/CD implementations built on monolithic Jenkins servers or basic Travis CI configurations break down under modern requirements. These systems were designed for simpler deployment models—single application servers, infrequent releases, and homogeneous technology stacks. They fail in contemporary environments for specific technical reasons.

First, traditional pipelines lack native container orchestration integration. Deploying to Kubernetes requires custom scripts that don't handle rollback scenarios, health checks, or progressive delivery patterns like canary deployments. Second, security scanning happens as an afterthought rather than being embedded throughout the pipeline, creating compliance gaps that regulators now penalize heavily under frameworks like SOC 2 Type II and GDPR.

Third, observability integration is minimal or absent. Modern applications require distributed tracing, structured logging, and real-time metrics collection during deployment. Legacy pipelines don't instrument deployments properly, making incident response and root cause analysis significantly harder when issues occur in production.

Fourth, cost optimization is impossible without dynamic resource allocation. Running dedicated CI/CD servers 24/7 wastes thousands of dollars monthly. Modern pipelines must scale to zero when idle and provision resources on-demand, something traditional architectures cannot achieve without complete redesign.

Finally, multi-cloud and hybrid deployment scenarios are increasingly common. Organizations run workloads across AWS, Google Cloud, Azure, and on-premises Kubernetes clusters simultaneously. Traditional pipelines weren't built for this heterogeneity and require brittle, environment-specific configurations that break frequently.

Modern CI/CD Architecture: A Production-Grade Solution

A contemporary CI/CD pipeline leverages cloud-native tools that integrate seamlessly with container orchestration platforms, provide built-in security scanning, and support GitOps workflows. The architecture consists of five core components: source control integration, automated build and test execution, artifact management, deployment orchestration, and continuous monitoring.

GitHub Actions serves as the pipeline orchestration engine, providing native integration with GitHub repositories, secrets management, and a marketplace of pre-built actions. Docker handles containerization, ensuring consistent environments from development through production. Kubernetes manages deployment orchestration with built-in health checks, rolling updates, and automatic rollback capabilities.

Here's a production-grade GitHub Actions workflow that implements a complete CI/CD pipeline for a Node.js microservice:

name: Production CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  KUBERNETES_NAMESPACE: production

jobs:
  security-scan:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'

      - name: Upload Trivy results to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: 'trivy-results.sarif'

  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [20.x, 22.x]
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js ${{ matrix.node-version }}
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm run test:unit -- --coverage

      - name: Run integration tests
        run: npm run test:integration
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/testdb

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          files: ./coverage/coverage-final.json
          flags: unittests
          fail_ci_if_error: true

  build-and-push:
    needs: [security-scan, test]
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    permissions:
      contents: read
      packages: write
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix={{branch}}-
            type=semver,pattern={{version}}
            type=raw,value=latest,enable={{is_default_branch}}

      - name: Build and push Docker image
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          platforms: linux/amd64,linux/arm64

      - name: Sign container image
        run: |
          cosign sign --yes ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}
        env:
          COSIGN_EXPERIMENTAL: 1

  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://api.example.com
    steps:
      - uses: actions/checkout@v4

      - name: Setup kubectl
        uses: azure/setup-kubectl@v4
        with:
          version: 'v1.29.0'

      - name: Configure Kubernetes context
        run: |
          echo "${{ secrets.KUBECONFIG }}" | base64 -d > kubeconfig.yaml
          export KUBECONFIG=kubeconfig.yaml
          kubectl config use-context production-cluster

      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/api-service \
            api-container=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-and-push.outputs.image-digest }} \
            -n ${{ env.KUBERNETES_NAMESPACE }}

          kubectl rollout status deployment/api-service \
            -n ${{ env.KUBERNETES_NAMESPACE }} \
            --timeout=5m

      - name: Run smoke tests
        run: |
          npm run test:smoke
        env:
          API_URL: https://api.example.com

      - name: Rollback on failure
        if: failure()
        run: |
          kubectl rollout undo deployment/api-service \
            -n ${{ env.KUBERNETES_NAMESPACE }}

This workflow implements several critical production requirements. Security scanning happens before any code reaches production, using Trivy to detect vulnerabilities in dependencies and container images. Results upload directly to GitHub Security tab for centralized vulnerability management.

The test job runs in a matrix across multiple Node.js versions, ensuring compatibility. Integration tests execute against a real database, not mocks, catching issues that unit tests miss. Code coverage metrics upload to Codecov with a failure threshold, preventing coverage regression.

The build process uses Docker Buildx for multi-platform images, supporting both AMD64 and ARM64 architectures. Layer caching through GitHub Actions cache dramatically reduces build times—typically from 8 minutes to under 2 minutes for incremental changes. Container image signing with Cosign provides supply chain security, allowing Kubernetes admission controllers to verify image authenticity before deployment.

Deployment uses image digests rather than tags, preventing race conditions where a tag gets updated between build and deploy stages. The rollout status command blocks until the deployment completes successfully or times out, providing immediate feedback. Smoke tests verify critical functionality post-deployment, triggering automatic rollback if they fail.

Kubernetes Deployment Configuration

The pipeline deploys to Kubernetes using declarative manifests that define the desired state. Here's a production-ready deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: production
  labels:
    app: api-service
    version: v1
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
        version: v1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: api-service
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: api-container
        image: ghcr.io/org/api-service:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        - containerPort: 9090
          name: metrics
          protocol: TCP
        env:
        - name: NODE_ENV
          value: "production"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: redis-credentials
              key: url
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/.cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: api-service
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 30
      selectPolicy: Max

This configuration implements production-grade reliability patterns. The rolling update strategy ensures zero-downtime deployments by maintaining at least one healthy pod during updates. Security contexts enforce least-privilege principles—containers run as non-root users with read-only filesystems and dropped capabilities.

Health checks distinguish between liveness (is the container alive?) and readiness (can it serve traffic?). This separation prevents cascading failures where temporary issues cause Kubernetes to restart healthy pods. Resource requests and limits prevent resource contention and enable efficient cluster bin-packing.

The HorizontalPodAutoscaler scales based on both CPU and memory utilization, with carefully tuned scale-up and scale-down policies. Aggressive scale-up responds quickly to traffic spikes, while conservative scale-down prevents flapping during normal load variations.

Infrastructure as Code for Pipeline Resources

Managing pipeline infrastructure through code ensures reproducibility and version control. Here's a Terraform configuration for the required cloud resources:

```hcl terraform { required_version = ">= 1.7" required_providers { google = { source = "hashicorp/google" version = "~> 5.0" } kubernetes = { source = "hashicorp/kubernetes" version = "~> 2.25" } } backend "gcs" { bucket = "terraform-state-prod" prefix = "cicd-infrastructure" } }

provider "google" { project = var.project_id region = var.region }

resource "google_container_cluster" "primary" { name = "production-cluster" location = var.region

remove_default_node_pool = true initial_node_count = 1

network = google_compute_network.vpc.name subnetwork = google_compute_subnetwork.subnet.name

workload_identity_config { workload_pool = "${var.project_id}.svc.id.goog" }

release_channel { channel = "REGULAR" }

addons_config { http_load_balancing { disabled = false } horizontal_pod_autoscaling { disabled = false } network_policy_config { disabled = false } }

binary_authorization { evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE" }

maintenance_policy { daily_maintenance_window { start_time = "03:00" } } }

resource "google_container_node_pool" "primary_nodes" { name = "primary-node-pool" location = var.region cluster = google_container_cluster.primary.name node_count = 3

autoscaling { min_node_count = 3 max_node_count = 20 }

management { auto_repair = true auto_upgrade = true }

node_config { preemptible = false machine_type = "n2-standard-4"

disk_size_gb = 100 disk_type = "pd-ssd"

oauth_scopes = [ "https://www.googleapis.com/auth/cloud-platform" ]

workload_metadata_config { mode = "GKE_METADATA" }

shielded_instance_config { enable_secure_boot = true enable_integrity_monitoring = true }

labels = { environment = "production" managed_by = "terraform" }

tags = ["production", "kubernetes"] } }

resource "google_artifact_registry_repository" "docker" { location = var.region repository_id = "docker-images" format = "DOCKER"

cleanup_policies { id = "keep-recent-versions" action = "KEEP"

most_recent_versions { keep_count = 10 } }

cleanup_policies { id = "delete-old-untagged" action = "DELETE"

condition { tag_state = "UNTAGGED" older_than = "2592000s" # 30 days } } }

resource "google_service_account" "github_actions" { account_id = "github-actions-deployer" display_name = "GitHub Actions Deployment Service Account" }

resource "google_project_iam_member" "github_actions_roles" { for_each = toset([ "roles/container.developer", "roles/artifactregistry.writer", ])

project = var.project_id role = each.value member = "serviceAccount:${google_service_account.github_actions.email}" }

resource "google_iam_workload_identity_pool" "github" { workload_identity_pool_id = "github-pool" display_name = "GitHub Actions Pool" }

resource "google_iam_workload_identity_pool_provider" "github" { workload_identity_pool_id = google_iam_workload_identity_pool.github.workload_identity_pool_id workload_identity_pool_provider_id = "github-provider" display_name = "GitHub Provider"

attribute_mapping = { "google.subject" = "assertion.sub" "attribute.actor" = "assertion.actor" "attribute.repository" = "assertion.repository" }

oidc { issuer_uri = "https://token.actions.