Isaac.

Kubernetes Resource Management

Configure CPU and memory limits for Kubernetes workloads.

By EMEPublished: February 20, 2025
kubernetesresourceslimitsrequestsscaling

A Simple Analogy

Kubernetes resource limits are like room reservations. You reserve what you need (request) and set a maximum you can use (limit).


Why Resource Management?

  • Efficiency: Use resources effectively
  • Stability: Prevent node overload
  • Scaling: Enable auto-scaling
  • Cost: Right-size containers
  • QoS: Guarantee quality of service

Requests and Limits

apiVersion: v1
kind: Pod
metadata:
  name: app-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    resources:
      # What the pod needs to run
      requests:
        memory: "128Mi"    # 128 megabytes
        cpu: "250m"        # 250 millicores (0.25 CPU)
      
      # Maximum it can use
      limits:
        memory: "512Mi"    # 512 megabytes
        cpu: "1000m"       # 1 full CPU

Resource Units

CPU:
  1 = 1 core
  500m = 0.5 cores
  100m = 0.1 cores

Memory:
  1Mi = 1 mebibyte
  1Gi = 1 gibibyte
  100m = 0.1 bytes (rarely used)

Quality of Service Classes

# Guaranteed: requests == limits
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"
        cpu: "500m"

---
# Burstable: requests < limits
apiVersion: v1
kind: Pod
metadata:
  name: burstable-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "1000m"

---
# BestEffort: no requests or limits
apiVersion: v1
kind: Pod
metadata:
  name: best-effort-pod
spec:
  containers:
  - name: app
    image: myapp:latest

Deployment Resources

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: app
        image: nginx:latest
        resources:
          requests:
            memory: "64Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30

Best Practices

  1. Set both requests and limits: Enable scheduling
  2. Monitor actual usage: Right-size containers
  3. Use HPA: Auto-scale based on metrics
  4. Consider QoS: Match requirements
  5. Test under load: Verify resource adequacy

Related Concepts

  • Vertical Pod Autoscaling
  • Resource quotas
  • Network policies
  • Storage management

Summary

Configure requests and limits for predictable resource allocation. Use HPA for automatic scaling based on metrics like CPU and memory utilization.