Health Checks for Services

A Simple Analogy

Health checks are like a pulse monitor. They constantly verify the service is alive and responding.

Why Health Checks?

Monitoring: Know when services fail
Auto-recovery: Restart unhealthy services
Load balancing: Route away from unhealthy instances
Deployment: Verify readiness
Alerting: Trigger on issues

ASP.NET Core Health Checks

builder.Services
    .AddHealthChecks()
    .AddCheck("database", new DatabaseHealthCheck())
    .AddCheck("cache", new CacheHealthCheck())
    .AddUrlGroup(new Uri("https://api.example.com/health"), "api");

// Endpoint
app.MapHealthChecks("/health");
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready")
});
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("live")
});

Custom Health Check

public class DatabaseHealthCheck : IHealthCheck
{
    private readonly IDbContextFactory<AppContext> _factory;
    
    public DatabaseHealthCheck(IDbContextFactory<AppContext> factory)
    {
        _factory = factory;
    }
    
    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        try
        {
            using var dbContext = await _factory.CreateDbContextAsync(cancellationToken);
            var canConnect = await dbContext.Database.CanConnectAsync(cancellationToken);
            
            return canConnect
                ? HealthCheckResult.Healthy("Database connection successful")
                : HealthCheckResult.Unhealthy("Cannot connect to database");
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Database check failed", ex);
        }
    }
}

Kubernetes Probes

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp:latest
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 2
    
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

Response Format

{
  "status": "Healthy",
  "totalDuration": "00:00:00.1234567",
  "entries": {
    "database": {
      "status": "Healthy",
      "duration": "00:00:00.0567890",
      "description": "Database connection successful"
    },
    "cache": {
      "status": "Unhealthy",
      "duration": "00:00:00.0123456",
      "description": "Redis connection failed",
      "exception": "Connection refused"
    }
  }
}

Best Practices

Separate endpoints: Liveness, readiness, startup
Quick responses: Keep checks fast
Don't check dependencies: Liveness should be minimal
Log failures: Track health issues
Monitor metrics: Alert on repeated failures

Related Concepts

Service discovery
Circuit breakers
Graceful shutdown
Deployment strategies

Summary

Implement health checks at /health endpoint with custom checks for dependencies. Use liveness, readiness, and startup probes in Kubernetes.