Health API Documentation

Overview

The Health API provides comprehensive monitoring of the system's operational status and performance metrics. This endpoint is designed for load balancers, monitoring systems, and operational teams to ensure service reliability.

Health Check Endpoint

The main health endpoint is available at /status and returns detailed information about all system components.

Getting Started

The health endpoint returns different HTTP status codes based on system health:

  • 200 OK: All systems operational
  • 503 Service Unavailable: One or more critical components are down

Basic Health Check Request

curl -X GET "http://localhost:8080/status" \
  -H "Accept: application/json"
const response = await fetch('http://localhost:8080/status', {
  method: 'GET',
  headers: {
    'Accept': 'application/json'
  }
});

const healthData = await response.json();
console.log('System Status:', healthData.status);
import requests

response = requests.get(
    'http://localhost:8080/status',
    headers={'Accept': 'application/json'}
)

health_data = response.json()
print(f"System Status: {health_data['status']}")
print(f"Response Time: {response.elapsed.total_seconds()}s")
package main

import (
    "encoding/json"
    "fmt"
    "net/http"
    "time"
)

type HealthResponse struct {
    Status      string `json:"status"`
    Timestamp   string `json:"timestamp"`
    Environment string `json:"environment"`
}

func checkHealth() {
    client := &http.Client{Timeout: 10 * time.Second}
    resp, err := client.Get("http://localhost:8080/status")
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    defer resp.Body.Close()

    var health HealthResponse
    json.NewDecoder(resp.Body).Decode(&health)
    fmt.Printf("System Status: %s\n", health.Status)
}

Implementing Health Check Monitoring

Set up automated monitoring for your applications:

Production Ready

This configuration has been tested in production environments with 99.9% uptime.

docker-compose.yml
version: '3.8'
services:
  api:
    image: rapidaigo:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/status"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

API Reference

Health Check Endpoint

PropTypeDefault
method?
string
GET
endpoint?
string
/status
authentication?
boolean
false

Request Headers

HeaderTypeRequiredDescription
AcceptstringNoResponse format preference (application/json)
User-AgentstringNoClient identification for monitoring logs

Response Schema

Component Health Details

Each monitored component reports the following information:

PropTypeDefault
status?
enum
-
response_time?
string
-
error?
string
-

Monitored Components

🗄️

Database (MySQL)

Connection Pool Status

Tests database connectivity, query execution, and connection pool health

Timeout: 5 seconds

Critical: ✅ System fails if unhealthy

Response Examples

Real-world Scenarios

Production Healthy Response
{
  "status": "healthy",
  "timestamp": "2024-08-18T14:30:15.123Z",
  "environment": "production",
  "checks": {
    "database": {
      "status": "healthy",
      "response_time": "1.8ms"
    },
    "redis": {
      "status": "healthy",
      "response_time": "0.9ms"
    }
  }
}

Optimal performance with sub-2ms database response times.

Database Connection Problems
{
  "status": "unhealthy",
  "timestamp": "2024-08-18T14:31:45.789Z",
  "environment": "production",
  "checks": {
    "database": {
      "status": "unhealthy",
      "response_time": "5.001s",
      "error": "dial tcp 127.0.0.1:3306: connect: connection refused"
    },
    "redis": {
      "status": "healthy",
      "response_time": "1.1ms"
    }
  }
}

Database Offline

Database connection failed - immediate attention required. Load balancer will route traffic to healthy instances.

Partial Service Degradation
{
  "status": "healthy",
  "timestamp": "2024-08-18T14:32:20.456Z",
  "environment": "production",
  "checks": {
    "database": {
      "status": "healthy",
      "response_time": "3.2ms"
    }
  }
}

Redis unavailable but system remains operational with reduced performance.

Under Heavy Load
{
  "status": "healthy",
  "timestamp": "2024-08-18T14:33:05.234Z",
  "environment": "production",
  "checks": {
    "database": {
      "status": "healthy",
      "response_time": "45.2ms"
    },
    "redis": {
      "status": "healthy",
      "response_time": "8.7ms"
    }
  }
}

Elevated response times during peak load - still within acceptable thresholds.

Implementation Details

Backend Implementation

health.go
base.go

The health check implementation in Go:

internal/handler/health.go
package handler

import (
    "context"
    "fmt"
    "net/http"
    "time"

    "github.com/pranavsoft/rapidaigo-web/internal/middleware"
    "github.com/pranavsoft/rapidaigo-web/internal/server"
    "github.com/labstack/echo/v4"
)

type HealthHandler struct {
    Handler
}

func NewHealthHandler(s *server.Server) *HealthHandler {
    return &HealthHandler{
        Handler: NewHandler(s),
    }
}

func (h *HealthHandler) CheckHealth(c echo.Context) error {
    start := time.Now()
    logger := middleware.GetLogger(c).With().
        Str("operation", "health_check").
        Logger()

    response := map[string]interface{}{
        "status":      "healthy", 
        "timestamp":   time.Now().UTC(),
        "environment": h.server.Config.Primary.Env,
        "checks":      make(map[string]interface{}),
    }

    checks := response["checks"].(map[string]interface{})
    isHealthy := true

    // Check database connectivity
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    dbStart := time.Now()
    if err := h.server.DB.DB.PingContext(ctx); err != nil {
        checks["database"] = map[string]interface{}{
            "status":        "unhealthy", 
            "response_time": time.Since(dbStart).String(),
            "error":         err.Error(),
        }
        isHealthy = false
    } else {
        checks["database"] = map[string]interface{}{
            "status":        "healthy", 
            "response_time": time.Since(dbStart).String(),
        }
    }

    // Set overall status
    if !isHealthy {
        response["status"] = "unhealthy"
        return c.JSON(http.StatusServiceUnavailable, response) 
    }

    return c.JSON(http.StatusOK, response) 
}

Frontend Type Definitions

packages/zod/src/health.ts
import { z } from "zod";

const ZHealthCheck = z.object({
  status: z.enum(["healthy", "unhealthy"]),
  response_time: z.string(),
  error: z.string().optional(),
});

export const ZHealthResponse = z.object({
  status: z.enum(["healthy", "unhealthy"]),
  timestamp: z.string().datetime(),
  environment: z.string(),
  checks: z.object({
    database: ZHealthCheck,
    redis: ZHealthCheck.optional(),
  }),
});

export type HealthResponse = z.infer<typeof ZHealthResponse>;

Troubleshooting

OpenAPI Integration

Interactive API Documentation

The health endpoint is fully documented in our OpenAPI specification:

OpenAPI Specification

View the complete API documentation at /api-docs for interactive testing and detailed schemas.

OpenAPI Definition

Health Endpoint OpenAPI Spec
/status:
  get:
    summary: System Health Check
    description: |
      Returns comprehensive health status of the API and all its dependencies
      including database connectivity, cache systems, and external services.
    tags:
      - health
    operationId: getHealth
    responses:
      '200':
        description: System is healthy
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SystemHealthResponse'
            examples:
              healthy_system:
                summary: All systems operational
                value:
                  status: healthy
                  timestamp: "2024-08-18T14:30:00Z"
                  environment: production
                  checks:
                    database:
                      status: healthy
                      response_time: "2.5ms"
      '503':
        description: System is unhealthy
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SystemHealthResponse'
            examples:
              unhealthy_system:
                summary: Database connection failed
                value:
                  status: unhealthy
                  timestamp: "2024-08-18T14:30:00Z"
                  environment: production
                  checks:
                    database:
                      status: unhealthy
                      response_time: "5.001s"
                      error: "Connection timeout after 5 seconds"

API Testing

Visit the interactive documentation at:

http://localhost:8080/api-docs

Features:

  • ✅ Interactive request/response testing
  • ✅ Real-time schema validation
  • ✅ Response examples and status codes
  • ✅ Authentication handling

Import the OpenAPI spec into Postman:

# Download OpenAPI spec
curl http://localhost:8080/openapi.json > api-spec.json

# Import into Postman collection

Collection Features:

  • Pre-configured environments
  • Automated testing scripts
  • Response assertions
Basic Health Check
# Simple health check
curl -i http://localhost:8080/status

# With verbose output
curl -v \
  -H "Accept: application/json" \
  -H "User-Agent: HealthMonitor/1.0" \
  http://localhost:8080/status

# Check response time
curl -w "Response Time: %{time_total}s\n" \
  -o /dev/null -s \
  http://localhost:8080/status
HTTPie Commands
# Basic request
http GET localhost:8080/status

# With custom headers
http GET localhost:8080/status \
  Accept:application/json \
  User-Agent:HealthMonitor/1.0

# Pretty print JSON response
http --print=HhBb GET localhost:8080/status

Monitoring & Alerting

Metrics Collection

Set up comprehensive monitoring for the health endpoint:

Production Monitoring

These configurations are battle-tested in production environments serving millions of requests.

prometheus.yml
scrape_configs:
  - job_name: 'rapidaigo-health'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 30s

    # Health check monitoring
  - job_name: 'health-check'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/status'
    scrape_interval: 15s

    # Custom metrics
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'rapidaigo-api'

Key Metrics:

  • health_check_duration_seconds - Response time histogram
  • health_check_status - Current health status (0=unhealthy, 1=healthy)
  • component_health_status - Per-component health status
grafana-dashboard.json
{
  "dashboard": {
    "title": "RapidAI Go Health Dashboard",
    "panels": [
      {
        "title": "System Health Status",
        "type": "stat",
        "targets": [{
          "expr": "health_check_status",
          "legendFormat": "{{instance}}"
        }],
        "fieldConfig": {
          "defaults": {
            "thresholds": {
              "steps": [
                {"color": "red", "value": 0},
                {"color": "green", "value": 1}
              ]
            }
          }
        }
      },
      {
        "title": "Response Time Trends",
        "type": "graph",
        "targets": [{
          "expr": "rate(health_check_duration_seconds[5m])",
          "legendFormat": "{{quantile}}"
        }]
      },
      {
        "title": "Component Health Matrix",
        "type": "heatmap",
        "targets": [{
          "expr": "component_health_status",
          "legendFormat": "{{component}}"
        }]
      }
    ]
  }
}

Dashboard Features:

  • Real-time health status indicators
  • Response time trends and percentiles
  • Component-level health matrix
  • Alert history and acknowledgments
pagerduty-alerts.yml
alerting_rules:
  - name: health_check_alerts
    rules:
      - alert: ServiceDown
        expr: health_check_status == 0
        for: 1m
        labels:
          severity: critical
          service: rapidaigo-api
        annotations:
          summary: "RapidAI Go API is down"
          description: "Health check failing for {{ $labels.instance }}"

      - alert: HighResponseTime
        expr: health_check_duration_seconds > 0.1
        for: 5m
        labels:
          severity: warning
          service: rapidaigo-api
        annotations:
          summary: "High response time detected"
          description: "Health check response time: {{ $value }}s"

      - alert: DatabaseConnectionFailed
        expr: component_health_status{component="database"} == 0
        for: 30s
        labels:
          severity: critical
          service: rapidaigo-api
          component: database
        annotations:
          summary: "Database connection failed"
          description: "Database health check failing"
health-monitor.sh
#!/bin/bash

# Health monitoring script
ENDPOINT="http://localhost:8080/status"
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
LOG_FILE="/var/log/health-check.log"

check_health() {
    local response=$(curl -s -w "%{http_code}" -o /tmp/health.json "$ENDPOINT")
    local http_code="${response: -3}"
    local timestamp=$(date -Iseconds)

    if [[ "$http_code" == "200" ]]; then
        local status=$(jq -r '.status' /tmp/health.json)
        local db_time=$(jq -r '.checks.database.response_time' /tmp/health.json)

        echo "[$timestamp] Health: $status, DB Response: $db_time" >> "$LOG_FILE"

        if [[ "$status" != "healthy" ]]; then
            send_alert "⚠️ System unhealthy but responding" "$status"
        fi
    else
        echo "[$timestamp] ERROR: HTTP $http_code" >> "$LOG_FILE"
        send_alert "🚨 Health endpoint unreachable" "HTTP $http_code"
    fi
}

send_alert() {
    local message="$1"
    local details="$2"

    curl -X POST "$SLACK_WEBHOOK" \
        -H 'Content-type: application/json' \
        --data "{
            \"text\": \"$message\",
            \"attachments\": [{
                \"color\": \"danger\",
                \"fields\": [{
                    \"title\": \"Details\",
                    \"value\": \"$details\",
                    \"short\": false
                }]
            }]
        }"
}

# Run health check
check_health

Security Considerations

Rate Limiting

rate-limiting.go
// Implement rate limiting for health checks
func HealthRateLimit() echo.MiddlewareFunc {
    limiter := rate.NewLimiter(rate.Every(time.Second), 10)

    return func(next echo.HandlerFunc) echo.HandlerFunc {
        return func(c echo.Context) error {
            if !limiter.Allow() {
                return c.JSON(http.StatusTooManyRequests, map[string]string{
                    "error": "Rate limit exceeded"
                })
            }
            return next(c)
        }
    }
}

🎉 You're All Set!

You now have a comprehensive understanding of the Health API. Start implementing health checks in your applications and set up monitoring for production readiness.