Skip to main content

Unified Troubleshooting Guide - InnoQualis EQMS

This comprehensive troubleshooting guide consolidates all issue resolution steps for the InnoQualis Electronic Quality Management System (EQMS). It covers deployment issues, development problems, testing failures, authentication issues, and production troubleshooting.

Last Updated: October 24, 2025
Version: Phase 8 In Progress (Documentation Consolidation Complete)
Status: Production Ready

Quick Reference​

Emergency Issues​

Development Issues​

Production Issues​

Containers failing healthchecks​

Symptoms:

  • docker ps shows unhealthy state for backend or frontend
  • docker compose logs show repeated healthcheck failures

Checks:

  • Logs: docker compose -f docker/docker-compose.prod.yml logs -f backend
  • Health endpoint: curl http://localhost:8000/health
  • Ports open: ensure 8000/3000 accessible locally

Fixes:

  • Verify env file path passed via --env-file docker/.env.prod
  • Rebuild images: docker compose -f docker/docker-compose.prod.yml --env-file docker/.env.prod build --no-cache
  • Ensure db is healthy first (depends_on condition service_healthy)
  • Increase healthcheck retries/interval if the VM is slow to start

Backend cannot connect to database​

Symptoms:

  • Backend logs show connection refused or authentication errors
  • HTTP 500 on API endpoints

Checks:

  • DATABASE_URL matches POSTGRES_* settings in env
  • DB health: docker compose logs db; pg_isready in healthcheck should pass
  • psql test inside container:
    • docker exec -it eqms-db psql -U eqms_user -d eqms_db -c "select 1;"

Fixes:

  • Align POSTGRES_DB/USER/PASSWORD and DATABASE_URL
  • Remove dangling postgres_data volume if credentials changed intentionally:
    • docker compose down
    • docker volume rm docker_postgres_data (exact name may differ: use docker volume ls)
    • docker compose up -d
  • Avoid accidental whitespace or URL encoding issues in DATABASE_URL

CORS errors in browser​

Symptoms:

  • Browser console: Access-Control-Allow-Origin missing
  • Requests blocked

Checks:

Fixes:

  • Update BACKEND_CORS_ORIGINS in env file (comma-separated if multiple)
  • Restart backend: docker compose up -d backend

Email not sending (SendGrid)​

Symptoms:

  • No email received; backend logs show SMTP auth or TLS errors

Checks:

  • SMTP_HOST=smtp.sendgrid.net, SMTP_PORT=587, SMTP_USER=apikey, SMTP_PASSWORD set to SendGrid API key
  • SMTP_FROM domain is authorized in SendGrid (SPF/DKIM for non-sandbox)

Fixes:

  • Correct credentials and from address
  • Test outbound connectivity from VM: nc -vz smtp.sendgrid.net 587
  • Use a sandbox key in staging to avoid production sends

File uploads/downloads failing​

Symptoms:

  • 500 on upload; 404 or permission denied on download

Checks:

  • Volume mounted: backend_uploads volume mapped to /app/uploads in backend
  • Directory permissions within container (writable by app user)
  • Disk space on VM: df -h

Fixes:

  • Ensure volume exists and container user has write permission
  • Recreate backend container to pick up correct mount:
    • docker compose up -d --force-recreate backend
  • Verify file size and type limits (if enforced)

High CPU or memory usage​

Symptoms:

  • Containers OOMKilled or slow responses

Checks:

  • docker stats
  • Log verbosity (LOG_LEVEL) set too high?
  • BACKEND_WORKERS too high for small VM?

Fixes:

  • Reduce BACKEND_WORKERS to 2 on small VM
  • Set LOG_LEVEL=info
  • Add compose resource limits (deploy resources or mem_limit/cpu_shares if supported)

Login/authentication failures​

Symptoms:

  • 401 Unauthorized despite correct credentials

Checks:

  • SECRET_KEY set and consistent across restarts
  • Token not expired (client clock skew)
  • Role assignment for user

Fixes:

  • Rotate SECRET_KEY only with planned downtime; forces re-login
  • Check user record and role
  • Review backend logs for specific cause

AI assistant errors (OpenAI)​

Symptoms:

  • 500 AI service error or timeout

Checks:

  • OPENAI_API_KEY defined
  • Outbound internet access from VM
  • Rate limits exceeded

Fixes:

  • Set valid API key or leave empty to disable AI features
  • Add retry/backoff client-side
  • Consider disabling AI during pilot if not required

Volumes/permissions issues​

Symptoms:

  • Backend cannot write uploads or chroma db

Checks:

  • Volume mounts defined in docker/docker-compose.prod.yml
  • Container user permissions on /app/uploads and /app/chroma_db

Fixes:

  • Adjust permissions inside container:
    • docker exec -it eqms-backend sh -c "mkdir -p /app/uploads /app/chroma_db && chmod -R 775 /app/uploads /app/chroma_db"
  • Ensure host filesystem isn’t mounted read-only

Backup/restore problems​

Symptoms:

  • pg_dump fails, file empty, or restore errors

Checks:

  • Cron path permissions for dump location
  • pg_dump available inside container (official Postgres image includes it)
  • Database size and disk space

Fixes:

  • Write dumps to a dedicated directory with sufficient space
  • Test restore on staging before relying on backups

Frontend cannot reach backend​

Symptoms:

  • 502/failed fetch from frontend pages

Checks:

  • Frontend points to backend via relative /api or direct host:port
  • In Next.js, ensure rewrites or environment are correct in production
  • Compose: frontend depends_on backend is healthy

Fixes:

  • If hosting behind a reverse proxy later, update NEXT_PUBLIC_API_BASE_URL and CORS accordingly
  • For pilot, ensure frontend requests go to http://<host>:8000