Unified Troubleshooting Guide - InnoQualis EQMS
This comprehensive troubleshooting guide consolidates all issue resolution steps for the InnoQualis Electronic Quality Management System (EQMS). It covers deployment issues, development problems, testing failures, authentication issues, and production troubleshooting.
Last Updated: October 24, 2025
Version: Phase 8 In Progress (Documentation Consolidation Complete)
Status: Production Ready
Quick Referenceβ
Emergency Issuesβ
Development Issuesβ
Production Issuesβ
Containers failing healthchecksβ
Symptoms:
- docker ps shows unhealthy state for backend or frontend
- docker compose logs show repeated healthcheck failures
Checks:
- Logs: docker compose -f docker/docker-compose.prod.yml logs -f backend
- Health endpoint: curl http://localhost:8000/health
- Ports open: ensure 8000/3000 accessible locally
Fixes:
- Verify env file path passed via --env-file docker/.env.prod
- Rebuild images: docker compose -f docker/docker-compose.prod.yml --env-file docker/.env.prod build --no-cache
- Ensure db is healthy first (depends_on condition service_healthy)
- Increase healthcheck retries/interval if the VM is slow to start
Backend cannot connect to databaseβ
Symptoms:
- Backend logs show connection refused or authentication errors
- HTTP 500 on API endpoints
Checks:
- DATABASE_URL matches POSTGRES_* settings in env
- DB health: docker compose logs db; pg_isready in healthcheck should pass
- psql test inside container:
- docker exec -it eqms-db psql -U eqms_user -d eqms_db -c "select 1;"
Fixes:
- Align POSTGRES_DB/USER/PASSWORD and DATABASE_URL
- Remove dangling postgres_data volume if credentials changed intentionally:
- docker compose down
- docker volume rm docker_postgres_data (exact name may differ: use docker volume ls)
- docker compose up -d
- Avoid accidental whitespace or URL encoding issues in DATABASE_URL
CORS errors in browserβ
Symptoms:
- Browser console: Access-Control-Allow-Origin missing
- Requests blocked
Checks:
- BACKEND_CORS_ORIGINS includes your frontend origin (http://host:3000 or https://your-domain)
- FastAPI middleware configured (see backend/app/main.ts or equivalent)
Fixes:
- Update BACKEND_CORS_ORIGINS in env file (comma-separated if multiple)
- Restart backend: docker compose up -d backend
Email not sending (SendGrid)β
Symptoms:
- No email received; backend logs show SMTP auth or TLS errors
Checks:
- SMTP_HOST=smtp.sendgrid.net, SMTP_PORT=587, SMTP_USER=apikey, SMTP_PASSWORD set to SendGrid API key
- SMTP_FROM domain is authorized in SendGrid (SPF/DKIM for non-sandbox)
Fixes:
- Correct credentials and from address
- Test outbound connectivity from VM: nc -vz smtp.sendgrid.net 587
- Use a sandbox key in staging to avoid production sends
File uploads/downloads failingβ
Symptoms:
- 500 on upload; 404 or permission denied on download
Checks:
- Volume mounted: backend_uploads volume mapped to /app/uploads in backend
- Directory permissions within container (writable by app user)
- Disk space on VM: df -h
Fixes:
- Ensure volume exists and container user has write permission
- Recreate backend container to pick up correct mount:
- docker compose up -d --force-recreate backend
- Verify file size and type limits (if enforced)
High CPU or memory usageβ
Symptoms:
- Containers OOMKilled or slow responses
Checks:
- docker stats
- Log verbosity (LOG_LEVEL) set too high?
- BACKEND_WORKERS too high for small VM?
Fixes:
- Reduce BACKEND_WORKERS to 2 on small VM
- Set LOG_LEVEL=info
- Add compose resource limits (deploy resources or mem_limit/cpu_shares if supported)
Login/authentication failuresβ
Symptoms:
- 401 Unauthorized despite correct credentials
Checks:
- SECRET_KEY set and consistent across restarts
- Token not expired (client clock skew)
- Role assignment for user
Fixes:
- Rotate SECRET_KEY only with planned downtime; forces re-login
- Check user record and role
- Review backend logs for specific cause
AI assistant errors (OpenAI)β
Symptoms:
- 500 AI service error or timeout
Checks:
- OPENAI_API_KEY defined
- Outbound internet access from VM
- Rate limits exceeded
Fixes:
- Set valid API key or leave empty to disable AI features
- Add retry/backoff client-side
- Consider disabling AI during pilot if not required
Volumes/permissions issuesβ
Symptoms:
- Backend cannot write uploads or chroma db
Checks:
- Volume mounts defined in docker/docker-compose.prod.yml
- Container user permissions on /app/uploads and /app/chroma_db
Fixes:
- Adjust permissions inside container:
- docker exec -it eqms-backend sh -c "mkdir -p /app/uploads /app/chroma_db && chmod -R 775 /app/uploads /app/chroma_db"
- Ensure host filesystem isnβt mounted read-only
Backup/restore problemsβ
Symptoms:
- pg_dump fails, file empty, or restore errors
Checks:
- Cron path permissions for dump location
- pg_dump available inside container (official Postgres image includes it)
- Database size and disk space
Fixes:
- Write dumps to a dedicated directory with sufficient space
- Test restore on staging before relying on backups
Frontend cannot reach backendβ
Symptoms:
- 502/failed fetch from frontend pages
Checks:
- Frontend points to backend via relative /api or direct host:port
- In Next.js, ensure rewrites or environment are correct in production
- Compose: frontend depends_on backend is healthy
Fixes:
- If hosting behind a reverse proxy later, update NEXT_PUBLIC_API_BASE_URL and CORS accordingly
- For pilot, ensure frontend requests go to http://<host>:8000