This repository demonstrates a production-style enterprise DevOps platform architecture built on AWS EKS using Infrastructure-as-Code, CI/CD automation, observability tooling, and disaster-recovery backup workflows.
The platform provisions Kubernetes infrastructure and deploys:
- MySQL (stateful backend)
- Keycloak (identity provider)
- Prometheus + Grafana (observability stack)
- Custom web application (Helm-based deployment)
- Automated DB backup workflows
- Cluster health monitoring utilities
Designed to reflect real-world Platform Engineering / SRE practices.
flowchart LR
Dev[Developer Push] --> GH[GitHub Repo]
GH --> GHA[GitHub Actions CI/CD]
GHA --> TF[Terraform]
TF --> AWS[AWS VPC + EKS Cluster]
GHA --> HELM[Helm Deployments]
HELM --> MYSQL[MySQL]
HELM --> KEYCLOAK[Keycloak]
HELM --> PROM[Prometheus]
HELM --> GRAFANA[Grafana]
HELM --> WEBAPP[Web Application]
MYSQL --> BACKUP[Automated Backup Script]
EKS --> MONITOR[Cluster Health Monitor]
| Layer | Tools |
|---|---|
| Cloud | AWS |
| IaC | Terraform |
| Container Orchestration | Kubernetes (EKS) |
| Package Manager | Helm |
| CI/CD | GitHub Actions |
| Monitoring | Prometheus + Grafana |
| Identity | Keycloak |
| Backup Automation | Python |
| Observability Scripts | Python |
enterprise-devops-platform/
│
├── terraform/ # VPC + EKS infrastructure provisioning
├── helm/ # Platform service deployment scripts
├── .github/workflows/ # CI/CD automation pipelines
├── backup/ # MySQL backup automation
├── monitoring/ # Kubernetes health monitoring toolkit
└── README.md
cd terraform
terraform init
terraform apply
Creates:
- Multi-AZ VPC
- Private/Public subnets
- Managed EKS cluster
- IRSA-enabled node groups
aws eks update-kubeconfig \
--region ap-south-1 \
--name enterprise-platform-cluster
Verify:
kubectl get nodes
bash helm/deploy.sh
Deploys:
- MySQL
- Keycloak
- Prometheus
- Grafana
- Web application
python monitoring/cluster_monitor.py
Detects:
- unhealthy pods
- node readiness issues
- restart loops
- PVC visibility
python backup/backup.py
Performs:
- MySQL logical dumps
- Cassandra snapshot support (extendable)
- checksum validation
- S3 archival (optional extension)
Supports:
- S3 backend storage
- DynamoDB locking
- encrypted state management
- multi-environment structure
Example:
terraform {
backend "s3" {
bucket = "enterprise-devops-tf-state"
key = "platform/eks/terraform.tfstate"
region = "ap-south-1"
encrypt = true
}
}
Prometheus collects:
- node metrics
- pod metrics
- container metrics
- API server metrics
Grafana dashboards visualize:
- cluster health
- resource usage
- workload trends
This platform demonstrates:
- Infrastructure as Code provisioning
- Helm-based service lifecycle management
- automated rollback support
- backup checksum validation
- cluster health alert hooks
- restart-loop detection logic
- PVC inventory checks
Aligned with SRE and Platform Engineering best practices.
Supports environment separation:
terraform/environments/dev
terraform/environments/stage
terraform/environments/prod
Each environment can deploy independent clusters.
Planned improvements:
- ArgoCD GitOps deployment
- Secrets encryption via Sealed Secrets
- ExternalDNS automation
- AWS Load Balancer Controller integration
- Cluster Autoscaler enablement
- Multi-region DR architecture