DevOps Team Lead
25+ Kubernetes clusters, 70+ servers spanning on-premise bare metal and AWS EKS. Built from the first node to production-grade multi-cluster operations.
NVIDIA device plugin, GPU scheduling, spot instance management for AI/ML inference pipelines. Making expensive hardware earn its keep.
Built the entire monitoring stack: Prometheus, Grafana, Loki, LibreNMS. Went from "the client told us it's down" to proactive incident detection.
ArgoCD, Helm charts, Jenkins, GitHub Actions, Bash/Python automation. If it can be automated, it should be. If it can't, rewrite it until it can.
AWS (EKS, EC2, S3, IAM, VPC), on-prem to cloud migrations, capacity planning. Moved workloads to EKS in 72 hours, zero data loss.
Deployed AI application across 30+ on-premise servers for 3+ government projects. Built the automation and monitoring layer from scratch.
7 Kubernetes clusters across 7 physical locations with 35 servers. Each with unique configurations, all managed remotely.
4-node cluster (2 GPU + 2 storage) on limited hardware. Made it work anyway, because "get better hardware" wasn't in the budget.
Full migration from on-premise to AWS EKS in 3 days. BOQ, provisioning, workload migration, DNS cutover, zero data loss.
Replaced the "client → CSM → manager → engineer" complaint chain with actual monitoring. Revolutionary concept, apparently.
I write technical articles explaining DevOps concepts the way I wish someone explained them to me, with real scenarios, actual commands, and the occasional production horror story.
Deep-diving into networking and traffic flow in distributed systems, because "it works on localhost" isn't a valid network architecture.
SLOs, observability practices, incident response frameworks. Building systems that fail gracefully instead of spectacularly.
Open to Senior DevOps / SRE opportunities. If you need someone who builds infrastructure from scratch and keeps it running. Let's connect.