Services
Five kinds of work I take.
Every engagement is delivered by me, remotely, in your repos and your AWS accounts. You get working infrastructure and a written record of why it's shaped that way, so nothing depends on me after I leave.
platform
Cloud platform architecture
The situation
The AWS environment grew by accretion. The VPC layout predates half the team, security groups reference services that no longer exist, and the one engineer who understands the database failover left last spring. Every new workload makes it harder to change.
What I do
I rebuild environments to current standards while they keep serving traffic. At Postscript I re-architected the full AWS estate (VPC, ECS, EKS, MSK, RDS Aurora, Elasticache) under live load measured in tens of millions of requests a day. The method is the same at any size: inventory what exists, sequence changes so nothing user-facing blinks, and put every resource in Terraform as it moves.
What you get
- An architecture and migration plan sequenced against your traffic, not a greenfield fantasy
- Terraform for every resource touched, in your workspaces, reviewed through your PRs
- Multi-region only where the failure math justifies it, with the cost modeled first
- Written handover: diagrams, runbooks, and the decisions with their reasoning
kubernetes
Kubernetes and EKS operations
The situation
The EKS upgrade is two versions overdue because nobody wants to own the blast radius. ArgoCD is half-adopted. Node groups were sized once, in 2023, and the cluster bill has crept up ever since.
What I do
I run EKS with ArgoCD, Helm, Karpenter, and KEDA as the default toolchain, and I've operated it on-call at two companies. That includes the migrations nobody volunteers for: ECS to EKS (done for business-critical services at CVS Digital), version upgrades across large fleets, and moving node groups to Bottlerocket and Spot where the workload tolerates it.
What you get
- Cluster upgrades executed and documented, with a repeatable path for the next one
- GitOps delivery through ArgoCD your team can operate without me
- Autoscaling (Karpenter, KEDA) tuned against real traffic, with the bill as a first-class metric
- A written operating model: who owns what, what pages, what the escalation path is
delivery
Terraform and CI/CD at scale
The situation
Infrastructure lives in three places: some CloudFormation, some Terraform, some console changes nobody wrote down. CI takes forty minutes and still needs a stored AWS key that security has flagged twice.
What I do
I led a migration of hundreds of resources from CloudFormation to Terraform at Aetna Digital, and I run multi-region Terraform on Terraform Cloud today. Pipelines get the same treatment: GitHub Actions with OIDC role assumption instead of long-lived keys, plans posted on PRs, applies gated and auditable.
What you get
- A migration path from CloudFormation or console-managed resources into Terraform, executed in slices
- Terraform Cloud (or your backend) workspace design with state, permissions, and drift handled
- CI/CD with OIDC auth, plan-on-PR, and no stored cloud credentials anywhere
- Module patterns your engineers copy instead of reinventing
cost
Cloud cost engineering
The situation
The AWS bill grows faster than traffic and nobody can say which team, feature, or decision is responsible. Finance asks; engineering shrugs; the reserved-instance renewal is next quarter.
What I do
Cost work is where I have the longest receipts: an AWS bill cut from roughly $500K to $200K a year at MyRounding, $350K+ a year saved moving GoPro transcode fleets to Spot, S3 lifecycle policies across petabytes worth hundreds of thousands more, and ~$4M a year of spend at Postscript held flat while traffic grew. The pattern: attribute spend first, then take the wins in order of effort-to-savings, then build the controls that stop the regrowth.
What you get
- Spend attribution by service, team, and workload, so every line item has an owner
- Executed savings: Spot, right-sizing, storage lifecycle, purchase commitments
- Budgets and alarms that page before finance notices
- Support on the AWS contract itself; I have negotiated these on the customer side
ai ops
AI ops and readiness
The situation
Your engineers are already pointing AI agents at infrastructure, or they want to and can't risk it. The worry is legitimate: one ungoverned apply in the wrong account is a bad afternoon at best. So the throughput either goes unused or gets used with nothing watching it.
What I do
I run Claude Code every day as a force multiplier on a live platform: parallel sub-agents inventory 50+ microservices across Terraform, ArgoCD, and AWS and drive changes from plan toward production. What makes that safe is the scaffolding around the model — an adversarial production-safety review plus an automated create-only / no-prod-reference gate that runs before any terraform apply, with a human approving every production-affecting change and nothing reaching prod until it proves out in dev. I build that scaffolding for teams that want the speed without betting the account on it.
What you get
- Guardrails as code: a create-only / no-prod-reference gate enforced before any apply, plus a production-safety review the agent has to pass
- A human-in-the-loop workflow where agents propose changes and a policy gate plus review decide whether they ship
- Read-only agent workflows for the low-risk wins first: EKS health triage, Terraform and Helm PR review
- Enablement so your team runs it after I leave: the runbook, and an honest map of where not to point an agent yet
Not sure which of these you need?
Describe the problem and I'll tell you what I'd do about it, including the case where the answer is "you don't need me for this."