Abhijeet HWe have all done this at some point. You are deploying a new application, and the manager asks, "What...
We have all done this at some point. You are deploying a new application, and the manager asks, "What size VM do we need?"
You don't want to be the person who crashed the production server because of low RAM. So, what do you do? You take the estimated requirement and multiply it by 2 or 4. "Just to be safe."
If the load test hit 60% CPU on 4 vCPUs, we request 8 vCPUs. The VM goes live, runs at 12% utilization, and we never look at it again.
This "Safety-margin culture" is the single biggest reason for cloud waste.
I am currently building CloudSavvy.io to automate this problem, but today I want to share the core engineering logic and the math you need to implement right-sizing yourself without breaking production.
Most organizations size VMs at deployment time and never revisit the decision. This is a structural issue.
Consider a D8s_v5 (8 vCPU, 32 GiB) in East US.
A D4s_v5 (4 vCPU, 16 GiB) costs ~$140/month. It would handle that load with plenty of buffer. If you have 200 VMs like this, the annual waste reaches six figures.
The problem is not that engineers over-provision deliberately. The problem is that right-sizing requires continuous, metrics-driven evaluation—and most teams lack the instrumentation to do it systematically.
Many scripts just look at "Average CPU" and suggest a downsize. This is dangerous. You need to analyze four resource dimensions over a 30-day window.
Raw average is insufficient. You need three statistical views:
This is the most neglected metric. A VM can run at 10% CPU while using 85% of available memory (common for databases and caching workloads).
Formula:
memory_utilization_pct = ((total_memory - available_memory) / total_memory) * 100
If average memory utilization exceeds 80% sustained, the VM is a candidate for Upsizing or a family change (e.g., to E-series), regardless of CPU. If you ignore this, you risk Out Of Memory (OOM) crashes.
Disk performance constrains VM sizing independently of CPU. Azure VM SKUs have hard ceilings.
If your workload sustains 5,800 IOPS and you downsize to a D2s because "CPU is low," you will hit I/O throttling and the application will lag. Always compare P95 IOPS against the target SKU limit.
Similar to disk, network bandwidth is SKU-dependent. If sustained network throughput exceeds 60% of the target SKU's ceiling, block the downsize. Network-bound workloads (like API gateways) often have low CPU but cannot tolerate bandwidth reduction.
You cannot rely on simple thresholds. You need a decision framework.
Here is the logic flow:
Step 1: Coverage Gate
If cpu_hours < 648 (90% of 720 hours/30 days), BLOCK. Do not guess with insufficient data.
Step 2: Classification
cpu_sustained_low = (cpu_p95 < 20%) AND (cpu_avg < 15%)memory_low = (memory_p95 < 40%)memory_high = (memory_p95 >= 75%)Step 3: Action Determination
IF cpu_sustained_low AND memory_low:
IF cpu_sustained_low AND memory_high:
IF cpu_high AND memory_low:
IF CPU variability is high (stddev/mean > 0.6):
Step 4: Guardrails
target_sku_max_iops < current_disk_iops_p95 * 1.2 → BLOCK.Scenario A: The Memory-Bound Database
Scenario B: The GPU Mistake
If you are implementing this, keep in mind:
Right-sizing is not just about cost minimization—it is cost-to-performance optimization. The goal is to eliminate waste without introducing performance risk.
A one-time audit is not enough because workloads change. If you automate this logic effectively, you can maintain performance while significantly reducing your Azure bill.
If you are looking for a tool that automates this entire decision framework, do check out CloudSavvy.io.
Let me know in the comments if you have faced issues with IOPS throttling after resizing!