Friday, October 24, 2025

Enhancing Cost Transparency for Machine Learning Workloads on Amazon EKS with AWS Split Cost Allocation Data

Share

We are excited to introduce split cost allocation support for accelerated workloads in Amazon Elastic Kubernetes Service (EKS). This enhancement to Split Cost Allocation Data for EKS enables customers to track container-level resource costs for accelerator-powered workloads. Split Cost Allocation Data now utilizes Trainium, Inferentia, NVIDIA, and AMD GPUs, complementing existing CPU and memory cost tracking capabilities. This cost data is available in the AWS Cost and Usage Report (legacy and CUR 2.0), providing organizations with a consolidated view of their cloud expenditures. This feature is now available across all AWS commercial regions (excluding China regions) at no additional cost to customers.

The Challenges of Monitoring and Allocating Container Costs for Accelerated Workloads

Organizations are increasingly leveraging accelerator-powered workloads on Amazon EKS to power Artificial Intelligence (AI) applications, including Machine Learning (ML) and Generative AI applications. These specialized workloads typically run in multi-tenant clusters, utilizing shared Amazon Elastic Compute Cloud instances to host multiple application containers. The high demand and value associated with accelerator resources make it essential to optimize their usage, ensuring maximum return on investment.

These clusters often support application workloads spanning across various teams, departments, and environments. Consequently, customers require granular cost visibility and accountability to accurately allocate expenses, set budgets, and promote efficient resource utilization. Relying solely on CPU and memory metrics for accelerated workloads provides an incomplete view of infrastructure usage, which can lead to misallocation. Therefore, customers increasingly seek detailed pod-level usage data for accelerator resources alongside traditional metrics. This need often pushes them towards homegrown solutions or costly third-party products, adding complexity to resource management.

Get Granular Cost Visibility for Accelerated Workloads Running on EKS with Split Cost Allocation Data

The newly added accelerator support in the Split Cost Allocation Data for EKS provides customers with a native AWS solution that allows visibility into the cost and usage of Kubernetes pods, based on the actual utilization of accelerators (Trainium, Inferentia, NVIDIA, and AMD GPUs), CPUs, and memory. This capability is particularly powerful as it enables organizations to harness cost allocation tags, including aws:eks:cluster-name, aws:eks:namespace, aws:eks:node, aws:eks:workload-type, aws:eks:workload-name, and aws:eks:deployment. These tags, automatically enabled for accelerator-powered pods, facilitate a consolidated view of applications’ costs and resource usage in shared, multi-tenant environments.

Granular cost data allows customers to allocate Inferentia, Trainium, and GPU expenses accurately across respective cost centers. This not only fosters accountability in resource usage but also informs critical product prioritization decisions. Additionally, the Split Cost Allocation Data feature aids in identifying unused compute resources, enabling customers to optimize their cluster configurations and container reservations to minimize inefficiencies. This alleviates the need for developing custom cost management tools, which can be both resource-intensive and financially burdensome to maintain.

Machine learning workload customers can opt-in to Split Cost Allocation Data for Amazon EKS through the AWS Billing and Cost Management Console. Once opted in, the system automatically scans for clusters across all accounts within the organization, ingesting accelerator, CPU, and memory reservation data for container workloads, and preparing detailed cost data for the current month. This feature automatically calculates split allocation cost metrics, including GPU usage per Kubernetes pod, accounting for the amortized costs of Amazon EC2 instances and applicable discounts. Customers can use the aforementioned cost allocation tags to categorize costs conveniently, gaining insights at hourly, daily, or monthly granularity and enabling internal chargebacks.

For specific instructions on enabling split cost allocation data for EKS, please refer to Understanding split cost allocation data.

How EKS Split Cost Allocation Works

To utilize this feature, customers must first activate Split Cost Allocation Data. For existing users of this capability, it will be enabled automatically. The process ingests accelerator, CPU, and memory reservations along with actual utilization, utilizing the greater of reservation and usage to compute the allocated resources for each pod.

To illustrate this, consider an example with a single EC2 instance, running 4 pods across two namespaces. Suppose the instance type is a p3.16xlarge featuring 8 GPUs, 64 vCPUs, and 488 GB of RAM, with an on-demand cost of $10 per hour. If the instance is a commitment (Savings Plan or Reserved Instance), the net amortized cost will be employed for calculations. The Split Cost Allocation Data calculates a normalized cost per resource based on a relative ratio of GPU to CPU and memory of 9:1, implying each GPU costs nine times more than one unit of CPU or memory.

Step #1 – Compute the Unit Cost

Using the specified accelerator (Trainium, Inferentia, and GPU) alongside CPU and memory resources on the EC2 instance, the Split Cost Allocation Data calculates the unit cost for GPU-hr, vCPU-hr, and GB-hr at $0.50, $0.05, and $0.005, respectively.

Step #2 – Calculate Allocated and Unused Capacity

By assessing the GPU, vCPU, and memory requests alongside actual usage across the four Kubernetes pods, the system computes allocated resources. For instance, if Pod 2 uses more GPU, CPU, and memory than requested (due to the lack of a defined limit), the Split Cost Allocation Data computes the allocations based on the higher actual or requested usage. The allocated values can indicate zero unused vGPU and vCPU, albeit revealing unallocated memory (e.g., 48 GB).

Step #3 – Compute Utilization Ratios and Split Usage Ratios

The ratio of split-usage is calculated as the percentage of allocated CPU or memory for each pod compared to overall available resources on the EC2 instance. Similarly, an unused ratio is determined based on allocated resources against all resources available. For example, with 48GB of unallocated memory, it represents a particular ratio relative to total instance memory allocations.

Step #4 – Compute the Split and Unused Costs

Once the pod-level split costs are computed using split-usage ratios multiplied by the per resource costs, any unused resource costs (like the aforementioned 48GB of memory) can be redistributively allocated to the pods based on computed unused ratios. This allows for both specific pod-level costs and aggregate costs at the namespace level, providing a comprehensive view that is also amenable to further categorization via cost allocation tags.

What Are the New Cost and Usage Report Columns?

For existing users of split cost allocation data, there will be no new columns introduced; the accelerator support will utilize the current structure. However, new users can expect to see Kubernetes pod-level metrics in their CUR reports, such as “SplitLineItem/SplitUsage,” revealing GPU, vCPU, or memory allocation across specified timeframes at the pod level. For more detailed information, refer to the CUR data dictionary.

The demo CUR report shows how this data will appear in the new columns. You can access the Containers Cost Allocation dashboard to visualize EKS costs in Amazon QuickSight and employ the CUR query library for querying EKS costs using Amazon Athena.

Read more

Related updates