EC2 powerhouses: the m4.16xlarge and P2 instance family

AWS gave its M4 family of general-purpose EC2 instances a big upgrade by releasing its newest, largest size: the m4.16xlarge. They also released a new GPU-optimized family of P2 instances. We took a close look at both instance types from a cloud efficiency perspective.

First, a look at the m4.16xlarge

Specs and pricing

According to the recent AWS Blog update about the m4.16xlarge, the instance features Intel Xeon E5-2686 v4, 5th-generation Broadwell processors. As featured in other newer EC2 instances, these processors are optimized for EC2.

Here’s a high-level comparison between the m4.10xlarge, c4.8xlarge, and the m4.16xlarge, focusing on ECU, vCPU, network performance, and more:

Name Memory ECU vCPU Network performance EBS optimization Linux On-Demand Cost
m4.16xlarge 256.0 GB 188 units 64 vCPUs 20 Gbps 10,000 Mbps $3.830 per hour
m4.10xlarge 160.0 GB 124.5 units 40 vCPUs 10 Gbps 4,000 Mbps $2.394 per hour
c4.8xlarge 60.0 GB 132 units 36 vCPUs 10 Gbps 4,000 Mbps $1.675 per hour

Both instances require EBS volumes and feature specific EBS-optimized data transfer speeds. For more on specific instance type pricing, see AWS’s official documentation.

Bringing specialized performance to the general-purpose M4

It looks like AWS is experimenting with bringing known, game-changing features from other families into the M4 line with the 16xlarge. For instance, the m4.16xlarge features the ability to configure the C- and P-state controllers. This gives users the ability to adjust the performance level of cores under various workloads and states. Prior to the launch of the m4.16xlarge, this feature could only be found in the c4.8xlarge and the x1.32xlarge instances.

Speaking of the gargantuan x1.32xlarge, the m4.16xlarge features the X1’s Elastic Network Adapter (ENA). Multiple m4.16xlarge instances within the same AWS Placement Group and Availability Zone can achieve network throughput as high as 20 Gbps. This brings incredibly fast speeds to cluster computing environments that feature multiple 16xlarge instances.

Note: ENA throughput will be as fast as the slowest instance in a grouping. For example, combining the m4.16xlarge within a Placement Group with other 8xlarge instances will cap the throughput out to the 8xlarge instances 10 Gbps network throughput.

How cost-efficient is the new m4.16xlarge?

The best way to learn about how cost efficient an EC2 instance can be is by using it in a real-world setting. But before spinning instances up and diving in, we can look at pricing per hour of ECU and memory to get a general idea of what we’re paying for.

ECU and memory cost-efficiency breakdown using U.S. West Linux On-demand pricing:

Name Cost per ECU per hour Cost per GB memory per hour
m4.16xlarge $0.020 per ECU per hour $0.014 per GB per hour
m4.10xlarge $0.019 per ECU per hour $0.014 per GB per hour
c4.8xlarge $0.012 per ECU per hour $0.027 per GB per hour

The m4.16xlarge delivers equal or better costs per ECU and GB of memory than the m4.10xlarge, while offering much more room to scale workloads. The 16xlarge also features a newer-generation CPU chipset, including the C- and P-stage configuration ability described previously.

The c4.8xlarge will still have a lower price per computing power per hour, even versus the m4.16xlarge. However, the m4.16xlarge has more computing and memory capacity per instance, with more competitive costs per GB of memory.

[Edit: 11/14/2016] AWS made an EC2 pricing announcement during November 2016. It stated that as of December 1, 2016, there will be various M4, C4, and T2 price reductions (anywhere from 5 to 25 percent) within specific regions. Spinning up instances within regions that include discounts (e.g., U.S. East (Northern Virginia), EU (Ireland), etc.) can yield even more savings for M4 instance users.

AWS also introduced the P2 instance family

For cloud environments that require the horsepower to run high-performance databases, molecular modeling, deep machine learning, computational finance, and other operations that require enormous amounts of processing power at scale, AWS recommends their new P2 family of instances. They’ve also released their own Deep Learning starter kit for Linux for folks who want to spin up a P2 and begin immediately.

Specs and pricing

The P2 family features three sizes: the p2.xlarge, p2.8xlarge, and the p2.16xlarge. The entire family comes equipped with Intel Xeon E5-2686 v4 Broadwell processors and NVIDIA Tesla K80 GPUs. The 16xlarge offers up to 16 GPUs that utilize GPUDirect™, a peer-to-peer GPU protocol that improves communication speeds all from a single host. Storage-wise, P2 instances also provide EBS optimization at no additional costs.

Name GPUs vCPUs Instance RAM (GB) GPU RAM (GB) Network Bandwidth Linux On-Demand Cost
p2.xlarge 1 4 61 GB 12 GB high $0.90 per hour
p2.8xlarge 8 32 488 GB 96 GB 10 Gbps $7.200 per hour
p2.16xlarge 16 64 732 GB 192 GB 20 Gbps $6.800 per hour

Along with the peer-to-peer GPU communications, the P2 instances also feature ENA-based Enhanced Networking, much like the m4.16xlarge and x1.32xlarge. This supports workloads that require high-performance, low-latency environments.

How is the P2 different from the G2 family?

When AWS announced the latest G2 instance back in 2015, it focused on its GPU-optimized line to support customers with molecular modeling, video rendering, machine learning, and game streaming needs. As big data and machine learning enterprise operations continue to grow, it’s no surprise that AWS would offer a bigger lineup of GPU-optimized instances using the latest chipsets from both Intel and NVIDIA, with the compute specifications to handle enterprise-level deep machine learning and data visualization at scale.

Comparing the p2.8xlarge and the g2.8xlarge by price per GPU

Name Linux On-Demand price Available GPUs Price per GPU per hour
p2.8xlarge $7.200 per hour 8 $0.900 per GPU per hour
p2.8xlarge $7.200 per hour 8 $0.900 per GPU per hour

From the surface, it looks like the G2 is more affordable per GPU, but each G2 GPU offers fewer cores (1,536 cores, compared to the P2’s 2,496 cores per GPU). Between the chipset, GPU, memory, and ENA/EBS optimizations and upgrades, the P2 accommodates and specializes in deep machine learning and data visualization workloads at scale. Where in the same scenarios, the G2 family will hit a limit quickly.

But, the G2 instances will be a more affordable line of graphics-optimized compute for users working with gaming, streaming, and other video rendering requirements.

The bottom line between G2 and P2 instances

Go with the P2: Deep machine learning at scale, genomics, high-performance databases, seismic analysis, computational finance, and other GPU-intensive at-scale workloads.

Spin up the G2: Smaller-scale machine learning, video encoding, 3D application streaming, interactive content streaming, and other graphics-related workloads.

How to determine if any of these EC2 instances could be a fit

On paper, current m4.10xlarge users have a pretty big case to switch over to m4.16xlarge, as they offer competitive cost efficiency with more compute and memory horsepower for projects and environment workloads to grow into.

Meanwhile, G2 users can now upgrade to the P2 family to run deep machine learning and database tasks efficiently and at-scale.

With any consideration between EC2 instances, looking at high-level specifications is the first step. Monitoring actual instance costs and usage data is the best way to start identifying ways to optimize EC2. The accuracy of this exercise requires the use of a proper cloud cost management tool to ingest EC2 data to provide a level of analysis and intelligence to help DevOps and operations teams make better cloud compute decisions.

We invite anyone interested in getting as efficient with EC2 as possible to try Cloudability for free. If there are other questions, about the m4.16xlarge, P2 family, or other instances, our EC2 experts are standing by.

Article Contents



Additional Resources