AWS offers Amazon Hadoop and Map-R for Hadoop as a service. Let’s try to find out how amazon is counting on bills while you’re busy running cluster & executing scripts. As widely known, EMR total cost is EC2 machine cost + very little of Hadoop cost and customer don’t need to pay extra on licenses as required in case of Cloudera.
Well in reality, calculation is not running in straight line. It is more complicated and other factors matter too like NAT Gateway/Instance, S3, window bastion host, RHEL Jump Server. Most important, it is not small fraction of EMR cost and it is 25% extra w.r.t. on-demand even you move to reserved instances.
AWS will say
use calculator: https://calculator.s3.amazonaws.com/index.html and you can calculate EMR cost. Calculator
doesn’t give any facilities to calculate cost when you are using reserve
instances.
Below records are taken from https://aws.amazon.com/emr/pricing/
EC2 Machine Type
|
On-demand EC2 Price, hourly
|
Additional fixed Amazon EMR Price, hourly
|
m3.xlarge
|
$0.266
|
$0.070
|
m3.2xlarge
|
$0.532
|
$0.140
|
m4.large
|
$0.12
|
$0.030
|
m4.xlarge
|
$0.239
|
$0.060
|
m4.2xlarge
|
$0.479
|
$0.120
|
m4.4xlarge
|
$0.958
|
$0.240
|
m4.10xlarge
|
$2.394
|
$0.270
|
Analysis on cost
is presented below:
EC2 Machine Type
|
On-demand EC2 Price. Hourly.
|
1 Year Reserve EC2 Price – No Upfront.
Details added from http://www.ec2instances.info/
|
Additional fixed Amazon EMR Price. Hourly
|
% Increase w.r.t. On-demand
|
% Increase w.r.t. Reserved
|
m3.xlarge
|
$0.266
|
$0.190
|
$0.070
|
25
|
36
|
m3.2xlarge
|
$0.532
|
$0.380
|
$0.140
|
25
|
36
|
m4.large
|
$0.12
|
$0.083
|
$0.030
|
25
|
36
|
m4.xlarge
|
$0.239
|
$0.164
|
$0.060
|
25
|
36
|
m4.2xlarge
|
$0.479
|
$0.329
|
$0.120
|
25
|
36
|
m4.4xlarge
|
$0.958
|
$0.658
|
$0.240
|
25
|
36
|
m4.10xlarge
|
$2.394
|
$1.645
|
$0.270 (Upper Cap)
|
25
|
36
|
- EMR cost is 25% extra with respect to on-demand EC2 machine type.
- If we move to EC2 reserved type, only EC2 cost will be reduced but not the EMR one as AWS will continue to charge you w.r.t on-demand type. For example, switching M4.large on-demand to reserved (1 Year- No upfront) M4.large will reduce EC2 bill from $0.12 to $0.083 but customer will be always paying $0.030 per Hour for EMR. Now customer will be paying 36% extra, if we compare.
- If we increase EC2 machine power, customer will naturally pay for more powerful machine but their EMR cost also going up. It is same Hadoop installable at both place. Customer will pay more if EMR is run over M4.xlargre Vs M4.large. Think Before increasing Machine power, you are also increasing EMR cost.
- Since EMR master node is not reliable. All data should be keep safe on S3, irrespective of long running EMR cluster. Always add up on S3 side cost. S3 is cheaper AWS service but when you put TB of data on S3, it is significant cost as well.
- EMR cost increases as you increase machine power but there is upper cap of $0.270 per Hour.
- If EMR cluster is running with in private subnet of VPC :
¨ You need to access any AWS Services or Internet
services. There is cost of using NAT Gateway/NAT Instance as EMR machines are running
in private subnet. However VPC S3 Endpoint is free.
¨ You also need small Linux Jump server.
- Plan extra for small mysql-RDS, if external metastore for hive/hue to persist users, queries, etc across restarts of the cluster.
