Running Apache Spark on AWS

Spark on EMR, Glue and Fargate

When you browse the Internet looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon’s Hadoop distribution suited to run in AWS cloud environment. It’s quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it’s a cumbersome task.

They say to a man with a hammer everything looks like a nail and I felt into this trap with EMR once. I wrote an article, describing two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that I use in my projects for Acast. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for “one service fits all” approach in AWS world.

Check out!

Written on December 12, 2019