Overview of Amazon Athena for Apache Spark
Introduction
- Amazon Athena for Apache Spark provides interactive analytics under a second to analyze petabytes of data using open source spark framework. Interactive Spark applications start instantly and run faster with our optimized Spark runtime, so you spend more time on insights, not waiting for results.
- Build Spark applications using the expressiveness of Python with a simplified notebook experience in an Athena console or through Athena APIs.
- With the Athena serverless, fully managed model, there are no resources to manage, provision, and configure and no minimum fee or setup cost. You only pay for the queries that you run.
How it works
Benefits
Accelerate time to insights
Spend more time on insights, not on waiting for results. Interactive Spark applications start in under a second and run faster with optimized Spark runtime.
Harness Spark for complex, powerful analytics
Use the expressiveness of Python with the popular open-source Spark framework to seek more complex insights from your data. Use notebooks to query data, chain calculations, and visualize results.
Build applications without managing resources
Run Spark applications cost-effectively, without provisioning and managing resources. Build Spark applications without worrying about Spark configurations or version upgrades.
Work with your data where it lives
Work with data in various data lakes, in open-data formats, and with your business applications without moving the data. Use data discovered and categorized by AWS Glue to build your Spark insights.
Demo
Objective:
In this blog, we will explore the features of Amazon Athena for Apache Spark and run hands-on demo that demonstrate features and best practices. By the end of the blog, you will be able:
- Create an Amazon Athena workgroup with Spark as the analytics engine
- Create notebooks and run calculations in notebook
- Use Cloudwatch logs for monitoring and debugging
Prerequisite:
- Go to Amazon S3 Console by clicking here. Select Create bucket, provide bucket name example
athena-spark-datastore
and click Create bucket.
- Go to Amazon S3 Console by clicking here. Select Create bucket, provide bucket name example
Create Spark workgroup and Notebook:
We will be creating new workgroup required for spark execution for that please go to Athena console by clicking here
Select Notebook explorer from the left panel and Click on create workgroup on top right handside.
Fill in Workgroup name :
AthenaSparkWorkgroup
& Description - optional :spark workgroup
as shown the screenshot. Next, Make sure you selectAnalytics engine
as Apache spark.Next, click Create workgroup.
Workgroup will be successfully created.
Download the Notebook. Click here
Import Notebook into Athena. In Notebook explorer, Click on Import File, Make sure you have selected AthenaSparkWorkgroup.
Please Note: IAM role for spark execution add necessary S3 permission as below.
Goto, Notebook Editor and Run all the cells in the Notebook.
Data preparation and exploration:
In this section we will show how to use Amazon Athena for Apache Spark to interactively run data analytics and exploration without the need to plan for, configure, or manage resources.
Download the Notebook. Click here
In Notebook explorer, Click on Import File, Make sure you have selected AthenaSparkWorkgroup.
Please Note: IAM role for spark execution add necessary S3 and Glue permissions as below.
Go to Notebook Editor and Run all the cells in the Notebook.
Resources
- Visit this page to find the latest documentation.