Sanchit Dilip Jain/Overview of Amazon Athena for Apache Spark 💡

Created Mon, 21 Aug 2023 12:00:00 +0000 Modified Sun, 12 May 2024 01:47:18 +0000
532 Words 2 min

Overview of Amazon Athena for Apache Spark

Introduction

  • Amazon Athena for Apache Spark provides interactive analytics under a second to analyze petabytes of data using open source spark framework. Interactive Spark applications start instantly and run faster with our optimized Spark runtime, so you spend more time on insights, not waiting for results.
  • Build Spark applications using the expressiveness of Python with a simplified notebook experience in an Athena console or through Athena APIs.
  • With the Athena serverless, fully managed model, there are no resources to manage, provision, and configure and no minimum fee or setup cost. You only pay for the queries that you run.

How it works

how-it-works

Benefits

  • Accelerate time to insights

    Spend more time on insights, not on waiting for results. Interactive Spark applications start in under a second and run faster with optimized Spark runtime.

  • Harness Spark for complex, powerful analytics

    Use the expressiveness of Python with the popular open-source Spark framework to seek more complex insights from your data. Use notebooks to query data, chain calculations, and visualize results.

  • Build applications without managing resources

    Run Spark applications cost-effectively, without provisioning and managing resources. Build Spark applications without worrying about Spark configurations or version upgrades.

  • Work with your data where it lives

    Work with data in various data lakes, in open-data formats, and with your business applications without moving the data. Use data discovered and categorized by AWS Glue to build your Spark insights.

Demo

  • Objective:

    • In this blog, we will explore the features of Amazon Athena for Apache Spark and run hands-on demo that demonstrate features and best practices. By the end of the blog, you will be able:

      • Create an Amazon Athena workgroup with Spark as the analytics engine
      • Create notebooks and run calculations in notebook
      • Use Cloudwatch logs for monitoring and debugging
  • Prerequisite:

    • Go to Amazon S3 Console by clicking here. Select Create bucket, provide bucket name example athena-spark-datastore and click Create bucket.
  • Create Spark workgroup and Notebook:

    • We will be creating new workgroup required for spark execution for that please go to Athena console by clicking here

    • Select Notebook explorer from the left panel and Click on create workgroup on top right handside.

    • Fill in Workgroup name : AthenaSparkWorkgroup & Description - optional : spark workgroup as shown the screenshot. Next, Make sure you select Analytics engine as Apache spark.

    • Next, click Create workgroup.

    • Workgroup will be successfully created.

    • Download the Notebook. Click here

    • Import Notebook into Athena. In Notebook explorer, Click on Import File, Make sure you have selected AthenaSparkWorkgroup.

    • Please Note: IAM role for spark execution add necessary S3 permission as below.

    • Goto, Notebook Editor and Run all the cells in the Notebook.

  • Data preparation and exploration:

    • In this section we will show how to use Amazon Athena for Apache Spark to interactively run data analytics and exploration without the need to plan for, configure, or manage resources.

    • Download the Notebook. Click here

    • In Notebook explorer, Click on Import File, Make sure you have selected AthenaSparkWorkgroup.

    • Please Note: IAM role for spark execution add necessary S3 and Glue permissions as below.

    • Go to Notebook Editor and Run all the cells in the Notebook.

Resources

  • Visit this page to find the latest documentation.