Heroku Data Lakehouse
GRAX Data Lake automatically organizes your CRM data as Parquet files in S3 for arbitrary on-demand analytics queries.
AWS Athena lets you query your data lake data by SQL queries that run on the Athena in a serverless and scalable fashion.
Heroku is a serverless PaaS (Platform as a Service) that allows you to deploy applications without worrying about the underlying infrastructure.
GRAX provides a Heroku Add-on to make querying your Athena Data Lake easy.
Connection Details
When you add the GRAX Data Lake add-on to your Heroku application, it exposes environment variables that allow you to make an AWS Athena connection.
They are:
GRAX_AWS_ACCESS_KEY_ID=
GRAX_AWS_SECRET_ACCESS_KEY=
GRAX_AWS_REGION=
GRAX_S3_STAGING_DIR=
GRAX_ATHENA_WORKGROUP=
GRAX_ATHENA_DATABASE=Jupyter Notebook
You can deploy a configurable JupyterHub installation that expects the add-on environment variables from the add-on and automatically configures a client that returns a pandas data frame.
Connecting from Python
You can use any Athena client to connect to and query the data lake. Here is an example using the pyathena client:
Last updated
Was this helpful?

