# Heroku Data Lakehouse

GRAX Data Lake automatically organizes your CRM data as Parquet files in S3 for arbitrary on-demand analytics queries.

AWS Athena lets you query your data lake data by SQL queries that run on the Athena in a serverless and scalable fashion.

Heroku is a serverless PaaS (Platform as a Service) that allows you to deploy applications without worrying about the underlying infrastructure.

GRAX provides a Heroku Add-on to make querying your Athena Data Lake easy.

## Connection Details

When you add the GRAX Data Lake add-on to your Heroku application, it exposes environment variables that allow you to make an AWS Athena connection.

They are:

```bash
GRAX_AWS_ACCESS_KEY_ID=
GRAX_AWS_SECRET_ACCESS_KEY=
GRAX_AWS_REGION=
GRAX_S3_STAGING_DIR=
GRAX_ATHENA_WORKGROUP=
GRAX_ATHENA_DATABASE=
```

## Jupyter Notebook

You can deploy a configurable JupyterHub installation that expects the add-on environment variables from the add-on and automatically configures a client that returns a pandas data frame.

[![Deploy to Heroku](https://www.herokucdn.com/deploy/button.png)](https://www.heroku.com/deploy/?template=https://github.com/graxlabs/grax-jupyter/tree/main)

```python
import grax_athena
df = grax_athena.query_data_lake("SELECT COUNT(*) FROM object_account")
```

## Connecting from Python

You can use any Athena client to connect to and query the data lake. Here is an example using the `pyathena` client:

```python
import os
from pyathena import connect

# Retrieve AWS credentials from environment variables
aws_access_key_id = os.environ.get('GRAX_AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('GRAX_AWS_SECRET_ACCESS_KEY')
aws_region = os.environ.get('GRAX_AWS_REGION', 'us-east-1')

# Athena connection parameters
s3_staging_dir = os.environ.get('GRAX_S3_STAGING_DIR')
athena_database = os.environ.get('GRAX_ATHENA_DATABASE')
athena_workgroup = os.environ.get('GRAX_ATHENA_WORKGROUP')

# Establish connection to Athena
conn = connect(aws_access_key_id=aws_access_key_id,
                aws_secret_access_key=aws_secret_access_key,
                s3_staging_dir=s3_staging_dir,
                work_group=athena_workgroup,
                region_name=aws_region)
print("Connection to Athena established successfully.")

# Execute the query
cursor = conn.cursor()
query = f"SELECT COUNT(*) FROM {athena_database}.\"object_account\""
cursor.execute(query)

# Fetch and print the result
result = cursor.fetchone()
count = result[0]
print(f"Number of rows in object_account: {count}")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.grax.com/reuse-data/data-lake/heroku-data-lakehouse.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
