> For the complete documentation index, see [llms.txt](https://documentation.grax.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.grax.com/reuse-data/data-lake/aws-data-lakehouse-iam-role.md).

# AWS Data Lakehouse IAM Role

Some use cases for Data Lake Parquet files involve consuming that Parquet with a third-party tool. Care should be taken to consume these files securely and without exposing the rest of the data in the bucket. The steps below outline how to do this within AWS's S3 and IAM services. Steps may vary for other cloud providers and services.

{% hint style="warning" %}
**Cross Account Guide**

This guide assumes that the consuming Principal is in a different AWS Account than the one that owns the S3 Bucket. If the consuming Principal is in the same AWS Account as the one that owns the S3 Bucket, then the steps below can be simplified.
{% endhint %}

## Determine the Consuming Principal

The first step is to determine the Principal that will be consuming the Parquet files. This is typically an IAM user, role, or account. For the purposes of this example, we will assume the Principal is *anything* owned by a specific AWS Account.

## Set or Modify the S3 Bucket Policy

The next step is to set or modify the S3 Bucket Policy to allow the Principal to access the Parquet files. This can be done by adding a statement to the existing Bucket Policy or by creating a new Bucket Policy. If created anew, the Policy should look something like this with `[MY_BUCKET_NAME]` and `[AWS_ACCOUNT_NUMBER]` replaced with the appropriate values:

```json
{
    "Version": "2012-10-17",
    "Id": "Policy1611277539797",
    "Statement": [
        {
            "Sid": "Parquet_Cross_Account_ListBucket",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[AWS_ACCOUNT_NUMBER]:root"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "parquet/*"
                }
            }
        },
        {
            "Sid": "Parquet_Cross_Account_GetObject",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[AWS_ACCOUNT_NUMBER]:root"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]/parquet/*"
        }
    ]
}
```

## Create an IAM Policy

The next step is to create an IAM Policy that allows the Principal to assume the role that will be created in the next step. The Policy should look something like this with `[MY_BUCKET_NAME]` replaced with the appropriate value:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "parquet/*"
                }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]/parquet/*"
        }
    ]
}
```

## Create an IAM Role

The next step is to create an IAM Role. The IAM Policy created above needs to be attached, and the Trust Policy needs to be set to allow the Principal to assume the role. The Trust Policy should look something like this with `[AWS_ACCOUNT_NUMBER]` replaced with the appropriate value:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Parquet_Cross_Account",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[AWS_ACCOUNT_NUMBER]:root"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

## Assume the Role

At this time, anything matching the allowed Principal scope can assume the role. The role must be assumed to have access; resources from that account will not be able to directly interact with the Parquet files.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.grax.com/reuse-data/data-lake/aws-data-lakehouse-iam-role.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
