LogoLogo
TrustAPI Docs
  • Application
  • Support
  • Platform
  • Infrastructure
  • Security
  • Notices
  • Overview
  • Protect Data
    • Auto Backup
      • Auto Backup API Usage
      • Supported Objects
      • Delete Tracking
      • Salesforce Metadata Backup
      • Missing Field Permissions
      • Viewing Records
      • Viewing Files
    • Archive
    • Restore
      • Restore Best Practices
    • Purge
  • Reuse Data
    • Data Replication
      • Data Replication API Usage
      • Supported Objects
      • Delete Tracking
      • Missing Field Permissions
      • Viewing Records
      • Viewing Files
    • Global Search
    • Data Lake (formerly History Stream)
      • AWS Data Lakehouse
      • DuckDB Data Lake
      • Heroku Data Lakehouse
      • Azure Data Lake
      • Data Lake FAQ
      • Data Lake v1 (formerly History Stream)
    • Salesforce Sandbox Seeding
      • Sandbox Seeding Walkthrough
    • Public API
    • Managed Package
      • Second Generation
        • Features
        • Install
        • Update
        • Uninstall
      • First Generation
        • Features
        • Configure
        • Uninstall
        • Migrate
      • Frequently Asked Questions
  • Other
    • Settings
      • Connecting Salesforce
      • Connecting Storage
      • Sandbox Refresh
    • Notifications
    • Permissions
      • Integration User
      • Integration User Scripts
    • Troubleshooting
      • Debugging Salesforce Triggers
    • Auto Updates

Copyright © 2025 GRAX, Inc.

On this page
  • Determine the consuming Principal
  • Set or Modify the S3 Bucket Policy
  • Create an IAM Policy
  • Create an IAM Role
  • Assume the Role

Was this helpful?

Export as PDF
  1. Reuse Data
  2. Data Lake (formerly History Stream)

Data Lake FAQ

Some use cases for Data Lake Parquet files involve consuming that Parquet with a third-party tool. Care should be taken to consume these files securely and without exposing the rest of the data in the bucket. The steps below outline how to do this within AWS's S3 and IAM services. Steps may vary for other cloud providers and services.

Cross Account Guide

This guide assumes that the consuming Principal is in a different AWS Account than the one that owns the S3 Bucket. If the consuming Principal is in the same AWS Account as the one that owns the S3 Bucket, then the steps below can be simplified.

Determine the consuming Principal

The first step is to determine the Principal that will be consuming the Parquet files. This is typically an IAM user, role, or account. For the purposes of this example, we will assume the Principal is anything owned by a specific AWS Account.

Set or Modify the S3 Bucket Policy

The next step is to set or modify the S3 Bucket Policy to allow the Principal to access the Parquet files. This can be done by adding a statement to the existing Bucket Policy or by creating a new Bucket Policy. If created anew, the Policy should look something like this with [MY_BUCKET_NAME] and [AWS_ACCOUNT_NUMBER] replaced with the appropriate values:

{
    "Version": "2012-10-17",
    "Id": "Policy1611277539797",
    "Statement": [
        {
            "Sid": "Parquet_Cross_Account_ListBucket",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[AWS_ACCOUNT_NUMBER]:root"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "parquet/*"
                }
            }
        },
        {
            "Sid": "Parquet_Cross_Account_GetObject",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[AWS_ACCOUNT_NUMBER]:root"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]/parquet/*"
        }
    ]
}

Create an IAM Policy

The next step is to create an IAM Policy that allows the Principal to assume the role that will be created in the next step. The Policy should look something like this with [MY_BUCKET_NAME] replaced with the appropriate value:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "parquet/*"
                }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::[MY_BUCKET_NAME]/parquet/*"
        }
    ]
}

Create an IAM Role

The next step is to create an IAM Role. The IAM Policy created above needs to be attached, and the Trust Policy needs to be set to allow the Principal to assume the role. The Trust Policy should look something like this with [AWS_ACCOUNT_NUMBER] replaced with the appropriate value:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Parquet_Cross_Account",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[AWS_ACCOUNT_NUMBER]:root"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Assume the Role

At this time, anything matching the allowed Principal scope can assume the role. The role must be assumed to have access; resources from that account will not be able to directly interact with the Parquet files.

Last updated 1 month ago

Was this helpful?