# Snowflake Tables from Data Lake

This guide sets up a Snowflake integration that reads Salesforce data from your GRAX Data Lake S3 bucket, loads it into Snowflake tables, and keeps those tables refreshed on a schedule.

Every command runs inside a Snowflake worksheet except one edit to an AWS IAM role trust policy in Step 2. All SQL is given as copy-paste blocks; run them top-to-bottom.

The example names (`GRAX_DATA`, `GRAX_WH`, `GRAX_S3_INT`, etc.) can be changed to match your naming standards, but if you change one, update every occurrence. Values written as `<ALL_CAPS_IN_ANGLE_BRACKETS>` are placeholders you must replace with your own values before running the SQL.

## Before you start

Gather the items below. If anything is missing, stop here and contact GRAX Support or your AWS administrator.

| Item                             | Details                                                                                                                                                                                                                                                                                                                                                                   |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| S3 bucket name                   | The S3 bucket used by your GRAX application for the Data Lake. Available in your GRAX environment configuration.                                                                                                                                                                                                                                                          |
| Salesforce org ID                | The 15- or 18-character ID of the org whose data you want to load. Also in your GRAX environment configuration.                                                                                                                                                                                                                                                           |
| AWS IAM role ARN                 | A role Snowflake will assume to read from S3, with `s3:GetObject` and `s3:ListBucket` on the `parquet/v2/` prefix of the bucket. If you don't already have one, follow [Snowflake's S3 storage integration guide](https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration) (Steps 1 and 2 only; the rest is covered below), or ask GRAX Support. |
| AWS trust-policy edit access     | You will paste two values into that role's trust policy in Step 2. This is the only AWS-side change required.                                                                                                                                                                                                                                                             |
| Snowflake privileges             | `ACCOUNTADMIN`, or a role with `CREATE INTEGRATION` and `EXECUTE TASK` granted at the account level.                                                                                                                                                                                                                                                                      |
| Snowflake database and warehouse | Used to hold the tables and run the copy task. The examples below use `GRAX_DATA` and `GRAX_WH`. If you don't have them yet, create them with the block below the table.                                                                                                                                                                                                  |

```sql
CREATE DATABASE IF NOT EXISTS GRAX_DATA;
CREATE WAREHOUSE IF NOT EXISTS GRAX_WH
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;
```

The rest of this guide uses the `PUBLIC` schema, which exists by default in every Snowflake database, so no schema creation is needed. If you want to isolate this integration from other workloads in `GRAX_DATA`, create a dedicated schema now and substitute its name for `PUBLIC` everywhere below:

```sql
CREATE SCHEMA IF NOT EXISTS GRAX;
```

## Step 1: Create the storage integration

The storage integration is Snowflake's record of which AWS role it will assume and which S3 locations it is permitted to read. Replace the three placeholders below with the role ARN, bucket name, and Org ID you gathered above.

```sql
USE ROLE ACCOUNTADMIN;

CREATE STORAGE INTEGRATION GRAX_S3_INT
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = '<YOUR_SNOWFLAKE_IAM_ROLE_ARN>'
  STORAGE_ALLOWED_LOCATIONS = ('s3://<YOUR_S3_BUCKET_NAME>/parquet/v2/org=<YOUR_SALESFORCE_ORG_ID>/');
```

## Step 2: Finish the AWS trust policy

The integration you just created has a generated IAM user and external ID. Paste both into the IAM role's trust policy so the role will accept Snowflake's assume-role calls.

```sql
DESC INTEGRATION GRAX_S3_INT;
```

From the output, note two values:

* `STORAGE_AWS_IAM_USER_ARN`, which looks like `arn:aws:iam::123456789012:user/abc1-s-v2st1234`
* `STORAGE_AWS_EXTERNAL_ID`, which looks like `ACME_SFCRole=1_abcDEF123...`

In the AWS console (or your preferred IAM tooling), edit the role's **trust policy** to match the block below, substituting both values:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<STORAGE_AWS_IAM_USER_ARN>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<STORAGE_AWS_EXTERNAL_ID>"
        }
      }
    }
  ]
}
```

Save the trust policy. Snowflake validates the connection the first time it reads from S3, at the end of Step 3.

## Step 3: Create the schema, file format, and external stage

These three objects are the plumbing that turns the storage integration into something you can query. Replace the two placeholders with the same bucket name and Org ID you used in Step 1.

```sql
USE SCHEMA GRAX_DATA.PUBLIC;

CREATE OR REPLACE FILE FORMAT GRAX_PARQUET
  TYPE = PARQUET;

CREATE OR REPLACE STAGE GRAX_STAGE
  STORAGE_INTEGRATION = GRAX_S3_INT
  URL = 's3://<YOUR_S3_BUCKET_NAME>/parquet/v2/org=<YOUR_SALESFORCE_ORG_ID>/'
  FILE_FORMAT = GRAX_PARQUET;
```

Verify the connection by listing what the stage sees:

```sql
LIST @GRAX_STAGE;
```

You should get back Parquet files organized under `object=<SalesforceObjectName>/` prefixes (for example `object=Account/`, `object=Contact/`). Do not continue until `LIST @GRAX_STAGE` returns files; every later step depends on it.

## Step 4: Load your first object

This step creates a Snowflake table for the `Account` object and loads data into it. The `INFER_SCHEMA` function reads a sample of the Parquet files and generates the correct column definitions automatically, so you don't have to hand-write the schema. `MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE` tells `COPY INTO` to map Parquet fields to Snowflake columns by name rather than position, which protects you from column-order changes in future GRAX exports.

```sql
CREATE TABLE IF NOT EXISTS GRAX_DATA.PUBLIC.ACCOUNT
USING TEMPLATE (
  SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION => '@GRAX_STAGE/object=Account/',
      FILE_FORMAT => 'GRAX_PARQUET'
    )
  )
);

ALTER TABLE GRAX_DATA.PUBLIC.ACCOUNT SET ENABLE_SCHEMA_EVOLUTION = TRUE;

COPY INTO GRAX_DATA.PUBLIC.ACCOUNT
FROM @GRAX_STAGE/object=Account/
FILE_FORMAT = (FORMAT_NAME = GRAX_PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
```

The block is safe to re-run. `CREATE TABLE IF NOT EXISTS` leaves an existing table untouched, `ALTER TABLE ... SET` is idempotent if already applied, and `COPY INTO` only loads files it hasn't loaded before. Schema evolution is enabled so new Salesforce fields appear as new columns on the next copy instead of breaking it.

To confirm it worked:

```sql
SELECT COUNT(*) FROM GRAX_DATA.PUBLIC.ACCOUNT;
SELECT * FROM GRAX_DATA.PUBLIC.ACCOUNT LIMIT 5;
```

To load another object, repeat the two statements above and change every occurrence of `Account` (case-sensitive, since it is part of the S3 path) and `ACCOUNT` (the Snowflake table name) to the Salesforce API name of the object you want, for example `Contact`, `Opportunity`, or `Lead`. The Salesforce API name is what GRAX uses as the folder name under `object=.../` in S3.

## Step 5: Automate ongoing refresh

Step 5 replaces the per-object `COPY INTO` from Step 4 with a stored procedure that loops over every object you care about, plus a scheduled task that runs the procedure on a cadence. The procedure uses `CREATE TABLE IF NOT EXISTS`, so it creates missing tables on first run and leaves existing ones alone, and it sets `ENABLE_SCHEMA_EVOLUTION = TRUE`, so new Salesforce fields appear as new columns without breaking the copy.

Edit the `ARRAY_CONSTRUCT(...)` list below to include every Salesforce object you want to keep in Snowflake.

```sql
CREATE OR REPLACE PROCEDURE GRAX_DATA.PUBLIC.INGEST()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
  stage   VARCHAR DEFAULT '@GRAX_DATA.PUBLIC.GRAX_STAGE';
  fmt     VARCHAR DEFAULT 'GRAX_DATA.PUBLIC.GRAX_PARQUET';
  tbl     VARCHAR;
  objects CURSOR FOR
    SELECT VALUE::VARCHAR AS name
    FROM TABLE(FLATTEN(ARRAY_CONSTRUCT(
      'Account',
      'Contact',
      'Opportunity'
      -- add more Salesforce object names here
    )));
BEGIN
  FOR obj IN objects DO
    tbl := 'GRAX_DATA.PUBLIC.' || obj.name;
    EXECUTE IMMEDIATE 'CREATE TABLE IF NOT EXISTS ' || tbl ||
      ' USING TEMPLATE (SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) FROM TABLE(INFER_SCHEMA(' ||
      'LOCATION => ''' || stage || '/object=' || obj.name || '/'',' ||
      ' FILE_FORMAT => ''' || fmt || ''')))';
    EXECUTE IMMEDIATE 'ALTER TABLE ' || tbl || ' SET ENABLE_SCHEMA_EVOLUTION = TRUE';
    EXECUTE IMMEDIATE 'COPY INTO ' || tbl ||
      ' FROM ' || stage || '/object=' || obj.name || '/' ||
      ' FILE_FORMAT = (FORMAT_NAME = ''' || fmt || ''')' ||
      ' MATCH_BY_COLUMN_NAME = ''CASE_INSENSITIVE''';
  END FOR;
  RETURN 'Success';
END;
$$;
```

Create the task that calls the procedure on a schedule. The example runs every 60 minutes; adjust `SCHEDULE` if you want a different cadence.

```sql
CREATE OR REPLACE TASK GRAX_DATA.PUBLIC.INGEST_TASK
  WAREHOUSE = GRAX_WH
  SCHEDULE = '60 MINUTES'
AS
  CALL GRAX_DATA.PUBLIC.INGEST();

ALTER TASK GRAX_DATA.PUBLIC.INGEST_TASK RESUME;
```

To trigger the task immediately instead of waiting for the first scheduled run:

```sql
EXECUTE TASK GRAX_DATA.PUBLIC.INGEST_TASK;
```

## Verify and query

Check that the task is scheduled and see recent runs:

```sql
SHOW TASKS IN SCHEMA GRAX_DATA.PUBLIC;

SELECT *
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
  SCHEDULED_TIME_RANGE_START => DATEADD('hour', -24, CURRENT_TIMESTAMP())
))
WHERE SCHEMA_NAME = 'PUBLIC'
ORDER BY SCHEDULED_TIME DESC;
```

Query your Salesforce data:

```sql
SELECT * FROM GRAX_DATA.PUBLIC.ACCOUNT LIMIT 10;
SELECT * FROM GRAX_DATA.PUBLIC.CONTACT LIMIT 10;
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.grax.com/reuse-data/data-lake/snowflake-tables-from-data-lake.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
