Azure Data Lake

GRAX Data Lake automatically organizes your CRM data as Parquet files in Azure Blob Storage for arbitrary on-demand analytics queries.

This guide demonstrates using Microsoft Synapse to query data it by SQL queries n a serverless fashion.

Configure Azure

To start you need to log into the Azure console and find your resource group, storage account and blob storage used for GRAX.

The modifications that needed to be made to the blob storage:

(REQUIRED) The user to query the data needs the RoleAssignment “Storage Blob Data Contributor”. If you are unsure how to do this, please consult Azure documentation: Assign an Azure role for access to blob data
(OPTIONAL) The IP of the computer being used needed to be added to the public access networking so the bucket could be viewed. This is not required for Synapse to view the data though.

Once this is done, log into a Synapse workspace, and attempt to run a query using the openrowset. If you don't have a Synapse workspace, you can follow these directions to create one in your resource group: QuickStart: Create a Synapse workspace.

In the example below, make sure the Account object has been turned on in Data Lake. This object could be substituted for any other object.

select *
from openrowset(
  bulk 'https://{{STORAGE_ACCOUNT}}.blob.core.windows.net/{{STORAGE_CONTAINER}}/parquet/v2/org={{SALESFORCE_ORG_ID}}/object=Account/*/*.parquet',
  format = 'parquet') as rows

BLOB_STORAGE_NAME - The name of the BLOB storage container that holds GRAX data.
YOUR_ORG_ID - The Salesforce Id of your org

Creating views

To accomplish this we need to create a database to house the views. By default Synapse will use the “master” database. Create a database for GRAX by following these steps:

Click on the data table
Click the + to add a new resource
Click SQL Database
Select Serverless and enter a name for the database

A view can now be created that abstracts away the openrowset:

CREATE OR ALTER VIEW OBJECT_ACCOUNT AS
select *
from openrowset(
  bulk 'https://{{BLOB_STORAGE_NAME}}.blob.core.windows.net/grax/parquet/v2/org={{YOUR_ORG_ID}}/object=Account/*/*.parquet',
  format = 'parquet') as rows;

BLOB_STORAGE_NAME - The name of the BLOB storage container that holds GRAX data.
YOUR_ORG_ID - The Salesforce Id of your org

The above instructions will get you started with Synapse and GRAX Data Lake. As your dataset grows in size, there will be other considerations needed, such as indexing which may necessitate a different approach or processing this data into an indexed destination table.

Last updated 3 months ago

Was this helpful?