# Blob Storage

For storage of record data, metadata, and file binaries, GRAX uses industry standard blob storage technologies. This is a reliable, durable, scalable, and cost effective way to retain your Salesforce data.

{% hint style="danger" %}
**Never Directly Modify a GRAX Storage Bucket**

GRAX stores record data in a proprietary format that is neither human readable nor organized in a straightforward way. Attempts to rename, remove, or modify blobs within the storage bucket **will** **cause data loss** and GRAX availability issues. GRAX isn't responsible for partial or complete loss of your backup dataset in the event of tampering by users.

For targeted record deletion (like GDPR compliance), see our [related documentation](/protect-data/purge.md).
{% endhint %}

## How It Works

* Metadata, files, records, and cache data are written to the bucket as part of Backup.
* The dataset is compacted and compressed as more data is added.
* The app maintains proprietary indexes on top of the dataset for performance.
* The GRAX API, Data Lake, Search, and Restore read from the dataset on demand.

{% hint style="info" %}
The Postgres database required as part of the GRAX architecture is for application metadata and is not used for storing backed up data.
{% endhint %}

## How Much it Costs

Backed up record data will consume less blob storage than reported by Salesforce. Combined with the low price of blob storage services, storage costs for GRAX almost always total a small fraction of what Salesforce would charge to store the data.

Below are some breakdowns of real-world costs of data-at-rest in AWS S3. **These don't include storage consumed by the binary component of Attachments, ContentDocuments, or EventLogFiles.** Storage usage for those binary components can currently be assumed to be 1:1 with Salesforce and estimated with published storage rates.

The extreme outlier described here is an international support organization backing up over 40,000,000,000 record versions in the last five years.

| Description      | Storage Consumed | Monthly Cost (standard S3 data-at-rest rates) |
| ---------------- | ---------------- | --------------------------------------------- |
| Average Case     | **216 GB**       | $4.97                                         |
| High End         | **5 TB**         | $117                                          |
| Extreme Outliers | **25 TB**        | $589                                          |

{% hint style="danger" %}
**Data Transfer Fees**

The calculations here only consider data-at-rest rates. All cloud providers bill for data-transfer or data-access operations. Those are considered variable costs based on usage of GRAX features and overall processing load. Our published [AWS estimate](/infrastructure/other/operating-costs.md) contains estimates for "normal" GET, LIST, PUT, and DELETE requests as well as overall data-transfer expectations.
{% endhint %}

## Technologies

{% tabs %}
{% tab title="AWS S3" %}
GRAX supports S3's standard storage class. Intelligent Tiering, Glacier, Outposts, Versioning, and Replication are not supported.

#### Authentication

GRAX supports the following authentication patterns for AWS S3:

* Static Access Keys
* Instance Roles
* Assume Role via Static Access Keys
* Assume Role via Instance Roles

#### Authorization

Regardless of authentication pattern, the following permissions must be granted at the bucket scope (`arn:aws:s3:::example`):

* *s3:ListBucket*

Additionally, the following permissions must be granted for all objects within the bucket (`arn:aws:s3:::example/*`):

* *s3:GetObject*
* *s3:PutObject*
* *s3:DeleteObject*

If using KMS encryption, the following permissions must be granted on the KMS key scope:

* kms:DescribeKey
* *kms:Decrypt*
* *kms:Encrypt*
* *kms:GenerateDataKey*
* *kms:ReEncrypt\**
  {% endtab %}

{% tab title="Azure Blob / Data Lake (gen2)" %}
GRAX supports standard Azure Blob Storage and Azure Data Lake (gen2) Storage accounts, including hierarchical namespaces. Premium storage accounts and gen1 Data Lake accounts are not supported. All objects must be stored in the "Hot" tier.

#### Authentication

GRAX supports the following authentication patterns for Azure Storage Accounts:

* Storage Account Access Keys
* System or User-Assigned Managed Identities
* Multi-Tenant App Registration (GRAX-hosted applications only)

**Multi-Tenant App Registration**

For GRAX-hosted applications, the customer admin consents to a GRAX-owned multi-tenant app registration, creating a service principal in their tenant that they then grant RBAC access on the Storage Account. Each GRAX-hosted application has its own app registration, so consent for one doesn't extend to any other.

No tenant secret is shared with GRAX. The app registration is bound by federated identity credential to the GRAX workload's managed identity, making it non-exportable and usable only from the issuing GRAX infrastructure. Customer consent is a one-time step.

{% hint style="info" %}
To use this authentication method, contact [GRAX Support](https://documentation.grax.com/support/get-support) with your application details so it can be enabled for your environment.
{% endhint %}

#### Authorization

If using an identity-based authentication pattern (managed identity or multi-tenant app registration), the following permissions must be granted at the container scope:

* *Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read*
* *Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write*
* *Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete*
* *Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action*
* *Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action*

The "Storage Blob Data Contributor" role is sufficient, but includes permission to delete the container. Custom roles are recommended for granting minimum necessary permissions.

{% hint style="danger" %}
**Double Check the Scope**

Make sure you assign storage permissions at container level, not the storage account level. These permissions at the storage account level allow the identity to interact with all data in all containers.
{% endhint %}
{% endtab %}

{% tab title="GCP Cloud Storage" %}
GRAX supports standard tier GCP Cloud Storage accounts. `Nearline`, `Coldline`, and `Archive` tiers are unsupported.

#### Authentication

GRAX supports the following authentication patterns for GCP Cloud Storage:

* Service Account Keys

#### Authorization

The following permissions must be granted on the bucket scope:

* *storage.objects.create*
* *storage.objects.delete*
* *storage.objects.get*
* *storage.objects.list*
* *storage.objects.update*

The built-in "Storage Object User" role grants these permissions safely.
{% endtab %}
{% endtabs %}

## Storage Replicas

GRAX can write data to multiple storage buckets simultaneously, providing additional durability and a foundation for cross-region redundancy. Each GRAX application has one *active* storage bucket (read and written) and may have one or more *passive* replicas (written only).

How replication works:

* Every write to the active bucket is split to all passive replicas at the same time.
* Reads are served only from the active bucket.
* When a new replica is added, a background sync backfills existing data automatically. Progress is shown in the storage configuration dialog.

Each replica is configured independently and has the same connection options available as the active bucket. See [Technologies](#technologies) above.

{% hint style="warning" icon="sack-dollar" %}
Each passive replica multiplies your data-at-rest costs, since every blob is written to every configured bucket. Replicas that span regions or cloud providers also incur the cross-region or egress data-transfer fees billed by the cloud provider.

Depending on the configuration, a replica may cost significantly more to operate than the active bucket and meaningfully change the total cost of ownership for GRAX. Use replicas sparingly and only where the durability or redundancy benefit justifies the additional spend.
{% endhint %}

## How It's Connected

For help connecting your GRAX application to a blob store, see our documentation for [Connecting Storage](https://documentation.grax.com/other/settings/connecting-storage).

## Frequently Asked Questions

<details>

<summary><strong>What are the data prefixes/folders written by GRAX?</strong></summary>

With the exception of the parquet folder, these storage locations contain data stored in a proprietary format and are designed to be read/written solely by the connected GRAX Application.

<table><thead><tr><th width="112">Folder</th><th>Contents</th></tr></thead><tbody><tr><td><code>grax</code></td><td>Metadata and binary components of Salesforce Files</td></tr><tr><td><code>table</code></td><td>Primary location for object and record backups</td></tr><tr><td><code>internal</code></td><td>Data generated by the GRAX application for its own use</td></tr><tr><td><code>parquet</code></td><td>Parquet files generated by GRAX's <a href="/spaces/wHKnqFEg4DROpG3KCq3D/pages/ZGht5k0UgsGqeUoiipn3">Data Lake</a> feature</td></tr></tbody></table>

</details>

<details>

<summary><strong>Can I use Data Lake with GRAX-Hosted storage?</strong></summary>

No. For security reasons, customers must use their own storage buckets for Data Lake.

</details>

<details>

<summary><strong>How can I clean deleted data from a bucket that had versioning enabled?</strong></summary>

First, ensure that Versioning is now disabled or suspended indefinitely in the bucket. Next, use provider-specific tools to automatically remove the "non-current versions" for deleted objects from the bucket. AWS S3 supports [Lifecycle Rules](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) that can be used to automatically remove old versions of objects, and clean up delete markers left behind. A rule needs to be created to do the following:

1. **Non-current Version Expiration** - This removes the non-current versions of objects after a specified number of days. The rule should be configured to remove non-current versions after 1 day.
2. **Remove Expired Object Delete Markers** - This removes the delete markers left behind after the non-current versions are removed.

An example of a Lifecycle Rule that will remove non-current versions and delete markers after 1 day is shown below:

{% hint style="danger" %}
These examples are not filtered. They will apply to the entire bucket. If you share your GRAX bucket with other applications or data, these rules may delete non-current versions of storage objects that are not related to GRAX. *Proceed with caution.*
{% endhint %}

**XML**

```xml
<LifecycleConfiguration>
    <Rule>
        <Expiration>
           <ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
        </Expiration>
        <NoncurrentVersionExpiration>
            <NoncurrentDays>1</NoncurrentDays>
        </NoncurrentVersionExpiration>
    </Rule>
</LifecycleConfiguration>
```

**JSON**

```json
{
    "Rules": [
        {
            "Expiration": {
                "ExpiredObjectDeleteMarker": true
            },
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 1
            }
        }
    ]
}
```

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.grax.com/infrastructure/other/blob-storage.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
