Data Lake v1 Retirement

What is happening?

GRAX is retiring the Data Lake v1 functionality in favor of the newer, faster, and safer Data Lake v2.

Effective immediately, all new GRAX applications will only have Data Lake v2 available by default
As of October 1st, 2025, GRAX users will no longer be able to enable new objects on Data Lake v1, and Data Lake v1 will be removed from GRAX applications where there are no enabled Data Lake v1 objects
As of April 1st, 2026, Data Lake v1 will be removed entirely from the GRAX product

Why is this happening?

GRAX has invested substantial effort into optimizing Data Lake in many ways:

No missed writes
Shorter time from backup to Data Lake write
Faster backfills
Higher object concurrency
Lower resource utilization
Easier integration with downstream tools (Athena, Glue, DuckDB, etc.)
Improved field formats/types

Many of these improvements necessitated structural changes to the Data Lake product, as well as changes to the final Parquet structure and content. Pipelines ingesting and querying Data Lake v1 Parquet will require modification to properly ingest Data Lake v2.

Data Lake v1 currently provides a user and developer experience that is less reliable, slower, and higher cost than Data Lake v2. To ensure that all users get the most value out of their GRAX application, we've decided to establish a deadline for moving over.

How does this impact me?

If you or your business depends on Data Lake v1 for analytics or other forms of downstream usage, we recommend restructuring your project to use Data Lake v2 as soon as possible. You will not be able to add new objects after October 1st, and your pipeline will halt entirely after April 1st if still using Data Lake v1.

Timing

GRAX support for Data Lake v1 ends on April 1st, 2026.

Frequently Asked Questions

What are the key differences between Data Lake v1 and v2?

The key differences between Data Lake v1 and v2 are:

Reduced delay from data being added in GRAX to writing to Data Lake
Increased throughput when backfilling newly enabled objects and handling large volumes of changes
Improved writing intelligence to reliably keep objects up-to-date
System improvements to remove possibilities of missed writes observed in v1
Different path structure (v2 prefix, day=YYYY-MM-DD/hr=HH to batch=444444444)
Increased max file size (10 MB for v1 to 100 MB for v2)
Addition of source__modified, grax__idseq, and grax__restoredfrom fields
Addition of typed fields for non-string values
Removal of grax__added field

How do I move from Data Lake v1 to Data Lake v2?

We've created an end-to-end guide, available here, that will walk you through the entire migration process.

What happens to my old Parquet files if I previously used Data Lake v1?

Data previously written with Data Lake v1 is still usable and accessible during the switch to v2 and afterwards. Data Lake v2 will duplicate the data written with v1; the v1 files are safe to delete once they are no longer being used, and v2 has finished the initial backfilling process.

When deleting, be sure to only delete files under parquet/org=X/... in your bucket. Do not delete files in other parts of the bucket.

How do I get more information?

If you have questions or need more information, please open a support ticket.

Last updated 1 month ago

Was this helpful?

hashtagWhat is happening?

hashtagWhy is this happening?

hashtagHow does this impact me?

hashtagTiming

hashtagFrequently Asked Questions

hashtagWhat are the key differences between Data Lake v1 and v2?

hashtagHow do I move from Data Lake v1 to Data Lake v2?

hashtagWhat happens to my old Parquet files if I previously used Data Lake v1?

hashtagHow do I get more information?