Data Lake v1 Retirement
What's happening?
GRAX is retiring the Data Lake v1 functionality in favor of the newer, faster, and safer Data Lake v2.
Effective immediately, all new GRAX applications will only have Data Lake v2 available by default
As of October 1st, 2025, GRAX users will no longer be able to enable new objects on Data Lake v1, and Data Lake v1 will be removed from GRAX applications where there are no enabled Data Lake v1 objects
As of April 1st, 2026, Data Lake v1 will be removed entirely from the GRAX product
Why is this happening?
GRAX has invested substantial effort into optimizing Data Lake in many ways:
No missed writes
Shorter time from backup to Data Lake write
Faster backfills
Higher object concurrency
Lower resource utilization
Easier integration with downstream tools (Athena, Glue, DuckDB, etc.)
Improved field formats/types
Many of these improvements necessitated structural changes to the Data Lake product, as well as changes to the final Parquet structure and content. Pipelines ingesting and querying Data Lake v1 Parquet will require modification to properly ingest Data Lake v2.
Data Lake v1 currently provides a user and developer experience that is less reliable, slower, and higher cost than Data Lake v2. To ensure that all users get the most value out of their GRAX application, we've decided to establish a deadline for moving over.
How does this impact me?
If you or your business depends on Data Lake v1 for analytics or other forms of downstream usage, we recommend restructuring your project to use Data Lake v2 as soon as possible. You will not be able to add new objects after October 1st, and your pipeline will halt entirely after April 1st if still using Data Lake v1.
Frequently Asked Questions
What are the key differences between Data Lake v1 and v2?
The key differences between Data Lake v1 and v2 are:
Reduced delay from data being added in GRAX to writing to Data Lake
Increased throughput when backfilling newly enabled objects and handling large volumes of changes
Improved writing intelligence to reliably keep objects up-to-date
System improvements to remove possibilities of missed writes observed in v1
Different path structure (
v2
prefix,day=YYYY-MM-DD/hr=HH
tobatch=444444444
)Increased max file size (10 MB for v1 to 100 MB for v2)
Addition of
source__modified
,grax__idseq
, andgrax__restoredfrom
fieldsAddition of typed fields for non-string values
Removal of
grax__added
field
How do I move from Data Lake v1 to Data Lake v2?
To enable Data Lake v2, navigate to the Data Lake
section of your GRAX Application and click on Data Lake v2
. You can then begin configuring the objects you want to write to Data Lake v2.
Once the initial backfill is complete and the object shows a Current
status, you can safely disable it in Data Lake v1. Be sure to enable any processing rules and triggers for Data Lake v2 data before disabling Data Lake v1 to prevent any data loss.
What happens to my old Parquet files if I previously used Data Lake v1?
Data previously written with Data Lake v1 is still usable and accessible during the switch to v2 and afterwards. Data Lake v2 will duplicate the data written with v1; the v1 files are safe to delete once they are no longer being used, and v2 has finished the initial backfilling process.
When deleting, be sure to only delete files under parquet/org=X/...
in your bucket. Do not delete files in other parts of the bucket.
How do I get more information?
If you have questions or need more information, please open a support ticket.
Last updated
Was this helpful?