Snowflake Migration for TrueAccord

Client Introduction
Problem/Client Challenges
TrueAccord’s existing data platform was plagued with several issues:
-
Frequent Spark Upload Failures: Spark upload jobs frequently failed, requiring re-runs and manual intervention.
-
Poor Scalability: Google Big Query struggled to scale efficiently with TrueAccord’s growing data demands.
-
Rising Costs: Operating in dual cloud environments (Google Cloud and AWS) increased infrastructure and data transfer costs.
-
Inefficient Data Warehouse Design: The current approach relied on a full rebuild of data during each cycle, consuming more resources than necessary.
Solution
TrueAccord migrated its data infrastructure to Snowflake on AWS, implementing a redesigned data warehouse with a focus on incremental data processing.
Motivation for Migration
- Eliminate dual cloud costs by consolidating on AWS.
- Move away from Spark processing to more efficient Snowflake SQL pipelines.
- Improve reliability and scalability of data processing.
New Data Warehouse Design
The migration introduced a shift from full data rebuilds to an incremental loading process, designed for efficiency and cost savings:
- Convert and Transfer Files:
- DBDump files on S3 are converted into JSON format.
- Converted files are copied into the Snowflake S3 bucket.
- A dedicated job,
snowpipe_incremental
, manages these operations.
- Load Data into Snowflake:
- JSON files are ingested into raw tables on Snowflake using Snowpipe.
- Create Latest Tables:
- SQL transformations extract the latest versions of tables.
- Snowflake Streams capture changed data, enabling incremental updates.
- Build Derived Tables:
- Additional SQL transformations generate derived tables.
- These replace the earlier Spark + SQL pipeline outputs, fully within Snowflake.
Results
The migration to Snowflake delivered multiple benefits to TrueAccord:
-
Cost Savings: Eliminated dual cloud expenses and reduced GCP data transfer/storage costs.
-
Efficiency: Incremental loading significantly reduced processing time compared to full rebuilds.
-
Scalability: Snowflake’s architecture provided the elasticity needed to scale with TrueAccord’s growing data volumes.
-
Simplicity: Moving away from Spark simplified the data pipeline, reducing maintenance overhead.
-
Future Flexibility: The new architecture positions TrueAccord for advanced analytics and future data-driven initiatives.
Conclusion
By migrating from Google Big Query to Snowflake on AWS, TrueAccord transformed its data infrastructure into a more scalable, cost-efficient, and reliable system. The adoption of incremental data loading, Snowflake Streams, and SQL-based transformations streamlined operations, minimized costs, and removed dependencies on Spark and dual cloud setups. This strategic move equips TrueAccord with a modernized data warehouse capable of supporting its long-term growth and innovation goals.