a

Snowflake Migration for TrueAccord

AllCode Solutions

 

snowflake

Client Introduction

TrueAccord, a financial technology company specializing in debt collection automation, faced growing challenges with its existing data infrastructure. Operating on Google Big Query, supplemented by Spark jobs and dual cloud support, the setup became increasingly inefficient and costly. To address these challenges, TrueAccord initiated a strategic migration to Snowflake on AWS, aiming to modernize its data warehouse, reduce costs, and improve scalability.

Problem/Client Challenges

TrueAccord’s existing data platform was plagued with several issues:

  • Frequent Spark Upload Failures: Spark upload jobs frequently failed, requiring re-runs and manual intervention.

  • Poor Scalability: Google Big Query struggled to scale efficiently with TrueAccord’s growing data demands.

  • Rising Costs: Operating in dual cloud environments (Google Cloud and AWS) increased infrastructure and data transfer costs.

  • Inefficient Data Warehouse Design: The current approach relied on a full rebuild of data during each cycle, consuming more resources than necessary.

Solution

TrueAccord migrated its data infrastructure to Snowflake on AWS, implementing a redesigned data warehouse with a focus on incremental data processing.

Motivation for Migration

  • Eliminate dual cloud costs by consolidating on AWS.
  • Move away from Spark processing to more efficient Snowflake SQL pipelines.
  • Improve reliability and scalability of data processing.

New Data Warehouse Design

The migration introduced a shift from full data rebuilds to an incremental loading process, designed for efficiency and cost savings:

  • Convert and Transfer Files:
    • DBDump files on S3 are converted into JSON format.
    • Converted files are copied into the Snowflake S3 bucket.
    • A dedicated job, snowpipe_incremental, manages these operations.
  • Load Data into Snowflake:
    • JSON files are ingested into raw tables on Snowflake using Snowpipe.
  • Create Latest Tables:
    • SQL transformations extract the latest versions of tables.
    • Snowflake Streams capture changed data, enabling incremental updates.
  • Build Derived Tables:
    • Additional SQL transformations generate derived tables.
    • These replace the earlier Spark + SQL pipeline outputs, fully within Snowflake.

Results

The migration to Snowflake delivered multiple benefits to TrueAccord:

  • Cost Savings: Eliminated dual cloud expenses and reduced GCP data transfer/storage costs.

  • Efficiency: Incremental loading significantly reduced processing time compared to full rebuilds.

  • Scalability: Snowflake’s architecture provided the elasticity needed to scale with TrueAccord’s growing data volumes.

  • Simplicity: Moving away from Spark simplified the data pipeline, reducing maintenance overhead.

  • Future Flexibility: The new architecture positions TrueAccord for advanced analytics and future data-driven initiatives.

Conclusion

By migrating from Google Big Query to Snowflake on AWS, TrueAccord transformed its data infrastructure into a more scalable, cost-efficient, and reliable system. The adoption of incremental data loading, Snowflake Streams, and SQL-based transformations streamlined operations, minimized costs, and removed dependencies on Spark and dual cloud setups. This strategic move equips TrueAccord with a modernized data warehouse capable of supporting its long-term growth and innovation goals.