Coinbase Optimizes Complex Snowflake Queries at Scale with Single Origin

Coinbase Optimizes Complex Snowflake Queries at Scale with Single Origin

Coinbase used Single Origin to transform their Snowflake query deduplication/rewrite process from manual efforts to efficient, automated intelligence. 

At a glance

Coinbase is a secure platform for buying, selling, and storing cryptocurrencies. 



Challenge
Managing complex query deduplication optimization at scale with limited engineering resources.


Results
Automated query analysis and optimization for redundant queries. Significantly improved query efficiency with a projected 10-15% reduction in annual Snowflake costs.


As one of the world's leading cryptocurrency exchanges, Coinbase operates a sophisticated financial data lake powered by Snowflake as their data warehouse. This infrastructure supports critical functions like data analysis, metrics reporting, compliance investigation, and machine learning feature generation. With tens of thousands of tables, tasks, graphs, and dashboards, and hundreds of thousands of queries executed daily, Coinbase faced significant challenges with high-cost big data redundancy. 

"Before Single Origin, we did a lot of query optimization for Snowflake, but we needed a long-term solution for difficult scenarios," explains Xun Kang, Staff Engineer at Coinbase. "We had accumulated so many complex, duplicate queries. It became increasingly time-consuming to maintain and optimize them."

The challenge

Coinbase faced several critical challenges in query parsing and deduplication:

  1. Overwhelming query volume and complexity: With hundreds of thousands of daily queries and tens of thousands of tables, the scale of operations demanded a systematic approach to optimization.
  2. Cost inefficiency: Multiple versions of similar tables and queries (and Common Table Expressions (CTEs)) led to unnecessary computational resource consumption and elevated costs.
  3. Time-consuming data governance: Similar tables and queries created confusion about the source of truth, and users spent hours trying to identify the correct data assets.

While Coinbase had system-level optimizations like warehouse size tuning and basic query monitoring, they needed an effective way to detect highly-similar logic and deduplicate queries. Manual efforts to optimize the long-tail expensive queries were too time-consuming and inefficient at scale. 

Why Single Origin

After evaluating different solutions, including open-source offerings, Coinbase chose Single Origin for its specialized expertise in query parsing and deduplication. The decision was driven by three key factors:

  1. Unmatched accuracy and coverage: Single Origin achieved a 95% coverage rate for Coinbase's queries, with 100% parsing and optimization accuracy within that coverage.
  2. Complex query support at scale: The platform successfully handles Coinbase's most challenging scenarios, like queries over 1,000 lines and with 10-50 embedded CTEs.
  3. Actionable recommendations: Instead of just identifying issues, Single Origin provides specific recommendations for query rewrite and consolidation.

"When we looked at other third-party providers for tooling, we didn’t find one that satisfied our metrics in all these areas," says Kang. "Single Origin differentiated itself by offering all of them at scale and meeting our data security requirements."

Results

Single Origin worked closely with Coinbase to customize their solutions for large-scale and complex queries, rapidly iterating based on specific data needs. Through algorithm refinements and customization for Coinbase's Snowflake implementation, parsing rates improved from 30-40% to 95%. Single Origin determined query parsing and optimization accuracy by successful execution without errors and equivalent algebraic relations between optimized and original queries.

Key outcomes include:

  • Cost optimization: For Coinbase’s top 100-consumed tables, Single Origin found that 50% of the corresponding queries can be optimized to reduce compute cost. For one of Coinbase’s most expensive tables, Single Origin identified 95% savings for detected similar queries. 
  • Enhanced decision making: Single Origin table and column usage insights help inform Coinbase’s canonical table designs and table reduction initiatives to help users quickly find the right table for their work.

Developer productivity: Integration with Coinbase's data operations portal gives engineers and analysts immediate insights into query tuning opportunities. These insights can drive down millions of dollars of compute costs.

Coinbase hopes to see even more efficiency gains and cost savings as they integrate Single Origin’s query auto-rewrite feature.  

Advice for developers

For organizations considering Single Origin, Kang emphasizes the importance of understanding your query patterns. "It's really helpful to have a good understanding of how your queries look, what kind of aggregations or expensive operations you care about, and which schemas are most important. This knowledge helps you customize the solution to your specific needs and accelerates the optimization process."

A collaborative process 

With Single Origin, Coinbase has transformed their approach to redundant query optimization from a manual, resource-intensive process to an automated system. The Single Origin platform's accuracy in parsing complex queries, combined with its actionable recommendations, has enabled Coinbase to tackle data redundancy at scale while reducing costs and improving efficiency.

This collaboration demonstrates how specialized query intelligence can drive immediate cost savings and long-term architectural improvements in large-scale data operations.

"The Single Origin team is very collaborative," concludes Kang. "They listened to our feedback and worked hard to make the process very successful. They truly listen to us."

If you’re interested in learning more about Single Origin to quickly and accurately optimize your redundant queries and tables, email x@singleorigin.tech or check out singleorigin.tech.

Authors: Single Origin team and Coinbase engineers Xun Kang, and Eric Sun

Engineering

Engineering

Engineering @ Single Origin