Strategies for Creating Query Collections

Strategies for Creating Query Collections
Strategies for Creating Collections in Single Origin

Intro

Creating a collection in Single Origin is the first step to taming your data asset sprawl. In this post, we will discuss some tips for curating your collections. These strategies will help you create meaningful collections that reveal valuable insights when audited in Single Origin.

As a reminder, query collections are simply sets of queries whose semantics you want to audit or import into Single Origin. You have various options for sourcing the queries for a collection, including your existing Query History Logs or a curated CSV. Read on for tips on filtering your logs or compiling your CSV!

Strategy #1 - Pipeline queries

Our first strategy is to create a collection of your pipeline queries. Depending on your setup, there are a few ways to import your pipelines as a collection:

  • If a service account runs your pipelines, you can use our Query History option with a filter for the service account’s email address.
  • If various users run your pipelines, you can use our Query History option with a filter to include queries that create or insert into a non-temporary target table.

In both cases above, you can also input the desired date range, like the last day, week, or month. From here, you can audit your pipelines and see ways to streamline your operations and reduce your ongoing storage requirements.

Create a Collection of Pipelines

Strategy #2 - Team-based collections

Another useful strategy is to create a collection of a specific team’s queries. You can use our Query History option with a filter to the email addresses of all your team members. In just a few minutes, you can turn the work a team has been doing into a catalog of shareable, reusable metrics and features.

Strategy #3 - Interactive Query Collection

In this strategy, you create a collection of queries that are being run interactively. These types of queries are notoriously difficult to track because their logic is not consolidated, reviewed, and persisted in repository tools like GitHub or dashboard tools like Looker and QuickSight.

Input emails and use the advanced filter “Interactive” in the collection import form to analyze interactive queries data operators run.  If you are worried about interactive queries including a lot of “noise” (queries that generate a single row for debugging or queries that accidentally output thousands of rows), then you can remove these using the number of rows filter. You can then audit these interactive queries to see commonly referenced tables and metrics - you may learn that people are constantly looking for a key metric not available in your BI tools!

Advanced Filters for Creating Collections

Strategy #4 - Dashboard Collection

If you have a set of “hot” dashboards, you can collect the dashboard queries into a local CSV and upload them as a collection. Auditing them will reveal opportunities to optimize and pre-calculate data across multiple dashboards. For example, five dashboards join the same large tables under the hood. Consolidating these resource-intensive queries by deprecating four dashboards can reduce your compute costs.

To see the required format for the CSV, check out our documentation page here.

Strategy #5 - Table Collection

If there is a single core table that you are interested in, you can use our Query History option with a filter for queries referencing this table. You can see the types of queries people are running against this table, the metrics they are looking for, and any unexpected joins. If you have just a few canonical tables, creating a collection for each makes it easier to reason about the next steps.

Summary

These strategies give you a sense of how to use Single Origin to start organizing, optimizing, and sharing data assets across your company. We are always happy to set up a demo, so reach out today!

Engineering

Engineering

Engineering @ Single Origin