Managing your ETL platform’s costs effectively is crucial for getting the best return on investment from your data pipelines. As data volumes grow, so do the expenses related to data syncing, transformation, and storage. In this article, we’ll delve deep into several strategies for cost optimization, complete with a real-life example that resulted in a 95% savings on one of our client ETL platform bills.
Table of contents
Understand your ETL platform’s billing model
Types of billing models:
- Per row: Some platforms bill you based on the number of new or updated rows that are synced.
- Total rows: Other platforms may charge based on the total number of rows stored or processed, regardless of whether they are new or updated.
- Fixed amount: Some platforms offer tiered pricing, where you pay a fixed amount based on the range of rows processed or data storage used.
- Additional features: Add-ons like advanced transformations, real-time syncing, or premium connectors can also affect the overall cost.
- API calls: If your ETL jobs require external API calls, these could also contribute to the total cost.
Example:
How to:
Now that you have a solid understanding of your ETL platform’s billing model, you’re well-equipped to dive into the various strategies for reducing your Fivetran costs. Keep your billing model in mind as you read through the following tips, as some strategies will be more impactful depending on how you are billed.
So, let’s move on to the actionable steps you can take to optimize your ETL platform’s expenses.
Selective syncing
Rather than syncing all tables and columns, specify which ones are crucial for your analytics. This reduces the volume of data transferred.
Let’s take the Google Ads connector from one of our clients Fivetran account as an example.
As you can see, the first 4 tables accounted for 88% of all billable rows synced to their data warehouse. Since they didn’t need those tables, we deselected them by going to the connector schema.
You can even go further and only sync the fields you need. For example, let’s say you don’t need the “active_view” metrics from the ad_group_stats table from the Google Ads connector. Simply expand the table schema, and deselect the fields you don’t need.
You can even go further and only sync the fields you need. For example, let’s say you don’t need the “active_view” metrics from the ad_group_stats table from the Google Ads connector. Simply expand the table schema, and deselect the fields you don’t need.
Incremental updates
Ensure that only new or changed data is transferred, to avoid re-syncing the entire dataset each time.
Sync frequency
Choose an appropriate sync frequency that matches your actual business needs. Our client had his sync frequency set to default which is every 6 hours. Since our data transformations are set to run once a day for this client, we changed the sync frequency to every 24 hours. Essentially slashing their costs by 75%.
Here’s how to do it in Fivetran:
Click on connector, then go to the setup tab. Then, set the “Sync Frequency” to the desired frequency.
Set a budget and alerts
Setting up budgets and alerts is an essential aspect of managing your ETL platform’s costs effectively. By keeping a close eye on your spending, you can make timely adjustments to your usage and avoid any unwelcome surprises at the end of the billing cycle.
Monitor schema changes
Changes at the source level happen all the time. Maybe someone added a new object in Salesforce, and now there’s a ton of new data flowing through your ETL platform.
To ensure you have complete visibility, set up alerts for new data so you can monitor the impact of your usage and evaluate if those fields are needed. Not all ETL platforms have an alert system for that. If you’re using Fivetran, you can easily set up notifications in the Schema tab for each connector.
Use multiple ETL platform’s
When it comes to ETL processes, one size definitely does not fit all. Different platforms offer various features, capabilities, and pricing models. As you aim to reduce your ETL platform costs, don’t overlook the possibility of using multiple ETL platforms to meet specific needs most efficiently. Here’s why diversifying your ETL platforms could be a smart move.
Let’s say most of your data sources don’t generate many new or updated rows per month. Fivetran is then a great solution since they have usage-based pricing. However, you have this one data source that generates a ton of new rows per month and would cost you a lot using Fivetran. Then, you might want to consider something like Datadoo which has a fixed pricing model.
Another reason might be that your preferred ETL platform doesn’t offer the connector/tables/fields to your required data source. In that case, there is no harm in using another ETL platform for that specific data source.
Leverage DBT packages
This is a big one. Fivetran is in my opinion the best ETL platform out there. Their usage-based pricing is great, the UI is easy to use, fantastic features, their documentation is great, and they have a free plan. That being said, the biggest advantage they have are their dbt packages.
DBT packages are pre-built data transformations. You see, when syncing data using an ETL tool, the data isn’t necessarily ready to be used for analysis yet. You will probably need a data engineer to write SQL to transform your data and implement basic data tests before it’s ready for data analysis. With Fivetran DBT packages, this is all taken care of automatically. You might still want a data engineer to create specific marts models, but the heavy lifting is done by the dbt package. This will save you time and money, and the best part is that those packages are also maintained by Fivetran.
Conclusion
I hope you’ve found this article to be a valuable resource for effectively managing your ETL platform costs.
If you have any questions or need further assistance, please don’t hesitate to contact us. Your effective data management is our top priority, and we’re here to help you every step of the way.