
Not only is it a great open-source solution, but it’s also simple to get up and running. Data Transformation Picture courtesy of dbt.ĭbt has predominantly emerged as the way forward for transforming data. Due to its ease of use and low barrier to startup, it is quickly becoming the industry standard for replication jobs.Īnother ingestion tool I commonly see companies using is Singer, an open-source ETL tool that powers data extraction and consolidation for all of your organization's data. Data Ingestion Deploying Airbyte on Kubernetes just became a whole lot easier.įor data ingestion, a popular solution is Airbyte, an open-source EL(T) platform that helps you replicate data from disparate sources to your data warehouses, data lakes, and databases. Bonus points for good community support channels and contribution programs. Good open-source tools are well-documented, have a strong community presence, and are being actively developed.

While this is certainly challenging for developers, I do think it’s certainly reasonable to deploy a self-hosted open-source data stack with the right tooling and processes in place as an organization. On the other hand, I realize how valuable it is for organizations to self-host their data stacks to better navigate privacy and security concerns relevant to their field of work. Having operated large scale systems for nearly a decade, I’ve seen firsthand the challenges of deploying an open-source application on your own. The current consensus is to deploy your open-source data stack on Kubernetes, as it provides the ability to scale seamlessly and operate complex infrastructure using standard tooling. While there still isn’t a consensus on which path to choose (to either choose an open-source or third-party vendor), I think it’s interesting to explore the possibilities of building an open-source data stack ( and, with the current state of the market, it’s honestly the best time to re-consider how you designed your data stack and begin to explore open-source alternatives).

The current data engineering ecosystem is filled with a wide range of tools from both open-source and third-party solutions.
METABASE KUBERNETES FULL
For further context check out the full episode below.

But I didn’t convince with that because from past 6 months it never crossed 60% but suddenly it reached 80%. If MySQL uses more memory then absolutely its fine since More In-Memory More Performance. We received an alert from our monitoring system that one of our read replicas consuming more memory than usual. It supports many databases including Google BigQuery which I like most in Metabase. Metabase is a great lightweight tool for analytics or we can use it for query MySQL database(it’s not the primary purpose, but still we can use it as a GUI for run your report queries).

It's not a very big deal but looks something interesting/warning if you are using Metabase. To continue our adventures on MySQL, today we investigated one more issue about MySQL’s memory consumptions.
