An Open Source CHAI
Over the last couple of years at tea, we’ve been trying to figure out a solution to better support open-source maintainers. We put our heads down and arrived at a solution that involved a giant data pipeline, that abstracted multiple package managers to come up with one single graph-view of open-source packages. It ran every single day, pulling data from NPM, PyPI, rubygems, apt, Homebrew, crates, and pkgx, identifying new nodes and edges in the open-source graph. It was technically intense, and very expensive to maintain.
Today, we’d like to open-source it, and give you a preview of how this could look.
In this early release, the core feature we wanted to preserve was the data model itself. It standardizes the dependency graph, and relationships between packages, their versions, urls, and users. So far, we’ve got it running for Homebrew and crates, and the data’s already pretty interesting!
The three simple components of an open-source CHAI that would be available for you to play around with today are:
- Database with tables and migrations, and an ERD so you can start querying the data
- Pipelines for crates and Homebrew, so you can see multiple strategies for fetching and transforming data from two different data sources
- A bare bones API to query the graph
We want this to be a community-driven tool, and we’d love for you to join us on this journey. Try running CHAI yourself and explore its capabilities. If you have ideas for improving our ERD or want to add new examples or pipelines, we’re eager to collaborate. Join us on Discord to continue the conversation and help shape the future of CHAI together!