BTW, we're ⚡ hiring Infra, SRE, Web, Mobile, and Data engineers at Doximity (see roles) -- find out more about our technical stack.
At Doximity, we benefit from open-source software every day. We consider contributing back to that software a worthwhile endeavor. As we were transitioning to building application container images with Paketo Buildpacks, we discovered that the Ruby buildpack we wanted to use was not yet capable of precompiling assets at build-time, so we found the open Github issue and decided to help by creating the rails-assets buildpack.
This post describes the process involved in developing that new buildpack with a special focus on the unanticipated complexity in managing image layers and caching. By no means is this post meant to be a source for reference documentation. If you are looking to learn how to build a buildpack, I recommend this tutorial. For an explanation on when to create a buildpack, I recommend Sophie Wigmore’s blog post. This is simply about our experience and the discoveries we made that we will be able to use during our next foray into writing a buildpack.
The RFC
Before writing any code, a significant change like introducing a new buildpack requires a Request for Comments (RFC). The RFC gives the maintainers, community, and contributors a chance to get on the same page about why this change is needed, how it will be implemented, and if it’s a new buildpack, where it will sit in relation to existing buildpacks and builders. Members of the Paketo community get together weekly to discuss RFCs face to face. By chance, the buildpack we wanted to build already had one that was accepted but not yet implemented. You can find RFCs that impact multiple buildpacks or the general Paketo organization in the RFCs repository. RFCs that only impact a given language family buildpack can be found in their respective repositories under rfcs/.
Let’s take a look at the RFC for enabling Rails Asset Pipeline support.
The RFC lays out a few things, but the critical points to consider when writing a buildpack are what the detect and build steps should do. These steps are necessary in order to abide by the Cloud Native Buildpack specification.
detect
When a buildpack runs against an application’s source code, the first step is detection, whereby the buildpack determines if it is a fit for the source code and whether it should participate in the build process. If it should participate in the build process, it will need to declare any dependencies it requires and any it provides.
For our buildpack, the detection criteria consist of:
- Is there a Gemfile, and does it contain the rails gem?
- Are there any assets directories?
This buildpack will require:
- gems
- bundler
- node
- mri
This buildpack will provide no dependencies.
build
Assuming the buildpack passes detection, it will participate in the build step. For our buildpack, the build contribution was simply to run:
bundle exec rails assets:precompile assets:clean
With an understanding of what our buildpack needed to do, we could begin writing some code.
The Basics
We spent time reviewing similar buildpack implementations to understand how the code was structured. We used the bundle-install buildpack as a reference frequently while building our own as it had similar requirements from the build environment (bundler, mri, etc.)
The Paketo buildpacks are written in Go and use Go Modules for dependency management. On my team at Doximity, we write plenty of Go code, so we didn’t have the added burden of learning a new language which made contributing much easier. If you’ve never written in Go before, writing a Paketo buildpack is actually a great opportunity to learn as each one is small, test-driven, and follows a consistent pattern.
Besides the language, the buildpacks are written following a test-driven development (TDD) approach. The buildpacks have tests at the integration and unit level. The integration tests for our buildpack use the pack CLI to build a simple Rails application using our buildpack and then run it so we can verify that the assets are compiled and in the expected location. There are separate integration tests for Rails 5.0 and 6.0 since the behavior is different for each version. Building Rails application images takes more than a few seconds, so there are unit tests for faster feedback. A unit test checks for a single isolated behavior in the code and passes or fails based on the assertions. Here is an example of one of the happy-path unit tests, and here is one of the unhappy-path unit tests for the detect phase.
Now that we’ve covered the basics let’s talk about what we found challenging when building our first Paketo buildpack.
The Complexity: Image Layers & Caching
Buildpacks create an Open Container Initiative (OCI) compliant container image (the same type of image created by a Dockerfile.) OCI defines industry standards around container image formats and runtimes. When an image complies with those standards, then any container runtime that implements the OCI runtime specification can read and run it.
The image itself is some metadata that references a set of layers. Those layers are included in the image as tar archives. When unarchived, each layer contains a set of files that are added to the container filesystem. The layers are applied on top of each other to create a complete filesystem.
Cloud Native Buildpacks takes a unique approach when it comes to developing and applying these layers. It enforces dedicated namespaces for the layers on the filesystem that allow those layers to be swapped out and replaced in an order-independent fashion. Think of it this way: if my layer contains files that only affect a single directory on the filesystem, and no other layer is allowed to modify that directory, then I can add, remove, or update that layer without needing to also rebuild those layers that might logically appear later in the image metadata. The advantage of this approach becomes clear when it comes to rebuilding the image. Layers that need to be updated can do so without requiring the subsequent steps in the build process to also update their layers. This is different from the naive layer caching strategy used in a simple docker build and enables some performance improvements both during build and later during deployment.
When an image is built, the buildpacks participating in the build process run independently, contributing layers and modifications to the working directory. Creating a layer is as simple as creating a directory and a TOML configuration file in a predetermined location on the filesystem. That TOML configuration file indicates to the buildpack lifecycle how the layer should be handled. Primarily, it indicates if the layer should be included in the final image, whether the layer should be made available to subsequent buildpacks, and if the layer should be restored when the application is being rebuilt. The details of these configurations can be summarized as follows:
launch
|
If true, the layer directory should be included in the built container image.
If false, the layer directory should not be included in the built container image. |
build
|
If true, the layer directory should be available to subsequent buildpacks at build-time.
If false, the layer directory should not be available to subsequent buildpacks at build-time. |
cache
|
If true, the layer directory should be persisted to subsequent builds of the same OCI image.
If false, the layer directory should not be persisted to subsequent builds of the same OCI image. |
[metadata]
|
The metadata section contains any buildpack-specific data for that layer. |
In the case of our buildpack, we produce a single layer called rails-assets
and it’s TOML file looks like:
launch = true
build = false
cache = false
[metadata]
cache_sha = 123456789123456789
This indicates that the layer is included in the built image and made available to the application at run-time, that the layer is not available to subsequent buildpacks at build-time, and that the cache does not need to be persisted for subsequent builds of the same image. In the buildpack-specific metadata section, we have included a cache key. The key is a checksum of the files included in the working directory that contain assets. When the application image is rebuilt, the buildpack will compare the checksum of the current working directory assets with the metadata cache key. If they match, then the buildpack can skip the asset precompilation process and just reuse the layer. This can speed up the build process significantly. Layer reuse is complex and nuanced, so I recommend the spec for further information.
The layers that set launch = true
are referenced and included in the built container image. We can crack open a simple Rails application container image to see our layer contents:
layers/
├── config
│ └── metadata.toml
├── paketo-buildpacks_bundle-install
│ └── ...
├── paketo-buildpacks_bundler
│ └── ...
├── paketo-buildpacks_mri
│ └── ...
└── paketo-buildpacks_rails-assets
└── assets
├── env.launch
│ ├── RAILS_ENV.default
│ └── RAILS_SERVE_STATIC_FILES.default
├── public-assets
│ ├── application-04024382391bb910584145.css
│ ├── application-04024382391bb910584145.css.gz
│ ├── manifest-b4bf6e57a53c2bdb55b8998cc.js
│ └── manifest-b4bf6e57a53c2bdb55b8998cc.js.gz
└── tmp-cache-assets
└── sprockets
└── v4.0.0
├── 0u
│ └── 0uAahpY5R4STSYbGEhSXFLPOde8YQb-d7IVpGQ6sPfI.cache
├── ...
└── wV
└── wVWuMnOuv77UjkwDPqaPGdMXco9Sz_GGHb1q20M_410.cache
409 directories, 1858 files
The rails-assets
layer that our buildpack creates contains a few directories within it. The first, env.launch
, contains the default environment variable values for RAILS_ENV
and RAILS_SERVE_STATIC_FILES
. These ensure Rails runs using a production configuration and that it serves its own static files instead of expecting a file server to handle it.
The other two directories contain the precompiled assets generated by running the bundle command against the application’s code. The directories the application sees (public/assets
and tmp/cache/assets
) are actually symlinks to the two layers you see above (public-assets
and tmp-cache-assets
). This way, when the application is running, it can find its precompiled assets in the expected locations. In the built image, the application lives at /workspace
.
cnb@b568cc16543c:/workspace$ ls /workspace/public/
404.html
422.html
500.html
apple-touch-icon-precomposed.png
apple-touch-icon.png
assets -> /layers/paketo-buildpacks_rails-assets/assets/public-assets
favicon.ico
robots.txt
cnb@b568cc16543c:/workspace$ ls /workspace/tmp/cache/
assets -> /layers/paketo-buildpacks_rails-assets/assets/tmp-cache-assets
webpacker
Being able to create and modify layers in a buildpack is common behavior. To make this easy programmatically, the Paketo Core Team has developed a library called packit. It provides a simple API to manage the layers for our buildpack. Our buildpack uses this library to create the layer we want if it does not exist or reuse it on subsequent builds if it’s present.
One of the best features of buildpacks is that they can choose to reuse layers if there is no work needing to be done. In our case, we can skip running rails assets:precompile
if none of the application’s assets have changed. If anything has changed, the buildpack can re-run the command to create new compiled asset bundles. A quick way to verify if the application’s assets have changed is to calculate a SHA sum of the app/assets
directory and include it in the layer metadata when it is created. Then on subsequent builds, the buildpack can compare that value with the latest SHA sum of that directory.
Once we had our buildpack fully functional and passing the acceptance tests, we could test it with our real applications.
Package & Release
In order to test the final product in our own builder, we packaged it and moved it to our remote image registry with skopeo so we could reference it. Since our builder uses the Paketo Ruby language-family buildpack, we needed to replace that with the individual buildpacks that make up the language-family buildpack so we could insert our rails-assets buildpack in the buildpack group ordering as specified in the RFC. With a release of our builder, we were able to test real Rails applications and confirm everything worked as expected.
After we confirmed that the buildpack was working for our applications, we pursued having the upstream Ruby Paketo buildpack adopt it.
Adoption
There are a couple of benefits to having a buildpack adopted:
- If the buildpack gets included as part of a language family buildpack (Ruby), we can ensure that it is readily available to the many developers in the buildpacks community that might benefit from that buildpack’s features. With more usage will come more improvements to the behavior and performance of the buildpack, as well as the identification and resolution of any bugs that are found. Like anything open-source, we use the community’s experience and intelligence to improve upon something that “just works for us.”
- We are not buildpack experts. If the buildpack stays in our company’s private Github organization or somewhere not readily accessible, then we are unlikely to see adoption by other developers and thus miss out on the benefits described in #1. More importantly, we would also be responsible for updating any dependencies of the buildpack. However, if the Paketo team adopts the buildpack as one of their own, they can use their existing tools and processes to ensure the buildpack stays up-to-date and relevant.
Depending on the buildpack, the adoption process may change. In our case, the buildpack was relatively simple, and it was critical for most Rails applications, so it was adopted into the Ruby language-family buildpack. For other buildpacks, they may make more sense to be contributed to the paketo-community organization.
I joined one of the Paketo Working Group meetings in order to share the buildpack with the community and answer any outstanding questions. After a discussion, it was agreed that the buildpack could be adopted into the Ruby language-family buildpack so it would be readily available to all Rails developers.
Summary
With the experience of having written our buildpack, the idea of writing another in the future is far less daunting. More importantly, we developed a better understanding of what the buildpacks are actually doing such that we can understand when there is unexpected behavior or we are seeking to optimize functionality.
At Doximity, we aim to build great products for our clinicians. Internally, we are building a platform to help our teams deliver those products more efficiently. The software we build depends in many ways on open-source software. Finding opportunities to contribute back and collaborate with those communities is beneficial for everyone. I hope reading about our experience encourages you to take stock of OSS that you benefit from that could use your expertise and contributions.
Be sure to follow @doximity_tech if you'd like to be notified about new blog posts.