Migrating a Repository to Bazel

This will be a bit of a difficult post for me to make, because the work I've been doing has all been in a private repository at work and I can't share any of the direct code snippets here. But I think the work to migrate build systems has been very interesting, and I'd like to share some lessons and tips with those following a similar path.

I won't go into "Why Bazel" or anything like that here. I'll assume you have an existing codebase in some form and want to switch it over to using Bazel, and you want to know how to make the switch easier.

Let's refactor your build tool.

1: Find your seam

This is a common step when performing refactorings, and applying it to build tools is no different. You should start by finding small parts of the repository that you can switch over to Bazel without disrupting other sections. This will allow you to slowly add new targets to your build without requiring a massive merge of all the projects at once.

The phrase "without disrupting" is potentially confusing here, because Bazel is very opinionated on where it puts your task output files. We'll cover that in the next section, but it's important to pick section of the repository that are conceptually separate steps, not just one that is tied together by a script or similar.

For example, there are often seams where two different languages interact in your repository, or where two different services talk to each other. You may currently depend on your protobuf objects being placed in a specific directory to get picked up by another project, but building those protobuf files is an excellent seam: you just have to change the instructions to build those files first, then switch that build step over to Bazel.

When you take this approach, changes get easier and easier to make. The first few seams are painful because you're breaking up your repository structure, but the next steps become easier and easier. Once the API schema is generated, you can generate the database schema. Then when you go to switch the server build to Bazel, you already have all your dependencies built in a way that makes them trivially accessible from Bazel.

2: Find the files you depend on

Bazel tries to remove the concept of file management from the build process, and instead tries to get you to think of your build as a dependency graph. If you're repository is small and the dependency graph is, too, this almost certainly isn't worth your time. But in large repositories, being able to abstract away a dependency on another project is incredibly powerful. Developers can compile and test just the changes they know they made, and know that all their dependencies will be built for them with no thought added.

However, if you don't have Bazel yet, you probably depend on files to communicate between projects. You may generate a file that several projects need to use, or that a current project uses from a very specific location. Or you might run a script that creates several files all over your repository.

Because Bazel is very opinionated about how it does things, it's very difficult to place files in exactly the same place they were before. First of all, Bazel puts output files in a directory separate from your source code, and there's no way to get this to change easily. Even with some efforts with a genrule, I've had no luck placing files into the repository tree during the build process.

However, that shouldn't deter you. You can do a 2 step process to create a "script" you can run to place the files you need wherever you want in the repository. Let's go through an example.

Example seam

In this example, let's assume that:

You have a Scala project (using SBT) that relies on some XML as a resource
That XML is currently generated by building and running a Python script (using Poetry)
So to run the current build, you might do something like:

cd python-generator/
poetry run generator --output-dir=../sbt/src/main/resources/
cd ../sbt/
sbt compile

This pattern of "go to project directory, run command, back out" is common in many polyglot repositories.

We can encode this in Bazel with 2 steps: a genrule to create a shell script that can run in the normal repository tree, and a sh_binary that you can actually run. Your project's poetry/BUILD.bazel might then look like:

genrule(
    name = "generator_script",
    tools = ["poetry"],
    outs = ["generator.sh"],
    srcs = glob(["**/*.py"]),
    executable = True,
    cmd = """
    	echo '#!/usr/bin/env bash' > "$(OUTS)"
        echo 'cd "$${BUILD_WORKSPACE_DIRECTORY}/poetry/"' >> "$(OUTS)"
        echo 'poetry run generator "$$@"' >> "$(OUTS)"
    """,
)

sh_binary(
    name = "generator",
    srcs = [":generator_script"],
)

This will create a shell script that will run from the BUILD_WORKSPACE_DIRECTORY, which is the root directory of your Bazel workspace (generally, the root directory of your repository). It will then cd into the poetry directory and execute poetry run generator, with all the command line flags you've passed to it.

So your command line from above would turn into:

bazel run //poetry:generator -- --output-dir=../sbt/src/main/resources
cd sbt/
sbt compile

From here you can work towards making the Python generator into a Bazel target, then use that in the Scala code when that gets switched over:

scala_library(
    name = "generator_consumer",
    srcs = glob(["**/*.scala"]),
    resources = "//poetry:generated_xml",
)