on
How to Migrate to a Monorepo
Introduction
There are many articles that analyze the pros and cons of using a monorepo. However, if you have already decided to migrate from a polyrepo to a monorepo, it can be hard to find tutorials or discussions about the process. I want to share my recent experience and discuss the challenges encountered during the migration. This will provide insights into the steps I took and the considerations made in achieving this goal.
Current Status and Motivation
Our team uses .NET for developing the company's business applications. We have adopted a polyrepo management approach to handle services such as web APIs and MQ. These services often need to communicate with each other.
The first challenge we encounter is the difficulty in maintaining the external contracts of these services. Checking all endpoints becomes necessary whenever we need to modify contracts.
Secondly, numerous similar or identical business logics are dispersed across various repositories. Managing changes to business logic incurs significant costs. Moreover, the polyrepo approach is not conducive to cohesive domain logic, especially considering that these services collectively serve one product.
Challenges
The most important thing is to keep our product available as possible as we can. Therefore, we must switch to a new approach in a very short duration. Unfortunately, our team currently uses git-flow with complex feature branch structures. Moreover, new requirements keep emerging, which means we can't afford to pause development while transitioning to a new repository approach. In summary, we need a plan to help us transfer smoothly. It may take some time, but it should be predictable and easy to roll back if we get in trouble.
Migration Plan
We will break down the migration procedure into several steps. I will provide a brief introduction to these steps, followed by a more in-depth discussion in the subsequent sections.
- Phase 1: Prepare a Designated Repository
The first step we created repository for the monorepo. We transfer all content from the original repository to the new repository. Because of using Git as the version control tool, this step is not hard, and we want to preserve all commits in the new repository. While all development activities continue to occur in the original repository, this new repository is intended for CI/CD preparation. The latest commit is not required, and the repository can also accommodate integration tests in the same time. - Phase 2: Modify CI/CD Pipelines
Each original repository has its own CI/CD pipeline settings. Transitioning these settings to the new repository takes time. Ideally, there should be an environment available for testing the migration of these pipelines. - Phase 3: Develop in Monorepo Repository
Once the above preparations are complete, the next step is to update the codebase in the monorepo. After updating the repository, all development activities can seamlessly transition to the new repository.
Git
Git is a very helpful tool in this procedure. When it comes to managing several repositories, we can use submodules or subtrees. Submodules are very easy to use. However, submodules don't really incorporate code into the superrepo, and one submodule can't reference another. Our goal is to reuse the codebase in each repo, and submodule is obviously not suitable for this purpose. Subtrees, on the other hand, integrate the subrepo as part of the superrepo. But syncing changes in the subrepo is not easy. In the migration procedure, I believe a subtree is a suitable choice.
-
Step 1: Add original repositories as remote repo:
Using the following Git command
git remote add <name> <url> # <name>: the repo alias you want # <url>: the repo you want to add
Using the repo alias
<name>
can simplify subtree commands. -
Step 2: Add all subtree repos:
git subtree add --prefix=<folder path> <name> <branch> # <folder path>: repository pull target path # <name>: repo alias # <branch>: the branch you want to pull
There are several things to notice. Your original repository structure may look like this:
# projectA |- src |- projectA.sln |... |...
After pulling all repositories, the new repository structure may look like this:
# monorepoProject |- src |- ProjectA | |- src | |- projectA.sln | |... | |... |- ProjectB |- src |- projectB.sln |... |...
Due to the complicated branch status, it's important to pay attention to the
<branch>
parameter. -
To simplify the migration plan, I suggest not modifying files in the subtree folder because I want other team members still developing in the original repository. I don't want to change other engineers' development habits in this phase.
About CI/CD
When modifying the GitLab CI script, I encountered two problems. The first is understandable; each repository has its settings, and after merging into one repo, it takes time to integrate all scripts. The second problem arises because we recently changed our workflow. We used to trigger pipelines manually to build and deploy projects. Recently, we modified the setting to automatically trigger pipelines when a pull request is merged. After migrating to a monorepo, every project gets built and deployed whenever a merge happens, which is not our desired behavior.The solution to the second problem is to use GitLab rules and detect changes in each project folder.
Discuss Monorepo Tool
Currently, I don't have plans to use a monorepo tool as our project isn't large enough to necessitate it. While build and deploy remain challenges, we can solved them by modifying the CI script. Nevertheless, I've been investing some time in understanding NX.
One issue with a monorepo is that not every change affects every project in the repository. As the repository grows larger, I doubt we can effectively manage it solely through more CI script settings. NX seems to be a solution. It's easy to use—creating a workspace and installing the NX/dotnet plugin. NX could automatically detect affected projects under the workspace folder and configure builds or deployments.
However, as a .NET developer, I find that the developer experience of NX is more familiar to JavaScript developers. Moreover, if we decide to use NX as our CI tool, it would be more efficient to have it handle all CI/CD as NX tasks. This transition could be challenging as it requires abandoning the GitLab CI script template we are currently using. Currently, there's no urgent need to ask our team members to adopt NX.