May 14, 2014

Getting git submodule to track a branch

Since using Git submodules in our Komodo code base, we’ve learned some
interesting things about submodules that we’d like to share. Komodo is a
powerful IDE with support for multiple languages, including Python,
PHP, Perl, Golang, Ruby. Learn more here.

Git submodules work really well for splitting out code bases into separate
repositories, but there are still a few things (quirks) you should be aware of.
There’s a lot of out-of-date information on the web, so pay attention to the
date of the articles you read, as git has been changing (getting better) all the
time.

About Submodules

A submodule in a git repository is like a sub-directory which is really a
separate git repository in its own right.
There is a special git submodule command included with git, and this command takes various arguments in order to add/update/delete your submodules.

You can set the submodule to track a particular branch (requires git 1.8.2+), which is what we are doing with Komodo, or you can reference a particular repository commit (the later requires updating the main repository whenever you want to pull in new changes from the module – i.e. updating the commit hash reference).

One special thing to note about submodules, is that by default they are first initialized into a detached head state (like an anonymous git branch), so if you want to work on this submodule code later, you’ll first need to update the information (checkout) to be able to push your changes correctly.

Using Submodules

Creating

To create a submodule, you create a new repository to host your module code and then link that into the main repository project. For example, we’ve created a trackchanges repository (a new feature coming in Komodo 9) that we’re linking into the main Komodo Edit  repository as a submodule:


cd KomodoEdit
git submodule add -b master https://github.com/Komodo/trackchanges.git src/modules/trackchanges

Using the -b argument means we want to follow the master branch of the trackchanges repository, and after running this command we’ll have an empty src/modules/trackchanges directory.

You’ll need to run a special submodule initialize command (i.e. after cloning the main repository) to fetch the code for the first time:


git submodule update --init

Updating

Now, git uses the update command to also fetch and apply updates, but this time the arguments are slightly different:


git submodule update --remote

This will update the trackchanges submodule we added earlier, to the latest
version.

Editing

Now that we have our submodule playing nicely, it would be useful if we could go and edit this submodule code in place, making commits and pushes, which is exactly what we’ll do next. It requires a few additional git commands to change from a detached head into a proper remote branch.


cd src/modules/trackchanges
git checkout master

This gets you into a state where you can edit, commit and push your changes back to the trackchanges repository.

Note that for GitHub you may wish to change from using https:// into a ssh:// remote url (to be able to authenticate using ssh) – to do that you’ll want to edit the submodule config file, which you can get from the “.git” file in your submodule:


cat .git
gitdir: ../../../.git/modules/src/modules/trackchanges

This gitdir directory holds the git config file for your remote, so update that accordingly.

One thing to note about submodule editing, if you run the git submodule update –remote command, your module will go back into being in a detached head state, so you’ll need to remember to re-checkout the next time you want to edit it again.

Status

One important thing to note, is that after you make new commits in the submodule, or have pulled in new commits from the submodule, is that ~~~git status~~~ will show the submodule as modified, like this:


# modified: src/modules/trackchanges (new commits)

which is annoying – as we thought we had told git to track the branch and the submodule is now updated to the latest branch commit… what’s going on?

This happens to be a limitation of submodule branch tracking – git submodule add -b simply adds information about a branch in the .gitmodule file and allows you the option to manually update the submodule object to the latest commit of that specified branch. Your main repository still thinks your submodule is at the original commit (i.e. when the initial git submodule add -b was run) – and when you run git submodule update (without the –remote option) your submodule will be reset back to that original commit.

You have to go and update that submodule commit reference to the latest code in the remote branch to avoid this:


git add src/modules/trackchanges
git commit -m "Update submodule tracking to the latest commit"

This additional commit step is somewhat of a pain – and I’m hopeful that future versions of git come up with a more stream-lined method to keep track of the submodule remote branch.

Summary

In summary, git submodules are a flexible way to de-couple your code base, but it does require some additional learning of git commands in order to manage and update these separate code bases. All in all I think this a worthy trade-off and much better than trying to utilize third-party alternatives (like Git Subtree or Google’s Repo).

There are a lot of git resources out there – this Git tutorial is a good
thorough reference, otherwise you’ll likely find what you need on Stack Overflow.

Activestate-platform-any-language

Automatically build Python, Perl and Tcl runtimes for Linux and Windows plus Python for macOS with the ActiveState Platform.

Todd Whiteman

Todd Whiteman

Todd Whiteman is the Development Lead for Komodo IDE. He works adeptly with Mozilla, Python and JavaScript to ensure that Komodo IDE is the IDE of choice for development teams. Todd has been working on Komodo IDE for nearly 6 years, adding components such as the Vi Key Bindings, Distributed Source Code Control, 64-bit Linux builds as well as Remote SSH support into Komodo IDE.