Cleaning a git repo

This is the nature of maintaining software.  You step into a role and are handed a repository of code.  There may be a lot of it, and it’s quality is questionable at best.  There may even be a ton of cruft.  I recently encountered one such repo involving a 7 year old WordPress Multisite. Additionally, there were a number of custom database tables and php CRUD apps were built alongside & intertwined with this MU instance.  About a year before I joined the project, the old Multisite instance became the basis for a new website, and a number of themes, plugins, library code, and log files were unnecessarily added to the new repository.  Even if one performs git rm, those files will still remain in the history and would be downloaded with every new clone.  Since it was still relatively early enough in the project’s history (and that I was the only active developer), I decided to try some more advanced git magic to purge these files.

Enter git filter-branch

An Alternate Approach

If you have more sophisticated needs, there is a tool that builds upon git-filter-branch

Another use case

I wanted to move a small repository of helper scripts and template config files to my primary github account.  The problem was that all my username and emails were for the source git account.  I found this github help article outlining how you can change the user name and email of a committer and author within your history.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s