blog/_posts/2020-03-18-why-did-i-recent...

6.7 KiB

title date url layout category image description
Why did I recently rewrite my whole blog Git history ? 2020-03-18 14:20 why-did-i-recently-rewrite-my-whole-blog-git-history post Articles /img/blog/why-did-i-recently-rewrite-my-whole-blog-git-history_1.png A quick Git LFS tutorial (justifying an anti-pattern technique usage)

A missing blog post image

Introduction

Back in the time, this blog was hosted on and by GitHub Pages, now Microsoft's (as NPM more recently).
But today, we won't be talking about the whole "embrace, extend, and extinguish" capitalist strategy, but rather Git.
You know Git, this little piece of software firstly written in two days and today daily used on Earth.
We are always looking for the next cool-but-functional graphic (including Web) interface to embellish and represent projects sources, but it's always often about the same program underly.

([Microsoft's] GitHub's) Pages is cool and handy, and I actually decided to keep Jekyll as the static HTML generator engine for my blog.
Self-hosting allows you to really appreciate some technical constraints, the same constraints often hidden when services are operated by large corporations platforms.

Basically, let's take this very blog as an example.
I have been, and for some years now, pushing non-diff-able objects (as images or minified front-end assets) to the Git tree.
It might be deadly-stupid (and stored by GitHub when using Pages), but that's not viable on the long run.

Here comes the main subject of this post : Git LFS.

Git LFS

Git LFS is a project allowing developers to version files that couldn't be version-ed by Git alone (those I qualified "non-diff-able" above).
By using short and diff-able pointers in the Git history, we might (finally) store binaries (or equivalent) without duplicating repositories size each time we update them.

And you know what ? It's packaged in Debian Buster (and back-ported to Stretch 🎉), so :

{% highlight bash %} apt install git-lfs {% endhighlight %}

Git LFS is well-supported by popular code hosting services, see some examples below :

As every other existing things on this planet, it comes with its own limitations, and before diving in, I'd advise you to consult them to check whether you are concerned or not.

Migrate Existing Repositories

Yeah, LFS is pretty cool and you should think about it before creating a new project and/or pushing non-diff-able data to a remote (and often, collaborative) repository.

But what about existing projects ?
How am I supposed to do if I want to keep the whole Git history AND migrate existing "binaries" to LFS ?

An awesome project comes with an awesome team : they thought about it.

Below is a very simple procedure to migrate already-referenced-contents.
Please adapt it, 'cause you know, YMMV :

{% highlight bash %}

When I first attempted to migrate blog assets to LFS, I came across an opened issue.

This was (likely) related to how project tags were named.

See https://github.com/git-lfs/git-lfs/issues/3818.

Thus, in order to move on (and take advantage of the COVID-19 freed time off), I've decided to delete 'em.

git tag -d v1.1.0 v1.2.0 # ... git push -d origin v1.1.0 v1.2.0 # ...

This blog got only one branch, so it (looks like it) drastically simplified the procedure.

I'd advise you to clean up your repository references too.

git branch -d feature/aint_time fix/not_a_bug # ... git push -d origin feature/aint_time fix/not_a_bug # ...

Now is the time to install LFS's hooks to your Git project internals.

git lfs install

Let's go !

The command below will show you what kind of files eat up your disk space.

git lfs migrate
info
--include-ref=refs/heads/master

If you are more of a BASH-guy, this could help you too.

find . -type f -not -path './.git*' -exec file --extension -b {} ';' | sort | uniq

Once you have identified the evil file extensions, you may rune something like :

git lfs migrate
import
--include=".jpg,.svg,.eot,.ttf,.woff,.min."
--include-ref=refs/heads/master

> Is it really... finished ?

Yes, and now it's verification time !

git lfs ls-files cat .gitattributes git log git # ...

If you're happy with the obtained results, you may clean Git internals.

git reflog expire --expire-unreachable=now --all git gc --prune=now

It's time to publish these changes, so here is a check list for you :

[ ] Disable your CI/CD hooks ;

[ ] Tell your colleagues not to push to the remote ;

[ ] Make sure LFS is enabled on your Git server ;

[ ] Make sure the target branch is not protected upstream ;

[ ] Force push :

git push -f

Git may has advised you to enable LFS file locking support, you should.

See https://github.com/git-lfs/git-lfs/wiki/File-Locking.

git config lfs.https://your.code.host/owner/a-repository.git/info/lfs.locksverify true {% endhighlight %}

Wow, you're done too ! Congratulations.

Your next (optional, but recommended) steps :

  • Run the garbage collector (if possible) on the remote (see example below on the Gitea administration dashboard) ; A missing blog post image

  • Tell your colleagues to install Git LFS too BEFORE properly re-cloning the affected repository ;

  • Apply the same previous operation on "read-only" mirrors (as your production for instance) ;

  • Re-enable your CI/CD hooks.

Conclusion

TL;DR No, I have not been hacked, I have voluntarily recently rewritten the whole blog Git history.

Sources