blog/_posts/2020-03-18-why-did-i-recent...

139 lines
6.7 KiB
Markdown

---
title: "Why did I recently rewrite my whole blog Git history ?"
date: 2020-03-18 14:20
url: why-did-i-recently-rewrite-my-whole-blog-git-history
layout: post
category: Articles
image: /img/blog/why-did-i-recently-rewrite-my-whole-blog-git-history_1.png
description: "A quick Git LFS tutorial (justifying an anti-pattern technique usage)"
---
[![A missing blog post image](/img/blog/why-did-i-recently-rewrite-my-whole-blog-git-history_1.png)](/img/blog/why-did-i-recently-rewrite-my-whole-blog-git-history_1.png)
### Introduction
Back in the time, this blog was hosted on and by GitHub Pages, now Microsoft's ([as NPM more recently](https://github.blog/2020-03-16-npm-is-joining-github/)).
But today, we won't be talking about the whole _"embrace, extend, and extinguish"_ capitalist strategy, but rather Git.
You know Git, this little piece of software firstly written in two days and today daily used on Earth.
We are always looking for the next cool-but-functional graphic (including Web) interface to embellish and represent projects sources, but it's ~~always~~ often about the same program underly.
(\[Microsoft's\] GitHub's) Pages is cool and handy, and I actually decided to keep Jekyll as the static HTML generator engine for my blog.
Self-hosting allows you to really appreciate some technical constraints, the same constraints often hidden when services are operated by large ~~corporations~~ platforms.
Basically, let's take _this_ very blog as an example.
I have been, and for some years now, pushing non-diff-able objects (as images or minified front-end assets) to the Git tree.
It might be deadly-stupid ~~(and stored by GitHub when using Pages)~~, but that's not _viable_ on the long run.
Here comes the main subject of this post : [Git LFS](https://git-lfs.github.com/).
### Git LFS
Git LFS is a project allowing developers to version files that couldn't be version-ed by Git alone (those I qualified "non-diff-able" above).
By using short and diff-able pointers in the Git history, we might (finally) store binaries (or equivalent) without duplicating repositories size each time we update them.
And you know what ? It's [packaged in Debian Buster](https://packages.debian.org/source/buster/git-lfs) (and [back-ported to Stretch](https://packages.debian.org/source/stretch-backports/git-lfs) :tada:), so :
{% highlight bash %}
apt install git-lfs
{% endhighlight %}
Git LFS is well-supported by popular code hosting services, see some examples below :
* GitHub, [since 2015](https://github.blog/2015-04-08-announcing-git-large-file-storage-lfs/) ;
* GitLab, [since v8.2 (2015 too)](https://about.gitlab.com/blog/2015/11/23/announcing-git-lfs-support-in-gitlab/) ;
* Gitea, [since v1.1.0 (2016)](https://github.com/go-gitea/gitea/pull/122).
As every other existing things on this planet, it comes with its own limitations, and before diving in, I'd advise you to [consult them](https://github.com/git-lfs/git-lfs/wiki/Limitations) to check whether you are concerned or not.
### Migrate Existing Repositories
Yeah, LFS is pretty cool and you should think about it **before** creating a new project and/or pushing non-diff-able data to a remote (and often, collaborative) repository.
> But what about existing projects ?
> How am I supposed to do if I want to keep the _whole_ Git history AND migrate existing "binaries" to LFS ?
An awesome project comes with an awesome team : they thought about it.
Below is a very simple procedure to migrate already-referenced-contents.
Please adapt it, 'cause you know, **YMMV** :
{% highlight bash %}
# When I first attempted to migrate blog assets to LFS, I came across an opened issue.
# This was (likely) related to how project tags were named.
# See <https://github.com/git-lfs/git-lfs/issues/3818>.
# Thus, in order to move on (and take advantage of the COVID-19 freed time off), I've decided to delete 'em.
git tag -d v1.1.0 v1.2.0 # ...
git push -d origin v1.1.0 v1.2.0 # ...
# This blog got only one branch, so it (looks like it) drastically simplified the procedure.
# I'd advise you to clean up your repository references too.
git branch -d feature/aint_time fix/not_a_bug # ...
git push -d origin feature/aint_time fix/not_a_bug # ...
# Now is the time to install LFS's hooks to your Git project internals.
git lfs install
# Let's go !
# The command below will show you what kind of files eat up your disk space.
git lfs migrate \
info \
--include-ref=refs/heads/master
# If you are more of a BASH-guy, this could help you too.
find . -type f -not -path './.git*' -exec file --extension -b {} ';' | sort | uniq
# Once you have identified the evil file extensions, you may rune something like :
git lfs migrate \
import \
--include="*.jpg,*.svg,*.eot,*.ttf,*.woff*,*.min.*" \
--include-ref=refs/heads/master
# > Is it really... finished ?
# Yes, and now it's verification time !
git lfs ls-files
cat .gitattributes
git log
git # ...
# If you're happy with the obtained results, you may clean Git internals.
git reflog expire --expire-unreachable=now --all
git gc --prune=now
# It's time to publish these changes, so here is a check list for you :
# [ ] Disable your CI/CD hooks ;
# [ ] Tell your colleagues **not** to push to the remote ;
# [ ] Make sure LFS is enabled on your Git server ;
# [ ] Make sure the target branch is not protected upstream ;
# [ ] Force push :
git push -f
# Git may has advised you to enable LFS file locking support, you should.
# See <https://github.com/git-lfs/git-lfs/wiki/File-Locking>.
git config lfs.https://your.code.host/owner/a-repository.git/info/lfs.locksverify true
{% endhighlight %}
Wow, you're done too ! Congratulations.
Your next (optional, but recommended) steps :
* Run the garbage collector (if possible) on the remote (see example below on the Gitea administration dashboard) ;
[![A missing blog post image](/img/blog/why-did-i-recently-rewrite-my-whole-blog-git-history_2.png)](/img/blog/why-did-i-recently-rewrite-my-whole-blog-git-history_2.png)
* Tell your colleagues to install Git LFS too **BEFORE** properly re-cloning the affected repository ;
* Apply the same previous operation on "read-only" mirrors (as your production for instance) ;
* Re-enable your CI/CD hooks.
### Conclusion
**TL;DR** No, I have not been hacked, I have voluntarily recently [rewritten the whole blog Git history](https://git.forestier.app/HorlogeSkynet/blog/compare/42bb72dc97209b05ba198c41ecf67146b93fcac1...7849565abeeb83ba947b35f4b5764e835a361a27).
### Sources
* [git-lfs-migrate(1) - Migrate history to or from git-lfs](https://github.com/git-lfs/git-lfs/blob/master/docs/man/git-lfs-migrate.1.ronn)
* [Migrating existing repository data to LFS](https://github.com/git-lfs/git-lfs/wiki/Tutorial#migrating-existing-repository-data-to-lfs)