135 lines
5.4 KiB
Markdown
135 lines
5.4 KiB
Markdown
---
|
|
title: "How to rewrite Git history while keeping message commit references"
|
|
date: 2022-03-26 12:41
|
|
last_modified_at: 2022-03-29 19:17
|
|
url: how-to-rewrite-git-history-while-keeping-message-commit-references
|
|
layout: post
|
|
category: Tutorials
|
|
image: /img/blog/how-to-rewrite-git-history-while-keeping-message-commit-references_1.png
|
|
description: "Mankind has a duty of memory"
|
|
---
|
|
|
|
[![A missing blog post image](/img/blog/how-to-rewrite-git-history-while-keeping-message-commit-references_1.png)](/img/blog/how-to-rewrite-git-history-while-keeping-message-commit-references_1.png)
|
|
|
|
### Introduction
|
|
|
|
Sometimes, you would like to clean your Git history (let's say, to remove [a redacted production secret still present in history](https://www.root-me.org/en/Challenges/Web-Server/Insecure-Code-Management), or maybe change an old committer identity).
|
|
|
|
> :warning: As such operations are very dangerous, please read this post **fully** before running anything, and note that I hereby decline any responsibility (as always) if something bad happens to your project.
|
|
|
|
### The problem
|
|
|
|
If you reach this page, you already know the problem : rewriting Git history causes all identifiers (SHA) following the first affected commit to change, and [you cannot do a thing about it](https://stackoverflow.com/questions/64204804/dirty-trick-to-keep-commit-hashes-when-rewriting-git-history).
|
|
|
|
If one of the developers used to specify commit references in their own commit messages (like `This commit follows 40d5014 [...]`), they won't mean anything once rewriting is done.
|
|
Moreover, if some of your commits "revert" others, they are also affected (Git does not update them automatically).
|
|
|
|
### The workaround
|
|
|
|
So we have somehow to dynamically "update" commit references, while rewriting the history, according to new commit identifiers.
|
|
|
|
Below is a script implementing this, derived from one of the official GIT-FILTER-BRANCH(1) manual page examples, updating `root <root@localhost>` identity with `John Doe <john@example.net>` :
|
|
|
|
{% highlight sh %}
|
|
git filter-branch \
|
|
--env-filter '
|
|
if test "$GIT_AUTHOR_NAME" = "root"
|
|
then
|
|
GIT_AUTHOR_NAME="John Doe"
|
|
fi
|
|
if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
|
|
then
|
|
GIT_AUTHOR_EMAIL=john@example.com
|
|
fi
|
|
if test "$GIT_COMMITTER_NAME" = "root"
|
|
then
|
|
GIT_COMMITTER_NAME="John Doe"
|
|
fi
|
|
if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
|
|
then
|
|
GIT_COMMITTER_EMAIL=john@example.com
|
|
fi
|
|
' \
|
|
--commit-filter '
|
|
printf "%s" "${GIT_COMMIT}," >> ../commits_mapping
|
|
git commit-tree "$@" | tee -a ../commits_mapping
|
|
' \
|
|
--tag-name-filter cat \
|
|
--msg-filter '
|
|
message="$(cat)"
|
|
commit_refs="$(echo "$message" | LC_ALL=C grep -oE "\b[0-9a-fA-F]{7,40}\b")"
|
|
for commit_ref in $commit_refs; do
|
|
new_sha="$(grep "^${commit_ref}" ../commits_mapping | cut -d, -f2)"
|
|
if test -z "$new_sha"
|
|
then
|
|
continue;
|
|
fi
|
|
commit_ref_len="$(printf "%s" "$commit_ref" | wc -m)"
|
|
new_commit_ref="$(echo "$new_sha" | cut -c "1-${commit_ref_len}")"
|
|
message="$(echo "$message" | sed "s/${commit_ref}/${new_commit_ref}/g")"
|
|
done
|
|
|
|
echo "$message"
|
|
' \
|
|
-- --all
|
|
{% endhighlight %}
|
|
|
|
You may have noticed that filtering scripts are fully-POSIX compatible, so they are _supposed_ to work in most environments (maybe even yours :wink:).
|
|
|
|
You will find other features too :
|
|
|
|
* Committer identities are additionally getting updated ;
|
|
|
|
* **All** branches are getting rewritten (this may not be something that you want !) ;
|
|
|
|
* Tags are getting updated too (they will point to the same effective version of the code).
|
|
|
|
### A workaround pitfall
|
|
|
|
> **TL; DR** : beware of word collisions across commit messages.
|
|
|
|
There is a caveat that we have to share though, because of the use of regular expressions in the `msg-filter` script :
|
|
|
|
[![A missing blog post image](/img/blog/how-to-rewrite-git-history-while-keeping-message-commit-references_2.png)](https://www.explainxkcd.com/wiki/index.php?title=1171:_Perl_Problems)
|
|
|
|
You _might_ encounter collisions between commit references and real-life words, existing in your language.
|
|
|
|
For a project with commit messages written in English, you can safely run the above Git migration, because there is none :
|
|
|
|
{% highlight bash %}
|
|
LC_ALL=C grep -oE "\b[0-9a-fA-F]{7,40}\b" /usr/share/hunspell/en_US.dic
|
|
{% endhighlight %}
|
|
|
|
If you happened to use shorter SHA (let's say, 6-character long references), there **are** collisions in English :
|
|
|
|
{% highlight bash %}
|
|
LC_ALL=C grep -oE "\b[0-9a-fA-F]{6,40}\b" /usr/share/hunspell/en_US.dic
|
|
accede
|
|
bedded
|
|
cabbed
|
|
dabbed
|
|
decade
|
|
efface
|
|
facade
|
|
{% endhighlight %}
|
|
|
|
For an Italian project, there **are** collisions, even with 7-character long references (:fearful:) :
|
|
|
|
{% highlight bash %}
|
|
LC_ALL=C grep -oE "\b[0-9a-fA-F]{7,40}\b" /usr/share/hunspell/it_IT.dic
|
|
accadde
|
|
decadde
|
|
{% endhighlight %}
|
|
|
|
### Last words
|
|
|
|
Please also note that `git filter-branch` usage is [deprecated since Git v2.24.0](https://github.com/git/git/commit/9df53c5de6e687df9cd7b36e633360178b65a0ef), and [filter-repo](https://github.com/newren/git-filter-repo/) should be preferred.
|
|
~~If you managed to adapt the solution described in this post with this tool, feel free to post a comment below !~~
|
|
|
|
It actually appeared that [filter-repo supports this feature by default](https://github.com/newren/git-filter-repo/#design-rationale-behind-filter-repo) ! :tada:
|
|
So it definitely should be preferred over `git filter-branch`, but sometimes, only legacy tools are available...
|
|
|
|
---
|
|
|
|
> Many thanks to the co-author of this script, who will recognize himself :pray:
|