I never thought that I'd have to remove files from Git Commit! Unless I was an complete idiot. I never really used git in my project before 2024. Once I was introduced to git in a Software Engineering course at my university, it felt like a scoreboard for me while writing codes for my projects. It's tracking each and every change, tracking how many commits I made. And most importantly, before using git, I manually uploaded project files to github to share projects, it was a silly, time-consuming, and tiring task too. After starting to use git, It was easy to manage changes, manage versions, share and sync codes and changes with others, and importantly push the code repository to github with a single command. As I am a to-go VS Code user, VS has integrated git tools with a nice UI, so it is much more convenient to use git with VSCode.
So recently, I've been working on a PHP Image Management Tool. Which is basically a CRUD application for managing Image and Media Content. It does have some complex features like an Image Duplicate checker, Image Hashing, Video Thumbnail Generator, and Image Frame Extractor, and other tools that use vanilla javascript, I coded it in native PHP, just to practice PHP again. I will talk about this in a later post. Now let's stick to my story.
While developing the PHP Image Management Tool, I uploaded 2000+ images as dummy payload data to test the PHP Application to see how optimized it was. It was working nice as far I've seen, the image compressor and processor were compressing the image to save bandwidth and reduce time and all the other things I was testing. I also tweaked the code. And I committed the changes and went to sleep.
The next afternoon, I opened up my VSCode, and I went to check the Working Tree of the git to see what I had changed previously. And my VSCode froze! I hit the task manager, task manager saying that VSCode was taking up 5GB of memory, I was like "Dude!"
I immediately pushed the code to remote origin, but It took a long time, which was so unusual. I felt like, what is happening? I cancelled the push.
Then I restarted the VSCode and opened the tree again, but I had the same problem. I felt like something was corrupted. I checked everything, and then my gut said, check the commit list, maybe some footprint of the problem is there. And there it goes! It showed "2556 files added" The git tracked each image in the uploads directory, and I did not add a .gitignore to the project directory. All the images were about 2 GB in size total, the project directory with all the git dependencies and project snapshot together was 4.5 GBish. I felt like a complete idiot. It was a simple .gitignore file that I missed.
At the moment I felt like,
So I made a mistake, normally I knew that I couldn't remove something from a commit. However, I had to find a way to remove the image files from the commits. So searched in forums like StackOverflow, and Reddit and got familiar with a tool that could resolve my problem and avert that crisis. But after searching for a while, I found a tool called git filter-repo
The git filter-repo tool is a powerful command-line utility designed to rewrite Git history. It's often used to clean up repositories, such as removing large files, sensitive data, or directories, without altering other parts of the history. And a point to note, I'm using a Windows system.
How git filter-repo works?
git filter-repo modifies the entire commit history by analyzing every commit, applying filtering rules, and rewriting the history based on these rules. The result is a new Git history where unwanted files, folders, or changes are removed.
Key Concepts of How It Works
- Analyzing the History:
- It goes through the entire repository history, commit by commit, looking at what files, folders, or metadata (like your commit messages, author names, etc.) exist.
- You specify the filtering criteria, such as which paths or files to include or exclude, or which commits to modify.
- Applying Filters:
- Based on the specified rules (like removing a folder or a file), it applies changes to each commit where the criteria are met.
- It rewrites affected commits while leaving unaffected ones as they are.
- Rebuilding the Repository:
- Once all commits have been processed,
git filter-repo
rebuilds the Git repository with the rewritten history. - The repository's structure, including branches and tags, is preserved, but the specified data is permanently removed.
Let's Remove some files from Git Commits!
Install Python (if not already installed):
Download and install Python for Windows. Make sure to check the option to add Python to your system's PATH during installation.
Install git-filter-repo
Once Python is installed, you can install git-filter-repo using pip (the Python package manager) via Git Bash. As I'm using a windows machine, I use the Git Bash shell for windows. So open the git bash in your project directory and run the below command.
pip install git-filter-repo
Add git-filter-repo to Your PATH:
After installation, you need to add git-filter-repo to your PATH so that Git recognizes it. The script will be in Python's Scripts folder(something like C:\Users\YourUsername\AppData\Local\Programs\Python\PythonXX\Scripts).
- To add it to your PATH, follow these steps:
- Open the Start Menu and search for "Environment Variables."
- Click on Edit the system environment variables.
- Click the Environment Variables button.
- Under User Variables, select the Path variable and click Edit.
- Add a new entry with the path to the Scripts folder where git-filter-repo was installed
(e.g.,C:\Users\YourUsername\AppData\Local\Programs\Python\PythonXX\Scripts
). - Click OK to save.
Remove the Unnecessary Folder from History:
Now that git-filter-repo is installed, you can remove the mistaken folder or file with some commands
As I've mistakenly added the /uploads directory
git filter-repo --path uploads/ --invert-paths
If you want more control to rewrite the history of your git repository, give a visit to this original repository by newren who created the git filter-repo tool.
Force Push to Remote Origin
That did my job, now I want to push it to the remote origin, which is my github repository. But git will match the local repository with the remote one and find out the differences, It will tell you that the repositories are altered, not synced or not the same. Here, we have to force push the local repository to the remote origin.git push origin --force
Now here, let me give you a friendly advice, if you are working in the project alone, this will just work fine for you. But if you are working in a team, you should notify all the team members that you changed the history, and ask the to sync their local repository with your updated version, either they will collide with your work and commits. That was hell of an experience, I'm never ignoring the gitignore again! And thanks to newren all the other good people for creating this great tool, really saved my time and my head 😂