Git –Large File Storage
Although Git is well-known as a version control system, many Git users are unaware of Git LFS (Large File Storage). I’ll try to explain why and when Git LFS should be used, as well as how to utilise it, in this post.
What is it, exactly?
Git LFS is an open-source project that serves as a Git extension. The purpose is to make working with huge files and binary files in your repository more efficient.
- Every time you change a large file, it adds to the history of your repository.
- Fetching and pulling large files will take longer.
- An update to a binary file is treated by Git as a complete file change, unlike a plain text file, which just stores the differences in the file. Your Git repository will grow in size if you make frequent modifications to binary files. Git commands will get slower after a given length of time due to the growing size of your repository.
When you have a lot of binaries and/or huge files in your repository, Git LFS is the way to go. When files or file types are specified as LFS files, Git LFS uses pointers instead of actual files. When you pull a Git LFS file to your local repository, it goes through a filter that replaces the pointer with the real file. The actual files are on the distant server, and the fetched actual files are in your local repository’s cache. This means that the size of your local repository will be limited, but the remote repository will, of course, have all of the files and differences.
The installation will be done on Ubuntu and we assume that Git is already installed. As said before, Git LFS is an extension to Git and therefore needs to be installed separately:
sudo apt install git-lfs
First create an empty new Git repository:
$ mkdir mygitlfsplanet
$ cd mygitlfsplanet
$ git init
Initialized empty Git repository in /home/user/mygitlfsplanet/.git/
Navigate to your Git repository (where the .git directory is located) and execute the following command in order to activate Git LFS:
$ git lfs install
Updated git hooks.
Git LFS initialized.
First, take a look at your .gitconfig file in your home directory. The following section has been added:
clean = git-lfs clean — %f
smudge = git-lfs smudge — %f
process = git-lfs filter-process
required = true
Navigate to the directory mygitlfsplanet/.git/hooks. The following hooks have been added/updated and contain git-lfs commands which will be executed when the hook is triggered:
Also a directory mygitlfsplanet/.git/lfs is added which is the local cache we have been talking about.
Now that we have installed Git LFS for our repository, it is time to configure which file types we want to associate with Git LFS. This information will be added to a .gitattributes file in your repository. It is advised to commit and push this file to your repository in order that every developer works with the same Git LFS configuration. The most easiest way to associate a file type with Git LFS is by means of the git lfs track command. Let’s associate all jpg files to Git LFS:
$ git lfs track “*.jpg”
The .gitattributes file is created and contains the following information:
*.jpg filter=lfs diff=lfs merge=lfs -text
What if we have a directory large files in our repository with large xml files and we don’t want to associate all XML files to Git LFS but only the ones that reside in that particular directory? We can track the directory large files and only associate the xml files in that directory with Git LFS:
$ git lfs track “largefiles/*.xml”
The only thing left to do is to commit the .gitattributes file to our local repository.
Git LFS With an Existing Repository
We’ve showed how to enable Git LFS when we create a new repository and we know which files we want to associate with Git LFS up to this point. However, what if you wish to add Git LFS support to an existing repository? You may do this in the same way that we did with a new repository. Git LFS will track new files and file updates from that point forward. The commits made before you enabled Git LFS will not be transferred automatically. However, there is a technique to move your complete repository. You must use the following command to migrate all of your existing branches:
git lfs migrate import –include=”*.jpg,largefiles/*.xml” –include-ref=refs/heads/master
If we had forgotten to associate any file types to our previously formed repository, we should perform the command shown above. You specify which file types must be migrated after the include option, and which branch you wish to migrate after the include-ref option. Your history will be moved to LFS after this. However, be aware that this operation will also rewrite your history! Because the commit hashes in your repository will be different after this activity, each developer should clone the repository again. Before you proceed with this migration, consider the repercussions carefully.
- The local Git LFS cache will not be automatically cleaned. You must prune your Git LFS content with the following command, just as you must prune remote branches on a regular basis: git lfs prune
- Ascertain that all developers are using Git LFS. You’ll get some unusual errors if someone who doesn’t have Git LFS installed commits a file that should be connected with Git LFS. They can be rectified, but it’s better to avoid them in the first place.
- We’ve also talked about committing binaries to a Git repository, but is this something you should do? As an initial response, I’d say no. However, there are situations when you just do not have any viable choices. When considering committing binaries, consider the following:
- Is it truly required to keep a version of the binary?
- Is there a text-based binary that can be used instead of binary? For example, if you wish to commit MS Word files, can you convert them to plain text or are there good reasons not to?
In this article, we discussed what Git LFS is, how to install it, and how to use it. We also provided some pointers on how to apply Git LFS to an existing repository.