bfsync - manage git-like repository with big files
bfsync <command> <args>...
bfsync is a program that provides git-style revision control for collections of big files. The contents of the files are managed by bfsync, and a git repository is used to do version control; in this repo only the hashes of the actual data files are stored.
For transfer, bfsync needs to be installed on every system that needs to be accessed. The actual transfer is done using ssh. Transfer has only be tested with ssh keys; it’s highly recommended to use ssh-agent to avoid entering your password over and over again.
bfsync has a number of commands, the options depend on the command.
clone <repo>
Initialize new cloned bfsync repo from git repo <repo>; it’s highly recommended to always work with a bare git main repo (git init --bare).
add <file>...
Add files to the bfsync repository
commit |
Commit changes to the repository |
|||
push |
Push changes to upstream repository |
|||
pull |
Pull changes from upstream repository |
get <dir>|<remote dir>...
Transfer file contents from directory - usually the directory should contain a bfsync checkout of the repository you’re working with. However, it is possible to get data from any directory (local or remote), and as long as required hash values are found, it doesn’t matter where the data comes from. A path with a ":" is interpreted as remote path, a path without ":" is a local path. Examples for remote paths are stefan@server:/big/files or server:some/dir
put <remote repo>
Transfer file contents from local directory to remote repo.
check |
Checks local repository - ideally, all files that are known to the git index of the repository should be found and their hash values should match the values in the git repo. If thats not the case, put/get can be used to complete the local repository. |
mv <src_file> <dest_file>
Use this for renaming a file - in the local repo, the data is automatically adapted, but for remote repositories, you’ll need to use get/put to get the data right, and delete to delete the old file.
rm <file>...
Remove a file from the repository. On the local checkout the data file is removed automatically; however on remote hosts, using bfsync delete is required to remove the file.
delete |
This will check for files that are present locally, but not known to the index git repo. This can occur if you create new files, or if remote renames have been done. Since bfsync cannot distinguish these two cases, you’ll have to confirm the delete list or abort if files that you have newly created are on that list. |
repo-files [-0|--null] <dir>
This searches a directory for files that are also in the repo. If you start moving data to the repo, you can clean up copies that might be present elsewhere. Using -0|--null makes the output suitable for use with xargs -0.
status |
Show status information about files in repo. |
First, we
create and setup repositories on three computers: server,
client1 and client2. The server will hold the main
repository (a git bare repository containing hashes of
files); this repo is the canonical index which files are
checked into the repository. It is stored under
~/repos/big.git. All computers will contain a checkout, so
that the actual contents of the files can be kept there.
server:~$ mkdir -p repos/big.git
Create a directory on the server for the main git index repository.
server:~$ cd repos/big.git
Change dir.
server:~/repos/big.git$ git init --bare
Init git repo.
server:~/repos/big.git$ cd ~
Change dir.
server:~$ bfsync clone repos/big.git
Clone repository on the server.
client1:~$ bfsync clone server:repos/big.git
Clone repository on client1.
client2:~$ bfsync clone server:repos/big.git
Clone repository on client2.
As second step,
we add a music file on client1. Of course it’s
possible to add more files in one step, either by passing
more than one filename to bfsync add, or by calling bfsync
add more than once. Whenever files are added or otherwise
changed, we need to commit and push the changes to the
server, so that it contains the canonical index of files.
client1:~$ cd big
Change dir.
client1:~/big$ cp ~/download/01-some-music.flac .
Copy a big file into the repository checkout
client1:~/big$ bfsync add 01-some-music.flac
Add the file to the repository.
client1:~/big$ bfsync commit
Commit the changes to the repository.
client1:~/big$ bfsync push
Push the changes to the server.
So far, we have
added the file to the repository on client1, but the
contents of the file are only present on client1, and not in
the other repos. To change this, we can transfer the file to
the server.
server:~$ cd big
Change directory.
server:~/big$ bfsync pull
Updating the git index repository is required on the server before we can transfer the file there. By pulling, the server will have the necessary information, or in other words: the server can know that a file named 01-some-music.flac is part of the bfsync repository and should be there. Running bfsync check will report one missing file after this step.
client1:~/big$ bfsync put server:big
Now the actual transfer: after this step, both client1 and server will have a copy of 01-some-music.flac.
As last step,
we’ll transfer the file to client2. Of course we could
use the same commands that we used to get the file to the
server, but let’s assume that client2 is behind a
firewall, and that it’s not possible to ssh to client2
directly. Fortunately, besides uploading files to another
host (bfsync put), it’s also possible to
download data from another host (bfsync get).
client2:~$ cd big
Change directory
client2:~/big$ bfsync pull
Update directory information.
client2:~/big$ bfsync get server:big
Get the file from the server.
git.1 <http://testbit.eu/index.php?title=Git.1&action=edit&redlink=1>, rsync.1 <http://testbit.eu/index.php?title=Rsync.1&action=edit&redlink=1>