BFSYNC

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
COMMANDS
CONFIGURATION
MERGES
WALKTHROUGH
SEE ALSO

NAME

bfsync - big file synchronization tool

SYNOPSIS

bfsync <command> <args>...

DESCRIPTION

bfsync is a file-synchronization tool which allows to keep a collection of big files synchronized on many machines. There are two types of repositories: master repositories and checkouts. Master repositories only contain the history information. Usually they should be stored on a central server which all computers that should view/edit this data can reach (at least some of the time). Master repositories are created using bfsync init. The other repository type is a checkout. A checkout can be created using bfsync clone. Besides the history information checkouts contain the actual file contents, so a checkout can be huge (houndreds of gigabytes) whereas a master repository is usually small.

To view/edit the data in a checked out repository, the repository must be mounted using bfsyncfs. Once mounted, bfsync can be called with the commands given below. For instance bfsync commit/push/pull can be used to modify the local history and resynchronize the history with the master history. Other commands like bfsync get/put can be used to transfer data between checkouts.

For transfer, bfsync needs to be installed on every system that needs to be accessed. The actual transfer is done using ssh. Transfer has only be tested with ssh keys; it’s highly recommended to use ssh-agent to avoid entering your password over and over again.

OPTIONS

bfsync has a number of commands, the options depend on the command.

COMMANDS

clone [ -u ] <repo> [ <dest-dir> ]

Initialize new cloned bfsync repo from bfsync master repo <repo>. If <dest-dir> is not specified, bfsync clone will generate a directory name from the repository name, otherwise it will clone into the directory <dest-dir>. If -u is given, bfsync clone will set the "use-uid-gid" option in the config file for the cloned repository to 1. This means that upon mount, the user id settings and group id settings for files/directories will be taken from the repository. This will only make sense if all machines that access the data have the same uid/gid numbers. The default (without -u) is not to use the uid/gid numbers, which should be perfect for almost every use case.

commit [-m <message>] [-a <author>]

Commit changes to the repository. Normally, an editor will be started to allow the user to enter a commit message. If the -m option is used, the commit message is set to the <message> argument, and no editor will be started. The -a option allows to specify the author of the commit (otherwise it defaults to user@host).

push [<repo>]

Push changes to master repository. If the <repo> argument is missing, the default from the config file will be used.

pull [<repo>]

Pull changes from master repository. If there are new local commits and new commits in the master repository, pull will merge the local and master history. See section MERGES. If the <repo> argument is missing, the default from the config file is used.

get <repo>

Transfer file contents from repository - the directory should contain a bfsync checkout of the repository you’re working with. A path with a ":" is interpreted as remote path, a path without ":" is a local path. Examples for remote paths are stefan@server:/repos/files.bfsync or server:some/dir. If the <repo> argument is missing, the default from the config file is used.

put <repo>

Transfer file contents from local checkout to <repo>. If the <repo> argument is missing, the default from the config file is used.

	check		Checks local repository - ideally, all file contents that are known to the history of the repository are available in the objects directory of the repository (which stores one file per SHA1 hash). If thats not the case, put/get can be used to complete the local repository.
	log		Displays the history of the repository. For each commit, one version will show up in this log.
	gc		This will check for objects that are present in the local repo, but not used. This can happen when using revert, or for instance if the merge algorithm needs no modify diffs due to merge decisions (then the old diff becomes unused).

repo-files [-0|--null] <dir>

This searches a directory for files that are also in the repo. If you start moving data to the repo, you can clean up copies that might be present elsewhere. Using -0|--null makes the output suitable for use with xargs -0.

status

Show status information for files which have been modified in the local checkout, but not committed yet.

collect <dir>

This command allows using a non-checkout directory to populate a repository with file contents. It computes the SHA1 hash of all files in the directory, and copies those files with matching hash that are required for the local repository to its object directory. This is for instance useful if you’re converting from an old bfsync format to the new one, because by using collect you don’t have to retransfer the file contents.

revert [<version>]

If <version> is not specified, revert will discard all uncommitted changes, and revert to the newest version available in the history. If <version> is specified, it will go back in time to that version.

CONFIGURATION

Every bfsync checkout has a file called "config", which can be used to set configuration variables for this checkout.
sqlite-sync 0|1;

This can be used to control whether sqlite should block the application until all data has been written to the disk. With sqlite-sync set to 1 (the default), sqlite will wait for the disk; this is slower than setting this to zero; however the sqlite documentation indicates that setting sqlite-sync to 0 might lead to a corrupt database if the system has a power failure or operating system crash while writing changes.

use-uid-gid 0|1;

Bfsync was designed to store all file meta data, including the user id and group id of each file. These numbers will only make sense if all checkouts use the same uid/gid number to name mappings. Since for most users we cannot assume that the uid/gid numbers are the same on every system that has a checkout, bfsync defaults to ignoring the access permissions and uid/gid numbers stored in the repository. All files will appear to belong to the user that mounted the filesystem, and access rights will also not be enforced. To use the uid/gid numbers and enforce access rights, set use-uid-gid to 1. This is for instance useful if you want to copy data into the repository as root and preserve the ownership of the files.

default { get "<url>|<path>"; }

Set default location for get (an <url> or <path>) to be used if bfsync get is called without an argument.

default { put "<url>|<path>"; }

Set default location for put (an <url> or <path>) to be used if bfsync put is called without an argument.

default { pull "<url>|<path>"; }

Set default location for pull (an <url> or <path>) to be used if bfsync pull is called without an argument.

default { push "<url>|<path>"; }

Set default location for push (an <url> or <path>) to be used if bfsync push is called without an argument.

The configuration keys in the default group can be set simultaneously, by using

 default {
   get "...";
   put "...";
   push "...";
   pull "...";
 }

MERGES

bfsync allows independent modifications of the data/history contained in different checkouts. Upon push, bfsync will check that the master history doesn’t contain new commits that are unknown to the local checkout. If two clients modify the repository independently, the first client that uses bfsync push will simply reintegrate its changes into the master history, and the new master history will be this client’s history.

However, if the second client tries a bfsync push, the push will be refused. To resolve the situation, the second client can use bfsync pull. Once it is detected that merging both histories is necessary, a merge algorithm will be used. For non-conflicting changes, everything will be merged automatically. Non-conflicting changes could be:
master history has new file F - client 2 has new file G

After merging, both files will be present in the repository

master history has new dir A, with new files in it - client 2 has new
dir B, with new files in it

After merging, both directories will be part of the repository

master history has renamed file F to G - client 2 has renamed dir X to
Y

After merging, both renames will be done

master history has new file X - client 2 has new file X

In this case, one of the files will be renamed to X~1, since they were both independently added it is likely that the user wants to keep both files.

However, there are situations where the merge algorithm can’t merge both histories automatically:
master history has edited file F - client 2 has edited file F

In this case, bfsync pull will ask the user to resolve the situation; it is possible to keep the master version, or the local version or both.

master history has edited file F - client 2 has deleted file F

bfsync pull will ask the user in this case; it is possible to either keep the file with changes, or delete it.

In any case, after the merge decisions are made (if any), the merge algorithm will use them to modify the local history so that it can be executed without conflicts after the master history. After this step, the modified local commits will be based on the master history. This means that then, bfsync push will succeed, and the modified changes of client 2 can be pushed to the master history.

Note that the master history is always linear, so the history branch that was present before the merge algorithm was used will no longer be visible in the history after the pull. The merged history will simply contain the old history (before client 1 and client 2 made their changes), the changes made on client 1, an extra merge commit (if necessary to resolve merge issues), and the modified changes of client 2.

WALKTHROUGH

First, we create and setup repositories on three computers: server, client1 and client2. The server will hold the master repository (which manages the history, but nothing else). It is stored under ~/repos/big.bfsync. All computers will contain a checkout, so that the actual contents of the files can be kept there.
server:~$ mkdir repos

Create a directory on the server for the master repository.

server:~$ cd repos

Change dir.

server:~/repos$ bfsync init big.bfsync

Init master repo.

server:~/repos$ cd ~

Change dir.

server:~$ bfsync clone repos/big.bfsync

Clone repository on the server.

server:~$ mkdir big

Create mount point on the server.

server:~$ bfsyncfs big.bfsync big

Mount repository on the server.

client1:~$ bfsync clone server:repos/big.bfsync

Clone repository on client1.

client1:~$ mkdir big

Create mount point on client1.

client1:~$ bfsyncfs big.bfsync big

Mount repository on client1.

client2:~$ bfsync clone server:repos/big.bfsync

Clone repository on client2.

client2:~$ mkdir big

Create mount point on client2.

client2:~$ bfsyncfs big.bfsync big

Mount repository on client2.

As second step, we add a music file on client1. Of course it’s possible to add more files in one step; you can also use rsync, mc or a file manager to copy files into the repository. Whenever files are added or otherwise changed, we need to commit and push the changes to the server, so that it contains the canonical index of files.
client1:~$ cd big

Change dir.

client1:~/big$ cp ~/download/01-some-music.flac .

Copy a big file into the repository checkout.

client1:~/big$ bfsync commit

Commit the changes to the repository.

client1:~/big$ bfsync push

Push the changes to the server.

So far, we have added the file to the repository on client1, but the contents of the file are only present on client1, and not in the other repos. To change this, we can transfer the file to the server.
server:~$ cd big

Change directory.

server:~/big$ bfsync pull

Using pull is required on the server before we can transfer the file there. By pulling, the server will have the necessary information, or in other words: the server can know that a file named 01-some-music.flac is part of the bfsync repository and should be there. Running bfsync check will report one missing file after this step.

client1:~/big$ bfsync put server:big

Now the actual transfer: after this step, both client1 and server will have a copy of 01-some-music.flac.

As last step, we’ll transfer the file to client2. Of course we could use the same commands that we used to get the file to the server, but let’s assume that client2 is behind a firewall, and that it’s not possible to ssh to client2 directly. Fortunately, besides uploading files to another host (bfsync put), it’s also possible to download data from another host (bfsync get).
client2:~$ cd big

Change directory

client2:~/big$ bfsync pull

Update directory information.

client2:~/big$ bfsync get server:big

Get the file from the server.