bfsync - Documentation - bfsync 0.3.4





bfsync - big file synchronization tool


bfsync <command> <args>...


bfsync is a file-synchronization tool which allows to keep a collection of big files synchronized on many machines. There are two types of repositories: master repositories and checkouts. Master repositories only contain the history information. Usually they should be stored on a central server which all computers that should view/edit this data can reach (at least some of the time). Master repositories are created using bfsync init. The other repository type is a checkout. A checkout can be created using bfsync clone. Besides the history information checkouts contain the actual file contents, so a checkout can be huge (houndreds of gigabytes) whereas a master repository is usually small.

To view/edit the data in a checked out repository, the repository must be mounted using bfsyncfs. Once mounted, bfsync can be called with the commands given below. For instance bfsync commit/push/pull can be used to modify the local history and resynchronize the history with the master history. Other commands like bfsync get/put can be used to transfer data between checkouts.

For transfer, bfsync needs to be installed on every system that needs to be accessed. The actual transfer is done using ssh. Transfer has only be tested with ssh keys; it’s highly recommended to use ssh-agent to avoid entering your password over and over again.

It is possible to use bfsync for backups, in that case you only use a subset of what bfsync can do to get a versioned FUSE filesystem with deduplication. See the section BACKUP WALKTHROUGH for a description.


bfsync has a number of commands, the options depend on the command.


clone [ -u ] [ -c <cache_size_mb> ] [--rsh <remote_shell>] <repo> [
<dest-dir> ]

Initialize new cloned bfsync repo from bfsync master repo <repo>. If <dest-dir> is not specified, bfsync clone will generate a directory name from the repository name, otherwise it will clone into the directory <dest-dir>. If -u is given, bfsync clone will set the "use-uid-gid" option in the config file for the cloned repository to 1. This means that upon mount, the user id settings and group id settings for files/directories will be taken from the repository. This will only make sense if all machines that access the data have the same uid/gid numbers. The default (without -u) is not to use the uid/gid numbers, which should be perfect for almost every use case except for backups. The cache size is the amount of shared memory that will be used as database cache size. It defaults to 16 MB, which is fine for repositories that contain a small amount of files. For synchronizing big files between different hosts, that will usually work. If the amount of files is large (> 100.000), you need a larger cache size. At least 100 MB cache per 1.000.000 files stored should be used, more is better, if you can afford it. The --rsh option can be used to set the remote shell used to connect the host; it defaults to ssh.

commit [-m <message>] [-a <author>]

Commit changes to the repository. Normally, an editor will be started to allow the user to enter a commit message. If the -m option is used, the commit message is set to the <message> argument, and no editor will be started. The -a option allows to specify the author of the commit (otherwise it defaults to user@host).

push [--rsh <remote_shell>] [<repo>]

Push changes to master repository. If the <repo> argument is missing, the default from the config file will be used. The --rsh option can be used to set the remote shell used to connect the host; it defaults to ssh.

pull [--rsh <remote_shell>] [<repo>]

Pull changes from master repository. If there are new local commits and new commits in the master repository, pull will merge the local and master history. See section MERGES. If the <repo> argument is missing, the default from the config file is used. The --rsh option can be used to set the remote shell used to connect the host; it defaults to ssh.

get [--rsh <remote_shell>] <repo>

Transfer file contents from repository - the directory should contain a bfsync checkout of the repository you’re working with. A path with a ":" is interpreted as remote path, a path without ":" is a local path. Examples for remote paths are stefan@server:/repos/files.bfsync or server:some/dir. If the <repo> argument is missing, the default from the config file is used. The --rsh option can be used to set the remote shell used to connect the host; it defaults to ssh.

put [--rsh <remote_shell>] <repo>

Transfer file contents from local checkout to <repo>. If the <repo> argument is missing, the default from the config file is used. The --rsh option can be used to set the remote shell used to connect the host; it defaults to ssh.


Checks local repository - ideally, all file contents that are known to the history of the repository are available in the objects directory of the repository (which stores one file per SHA1 hash). If thats not the case, put/get can be used to complete the local repository.


Displays the history of the repository. For each commit, one version will show up in this log.


This will check for objects that are present in the local repo, but not used. This can happen when using revert, or for instance if the merge algorithm needs to modify diffs due to merge decisions (then the old diff becomes unused).

repo-files [-0|--null] <dir>

This searches a directory for files that are also in the repo. If you start moving data to the repo, you can clean up copies that might be present elsewhere. Using -0|--null makes the output suitable for use with xargs -0.


Show status information for files which have been modified in the local checkout, but not committed yet.

collect <dir>

This command allows using a non-checkout directory to populate a repository with file contents. It computes the SHA1 hash of all files in the directory, and copies those files with matching hash that are required for the local repository to its object directory. This is for instance useful if you’re converting from an old bfsync format to the new one, because by using collect you don’t have to retransfer the file contents.

revert [<version>]

If <version> is not specified, revert will discard all uncommitted changes, and revert to the newest version available in the history. If <version> is specified, it will go back in time to that version.

continue [<repo>]

There are some operations, such as commit or revert, that must be finished completely once they are started, to ensure that the database is in a consistent state. If such an operation is interrupted, for instance because of a power failure, user abort (^C) or because the process was killed, it needs to be completed before the repository can be used for something else. A filesystem with an unfinished command can only be used in readonly mode until that operation is finished. The "continue" command can be used to finish an unfinished operation, either with the repository directory as argument, or if that is not there, the repository will be choosen according to the directory "continue" was started in.

need-continue [<repo>]

Returns true if running bfsync continue is necessary. The repository is choosen either from the <repo> argument, or if that is not there, from the directory "need-continue" was started in.

recover [<repo>]

Since bfsync uses Berkeley DB to store its data, the data integrity is protected even against extreme failures. For instance if the process that accesses the database crashes or is killed, or if everything on the host running bfsync is interrupted due to power failure or kernel panic. After such failures, the database needs recovery. Trying to mount bfsyncfs in that situation or using the bfsync utility will fail. In this situation running "recover" will restore the database to a consistent state and after that, normal work can start again. The repository is choosen either from the <repo> argument, or if that is not there, from the directory "recover" was started in. Recovery can not be done if there are still processes that are using the database. In that case, these processes need to be terminated before recovery. Attempting recovery if it is not necessary is safe, so it won’t change the database state.

need-recover [<repo>]

Returns true if running bfsync recover is necessary. The repository is choosen either from the <repo> argument, or if that is not there, from the directory "need-recover" was started in.

disk-usage [-h]

This generates disk usage information. The -h option will use human readable sizes (like 15G for 15 gigabytes), omitting it will print the sizes in bytes. The disk usage information is designed to see what could be freed when removing older versions. Therefore, it will start with the most recent version, since you’ll want to keep this version in any case. This first size will be the size required to store one full backup (for the backup use case). Then, older versions are examined. Only the sizes of the files are counted that need to be kept additionally to be able to retrieve the older version completely. Since most of the files will be unchanged when compared to the current version, the increments will usually be small.

new-files [-h] [-s] <version>

This command shows which files were added to the repository for a given version. It supports -s, which will also print the sizes of the newly added files. The -h option will use human readable sizes, omitting it will print the sizes in bytes. When the size information is requested, the files will be sorted by size, to make it easier to find out which are the biggest additions of this version.


The expire command was designed mainly for deleting the contents of old backups, although it will work on any repository. During expire, the tags daily, weekly, monthly and yearly will be assigned to the versions that are daily, weekly, monthly or yearly backups. The expire configuration will determine which version is - for instance - a monthly backup. It could be the first backup of the month, or the last backup of the month. Finally, after all versions have been tagged according to the expire configuration, expire will mark versions deleted that shouldn’t be kept. For each daily/weekly/monthly/yearly backup, expire will keep the N newest backups, where N is configurable. The expire/keep_daily setting defines how many daily backups expire will keep, the expire/keep_weekly setting defines how many weekly backups expire will keep, and so on. Finally there is a setting expire/keep_most_recent defines how many most recent backups expire will keep. Every backup that is not kept due to one of these settings will be marked deleted. The settings that affect the expire command are documented in the CONFIGURATION section. Note that while expire marks versions as deleted, it doesn’t actually delete the file contents that belong to deleted versions. Running bfsync gc will delete all file contents that are only in deleted versions.


Upgrade bfsync repository contents from an old bfsync version to a new bfsync version. The repository version of a bfsync repository is initialized to the version of bfsync that created the repository. That is true for master repositories (create with bfsync init) and checkouts (created with bfsync clone). A repository can only be accessed by bfsync if the repository version and the bfsync version matches. In some cases upgrade can automatically upgrade a repository to the current bfsync version. Note that while this makes it possible to access the repository with the new bfsync, at the same time it makes it impossible to access it with the old bfsync (and there is no downgrade command). Manual repository version upgrades are described in the section UPDATING FROM AN OLD VERSION; of course if automated upgrade with bfsync upgrade are easier and should be used if bfsync upgrade supports the old version -> new version upgrade.

delete-version <vrange1> <vrange2> ...

This command tags the version(s) <vrange1>, <vrange2>, ... as deleted, where single numbers are possible (bfsync delete-version 10) or version ranges (bfsync delete-version 10-20). Once deleted, these versions become invisible (in the .commits dir, in the bfsync log and so on), and files that are only present in deleted versions are removed during bfsync gc, and will not be retransferred for instance during bfsync get.

undelete-version <vrange1> <vrange2> ...

Untag versions that were previously deleted (see delete-version for complete description).

config-set <key> <value> ...

Modify repository configuration file. The key can either be a simple key (bfsync config-set cache-size 1024) or in <group>/<key> format (bfsync config-set expire/keep_daily 30). It is possible to assign more than one value to a key, however this only makes sense for keys that support that.

transfer-bench [--rsh <remote_shell>] <host>

Measure transfer speed when transferring data from host <host> to the local host. The --rsh option can be used to set the remote shell used to connect the host. It defaults to ssh, but other remote shells (like rsh) can provide better performance.

sql-export [-d <database>] [-u <user>] [-w <password>] [-H <host>] [-p
<port>] [-r]

Export a versioned list of files to postgres, in order to be able to browse the repository contents using standard sql queries. The export only adds the versions that are new since the last sql-export run, unless the -r option is used. In that case, it clears all database tables and starts exporting all versions from the beginning. The postgres connection arguments (user/database/...) can be specified as commandline arguments, or in the config file. See section CONFIGURATION for the config file entries that affect the sql-export command.

find-missing [-0|--null]

Show filenames of files where file contents are unavailable. This command only shows files that are present in the current version of the repository, files that have been deleted are ignored. Using -0|--null makes the output suitable for use with xargs -0.

copy-expire [--rsh <remote_shell>] <repo>

Usually, the deletion of old versions is done by the expire command. If a repository is replicated using clone/pull/get, it is often useful to run the expire command on the source repository, and copy the resulting tags to the target repositories using copy-expire. In this way, the same versions that have been deleted in the source repository will also be deleted in the target repository. The --rsh option can be used to set the remote shell used to connect the host; it defaults to ssh.


Each bfsync repository or checkout has a unique id, which is generated upon creation of the repository and stored in the "info" file of each repository. This command will print this repository id.


Print bfsync version.


Every bfsync checkout has a file called "config", which can be used to set configuration variables for this checkout.

Bfsync was designed to store all file meta data, including the user id and group id of each file. These numbers will only make sense if all checkouts use the same uid/gid number to name mappings. Since for most users we cannot assume that the uid/gid numbers are the same on every system that has a checkout, bfsync defaults to ignoring the access permissions and uid/gid numbers stored in the repository. All files will appear to belong to the user that mounted the filesystem, and access rights will also not be enforced. To use the uid/gid numbers and enforce access rights, set use-uid-gid to 1. This is for instance useful if you want to copy data into the repository as root and preserve the ownership of the files.

get-rate-limit <get-limit-kb>;

Set the maximum transfer rate in kilobytes/sec that bfsync get will use. This is helpful if your internet connection has a limited speed: that way you can ensure that bfsync will not use up your line completely.

put-rate-limit <put-limit-kb>;

Set the maximum transfer rate in kilobytes/sec that bfsync put will use.

default { get "<url>|<path>"; }

Set default location for get (an <url> or <path>) to be used if bfsync get is called without an argument.

default { put "<url>|<path>"; }

Set default location for put (an <url> or <path>) to be used if bfsync put is called without an argument.

default { pull "<url>|<path>"; }

Set default location for pull (an <url> or <path>) to be used if bfsync pull is called without an argument.

default { push "<url>|<path>"; }

Set default location for push (an <url> or <path>) to be used if bfsync push is called without an argument.

default { copy-expire "<url>|<path>"; }

Set default location for copy-expire (an <url> or <path>) to be used if bfsync copy-expire is called without an argument.

The configuration keys in the default group can be set simultaneously, by using

 default {
   get "...";
   put "...";
   push "...";
   pull "...";

expire { keep-most-recent <N>; }

Keep <N> most recent versions during expire.

expire { create-daily first|last; }

Tag first/last backup of the day as daily backup during expire.

expire { keep-daily <N>; }

Keep the newest <N> daily backups during expire.

expire { create-weekly <weekday>; }

Tag daily backup on <weekday> as weekly backup during expire. Possible values for <weekday> are monday, tuesday, ..., sunday.

expire { keep-weekly <N>; }

Keep the newest <N> weekly backups during expire.

expire { create-monthly first|last; }

Tag first/last daily backup of the month as monthly backup during expire.

expire { keep-monthly <N>; }

Keep the newest <N> monthly backups during expire.

expire { create-yearly first|last; }

Tag first/last daily backup of the year as yearly backup during expire.

expire { keep-yearly <N>; }

Keep the newest <N> yearly backups during expire.

The configuration keys in the expire group can be set simultaneously, for instance by using

 expire {
   keep-most-recent 30;
   keep-daily 45;
   keep-monthly 30;

sql-export { database <database>; }

Use the postgres database <database> for the sql-export command.

sql-export { user <user>; }

Use the postgres user <user> for the sql-export command.

sql-export { password <password>; }

Use the postgres password <password> for the sql-export command.

sql-export { host <host>; }

Use the postgres host <host> for the sql-export command.

sql-export { port <port>; }

Use the postgres port <port> for the sql-export command.

The configuration keys in the sql-export group can be set simultaneously, for instance by using

 sql-export {
   database bfsync;
   user postgres;
   password secret;


Shared memory is used by bfsync to access the Berkeley DB database contents from different processes: the bfsync FUSE filesystem process, bfsyncfs, and the python frontend, bfsync. Under Linux, the amount of shared memory usually is limited by three system-wide kernel parameters:

The maximum amount of shared memory that can be allocated.


The maximum size of a shared memory segment.


The maximum number of shared memory segments.

These limits need to be large enough to allow bfsync to allocate the required amount of shared memory. The amount of shared memory required mainly depends on the cache size. Bfsync will use somewhat more shared memory than the cache size, but setting the limits too high is usually not a problem. Example: If you’re using three bfsync filesystems with 256 MB cache per filesystem, you can do so if shmall is 2 GB and shmmax is 512 MB. shmmni is usually not an issue, because bfsync doesn’t use may segments (about 4 per filesystem).

To display your current limits, you can use:
server:~$ ipcs -lm

Display the system wide shared memory limits.

To adjust shared memory settings at boot time, create a file called /etc/sysctl.d/90-bfsync-shm.conf:

# Shared memory settings for bfsync

# Maximum size of shared memory segment in bytes
# 512 MB
kernel.shmmax = 536870912

# Maximum total size of shared memory in pages (normally 4096 bytes)
# 2 GB
kernel.shmall = 524288

Note that if you have other programs that also need shared memory, you need to coordinate the settings of all shared memory using programs. Its also not a problem if your limits are too high, so if the system wide limit for shmall is already 8 GB, there is no need to adjust it.

After creating this files, the settings will be loaded at boot time. To activate the shared memory configuration without rebooting, you can use
server:~$ sysctl -p /etc/sysctl.d/90-bfsync-shm.conf

Load shared memory settings (as root).


bfsync allows independent modifications of the data/history contained in different checkouts. Upon push, bfsync will check that the master history doesn’t contain new commits that are unknown to the local checkout. If two clients modify the repository independently, the first client that uses bfsync push will simply reintegrate its changes into the master history, and the new master history will be this client’s history.

However, if the second client tries a bfsync push, the push will be refused. To resolve the situation, the second client can use bfsync pull. Once it is detected that merging both histories is necessary, a merge algorithm will be used. For non-conflicting changes, everything will be merged automatically. Non-conflicting changes could be:
master history has new file F - client 2 has new file G

After merging, both files will be present in the repository

master history has new dir A, with new files in it - client 2 has new
dir B, with new files in it

After merging, both directories will be part of the repository

master history has renamed file F to G - client 2 has renamed dir X to

After merging, both renames will be done

master history has new file X - client 2 has new file X

In this case, one of the files will be renamed to X~1, since they were both independently added it is likely that the user wants to keep both files.

However, there are situations where the merge algorithm can’t merge both histories automatically:
master history has edited file F - client 2 has edited file F

In this case, bfsync pull will ask the user to resolve the situation; it is possible to keep the master version, or the local version or both.

master history has edited file F - client 2 has deleted file F

bfsync pull will ask the user in this case; it is possible to either keep the file with changes, or delete it.

In any case, after the merge decisions are made (if any), the merge algorithm will use them to modify the local history so that it can be executed without conflicts after the master history. After this step, the modified local commits will be based on the master history. This means that then, bfsync push will succeed, and the modified changes of client 2 can be pushed to the master history.

Note that the master history is always linear, so the history branch that was present before the merge algorithm was used will no longer be visible in the history after the pull. The merged history will simply contain the old history (before client 1 and client 2 made their changes), the changes made on client 1, an extra merge commit (if necessary to resolve merge issues), and the modified changes of client 2.


First, we create and setup repositories on three computers: server, client1 and client2. The server will hold the master repository (which manages the history, but nothing else). It is stored under ~/repos/big.bfsync. All computers will contain a checkout, so that the actual contents of the files can be kept there.
server:~$ mkdir repos

Create a directory on the server for the master repository.

server:~$ cd repos

Change dir.

server:~/repos$ bfsync init big.bfsync

Init master repo.

server:~/repos$ cd ~

Change dir.

server:~$ bfsync clone repos/big.bfsync

Clone repository on the server.

server:~$ mkdir big

Create mount point on the server.

server:~$ bfsyncfs big.bfsync big

Mount repository on the server.

client1:~$ bfsync clone server:repos/big.bfsync

Clone repository on client1.

client1:~$ mkdir big

Create mount point on client1.

client1:~$ bfsyncfs big.bfsync big

Mount repository on client1.

client2:~$ bfsync clone server:repos/big.bfsync

Clone repository on client2.

client2:~$ mkdir big

Create mount point on client2.

client2:~$ bfsyncfs big.bfsync big

Mount repository on client2.

As second step, we add a music file on client1. Of course it’s possible to add more files in one step; you can also use rsync, mc or a file manager to copy files into the repository. Whenever files are added or otherwise changed, we need to commit and push the changes to the server, so that it contains the canonical index of files.
client1:~$ cd big

Change dir.

client1:~/big$ cp ~/download/01-some-music.flac .

Copy a big file into the repository checkout.

client1:~/big$ bfsync commit

Commit the changes to the repository.

client1:~/big$ bfsync push

Push the changes to the server.

So far, we have added the file to the repository on client1, but the contents of the file are only present on client1, and not in the other repos. To change this, we can transfer the file to the server.
server:~$ cd big

Change directory.

server:~/big$ bfsync pull

Using pull is required on the server before we can transfer the file there. By pulling, the server will have the necessary information, or in other words: the server can know that a file named 01-some-music.flac is part of the bfsync repository and should be there. Running bfsync check will report one missing file after this step.

client1:~/big$ bfsync put server:big

Now the actual transfer: after this step, both client1 and server will have a copy of 01-some-music.flac.

As last step, we’ll transfer the file to client2. Of course we could use the same commands that we used to get the file to the server, but let’s assume that client2 is behind a firewall, and that it’s not possible to ssh to client2 directly. Fortunately, besides uploading files to another host (bfsync put), it’s also possible to download data from another host (bfsync get).
client2:~$ cd big

Change directory

client2:~/big$ bfsync pull

Update directory information.

client2:~/big$ bfsync get server:big

Get the file from the server.


Since bfsync implements file level deduplication and versioning of files, it can be used to do backups. Backups typically contain lots of files (like 5.000.000 files). Therefore you can only use a subset of the available commands for backups, since some commands do not work well if the number of files is that large. Currently, only commit and gc have been optimized for backup usage. It is likely that get, put, check and others will be supported for backups in the future. However, advanced functions like merges might never be supported for backups - for typical backup scenarios this is not an issue.

The first step for backups is to set up repositories. All steps should be done as root. For this example, we assume that our backup harddisk is mounted to /backup.
server:/backup$ bfsync init master

Setup master repository

server:/backup$ bfsync clone -u -c 500 master repo

Clone repository, ensure uid/gid are stored and set cache size.

The cache size is important for backups: if it is too small, the backup will take a lot more time. However, since the cache is stored in shared memory, a overly large cache may use too much of the system memory. As a rule of thumb, 100 megabytes of cache should be used for every 1.000.000 files that are stored in the backup. More is better, if you can afford it.
server:/backup$ mkdir mnt

Create mount point for the backup repository.

server:/backup$ bfsyncfs repo mnt

Mount repository.

server:/backup$ cd /backup/mnt

Change dir.

Now that everything is initialized, we can backup some data. For this example we backup /home.
server:/backup/mnt$ rsync -axH --delete /home/ home

Copy everything from /home to the backup. This is the initial backup, so all files will be copyied to the backup harddisk.

The rsync options we use here are -a to copy all file attributes, -x to exclude everything that is not on the filesystem that /home is on and -H to backup hardlinks as hardlinks. Using --delete deletes files in the target directory that are not in the source directory.
server:/backup/mnt$ bfsync commit -m "initial backup"

Snapshot current state, run deduplication.

server:/backup/mnt$ bfsync push

Push changes into the master repository. This is a precaution for the case that your repository gets damaged due to disk failure. Having the metadata stored twice can be used to recover your repository in that case (by cloning again for master using bfsync clone and reassembling the data files using bfsync collect).

We have the initial full backup. Now one day later, we only need to backup changes (which will be a lot faster than the initial backup), like this:
server:/backup/mnt$ rsync -axH --delete /home/ home

Copy changes from /home to the backup.

server:/backup/mnt$ bfsync commit -m "first incremental backup"

Snapshot current state, run deduplication.

server:/backup/mnt$ bfsync push

Push changes into the master repository.

Now, we’ve created the first incremental backup. This usually uses a lot less additional disk space than the initial full backup, since usually only few files will be changed. To access an individual backup, you can use
server:/backup/mnt$ cd /backup/mnt/.bfsync/commits/

Access a specific version. The version log can be viewed with bfsync log.

To automate the process, a script which runs the rsync and commit steps every night can be used. Removing the contents of old backups is currently not supported, but will be available in the future.

The commandline for creating a backup of the root filesystem is:
server:/backup/mnt$ rsync -axH --delete / root

Copy changes from / to the backup.

If you backup more than one filesystem every day, you only need to commit once, that is first rsync all filesystems and commit as last step.


The repository format is not (yet) stable across bfsync versions. In some cases upgrades can be done automatically by running
server:/path/to/files.bfsync$ bfsync upgrade

Automatically upgrade repository to current bfsync version.

This is the easiest way. If this way doesn’t work, you need to manually convert your old repositories to a new version. There is currently no easy way to preserve the history when manually updating from an old version of bfsync. But if you need only preserve the repository content, you can use the following steps:
install the old version and the new version of bfsync in parallel on
one machine

Use different prefixes to make this happen (configure --prefix=...).

create a new empty master repository and checkout

This will become your new repository & checkout.

copy all files from the old repository to the new repository

You’ll need to mount both, the old and new bfsync repostory using bfsyncfs. Copying can be done with a filemanager, cp -a or rsync. You need to copy everything except for the .bfsync directory.

commit and push in the new repository

You have a new repository now, conversion on this machine is finished.

To avoid retransfer if you have the data on other machines, the following steps can be used:
checkout the new master repository on a satellite system

Now you have a new bfsync repository, but the data is still missing

use bfsync collect to get your data into the new repository without

Since you already have a bfsync checkout on the satellite system, you can simply get the data from there, without retransfer. Since bfsync collect automatically detects whether it needs a file or not using the file contents, you can simply use bfsync collect /path/to/files.bfsync to get the data from your old checkout into the new repository.

Repeat these steps on all machines that contain checkouts of your repository. You can delete the old format checkouts after verifying with bfsync check that all files you need are there. You do not need to install old and new bfsync versions on the satellite systems. Only the new bfsync is required to perform the checkout & collect steps.


bfsyncfs.1 <>

Valid HTML 4.01 Transitional