BTRFS file deduplication with bedup

Bedup is a tool that can scan and deduplicate an existing btrfs filesystem.

In other words, it can find identical files on the filesystem and tell BTRFS they are the same – so you only store one copy of the data rather than N copies (kind of similar to how you could manually use hard links to de-duplicate files).

Anyway, installation of bedup is straight forward enough (see it’s README.md file, unfortunately it’s not packaged for Debian).

The only catch I found was that by default it doesn’t look to de-duplicate files smaller than 8Mb, and I needed to either call ‘sync’ or use the –flush argument if I’d only recently copied/created files on disk.

Random example of usage :

root@box:/test/wordpress# cp wordpress-4.5.2.tar.gz wordpress-4.5.2.tar.gz.copy
root@box:/test/wordpress# ls -l
....
-rw-r--r-- 1 root root 7770470 May 22 15:12 wordpress-4.5.2.tar.gz
-rw-r--r-- 1 root root 7770470 May 22 15:12 wordpress-4.5.2.tar.gz.copy

root@box:/test/wordpress# /usr/local/bin/bedup dedup /test/ --sizecutoff=1024 --flush
...
Scanning volume /test/ generations from 473678 to 473681, with size cutoff 1024
00.01 Scanned 81 retained 2
Deduplicating filesystem 
Deduplicated:
- '/test/wordpress/wordpress-4.5.2.tar.gz'
- '/test/wordpress/wordpress-4.5.2.tar.gz.copy'
00.19 Size group 1/1 (7770470) sampled 2 hashed 2 freed 7770470
00.02 Committing tracking state

Interesting bits :

  • “freed” is bytes, so we’ve “saved” ~7.5mb here.
  • “–size-cutoff=1024” – so we’re only going to de-duplicate files larger than 1kb.

2 Replies to “BTRFS file deduplication with bedup”

  1. Does it also dedup individual blocks ?
    That be nice really for vm images.

  2. bedup – is only at the file level as far as I understand it.

    A BTRFS snapshot would allow for more sharing (parts within a file) but I’m don’t know what chunk size is used.

Leave a Reply

Your email address will not be published. Required fields are marked *