From 09b0ee316c034848c3b50966e7b5e3ed720aef2b Mon Sep 17 00:00:00 2001 From: Casey Rodarmor Date: Sat, 18 Apr 2020 23:06:21 -0700 Subject: [PATCH] Document piece length selection algorithm Add a page to the book discussing factors in piece length selection, and Intermodal's piece length selection algorithm. type: documentation pr: https://github.com/casey/intermodal/pull/392 fixes: - https://github.com/casey/intermodal/issues/367 --- CHANGELOG.md | 3 +- bin/gen/templates/SUMMARY.md | 3 +- book/src/SUMMARY.md | 3 +- book/src/bittorrent/piece-length-selection.md | 127 ++++++++++++++++++ book/src/bittorrent/piece-length.md | 127 ++++++++++++++++++ src/piece_length_picker.rs | 20 +-- 6 files changed, 262 insertions(+), 21 deletions(-) create mode 100644 book/src/bittorrent/piece-length-selection.md create mode 100644 book/src/bittorrent/piece-length.md diff --git a/CHANGELOG.md b/CHANGELOG.md index a0b75de..ab2825f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,8 @@ Changelog UNRELEASED - 2020-04-19 ----------------------- -- :books: [`xxxxxxxxxxxx`](https://github.com/casey/intermodal/commits/master) Generate reference sections with `bin/gen` - _Casey Rodarmor _ +- :books: [`xxxxxxxxxxxx`](https://github.com/casey/intermodal/commits/master) Document piece length selection algorithm ([#392](https://github.com/casey/intermodal/pull/392)) - Fixes [#367](https://github.com/casey/intermodal/issues/367) - _Casey Rodarmor _ +- :books: [`3ed449ce9325`](https://github.com/casey/intermodal/commit/3ed449ce932509ac88bd4837d74c9cbbb0729da9) Generate reference sections with `bin/gen` - _Casey Rodarmor _ - :art: [`a6bf75279181`](https://github.com/casey/intermodal/commit/a6bf7527918178821e080db10e65b057f427200d) Use `invariant` instead of `unwrap` and `expect` - Fixes [#167](https://github.com/casey/intermodal/issues/167) - _Casey Rodarmor _ - :white_check_mark: [`faf46c0f0e6f`](https://github.com/casey/intermodal/commit/faf46c0f0e6fd4e4f8b504d414a3bf02d7d68e4a) Test that globs match torrent contents - Fixes [#377](https://github.com/casey/intermodal/issues/377) - _Casey Rodarmor _ - :books: [`0a754d0bcfcf`](https://github.com/casey/intermodal/commit/0a754d0bcfcfd65127d7b6e78d41852df78d3ea2) Add manual Arch install link - Fixes [#373](https://github.com/casey/intermodal/issues/373) - _Casey Rodarmor _ diff --git a/bin/gen/templates/SUMMARY.md b/bin/gen/templates/SUMMARY.md index f122fd9..3f8e667 100644 --- a/bin/gen/templates/SUMMARY.md +++ b/bin/gen/templates/SUMMARY.md @@ -6,9 +6,10 @@ Summary {{commands}} - [Bittorrent](./bittorrent.md) - - [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md) + - [Piece Length Selection](./bittorrent/piece-length-selection.md) - [BEP Support](./bittorrent/bep-support.md) - [Metainfo Utilities](./bittorrent/metainfo-utilities.md) + - [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md) - [UDP Tracker Protocol](./bittorrent/udp-tracker-protocol.md) {{references}} diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 2759636..36bc507 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -15,9 +15,10 @@ Summary - [`imdl torrent verify`](./commands/imdl-torrent-verify.md) - [Bittorrent](./bittorrent.md) - - [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md) + - [Piece Length Selection](./bittorrent/piece-length-selection.md) - [BEP Support](./bittorrent/bep-support.md) - [Metainfo Utilities](./bittorrent/metainfo-utilities.md) + - [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md) - [UDP Tracker Protocol](./bittorrent/udp-tracker-protocol.md) - [References](./references.md) diff --git a/book/src/bittorrent/piece-length-selection.md b/book/src/bittorrent/piece-length-selection.md new file mode 100644 index 0000000..e69511c --- /dev/null +++ b/book/src/bittorrent/piece-length-selection.md @@ -0,0 +1,127 @@ +BitTorrent Piece Length Selection +================================= + +BitTorrent `.torrent` files contain so-called metainfo that allows BitTorrent +peers to locate, download, and verify the contents of a torrent. + +This metainfo includes the piece list, a list of SHA-1 hashes of fixed-size +pieces of the torrent data. The size of these pieces is chosen by the torrent +creator. + +Intermodal has a simple algorithm that attempts to pick a reasonable piece +length for a torrent given the size of the contents. + +For compatibility with the +[BitTorrent v2 specification](http://bittorrent.org/beps/bep_0052.html), the +algorithm chooses piece lengths that are powers of two, and that are at least +16KiB. + +The maximum automatically chosen piece length is 16MiB, as piece lengths larger +than 16MiB have been reported to cause issues for some clients. + +In addition to the above constraints, there are a number of additional factors +to consider. + + +Factors favoring smaller piece length +------------------------------------- + +- To avoid uploading bad data, peers only upload data from full pieces, which + can be verified by hash. Decreasing the piece size allows peers to more + quickly obtain a full piece, which decreases the time before they begin + uploading, and receiving data in return. + +- Decreasing the piece size decreases the amount of data that must be thrown + away in case of corruption. + + +Factors favoring larger piece length +------------------------------------ + +- Increasing the piece size decreases the protocol overhead from requesting + many pieces. + +- Increasing the piece size decreases the number of pieces, decreasing the + size of the metainfo. + +- Increasing piece length increases the proportion of disk seeks to disk + reads, which can be beneficial for spinning disks. + + +Intermodal's Algorithm +---------------------- + +In Python, the algorithm used by intermodal is: + +```python +MIN = 16 * 1024 +MAX = 16 * 1024 * 1024 + +def piece_length(content_length): + exponent = math.log2(content_length) + length = 1 << int((exponent / 2 + 4)) + return min(max(length, MIN), MAX) +``` + +Which gives the following piece lengths: + +``` +Content -> Piece Length x Count = Piece List Size +16 KiB -> 16 KiB x 1 = 20 bytes +32 KiB -> 16 KiB x 2 = 40 bytes +64 KiB -> 16 KiB x 4 = 80 bytes +128 KiB -> 16 KiB x 8 = 160 bytes +256 KiB -> 16 KiB x 16 = 320 bytes +512 KiB -> 16 KiB x 32 = 640 bytes +1 MiB -> 16 KiB x 64 = 1.25 KiB +2 MiB -> 16 KiB x 128 = 2.5 KiB +4 MiB -> 32 KiB x 128 = 2.5 KiB +8 MiB -> 32 KiB x 256 = 5 KiB +16 MiB -> 64 KiB x 256 = 5 KiB +32 MiB -> 64 KiB x 512 = 10 KiB +64 MiB -> 128 KiB x 512 = 10 KiB +128 MiB -> 128 KiB x 1024 = 20 KiB +256 MiB -> 256 KiB x 1024 = 20 KiB +512 MiB -> 256 KiB x 2048 = 40 KiB +1 GiB -> 512 KiB x 2048 = 40 KiB +2 GiB -> 512 KiB x 4096 = 80 KiB +4 GiB -> 1 MiB x 4096 = 80 KiB +8 GiB -> 1 MiB x 8192 = 160 KiB +16 GiB -> 2 MiB x 8192 = 160 KiB +32 GiB -> 2 MiB x 16384 = 320 KiB +64 GiB -> 4 MiB x 16384 = 320 KiB +128 GiB -> 4 MiB x 32768 = 640 KiB +256 GiB -> 8 MiB x 32768 = 640 KiB +512 GiB -> 8 MiB x 65536 = 1.25 MiB +1 TiB -> 16 MiB x 65536 = 1.25 MiB +2 TiB -> 16 MiB x 131072 = 2.5 MiB +4 TiB -> 16 MiB x 262144 = 5 MiB +8 TiB -> 16 MiB x 524288 = 10 MiB +16 TiB -> 16 MiB x 1048576 = 20 MiB +32 TiB -> 16 MiB x 2097152 = 40 MiB +64 TiB -> 16 MiB x 4194304 = 80 MiB +128 TiB -> 16 MiB x 8388608 = 160 MiB +256 TiB -> 16 MiB x 16777216 = 320 MiB +512 TiB -> 16 MiB x 33554432 = 640 MiB +1 PiB -> 16 MiB x 67108864 = 1.25 GiB +``` + + +References +---------- + +### Articles + +- [Vuze Wiki](https://wiki.vuze.com/w/Torrent_Piece_Size) + +- [TorrentFreak](https://torrentfreak.com/how-to-make-the-best-torrents-081121/) + +### Implementations + +- [libtorrent](https://github.com/arvidn/libtorrent/blob/a3440e54bb7f65ac6100c3d993c53f887025d660/src/create_torrent.cpp#L367) + +- [libtransmission](https://github.com/transmission/transmission/blob/a482100f0cbae8050fd7e954af2cb1311205916e/libtransmission/makemeta.c#L89) + +- [dottorrent](https://github.com/kz26/dottorrent/blob/fea5714efe0cde2a55eabfb387295781a78d84bb/dottorrent/__init__.py#L154) + +- [Torrent File Editor](https://github.com/torrent-file-editor/torrent-file-editor/blob/811e401b38f26b6d94c4808c54ae2dcc7bbc27dd/mainwindow.cpp#L1210) diff --git a/book/src/bittorrent/piece-length.md b/book/src/bittorrent/piece-length.md new file mode 100644 index 0000000..620cdac --- /dev/null +++ b/book/src/bittorrent/piece-length.md @@ -0,0 +1,127 @@ +Piece Length Selection +====================== + +BitTorrent `.torrent` files contain so-called metainfo that allows BitTorrent +peers to locate, download, and verify the contents of a torrent. + +This metainfo includes the piece list, a list of SHA-1 hashes of fixed-size +pieces of the torrent data. The size of these pieces is chosen by the torrent +creator. + +Intermodal has a simple algorithm that attempts to pick a reasonable piece +length for a torrent given the size of the contents. + +For compatibility with the +[BitTorrent v2 specification](http://bittorrent.org/beps/bep_0052.html), the +algorithm chooses piece lengths that are powers of two, and that are at least +16 KiB. + +The maximum automatically chosen piece length is 16 MiB, as piece lengths larger +than 16 MiB have been reported to cause issues for some clients. + +In addition to the above constraints, there are a number of additional factors +to consider. + + +Factors favoring smaller piece length +------------------------------------- + +- To avoid uploading bad data, peers only upload data from full pieces, which + can be verified by hash. Decreasing the piece size allows peers to more + quickly obtain a full piece, which decreases the time before they begin + uploading, and receiving data in return. + +- Decreasing the piece size decreases the amount of data that must be thrown + away in case of corruption. + + +Factors favoring larger piece length +------------------------------------ + +- Increasing the piece size decreases the protocol overhead from requesting + many pieces. + +- Increasing the piece size decreases the number of pieces, decreasing the + size of torrent metainfo. + +- Increasing piece length increases the proportion of disk seeks to disk + reads, which can be beneficial for spinning disks. + + +Intermodal's Algorithm +---------------------- + +In Python, the algorithm used by intermodal is: + +```python +MIN = 16 * 1024 +MAX = 16 * 1024 * 1024 + +def piece_length(content_length): + exponent = math.log2(content_length) + length = 1 << int((exponent / 2 + 4)) + return min(max(length, MIN), MAX) +``` + +Which gives the following piece lengths: + +``` +Content -> Piece Length x Count = Piece List Size +16 KiB -> 16 KiB x 1 = 20 bytes +32 KiB -> 16 KiB x 2 = 40 bytes +64 KiB -> 16 KiB x 4 = 80 bytes +128 KiB -> 16 KiB x 8 = 160 bytes +256 KiB -> 16 KiB x 16 = 320 bytes +512 KiB -> 16 KiB x 32 = 640 bytes +1 MiB -> 16 KiB x 64 = 1.25 KiB +2 MiB -> 16 KiB x 128 = 2.5 KiB +4 MiB -> 32 KiB x 128 = 2.5 KiB +8 MiB -> 32 KiB x 256 = 5 KiB +16 MiB -> 64 KiB x 256 = 5 KiB +32 MiB -> 64 KiB x 512 = 10 KiB +64 MiB -> 128 KiB x 512 = 10 KiB +128 MiB -> 128 KiB x 1024 = 20 KiB +256 MiB -> 256 KiB x 1024 = 20 KiB +512 MiB -> 256 KiB x 2048 = 40 KiB +1 GiB -> 512 KiB x 2048 = 40 KiB +2 GiB -> 512 KiB x 4096 = 80 KiB +4 GiB -> 1 MiB x 4096 = 80 KiB +8 GiB -> 1 MiB x 8192 = 160 KiB +16 GiB -> 2 MiB x 8192 = 160 KiB +32 GiB -> 2 MiB x 16384 = 320 KiB +64 GiB -> 4 MiB x 16384 = 320 KiB +128 GiB -> 4 MiB x 32768 = 640 KiB +256 GiB -> 8 MiB x 32768 = 640 KiB +512 GiB -> 8 MiB x 65536 = 1.25 MiB +1 TiB -> 16 MiB x 65536 = 1.25 MiB +2 TiB -> 16 MiB x 131072 = 2.5 MiB +4 TiB -> 16 MiB x 262144 = 5 MiB +8 TiB -> 16 MiB x 524288 = 10 MiB +16 TiB -> 16 MiB x 1048576 = 20 MiB +32 TiB -> 16 MiB x 2097152 = 40 MiB +64 TiB -> 16 MiB x 4194304 = 80 MiB +128 TiB -> 16 MiB x 8388608 = 160 MiB +256 TiB -> 16 MiB x 16777216 = 320 MiB +512 TiB -> 16 MiB x 33554432 = 640 MiB +1 PiB -> 16 MiB x 67108864 = 1.25 GiB +``` + + +References +---------- + +### Articles + +- [Vuze Wiki](https://wiki.vuze.com/w/Torrent_Piece_Size) + +- [TorrentFreak](https://torrentfreak.com/how-to-make-the-best-torrents-081121/) + +### Implementations + +- [libtorrent](https://github.com/arvidn/libtorrent/blob/a3440e54bb7f65ac6100c3d993c53f887025d660/src/create_torrent.cpp#L367) + +- [libtransmission](https://github.com/transmission/transmission/blob/a482100f0cbae8050fd7e954af2cb1311205916e/libtransmission/makemeta.c#L89) + +- [dottorrent](https://github.com/kz26/dottorrent/blob/fea5714efe0cde2a55eabfb387295781a78d84bb/dottorrent/__init__.py#L154) + +- [Torrent File Editor](https://github.com/torrent-file-editor/torrent-file-editor/blob/811e401b38f26b6d94c4808c54ae2dcc7bbc27dd/mainwindow.cpp#L1210) diff --git a/src/piece_length_picker.rs b/src/piece_length_picker.rs index 1b4ce76..c26dd74 100644 --- a/src/piece_length_picker.rs +++ b/src/piece_length_picker.rs @@ -1,21 +1,5 @@ -// The piece length picker attempts to pick a reasonable piece length -// for a torrent given the size of the torrent's contents. -// -// Constraints: -// - Decreasing piece length increases protocol overhead. -// - Decreasing piece length increases torrent metainfo size. -// - Increasing piece length increases the amount of data that must be thrown -// away in case of corruption. -// - Increasing piece length increases the amount of data that must be -// downloaded before it can be verified and uploaded to other peers. -// - Decreasing piece length increases the proportion of disk seeks to disk -// reads. This can be an issue for spinning disks. -// - The BitTorrent v2 specification requires that piece sizes be larger than 16 -// KiB. -// -// These constraints could probably be exactly defined and optimized -// using an integer programming solver, but instead we just copy what -// libtorrent does. +//! See [the book](https://imdl.io/book/bittorrent/piece-length.html) for more +//! information on Intermodal's automatic piece length selection algorithm. use crate::common::*;