Document piece length selection algorithm

Add a page to the book discussing factors in piece length selection, and
Intermodal's piece length selection algorithm.

type: documentation
pr: https://github.com/casey/intermodal/pull/392
fixes:
- https://github.com/casey/intermodal/issues/367
This commit is contained in:
Casey Rodarmor 2020-04-18 23:06:21 -07:00
parent 3ed449ce93
commit 09b0ee316c
No known key found for this signature in database
GPG Key ID: 556186B153EC6FE0
6 changed files with 262 additions and 21 deletions

View File

@ -4,7 +4,8 @@ Changelog
UNRELEASED - 2020-04-19 UNRELEASED - 2020-04-19
----------------------- -----------------------
- :books: [`xxxxxxxxxxxx`](https://github.com/casey/intermodal/commits/master) Generate reference sections with `bin/gen` - _Casey Rodarmor <casey@rodarmor.com>_ - :books: [`xxxxxxxxxxxx`](https://github.com/casey/intermodal/commits/master) Document piece length selection algorithm ([#392](https://github.com/casey/intermodal/pull/392)) - Fixes [#367](https://github.com/casey/intermodal/issues/367) - _Casey Rodarmor <casey@rodarmor.com>_
- :books: [`3ed449ce9325`](https://github.com/casey/intermodal/commit/3ed449ce932509ac88bd4837d74c9cbbb0729da9) Generate reference sections with `bin/gen` - _Casey Rodarmor <casey@rodarmor.com>_
- :art: [`a6bf75279181`](https://github.com/casey/intermodal/commit/a6bf7527918178821e080db10e65b057f427200d) Use `invariant` instead of `unwrap` and `expect` - Fixes [#167](https://github.com/casey/intermodal/issues/167) - _Casey Rodarmor <casey@rodarmor.com>_ - :art: [`a6bf75279181`](https://github.com/casey/intermodal/commit/a6bf7527918178821e080db10e65b057f427200d) Use `invariant` instead of `unwrap` and `expect` - Fixes [#167](https://github.com/casey/intermodal/issues/167) - _Casey Rodarmor <casey@rodarmor.com>_
- :white_check_mark: [`faf46c0f0e6f`](https://github.com/casey/intermodal/commit/faf46c0f0e6fd4e4f8b504d414a3bf02d7d68e4a) Test that globs match torrent contents - Fixes [#377](https://github.com/casey/intermodal/issues/377) - _Casey Rodarmor <casey@rodarmor.com>_ - :white_check_mark: [`faf46c0f0e6f`](https://github.com/casey/intermodal/commit/faf46c0f0e6fd4e4f8b504d414a3bf02d7d68e4a) Test that globs match torrent contents - Fixes [#377](https://github.com/casey/intermodal/issues/377) - _Casey Rodarmor <casey@rodarmor.com>_
- :books: [`0a754d0bcfcf`](https://github.com/casey/intermodal/commit/0a754d0bcfcfd65127d7b6e78d41852df78d3ea2) Add manual Arch install link - Fixes [#373](https://github.com/casey/intermodal/issues/373) - _Casey Rodarmor <casey@rodarmor.com>_ - :books: [`0a754d0bcfcf`](https://github.com/casey/intermodal/commit/0a754d0bcfcfd65127d7b6e78d41852df78d3ea2) Add manual Arch install link - Fixes [#373](https://github.com/casey/intermodal/issues/373) - _Casey Rodarmor <casey@rodarmor.com>_

View File

@ -6,9 +6,10 @@ Summary
{{commands}} {{commands}}
- [Bittorrent](./bittorrent.md) - [Bittorrent](./bittorrent.md)
- [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md) - [Piece Length Selection](./bittorrent/piece-length-selection.md)
- [BEP Support](./bittorrent/bep-support.md) - [BEP Support](./bittorrent/bep-support.md)
- [Metainfo Utilities](./bittorrent/metainfo-utilities.md) - [Metainfo Utilities](./bittorrent/metainfo-utilities.md)
- [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md)
- [UDP Tracker Protocol](./bittorrent/udp-tracker-protocol.md) - [UDP Tracker Protocol](./bittorrent/udp-tracker-protocol.md)
{{references}} {{references}}

View File

@ -15,9 +15,10 @@ Summary
- [`imdl torrent verify`](./commands/imdl-torrent-verify.md) - [`imdl torrent verify`](./commands/imdl-torrent-verify.md)
- [Bittorrent](./bittorrent.md) - [Bittorrent](./bittorrent.md)
- [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md) - [Piece Length Selection](./bittorrent/piece-length-selection.md)
- [BEP Support](./bittorrent/bep-support.md) - [BEP Support](./bittorrent/bep-support.md)
- [Metainfo Utilities](./bittorrent/metainfo-utilities.md) - [Metainfo Utilities](./bittorrent/metainfo-utilities.md)
- [Distributing Large Data Sets](./bittorrent/distributing-large-data-sets.md)
- [UDP Tracker Protocol](./bittorrent/udp-tracker-protocol.md) - [UDP Tracker Protocol](./bittorrent/udp-tracker-protocol.md)
- [References](./references.md) - [References](./references.md)

View File

@ -0,0 +1,127 @@
BitTorrent Piece Length Selection
=================================
BitTorrent `.torrent` files contain so-called metainfo that allows BitTorrent
peers to locate, download, and verify the contents of a torrent.
This metainfo includes the piece list, a list of SHA-1 hashes of fixed-size
pieces of the torrent data. The size of these pieces is chosen by the torrent
creator.
Intermodal has a simple algorithm that attempts to pick a reasonable piece
length for a torrent given the size of the contents.
For compatibility with the
[BitTorrent v2 specification](http://bittorrent.org/beps/bep_0052.html), the
algorithm chooses piece lengths that are powers of two, and that are at least
16KiB.
The maximum automatically chosen piece length is 16MiB, as piece lengths larger
than 16MiB have been reported to cause issues for some clients.
In addition to the above constraints, there are a number of additional factors
to consider.
Factors favoring smaller piece length
-------------------------------------
- To avoid uploading bad data, peers only upload data from full pieces, which
can be verified by hash. Decreasing the piece size allows peers to more
quickly obtain a full piece, which decreases the time before they begin
uploading, and receiving data in return.
- Decreasing the piece size decreases the amount of data that must be thrown
away in case of corruption.
Factors favoring larger piece length
------------------------------------
- Increasing the piece size decreases the protocol overhead from requesting
many pieces.
- Increasing the piece size decreases the number of pieces, decreasing the
size of the metainfo.
- Increasing piece length increases the proportion of disk seeks to disk
reads, which can be beneficial for spinning disks.
Intermodal's Algorithm
----------------------
In Python, the algorithm used by intermodal is:
```python
MIN = 16 * 1024
MAX = 16 * 1024 * 1024
def piece_length(content_length):
exponent = math.log2(content_length)
length = 1 << int((exponent / 2 + 4))
return min(max(length, MIN), MAX)
```
Which gives the following piece lengths:
```
Content -> Piece Length x Count = Piece List Size
16 KiB -> 16 KiB x 1 = 20 bytes
32 KiB -> 16 KiB x 2 = 40 bytes
64 KiB -> 16 KiB x 4 = 80 bytes
128 KiB -> 16 KiB x 8 = 160 bytes
256 KiB -> 16 KiB x 16 = 320 bytes
512 KiB -> 16 KiB x 32 = 640 bytes
1 MiB -> 16 KiB x 64 = 1.25 KiB
2 MiB -> 16 KiB x 128 = 2.5 KiB
4 MiB -> 32 KiB x 128 = 2.5 KiB
8 MiB -> 32 KiB x 256 = 5 KiB
16 MiB -> 64 KiB x 256 = 5 KiB
32 MiB -> 64 KiB x 512 = 10 KiB
64 MiB -> 128 KiB x 512 = 10 KiB
128 MiB -> 128 KiB x 1024 = 20 KiB
256 MiB -> 256 KiB x 1024 = 20 KiB
512 MiB -> 256 KiB x 2048 = 40 KiB
1 GiB -> 512 KiB x 2048 = 40 KiB
2 GiB -> 512 KiB x 4096 = 80 KiB
4 GiB -> 1 MiB x 4096 = 80 KiB
8 GiB -> 1 MiB x 8192 = 160 KiB
16 GiB -> 2 MiB x 8192 = 160 KiB
32 GiB -> 2 MiB x 16384 = 320 KiB
64 GiB -> 4 MiB x 16384 = 320 KiB
128 GiB -> 4 MiB x 32768 = 640 KiB
256 GiB -> 8 MiB x 32768 = 640 KiB
512 GiB -> 8 MiB x 65536 = 1.25 MiB
1 TiB -> 16 MiB x 65536 = 1.25 MiB
2 TiB -> 16 MiB x 131072 = 2.5 MiB
4 TiB -> 16 MiB x 262144 = 5 MiB
8 TiB -> 16 MiB x 524288 = 10 MiB
16 TiB -> 16 MiB x 1048576 = 20 MiB
32 TiB -> 16 MiB x 2097152 = 40 MiB
64 TiB -> 16 MiB x 4194304 = 80 MiB
128 TiB -> 16 MiB x 8388608 = 160 MiB
256 TiB -> 16 MiB x 16777216 = 320 MiB
512 TiB -> 16 MiB x 33554432 = 640 MiB
1 PiB -> 16 MiB x 67108864 = 1.25 GiB
```
References
----------
### Articles
- [Vuze Wiki](https://wiki.vuze.com/w/Torrent_Piece_Size)
- [TorrentFreak](https://torrentfreak.com/how-to-make-the-best-torrents-081121/)
### Implementations
- [libtorrent](https://github.com/arvidn/libtorrent/blob/a3440e54bb7f65ac6100c3d993c53f887025d660/src/create_torrent.cpp#L367)
- [libtransmission](https://github.com/transmission/transmission/blob/a482100f0cbae8050fd7e954af2cb1311205916e/libtransmission/makemeta.c#L89)
- [dottorrent](https://github.com/kz26/dottorrent/blob/fea5714efe0cde2a55eabfb387295781a78d84bb/dottorrent/__init__.py#L154)
- [Torrent File Editor](https://github.com/torrent-file-editor/torrent-file-editor/blob/811e401b38f26b6d94c4808c54ae2dcc7bbc27dd/mainwindow.cpp#L1210)

View File

@ -0,0 +1,127 @@
Piece Length Selection
======================
BitTorrent `.torrent` files contain so-called metainfo that allows BitTorrent
peers to locate, download, and verify the contents of a torrent.
This metainfo includes the piece list, a list of SHA-1 hashes of fixed-size
pieces of the torrent data. The size of these pieces is chosen by the torrent
creator.
Intermodal has a simple algorithm that attempts to pick a reasonable piece
length for a torrent given the size of the contents.
For compatibility with the
[BitTorrent v2 specification](http://bittorrent.org/beps/bep_0052.html), the
algorithm chooses piece lengths that are powers of two, and that are at least
16 KiB.
The maximum automatically chosen piece length is 16 MiB, as piece lengths larger
than 16 MiB have been reported to cause issues for some clients.
In addition to the above constraints, there are a number of additional factors
to consider.
Factors favoring smaller piece length
-------------------------------------
- To avoid uploading bad data, peers only upload data from full pieces, which
can be verified by hash. Decreasing the piece size allows peers to more
quickly obtain a full piece, which decreases the time before they begin
uploading, and receiving data in return.
- Decreasing the piece size decreases the amount of data that must be thrown
away in case of corruption.
Factors favoring larger piece length
------------------------------------
- Increasing the piece size decreases the protocol overhead from requesting
many pieces.
- Increasing the piece size decreases the number of pieces, decreasing the
size of torrent metainfo.
- Increasing piece length increases the proportion of disk seeks to disk
reads, which can be beneficial for spinning disks.
Intermodal's Algorithm
----------------------
In Python, the algorithm used by intermodal is:
```python
MIN = 16 * 1024
MAX = 16 * 1024 * 1024
def piece_length(content_length):
exponent = math.log2(content_length)
length = 1 << int((exponent / 2 + 4))
return min(max(length, MIN), MAX)
```
Which gives the following piece lengths:
```
Content -> Piece Length x Count = Piece List Size
16 KiB -> 16 KiB x 1 = 20 bytes
32 KiB -> 16 KiB x 2 = 40 bytes
64 KiB -> 16 KiB x 4 = 80 bytes
128 KiB -> 16 KiB x 8 = 160 bytes
256 KiB -> 16 KiB x 16 = 320 bytes
512 KiB -> 16 KiB x 32 = 640 bytes
1 MiB -> 16 KiB x 64 = 1.25 KiB
2 MiB -> 16 KiB x 128 = 2.5 KiB
4 MiB -> 32 KiB x 128 = 2.5 KiB
8 MiB -> 32 KiB x 256 = 5 KiB
16 MiB -> 64 KiB x 256 = 5 KiB
32 MiB -> 64 KiB x 512 = 10 KiB
64 MiB -> 128 KiB x 512 = 10 KiB
128 MiB -> 128 KiB x 1024 = 20 KiB
256 MiB -> 256 KiB x 1024 = 20 KiB
512 MiB -> 256 KiB x 2048 = 40 KiB
1 GiB -> 512 KiB x 2048 = 40 KiB
2 GiB -> 512 KiB x 4096 = 80 KiB
4 GiB -> 1 MiB x 4096 = 80 KiB
8 GiB -> 1 MiB x 8192 = 160 KiB
16 GiB -> 2 MiB x 8192 = 160 KiB
32 GiB -> 2 MiB x 16384 = 320 KiB
64 GiB -> 4 MiB x 16384 = 320 KiB
128 GiB -> 4 MiB x 32768 = 640 KiB
256 GiB -> 8 MiB x 32768 = 640 KiB
512 GiB -> 8 MiB x 65536 = 1.25 MiB
1 TiB -> 16 MiB x 65536 = 1.25 MiB
2 TiB -> 16 MiB x 131072 = 2.5 MiB
4 TiB -> 16 MiB x 262144 = 5 MiB
8 TiB -> 16 MiB x 524288 = 10 MiB
16 TiB -> 16 MiB x 1048576 = 20 MiB
32 TiB -> 16 MiB x 2097152 = 40 MiB
64 TiB -> 16 MiB x 4194304 = 80 MiB
128 TiB -> 16 MiB x 8388608 = 160 MiB
256 TiB -> 16 MiB x 16777216 = 320 MiB
512 TiB -> 16 MiB x 33554432 = 640 MiB
1 PiB -> 16 MiB x 67108864 = 1.25 GiB
```
References
----------
### Articles
- [Vuze Wiki](https://wiki.vuze.com/w/Torrent_Piece_Size)
- [TorrentFreak](https://torrentfreak.com/how-to-make-the-best-torrents-081121/)
### Implementations
- [libtorrent](https://github.com/arvidn/libtorrent/blob/a3440e54bb7f65ac6100c3d993c53f887025d660/src/create_torrent.cpp#L367)
- [libtransmission](https://github.com/transmission/transmission/blob/a482100f0cbae8050fd7e954af2cb1311205916e/libtransmission/makemeta.c#L89)
- [dottorrent](https://github.com/kz26/dottorrent/blob/fea5714efe0cde2a55eabfb387295781a78d84bb/dottorrent/__init__.py#L154)
- [Torrent File Editor](https://github.com/torrent-file-editor/torrent-file-editor/blob/811e401b38f26b6d94c4808c54ae2dcc7bbc27dd/mainwindow.cpp#L1210)

View File

@ -1,21 +1,5 @@
// The piece length picker attempts to pick a reasonable piece length //! See [the book](https://imdl.io/book/bittorrent/piece-length.html) for more
// for a torrent given the size of the torrent's contents. //! information on Intermodal's automatic piece length selection algorithm.
//
// Constraints:
// - Decreasing piece length increases protocol overhead.
// - Decreasing piece length increases torrent metainfo size.
// - Increasing piece length increases the amount of data that must be thrown
// away in case of corruption.
// - Increasing piece length increases the amount of data that must be
// downloaded before it can be verified and uploaded to other peers.
// - Decreasing piece length increases the proportion of disk seeks to disk
// reads. This can be an issue for spinning disks.
// - The BitTorrent v2 specification requires that piece sizes be larger than 16
// KiB.
//
// These constraints could probably be exactly defined and optimized
// using an integer programming solver, but instead we just copy what
// libtorrent does.
use crate::common::*; use crate::common::*;