doc(step-by-step): readability improvements (#820)
This diff contains readability improvements for the step-by-step design document. Co-authored-by: Simone Basso <bassosimone@gmail.com>
This commit is contained in:
parent
797dd27ffc
commit
74aebedac3
|
@ -1,16 +1,19 @@
|
||||||
# Step-by-step measurements
|
# Step-by-step measurements
|
||||||
|
|
||||||
| | |
|
| | |
|
||||||
|:-------------|:-------------|
|
|--------------|------------------------------------------------|
|
||||||
| Author | [@bassosimone](https://github.com/bassosimone) |
|
| Author | [@bassosimone](https://github.com/bassosimone) |
|
||||||
| Last-Updated | 2022-06-13 |
|
| Last-Updated | 2022-06-29 |
|
||||||
| Reviewed-by | [@hellais](https://github.com/hellais) |
|
| Reviewed-by | [@hellais](https://github.com/hellais) |
|
||||||
| Reviewed-by | [@DecFox](https://github.com/DecFox/) |
|
| Reviewed-by | [@DecFox](https://github.com/DecFox/) |
|
||||||
| Status | approved |
|
| Reviewed-by | [@ainghazal](https://github.com/ainghazal/) |
|
||||||
| Obsoletes | [dd-002-netx.md](dd-002-netx.md) |
|
| Status | approved |
|
||||||
|
| Obsoletes | [dd-002-netx.md](dd-002-netx.md) |
|
||||||
|
|
||||||
*Abstract.* The original [netx design document](dd-002-netx.md) is now two
|
## Abstract
|
||||||
years old. Since we wrote such a document, we amended the overall design
|
|
||||||
|
The original [netx design document](dd-002-netx.md) is now two
|
||||||
|
years old. Since we wrote that document, we amended the overall design
|
||||||
several times. The four major design changes were:
|
several times. The four major design changes were:
|
||||||
|
|
||||||
1. saving rather than emitting
|
1. saving rather than emitting
|
||||||
|
@ -24,50 +27,56 @@ pattern [ooni/probe-engine#522](https://github.com/ooni/probe-engine/pull/522);
|
||||||
4. measurex [ooni/probe-cli#528](https://github.com/ooni/probe-cli/pull/528).
|
4. measurex [ooni/probe-cli#528](https://github.com/ooni/probe-cli/pull/528).
|
||||||
|
|
||||||
In this (long) design document, we will revisit the original problem proposed by
|
In this (long) design document, we will revisit the original problem proposed by
|
||||||
[df-002-netx.md](df-002-netx.md), in light of what we changed and learned from the
|
[dd-002-netx.md](dd-002-netx.md), in light of all the changes and lessons
|
||||||
changes we applied. We will highlight the significant pain points of the current
|
learned since then. We will highlight the significant pain points of the
|
||||||
implementation, which are the following:
|
current implementation, which are the following:
|
||||||
|
|
||||||
1. that the measurement library API is significantly different from the Go stdlib
|
1. The measurement library API is significantly different from the Go stdlib
|
||||||
API, therefore violating this original `netx` design goal: that writing a new
|
API. This violates one of the central design goals for `netx`: that writing a new
|
||||||
experiment means using slightly different constructors that deviate from the standard
|
experiment would involve using constructors very similar to the standard
|
||||||
library only to meet specific measurement goals we have;
|
library. Such deviations were supposed to be made only to meet our specific
|
||||||
|
measurement goals;
|
||||||
|
|
||||||
2. that the decorator pattern leads to complexity in creating measurement types,
|
2. The decorator pattern has lead to complexity in creating measurement types,
|
||||||
which in turn seems to be the reason why the previous issue exists;
|
which in turn seems to be the reason why the previous issue exists;
|
||||||
|
|
||||||
3. that the decorator pattern does not allow us to precisely collect all the
|
3. The decorator pattern does not allow us to precisely collect all the
|
||||||
data that matters for events such as TCP connect and DNS round trips using
|
data that matters for certain events (such as TCP connect and DNS round trips using
|
||||||
a custom transport, thus suggesting that we should revisit our choice of using
|
a custom transport). This suggests that we should revisit our choice of using
|
||||||
decorators and revert back to some form of _constructor based injection_ to
|
decorators, and revert back to some form of _constructor based injection_ to
|
||||||
inject a data type suitable for saving events.
|
inject a data type suitable for saving events.
|
||||||
|
|
||||||
In doing that, we will also propose an incremental plan for moving the tree
|
Finally, this document also proposes an incremental plan for moving the tree
|
||||||
forward from [the current state](https://github.com/ooni/probe-cli/tree/1685ef75b5a6a0025a1fd671625b27ee989ef111)
|
forward from [the current state](https://github.com/ooni/probe-cli/tree/1685ef75b5a6a0025a1fd671625b27ee989ef111)
|
||||||
to a state where complexity is transferred from the measurement-support library to
|
to a state in which the complexity has been transferred from the
|
||||||
the implementation of each individual network experiment.
|
measurement-support library to the implementation of each individual network
|
||||||
|
experiment.
|
||||||
|
|
||||||
## Index
|
## Index
|
||||||
|
|
||||||
In [netxlite: the underlying library](#netxlite-the-underlying-network-library) we
|
There are four main sections in this document:
|
||||||
describe the current design of the underlying network library.
|
|
||||||
|
|
||||||
In [measurement tactics](#measurement-tactics) we provide an historical perspective
|
[1. Netxlite: the underlying library](#1-netxlite-the-underlying-network-library)
|
||||||
on the measurement tactics, we adopted or just tried.
|
describes the current design of the underlying network library.
|
||||||
|
|
||||||
The [step-by-step refactoring proposal](#step-by-step-refactoring-proposal)
|
[2. Measurement tactics](#2-measurement-tactics) gives an historical perspective
|
||||||
contains the main contribution of this design document: a proposal to refactor
|
on different measurement tactics we adopted or tried in the past, and reflects
|
||||||
the existing codebase to address our current measurement-code problems.
|
on their merits and downsides.
|
||||||
|
|
||||||
The [reviews](#reviews) section contains information about reviews.
|
[3. Step-by-step refactoring proposal](#3-step-by-step-refactoring-proposal)
|
||||||
|
contains the main contribution of this design document: a concrete proposal to
|
||||||
|
refactor the existing codebase to address our current measurement-code
|
||||||
|
problems.
|
||||||
|
|
||||||
## netxlite: the underlying network library
|
[4. Document reviews](#4-document-reviews) contains information about reviews of this document.
|
||||||
|
|
||||||
This section describes `netxlite`, the underlying network library, from a
|
## 1. netxlite: the underlying network library
|
||||||
|
|
||||||
|
This section describes `netxlite`, the underlying network library, from an
|
||||||
historical perspective. We explain our data collection needs, and what types
|
historical perspective. We explain our data collection needs, and what types
|
||||||
from the standard library we're using as patterns.
|
from the standard library we're using as patterns.
|
||||||
|
|
||||||
### Measurement Observations
|
### 1.1. Measurement Observations
|
||||||
|
|
||||||
Most OONI experiments need to observe and give meaning to these events:
|
Most OONI experiments need to observe and give meaning to these events:
|
||||||
|
|
||||||
|
@ -121,7 +130,7 @@ defines how we archive experiment results as a set of observations.
|
||||||
(Orthogonally, we may also want to improve the data format, but this is
|
(Orthogonally, we may also want to improve the data format, but this is
|
||||||
not under discussion now.)
|
not under discussion now.)
|
||||||
|
|
||||||
### Error Wrapping
|
### 1.2. Error Wrapping
|
||||||
|
|
||||||
The OONI data format also specifies [how we should represent
|
The OONI data format also specifies [how we should represent
|
||||||
errors](https://github.com/ooni/spec/blob/master/data-formats/df-007-errors.md).
|
errors](https://github.com/ooni/spec/blob/master/data-formats/df-007-errors.md).
|
||||||
|
@ -140,7 +149,7 @@ DNS response messages or the syscall error returned by a Read call. By
|
||||||
adding this, we would give those who analyze the data information to
|
adding this, we would give those who analyze the data information to
|
||||||
evaluate the correctness of a measurement.
|
evaluate the correctness of a measurement.
|
||||||
|
|
||||||
### Go Stdlib
|
### 1.3 Go Stdlib
|
||||||
|
|
||||||
The Go standard library provides the following structs and interfaces
|
The Go standard library provides the following structs and interfaces
|
||||||
that we can use for measuring:
|
that we can use for measuring:
|
||||||
|
@ -235,7 +244,7 @@ Apart from the stdlib and quic-go, the only other significant network code
|
||||||
dependency is [miekg/dns](https://github.com/miekg/dns)
|
dependency is [miekg/dns](https://github.com/miekg/dns)
|
||||||
for custom DNS resolvers (e.g., DNS-over-HTTPS).
|
for custom DNS resolvers (e.g., DNS-over-HTTPS).
|
||||||
|
|
||||||
### Network Extensions
|
### 1.4. Network Extensions
|
||||||
|
|
||||||
A reasonable idea is to try to use types as close as possible to the
|
A reasonable idea is to try to use types as close as possible to the
|
||||||
standard library. By following this strategy, we can compose our code
|
standard library. By following this strategy, we can compose our code
|
||||||
|
@ -281,12 +290,20 @@ netxlite as a dependency.
|
||||||
itself with measurements unlike [the original netx](df-002-netx.md), which contained
|
itself with measurements unlike [the original netx](df-002-netx.md), which contained
|
||||||
both basic networking wrappers and network measurement code.)
|
both basic networking wrappers and network measurement code.)
|
||||||
|
|
||||||
## Measurement Tactics
|
## 2. Measurement Tactics
|
||||||
|
|
||||||
Each subsection presents a different tactic for collecting measurement
|
Each subsection presents a different tactic for collecting measurement
|
||||||
observations.
|
observations, while reflecting on their pros and cons.
|
||||||
|
|
||||||
### Context-Based Tracing
|
We revisit four distinct tactics:
|
||||||
|
|
||||||
|
* [(1) Context-based tracing](#21-context-based-tracing),
|
||||||
|
* [(2) Decorator-based tracing](#22-decorator-based-tracing),
|
||||||
|
* [(3) Step-by-step measurements](#23-step-by-step-measurements), and
|
||||||
|
* [(4) Measurex: splitting DNSLookup and Endpoint Measurements](#24-measurex-splitting-dnslookup-and-endpoint-measurements).
|
||||||
|
|
||||||
|
|
||||||
|
### 2.1. Context-Based Tracing
|
||||||
|
|
||||||
This tactic is the first one we implemented. We call this approach
|
This tactic is the first one we implemented. We call this approach
|
||||||
"tracing" because it produces a trace of the events, and it's
|
"tracing" because it produces a trace of the events, and it's
|
||||||
|
@ -304,7 +321,7 @@ the stdlib allows one to use the context to perform network tracing
|
||||||
we progressively abandoned `httptrace` as our tracing needs
|
we progressively abandoned `httptrace` as our tracing needs
|
||||||
become more complex than what `httptrace` could provide us with.
|
become more complex than what `httptrace` could provide us with.
|
||||||
|
|
||||||
#### How this tactic feels like
|
#### How context-tracing feels like
|
||||||
|
|
||||||
I tried to adapt how this code would look if we used it now. As
|
I tried to adapt how this code would look if we used it now. As
|
||||||
[dd-002-netx.md](dd-002-netx.md) suggests, here I am trying to separate
|
[dd-002-netx.md](dd-002-netx.md) suggests, here I am trying to separate
|
||||||
|
@ -346,7 +363,7 @@ As you can see, I have marked with fire emojis where we
|
||||||
need to figure out what happened by reading the trace. We are going
|
need to figure out what happened by reading the trace. We are going
|
||||||
to discuss this issue in the next section.
|
to discuss this issue in the next section.
|
||||||
|
|
||||||
#### Issue #1: distance between collection and interpretation
|
#### Issue #1 with context tracing: distance between collection and interpretation
|
||||||
|
|
||||||
The nice part of this approach is that the network-interaction part of
|
The nice part of this approach is that the network-interaction part of
|
||||||
the experiment is \~easy. The bad part is that we must figure out
|
the experiment is \~easy. The bad part is that we must figure out
|
||||||
|
@ -422,7 +439,7 @@ respect and make result-determining code more obvious and closer to the code
|
||||||
that performs the measurement. We will eventually come to fix this issue later
|
that performs the measurement. We will eventually come to fix this issue later
|
||||||
in this document. For now, let us continue to analyze this tactic.
|
in this document. For now, let us continue to analyze this tactic.
|
||||||
|
|
||||||
#### Issue #2: the Context is magic and implicit
|
#### Issue #2 with context tracing: the Context is magic and implicit
|
||||||
|
|
||||||
Another pain point is that we're using the Context's magic. What happens
|
Another pain point is that we're using the Context's magic. What happens
|
||||||
there feels more obscure than explicit initialization for performing
|
there feels more obscure than explicit initialization for performing
|
||||||
|
@ -444,7 +461,7 @@ dialer := saver.WrapQUICDialer(dialer)
|
||||||
conn, err := dialer.DialContext(ctx, /* ... */)
|
conn, err := dialer.DialContext(ctx, /* ... */)
|
||||||
```
|
```
|
||||||
|
|
||||||
In the latter case, it's evident that we're *decorating* the original
|
In the later case, it's evident that we're *decorating* the original
|
||||||
dialer with an extended dialer that knows how to perform network
|
dialer with an extended dialer that knows how to perform network
|
||||||
measurements. In the former case, it feels magic that we're setting some
|
measurements. In the former case, it feels magic that we're setting some
|
||||||
value as an opaque any inside of the context, and there's a documented
|
value as an opaque any inside of the context, and there's a documented
|
||||||
|
@ -460,7 +477,7 @@ explicitly wrapping a type. I will discuss this topic when we
|
||||||
analyze the next tactic because the next tactic is all about reducing
|
analyze the next tactic because the next tactic is all about reducing
|
||||||
the cognitive burden and avoiding the context.
|
the cognitive burden and avoiding the context.
|
||||||
|
|
||||||
#### Issue #3: we obtain a flat trace
|
#### Issue #3 with context tracing: we obtain a flat trace
|
||||||
|
|
||||||
The most straightforward implementation of this approach yields a flat trace. This
|
The most straightforward implementation of this approach yields a flat trace. This
|
||||||
means that one needs to be careful to understand, say, which events are
|
means that one needs to be careful to understand, say, which events are
|
||||||
|
@ -474,7 +491,7 @@ assign numbers to distinct HTTP round trips and DNS lookups so that
|
||||||
later it is possible to make sense of the trace. This was indeed the
|
later it is possible to make sense of the trace. This was indeed the
|
||||||
approach we chose initially. However, this approach puts more pressure
|
approach we chose initially. However, this approach puts more pressure
|
||||||
on the context magic because it does not just suffice to wrap the
|
on the context magic because it does not just suffice to wrap the
|
||||||
context once with WithValue, but you need to additionally wrap it when
|
context once with `WithValue`, but you need to additionally wrap it when
|
||||||
you descend into sub-operations. (Other approaches are possible, but I
|
you descend into sub-operations. (Other approaches are possible, but I
|
||||||
will discuss this one because it looks conceptually cleaner to
|
will discuss this one because it looks conceptually cleaner to
|
||||||
create a new "node" in the trace like it was a DOM.)
|
create a new "node" in the trace like it was a DOM.)
|
||||||
|
@ -529,7 +546,7 @@ censorship. See our [DoT in Iran
|
||||||
research](https://ooni.org/post/2020-iran-dot/) for more
|
research](https://ooni.org/post/2020-iran-dot/) for more
|
||||||
information.)
|
information.)
|
||||||
|
|
||||||
### Decoration-Based Tracing
|
### 2.2. Decorator-Based Tracing
|
||||||
|
|
||||||
In [probe-engine#359](https://github.com/ooni/probe-engine/issues/359), we
|
In [probe-engine#359](https://github.com/ooni/probe-engine/issues/359), we
|
||||||
started planning refactoring of `netx` to solve the identified issues
|
started planning refactoring of `netx` to solve the identified issues
|
||||||
|
@ -903,7 +920,7 @@ the `CNAME` and getaddrinfo's return code very easily):
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Concluding remarks
|
#### Concluding remarks on decorator-based tracing
|
||||||
|
|
||||||
Historically, the decorator-based approach helped simplify
|
Historically, the decorator-based approach helped simplify
|
||||||
the codebase (probably because what we had previously was
|
the codebase (probably because what we had previously was
|
||||||
|
@ -922,29 +939,29 @@ Whatever choice we make, it should probably include some form of
|
||||||
dependency injection for a trace that allows us to collect the events we
|
dependency injection for a trace that allows us to collect the events we
|
||||||
care about more precisely and with less effort.
|
care about more precisely and with less effort.
|
||||||
|
|
||||||
### Step-by-Step Measurements
|
### 2.3. Step-by-step measurements
|
||||||
|
|
||||||
We had previous conversations around simplifying how we perform
|
We've had many conversations about how to simplify the way we do measurements.
|
||||||
measurements (e.g., I remember a conversation with
|
For instance, [Vinicius](https://github.com/fortuna) at some point advocated
|
||||||
[Vinicius](https://github.com/fortuna), where he advocated
|
for decomposing measurements in simple operations. He rightfully pointed out
|
||||||
for decomposing measurements in simple operations, and he
|
that tracing is excellent for debugging, but it complicates to assign
|
||||||
rightfully pointed out that tracing is excellent for debugging
|
meaning to each measurement.
|
||||||
but complicating assigning meaning to measurements).
|
|
||||||
|
|
||||||
We also mentioned that, in the codebase, we documented that `netx` was discouraged as an
|
We had documented in the codebase that `netx` was discouraged as an approach
|
||||||
approach for new experiments. The first chance to try a different tactic
|
for new experiments. We got the first chance to try a
|
||||||
was the development of the `websteps` prototype. We tried to implement
|
different tactic while developing the `websteps` prototype.
|
||||||
step-by-step measurements, which, in its most radical form, calls for
|
|
||||||
performing each relevant step in isolation, immediately saving a
|
In `websteps`, we tried to implement step-by-step measurements: in its most
|
||||||
small trace and interpreting it before moving on to the next step
|
radical form, this calls for performing each relevant step in isolation,
|
||||||
(unless there's an error, in which case you typically stop).
|
immediately saving a small trace and interpreting it before moving on to the
|
||||||
|
next step (unless there's an error, in which case you typically stop).
|
||||||
|
|
||||||
Looking back at the Go stdlib API, the main blocker to implementing
|
Looking back at the Go stdlib API, the main blocker to implementing
|
||||||
this tactic is how to reconcile it with HTTP transports, which expects to
|
this tactic is how to reconcile it with HTTP transports, which expects to
|
||||||
dial and control their own connections. Luckily,
|
dial and control their own connections. Luckily,
|
||||||
[Kathrin](https://github.com/kelmenhorst)
|
[Kathrin](https://github.com/kelmenhorst)
|
||||||
[implemented](https://github.com/ooni/probe-cli/pull/432) the
|
[implemented](https://github.com/ooni/probe-cli/pull/432) the
|
||||||
following trick that solved such an issue:
|
following trick that allows us to solve this issue:
|
||||||
|
|
||||||
```Go
|
```Go
|
||||||
// NewSingleUseDialer returns a "single use" dialer. The first
|
// NewSingleUseDialer returns a "single use" dialer. The first
|
||||||
|
@ -978,8 +995,8 @@ func (s *dialerSingleUse) DialContext(ctx context.Context, network string, addr
|
||||||
With a "single-use" dialer, we provide an HTTPTransport with a fake
|
With a "single-use" dialer, we provide an HTTPTransport with a fake
|
||||||
dialer that provides a previously-established
|
dialer that provides a previously-established
|
||||||
connection to the transport itself. The following snippet shows
|
connection to the transport itself. The following snippet shows
|
||||||
code from my first naive attempt at writing code using this tactic,
|
code from my first naive attempt at writing code using this approach. The pain
|
||||||
where originally identified pain points have been emphasized:
|
points we had originally identified have been emphasized:
|
||||||
|
|
||||||
```Go
|
```Go
|
||||||
dialer := netxlite.NewDialerWithoutResolver()
|
dialer := netxlite.NewDialerWithoutResolver()
|
||||||
|
@ -1015,31 +1032,31 @@ if err != nil {
|
||||||
```
|
```
|
||||||
|
|
||||||
Let's discuss the good parts before moving on to the bad parts. It is
|
Let's discuss the good parts before moving on to the bad parts. It is
|
||||||
dead obvious which operation failed and why, and we know what went
|
dead obvious which operation failed and why, and **we know what went
|
||||||
wrong and can analyze the observations immediately.
|
wrong and can analyze the observations immediately**.
|
||||||
|
|
||||||
Additionally, if you ask someone who knows the standard library to write
|
Additionally, if you ask someone who knows the standard library to write
|
||||||
an experiment that provides information about TCP connect, TLS handshake, and
|
an experiment that provides information about TCP connect, TLS handshake, and
|
||||||
HTTP round trip using `netxlite`, they would probably write something
|
HTTP round trip using `netxlite`, they would probably write something
|
||||||
similar to the above code.
|
similar to the above code.
|
||||||
|
|
||||||
#### Issue #1: no persistent connections
|
#### Issue #1 with step-by-step approach: no persistent connections
|
||||||
|
|
||||||
Without adding extra complexity, we lose the possibility of using
|
Without adding extra complexity, we lose the possibility to use
|
||||||
persistent connections. This may not be a huge problem except for
|
persistent connections. This may not be a huge problem, except for
|
||||||
redirects. Each redirect will require us to set up a new connection,
|
redirects. Each redirect will require us to set up a new connection,
|
||||||
even though an ordinary http.Transport would probably have reused an
|
even though an ordinary `http.Transport` would probably have reused an
|
||||||
existing one.
|
existing one.
|
||||||
|
|
||||||
Because we're measuring censorship, I would argue it's OK to not reuse
|
Because we're measuring censorship, I would argue it's OK to not reuse
|
||||||
connections. Sure, the measurement could be slower, but we'd get
|
connections. Sure, the measurement can be slower, but we'll also get
|
||||||
more data points on TCP/IP and TLS blocking.
|
more data points on TCP/IP and TLS blocking.
|
||||||
|
|
||||||
#### Issue #2: requires manual handling of redirects
|
#### Issue #2 with step-by-step approach: requires manual handling of redirects
|
||||||
|
|
||||||
Because we cannot reuse connections easily, we cannot rely on
|
Because we cannot reuse connections easily, we cannot rely on
|
||||||
&http.Client{} to perform redirections automatically for us. This is why
|
`&http.Client{}` to perform redirections automatically for us. This is why
|
||||||
websteps implements HTTP redirection manually.
|
`websteps` implements HTTP redirection manually.
|
||||||
|
|
||||||
While it may seem that discussing redirects is out of scope,
|
While it may seem that discussing redirects is out of scope,
|
||||||
historically I had been reluctant to switch to a step-by-step model
|
historically I had been reluctant to switch to a step-by-step model
|
||||||
|
@ -1075,7 +1092,7 @@ Because this problem of redirection is fundamental to many experiments
|
||||||
(not only webconnectivity but also, e.g., whatsapp), any step-by-step
|
(not only webconnectivity but also, e.g., whatsapp), any step-by-step
|
||||||
approach library needs this functionality.
|
approach library needs this functionality.
|
||||||
|
|
||||||
#### Issue #3: DRY pressure
|
#### Issue #3 with step-by-step approach: DRY pressure
|
||||||
|
|
||||||
Before, I said that saving traces seems complicated. That is not entirely
|
Before, I said that saving traces seems complicated. That is not entirely
|
||||||
true. Depending on the extent to which we are willing to suffer the pain of
|
true. Depending on the extent to which we are willing to suffer the pain of
|
||||||
|
@ -1195,7 +1212,7 @@ some network operation. If the API is doing too much, I
|
||||||
might not have the ability to hook me into it and run the follow-up
|
might not have the ability to hook me into it and run the follow-up
|
||||||
experiment right after the operation I needed to do.
|
experiment right after the operation I needed to do.
|
||||||
|
|
||||||
#### Concluding remarks
|
#### Concluding remarks on step-by-step measurements
|
||||||
|
|
||||||
If we have a way to collect observations,
|
If we have a way to collect observations,
|
||||||
this approach certainly has the advantage of having some
|
this approach certainly has the advantage of having some
|
||||||
|
@ -1215,13 +1232,16 @@ probably want to split DNS and other operations to get a chance to
|
||||||
test all (or many) of the available IP addresses and use tracing within DNS and
|
test all (or many) of the available IP addresses and use tracing within DNS and
|
||||||
other operations.
|
other operations.
|
||||||
|
|
||||||
### DNSLookup and Endpoint Measurements
|
### 2.4. measurex: splitting DNSLookup and Endpoint Measurements
|
||||||
|
|
||||||
This approach is currently implemented as measurex. A DNSLookup
|
This fourth and last approach of the ones we'll discuss is currently
|
||||||
measurement should be an obvious concept, while it's probably less
|
implemented in `measurex`.
|
||||||
clear what's an endpoint measurement. So, let's clarify.
|
|
||||||
|
|
||||||
An endpoint measurement is one of the following operations:
|
A **DNSLookup measurement** is perhaps an obvious concept, but an **endpoint
|
||||||
|
measurement** is probably not obvious. So, let's first clarify the
|
||||||
|
terminology:
|
||||||
|
|
||||||
|
We define an **endpoint measurement** as one of the following operations:
|
||||||
|
|
||||||
1. given a TCP endpoint (IP address and port), TCP connect to it;
|
1. given a TCP endpoint (IP address and port), TCP connect to it;
|
||||||
|
|
||||||
|
@ -1249,12 +1269,12 @@ addresses, so that pattern was already in OONI somehow (at least for the
|
||||||
most important experiment measuring web censorship.)
|
most important experiment measuring web censorship.)
|
||||||
|
|
||||||
Another interesting observation about the above set of operations is
|
Another interesting observation about the above set of operations is
|
||||||
that each of them could fail exactly once. The DNSLookup could fail or
|
that each of them could fail **exactly once**. The DNSLookup could fail or
|
||||||
yield addresses, and TCP connect could fail or succeed. In the TCP connect
|
yield addresses, and TCP connect could fail or succeed. In the TCP connect
|
||||||
plus TLS handshake case, you stop there if you fail the TCP connect. And
|
plus TLS handshake case, you stop there if you fail the TCP connect. And
|
||||||
so on.
|
so on.
|
||||||
|
|
||||||
Because of this reasoning, one could say that the measurex tactic
|
Because of this reasoning, one could say that the `measurex` tactic
|
||||||
is equivalent to the previous one in relatively easily identifying
|
is equivalent to the previous one in relatively easily identifying
|
||||||
the failed operation and filling the measurement. That seems to be an argument
|
the failed operation and filling the measurement. That seems to be an argument
|
||||||
for having a library containing code to simplify measurements.
|
for having a library containing code to simplify measurements.
|
||||||
|
@ -1263,15 +1283,17 @@ an entirely new API*. After careful consideration, it seems preferable to
|
||||||
select an API that is closer to what a typical Go programmer would
|
select an API that is closer to what a typical Go programmer would
|
||||||
expect.
|
expect.
|
||||||
|
|
||||||
## Step-by-step refactoring proposal
|
## 3. Step-by-step refactoring proposal
|
||||||
|
|
||||||
|
Finally, all the discussion is in place to get to a concrete proposal.
|
||||||
|
|
||||||
I tried to reimplement the telegram experiment using a pure step-by-step approach
|
I tried to reimplement the telegram experiment using a pure step-by-step approach
|
||||||
([here's the
|
([here's the
|
||||||
gist](https://gist.github.com/bassosimone/f6e680d35805174d1f150bc15ef754af)).
|
gist](https://gist.github.com/bassosimone/f6e680d35805174d1f150bc15ef754af)).
|
||||||
It looks fine, but one ends up writing a support library such as measurex. Yet,
|
It looks fine, but one ends up writing a support library such as `measurex`. Yet,
|
||||||
as noted above, the API exposed by such a measurement
|
as noted above, the API exposed by such a measurement
|
||||||
library matters, and an API familiar to Go developers seems preferable
|
library matters, and an API familiar to Go developers seems preferable
|
||||||
to the API implemented by measurex.
|
to the API implemented by `measurex`.
|
||||||
|
|
||||||
There are two key insights I derived from my telegram PoC.
|
There are two key insights I derived from my telegram PoC.
|
||||||
|
|
||||||
|
@ -1415,7 +1437,7 @@ the PoC is that *traces are numbered*. This is not what happens
|
||||||
currently in OONI Probe, but it is *beneficial*. By numbering
|
currently in OONI Probe, but it is *beneficial*. By numbering
|
||||||
traces, we can quite easily tell which existing event belongs to which
|
traces, we can quite easily tell which existing event belongs to which
|
||||||
specific submeasurement. (If there's no need to number traces, we can
|
specific submeasurement. (If there's no need to number traces, we can
|
||||||
just set a zero index to all the traces we collect, *e passa la paura*.)
|
just set a zero index to all the traces we collect, *[e passa la paura](https://context.reverso.net/traduzione/italiano-inglese/passa+la+paura)*.)
|
||||||
|
|
||||||
One minor aspect to keep in mind in this design is that we need to
|
One minor aspect to keep in mind in this design is that we need to
|
||||||
communicate to developers that the *trace* will cause the body snapshot
|
communicate to developers that the *trace* will cause the body snapshot
|
||||||
|
@ -1423,14 +1445,14 @@ to be read as part of the round trip. This fact occurs because OONI's
|
||||||
definition of a request-response transaction includes the response body
|
definition of a request-response transaction includes the response body
|
||||||
(or a snapshot) while Go does not include a body in http.Response
|
(or a snapshot) while Go does not include a body in http.Response
|
||||||
but allows for streaming the body on demand. Because reading all the
|
but allows for streaming the body on demand. Because reading all the
|
||||||
body with netxlite.ReadAllContext without any limit bound is unsafe (as
|
body with `netxlite.ReadAllContext` without any limit bound is unsafe (as
|
||||||
it could consume lots of RAM, and we're not always running on systems
|
it could consume lots of RAM, and we're not always running on systems
|
||||||
with lots of RAM), the example was already limiting the response
|
with lots of RAM), the example was already limiting the response
|
||||||
body length before we introduced data collection. Yet, with the introduction of
|
body length before we introduced data collection. Yet, with the introduction of
|
||||||
data collection, the explicit netxlite.ReadAllContext is now reading
|
data collection, the explicit `netxlite.ReadAllContext` is now reading
|
||||||
from memory rather than from the network because the body snapshot has
|
from memory rather than from the network because the body snapshot has
|
||||||
already been read. So, we need to ensure that developers know that
|
already been read. So, we need to ensure that developers know that
|
||||||
netxlite.ReadAllContext cannot be used to measure/estimate the
|
`netxlite.ReadAllContext` cannot be used to measure/estimate the
|
||||||
download speed. (In such a case, one would need to either use a
|
download speed. (In such a case, one would need to either use a
|
||||||
different transport or not collect any snapshot and then read
|
different transport or not collect any snapshot and then read
|
||||||
the whole body directly from the network--so perhaps we
|
the whole body directly from the network--so perhaps we
|
||||||
|
@ -1453,16 +1475,16 @@ will always be some form of *limited* step-by-step, where we will always
|
||||||
split DNS lookup and endpoint measurements as we already do in
|
split DNS lookup and endpoint measurements as we already do in
|
||||||
dnscheck to ensure we measure \~all IP addresses.
|
dnscheck to ensure we measure \~all IP addresses.
|
||||||
|
|
||||||
Compared to measurex, I think step-by-step is \~better because it does
|
Compared to `measurex`, I think step-by-step is \~better because it does
|
||||||
not require anyone to learn more beyond how to use netxlite instead of
|
not require anyone to learn more beyond how to use `netxlite` instead of
|
||||||
the standard library. (BTW, we cannot really get rid of netxlite because
|
the standard library. (BTW, we cannot really get rid of `netxlite` because
|
||||||
we have measurement requirements that call for wrapping and extending
|
we have measurement requirements that call for wrapping and extending
|
||||||
the standard library or to provide enhancements beyond the stdlib
|
the standard library or to provide enhancements beyond the stdlib
|
||||||
functionality.)
|
functionality.)
|
||||||
|
|
||||||
Regarding the way to implement tracing, from the above discussion, it is
|
Regarding the way to implement tracing, from the above discussion, it is
|
||||||
clear that we should move away from the wrapping approach because it
|
clear that **we should move away from the wrapping approach because it
|
||||||
does not allow us to correctly collect specific events. (To be fair, it
|
does not allow us to correctly collect specific events**. (To be fair, it
|
||||||
could allow us to do that, but it would entail
|
could allow us to do that, but it would entail
|
||||||
significant wrapping efforts.) I would therefore recommend rewriting
|
significant wrapping efforts.) I would therefore recommend rewriting
|
||||||
tracing to use the context (ugh!) but to wrap this implementation inside
|
tracing to use the context (ugh!) but to wrap this implementation inside
|
||||||
|
@ -1538,21 +1560,21 @@ func NewArchivalTLSOrQUICHandshakeResult(index int64, started time.Time,
|
||||||
// package internal/netxlite
|
// package internal/netxlite
|
||||||
|
|
||||||
func (thx *tlsHandshakerConfigurable) Handshake(ctx context.Context,
|
func (thx *tlsHandshakerConfigurable) Handshake(ctx context.Context,
|
||||||
conn net.Conn, config \*tls.Config) (net.Conn, tls.ConnectionState, error) {
|
conn net.Conn, config \*tls.Config) (net.Conn, tls.ConnectionState, error) {
|
||||||
// ...
|
// ...
|
||||||
remoteAddr := conn.RemoteAddr().String() // +++ (i.e., added line)
|
remoteAddr := conn.RemoteAddr().String() // +++ (i.e., added line)
|
||||||
trace := ContextTraceOrDefault(ctx) // +++
|
trace := ContextTraceOrDefault(ctx) // +++
|
||||||
started := time.Now() // +++
|
started := time.Now() // +++
|
||||||
err := tlsconn.HandshakeContext(ctx)
|
err := tlsconn.HandshakeContext(ctx)
|
||||||
finished := time.Now() // +++
|
finished := time.Now() // +++
|
||||||
if err != nil {
|
if err != nil {
|
||||||
trace.OnTLSHandshake(started, remoteAddr, config, // +++
|
trace.OnTLSHandshake(started, remoteAddr, config, // +++
|
||||||
tls.ConnectionState{}, err, finished) // +++
|
tls.ConnectionState{}, err, finished) // +++
|
||||||
return nil, tls.ConnectionState{}, err
|
return nil, tls.ConnectionState{}, err
|
||||||
}
|
}
|
||||||
state := tlsconn.connectionState()
|
state := tlsconn.connectionState()
|
||||||
trace.OnTLSHandshake(started, remoteAddr, config, // +++
|
trace.OnTLSHandshake(started, remoteAddr, config, // +++
|
||||||
state, nil, finished) // +++
|
state, nil, finished) // +++
|
||||||
return tlsconn, state, nil
|
return tlsconn, state, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1563,8 +1585,8 @@ func ContextWithTrace(ctx context.Context, trace model.Trace) context.Context {
|
||||||
// package internal/model
|
// package internal/model
|
||||||
|
|
||||||
type Trace interface {
|
type Trace interface {
|
||||||
OnTLSHandshake(started time.Time, remoteAddr string, config *tls.Config,
|
OnTLSHandshake(started time.Time, remoteAddr string, config *tls.Config,
|
||||||
state tls.ConnectionState, err error, finished time.Time)
|
state tls.ConnectionState, err error, finished time.Time)
|
||||||
|
|
||||||
// ...
|
// ...
|
||||||
}
|
}
|
||||||
|
@ -1580,7 +1602,7 @@ can stick with the current model where we have possibly unbounded
|
||||||
traces. (This is just an implementation detail that does not matter much
|
traces. (This is just an implementation detail that does not matter much
|
||||||
regarding the overall design.)
|
regarding the overall design.)
|
||||||
|
|
||||||
### Smooth transition
|
### 3.1. Smooth transition
|
||||||
|
|
||||||
We should do incremental refactoring. We should create a few issues
|
We should do incremental refactoring. We should create a few issues
|
||||||
describing these design aspects and summarize what would be the way
|
describing these design aspects and summarize what would be the way
|
||||||
|
@ -1600,7 +1622,7 @@ earlier with a simplified tree with less measurement-supporting
|
||||||
libraries. It also seems that dash, hhfm, and hirl can be migrated quite
|
libraries. It also seems that dash, hhfm, and hirl can be migrated quite
|
||||||
easily away from netx and urlgetter.
|
easily away from netx and urlgetter.
|
||||||
|
|
||||||
### Netxlite scope change
|
### 3.2. Netxlite scope change
|
||||||
|
|
||||||
If we move forward with this plan, we will slightly change the scope of
|
If we move forward with this plan, we will slightly change the scope of
|
||||||
netxlite to include lightweight support for collecting traces. We
|
netxlite to include lightweight support for collecting traces. We
|
||||||
|
@ -1613,7 +1635,7 @@ policy for saving measurements by implementing model.Trace properly. So,
|
||||||
we should also amend the documentation of netxlite to
|
we should also amend the documentation of netxlite to
|
||||||
explicitly mention support for tracing as a new concern.
|
explicitly mention support for tracing as a new concern.
|
||||||
|
|
||||||
### Cleanups
|
### 3.3. Cleanups
|
||||||
|
|
||||||
If we implement this step-by-step change, we no longer need a "flat" data
|
If we implement this step-by-step change, we no longer need a "flat" data
|
||||||
format. We use the flat data format for processing the results
|
format. We use the flat data format for processing the results
|
||||||
|
@ -1629,7 +1651,7 @@ concern is with the error wrapping, which probably should be in the same
|
||||||
place where we are using the context to inject a trace to ensure
|
place where we are using the context to inject a trace to ensure
|
||||||
that error wrapping and tracing happen together.)
|
that error wrapping and tracing happen together.)
|
||||||
|
|
||||||
## Reviews
|
## 4. Document Reviews
|
||||||
|
|
||||||
### 2022-06-09 - Review: Arturo
|
### 2022-06-09 - Review: Arturo
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
# Directory github.com/ooni/probe-engine/experiment
|
# Directory github.com/ooni/probe-cli/internal/engine/experiment
|
||||||
|
|
||||||
This directory contains the implementation of all the supported
|
This directory contains the implementation of all the supported
|
||||||
experiments, one for each directory. The [OONI spec repository
|
experiments, one for each directory. The [OONI spec repository
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
# Package github.com/ooni/probe-engine/model
|
# Package github.com/ooni/probe-cli/internal/model
|
||||||
|
|
||||||
Shared data structures and interfaces. We include in this
|
Shared data structures and interfaces. We include in this
|
||||||
package the most fundamental types. Use `go doc` to get
|
package the most fundamental types. Use `go doc` to get
|
||||||
|
|
|
@ -57,7 +57,7 @@ type Measurement struct {
|
||||||
|
|
||||||
// MeasurementStartTimeSaved is the moment in time when we
|
// MeasurementStartTimeSaved is the moment in time when we
|
||||||
// started the measurement. This is not included into the JSON
|
// started the measurement. This is not included into the JSON
|
||||||
// and is only used within probe-engine as a "zero" time.
|
// and is only used within the ./internal pkg as a "zero" time.
|
||||||
MeasurementStartTimeSaved time.Time `json:"-"`
|
MeasurementStartTimeSaved time.Time `json:"-"`
|
||||||
|
|
||||||
// Options contains command line options
|
// Options contains command line options
|
||||||
|
|
Loading…
Reference in New Issue
Block a user