doc(step-by-step): readability improvements (#820)

This diff contains readability improvements for the step-by-step design document.

Co-authored-by: Simone Basso <bassosimone@gmail.com>
This commit is contained in:
Ain Ghazal 2022-06-30 09:55:18 +02:00 committed by GitHub
parent 797dd27ffc
commit 74aebedac3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 144 additions and 122 deletions

View File

@ -1,16 +1,19 @@
# Step-by-step measurements # Step-by-step measurements
| | | | | |
|:-------------|:-------------| |--------------|------------------------------------------------|
| Author | [@bassosimone](https://github.com/bassosimone) | | Author | [@bassosimone](https://github.com/bassosimone) |
| Last-Updated | 2022-06-13 | | Last-Updated | 2022-06-29 |
| Reviewed-by | [@hellais](https://github.com/hellais) | | Reviewed-by | [@hellais](https://github.com/hellais) |
| Reviewed-by | [@DecFox](https://github.com/DecFox/) | | Reviewed-by | [@DecFox](https://github.com/DecFox/) |
| Status | approved | | Reviewed-by | [@ainghazal](https://github.com/ainghazal/) |
| Obsoletes | [dd-002-netx.md](dd-002-netx.md) | | Status | approved |
| Obsoletes | [dd-002-netx.md](dd-002-netx.md) |
*Abstract.* The original [netx design document](dd-002-netx.md) is now two ## Abstract
years old. Since we wrote such a document, we amended the overall design
The original [netx design document](dd-002-netx.md) is now two
years old. Since we wrote that document, we amended the overall design
several times. The four major design changes were: several times. The four major design changes were:
1. saving rather than emitting 1. saving rather than emitting
@ -24,50 +27,56 @@ pattern [ooni/probe-engine#522](https://github.com/ooni/probe-engine/pull/522);
4. measurex [ooni/probe-cli#528](https://github.com/ooni/probe-cli/pull/528). 4. measurex [ooni/probe-cli#528](https://github.com/ooni/probe-cli/pull/528).
In this (long) design document, we will revisit the original problem proposed by In this (long) design document, we will revisit the original problem proposed by
[df-002-netx.md](df-002-netx.md), in light of what we changed and learned from the [dd-002-netx.md](dd-002-netx.md), in light of all the changes and lessons
changes we applied. We will highlight the significant pain points of the current learned since then. We will highlight the significant pain points of the
implementation, which are the following: current implementation, which are the following:
1. that the measurement library API is significantly different from the Go stdlib 1. The measurement library API is significantly different from the Go stdlib
API, therefore violating this original `netx` design goal: that writing a new API. This violates one of the central design goals for `netx`: that writing a new
experiment means using slightly different constructors that deviate from the standard experiment would involve using constructors very similar to the standard
library only to meet specific measurement goals we have; library. Such deviations were supposed to be made only to meet our specific
measurement goals;
2. that the decorator pattern leads to complexity in creating measurement types, 2. The decorator pattern has lead to complexity in creating measurement types,
which in turn seems to be the reason why the previous issue exists; which in turn seems to be the reason why the previous issue exists;
3. that the decorator pattern does not allow us to precisely collect all the 3. The decorator pattern does not allow us to precisely collect all the
data that matters for events such as TCP connect and DNS round trips using data that matters for certain events (such as TCP connect and DNS round trips using
a custom transport, thus suggesting that we should revisit our choice of using a custom transport). This suggests that we should revisit our choice of using
decorators and revert back to some form of _constructor based injection_ to decorators, and revert back to some form of _constructor based injection_ to
inject a data type suitable for saving events. inject a data type suitable for saving events.
In doing that, we will also propose an incremental plan for moving the tree Finally, this document also proposes an incremental plan for moving the tree
forward from [the current state](https://github.com/ooni/probe-cli/tree/1685ef75b5a6a0025a1fd671625b27ee989ef111) forward from [the current state](https://github.com/ooni/probe-cli/tree/1685ef75b5a6a0025a1fd671625b27ee989ef111)
to a state where complexity is transferred from the measurement-support library to to a state in which the complexity has been transferred from the
the implementation of each individual network experiment. measurement-support library to the implementation of each individual network
experiment.
## Index ## Index
In [netxlite: the underlying library](#netxlite-the-underlying-network-library) we There are four main sections in this document:
describe the current design of the underlying network library.
In [measurement tactics](#measurement-tactics) we provide an historical perspective [1. Netxlite: the underlying library](#1-netxlite-the-underlying-network-library)
on the measurement tactics, we adopted or just tried. describes the current design of the underlying network library.
The [step-by-step refactoring proposal](#step-by-step-refactoring-proposal) [2. Measurement tactics](#2-measurement-tactics) gives an historical perspective
contains the main contribution of this design document: a proposal to refactor on different measurement tactics we adopted or tried in the past, and reflects
the existing codebase to address our current measurement-code problems. on their merits and downsides.
The [reviews](#reviews) section contains information about reviews. [3. Step-by-step refactoring proposal](#3-step-by-step-refactoring-proposal)
contains the main contribution of this design document: a concrete proposal to
refactor the existing codebase to address our current measurement-code
problems.
## netxlite: the underlying network library [4. Document reviews](#4-document-reviews) contains information about reviews of this document.
This section describes `netxlite`, the underlying network library, from a ## 1. netxlite: the underlying network library
This section describes `netxlite`, the underlying network library, from an
historical perspective. We explain our data collection needs, and what types historical perspective. We explain our data collection needs, and what types
from the standard library we're using as patterns. from the standard library we're using as patterns.
### Measurement Observations ### 1.1. Measurement Observations
Most OONI experiments need to observe and give meaning to these events: Most OONI experiments need to observe and give meaning to these events:
@ -121,7 +130,7 @@ defines how we archive experiment results as a set of observations.
(Orthogonally, we may also want to improve the data format, but this is (Orthogonally, we may also want to improve the data format, but this is
not under discussion now.) not under discussion now.)
### Error Wrapping ### 1.2. Error Wrapping
The OONI data format also specifies [how we should represent The OONI data format also specifies [how we should represent
errors](https://github.com/ooni/spec/blob/master/data-formats/df-007-errors.md). errors](https://github.com/ooni/spec/blob/master/data-formats/df-007-errors.md).
@ -140,7 +149,7 @@ DNS response messages or the syscall error returned by a Read call. By
adding this, we would give those who analyze the data information to adding this, we would give those who analyze the data information to
evaluate the correctness of a measurement. evaluate the correctness of a measurement.
### Go Stdlib ### 1.3 Go Stdlib
The Go standard library provides the following structs and interfaces The Go standard library provides the following structs and interfaces
that we can use for measuring: that we can use for measuring:
@ -235,7 +244,7 @@ Apart from the stdlib and quic-go, the only other significant network code
dependency is [miekg/dns](https://github.com/miekg/dns) dependency is [miekg/dns](https://github.com/miekg/dns)
for custom DNS resolvers (e.g., DNS-over-HTTPS). for custom DNS resolvers (e.g., DNS-over-HTTPS).
### Network Extensions ### 1.4. Network Extensions
A reasonable idea is to try to use types as close as possible to the A reasonable idea is to try to use types as close as possible to the
standard library. By following this strategy, we can compose our code standard library. By following this strategy, we can compose our code
@ -281,12 +290,20 @@ netxlite as a dependency.
itself with measurements unlike [the original netx](df-002-netx.md), which contained itself with measurements unlike [the original netx](df-002-netx.md), which contained
both basic networking wrappers and network measurement code.) both basic networking wrappers and network measurement code.)
## Measurement Tactics ## 2. Measurement Tactics
Each subsection presents a different tactic for collecting measurement Each subsection presents a different tactic for collecting measurement
observations. observations, while reflecting on their pros and cons.
### Context-Based Tracing We revisit four distinct tactics:
* [(1) Context-based tracing](#21-context-based-tracing),
* [(2) Decorator-based tracing](#22-decorator-based-tracing),
* [(3) Step-by-step measurements](#23-step-by-step-measurements), and
* [(4) Measurex: splitting DNSLookup and Endpoint Measurements](#24-measurex-splitting-dnslookup-and-endpoint-measurements).
### 2.1. Context-Based Tracing
This tactic is the first one we implemented. We call this approach This tactic is the first one we implemented. We call this approach
"tracing" because it produces a trace of the events, and it's "tracing" because it produces a trace of the events, and it's
@ -304,7 +321,7 @@ the stdlib allows one to use the context to perform network tracing
we progressively abandoned `httptrace` as our tracing needs we progressively abandoned `httptrace` as our tracing needs
become more complex than what `httptrace` could provide us with. become more complex than what `httptrace` could provide us with.
#### How this tactic feels like #### How context-tracing feels like
I tried to adapt how this code would look if we used it now. As I tried to adapt how this code would look if we used it now. As
[dd-002-netx.md](dd-002-netx.md) suggests, here I am trying to separate [dd-002-netx.md](dd-002-netx.md) suggests, here I am trying to separate
@ -346,7 +363,7 @@ As you can see, I have marked with fire emojis where we
need to figure out what happened by reading the trace. We are going need to figure out what happened by reading the trace. We are going
to discuss this issue in the next section. to discuss this issue in the next section.
#### Issue #1: distance between collection and interpretation #### Issue #1 with context tracing: distance between collection and interpretation
The nice part of this approach is that the network-interaction part of The nice part of this approach is that the network-interaction part of
the experiment is \~easy. The bad part is that we must figure out the experiment is \~easy. The bad part is that we must figure out
@ -422,7 +439,7 @@ respect and make result-determining code more obvious and closer to the code
that performs the measurement. We will eventually come to fix this issue later that performs the measurement. We will eventually come to fix this issue later
in this document. For now, let us continue to analyze this tactic. in this document. For now, let us continue to analyze this tactic.
#### Issue #2: the Context is magic and implicit #### Issue #2 with context tracing: the Context is magic and implicit
Another pain point is that we're using the Context's magic. What happens Another pain point is that we're using the Context's magic. What happens
there feels more obscure than explicit initialization for performing there feels more obscure than explicit initialization for performing
@ -444,7 +461,7 @@ dialer := saver.WrapQUICDialer(dialer)
conn, err := dialer.DialContext(ctx, /* ... */) conn, err := dialer.DialContext(ctx, /* ... */)
``` ```
In the latter case, it's evident that we're *decorating* the original In the later case, it's evident that we're *decorating* the original
dialer with an extended dialer that knows how to perform network dialer with an extended dialer that knows how to perform network
measurements. In the former case, it feels magic that we're setting some measurements. In the former case, it feels magic that we're setting some
value as an opaque any inside of the context, and there's a documented value as an opaque any inside of the context, and there's a documented
@ -460,7 +477,7 @@ explicitly wrapping a type. I will discuss this topic when we
analyze the next tactic because the next tactic is all about reducing analyze the next tactic because the next tactic is all about reducing
the cognitive burden and avoiding the context. the cognitive burden and avoiding the context.
#### Issue #3: we obtain a flat trace #### Issue #3 with context tracing: we obtain a flat trace
The most straightforward implementation of this approach yields a flat trace. This The most straightforward implementation of this approach yields a flat trace. This
means that one needs to be careful to understand, say, which events are means that one needs to be careful to understand, say, which events are
@ -474,7 +491,7 @@ assign numbers to distinct HTTP round trips and DNS lookups so that
later it is possible to make sense of the trace. This was indeed the later it is possible to make sense of the trace. This was indeed the
approach we chose initially. However, this approach puts more pressure approach we chose initially. However, this approach puts more pressure
on the context magic because it does not just suffice to wrap the on the context magic because it does not just suffice to wrap the
context once with WithValue, but you need to additionally wrap it when context once with `WithValue`, but you need to additionally wrap it when
you descend into sub-operations. (Other approaches are possible, but I you descend into sub-operations. (Other approaches are possible, but I
will discuss this one because it looks conceptually cleaner to will discuss this one because it looks conceptually cleaner to
create a new "node" in the trace like it was a DOM.) create a new "node" in the trace like it was a DOM.)
@ -529,7 +546,7 @@ censorship. See our [DoT in Iran
research](https://ooni.org/post/2020-iran-dot/) for more research](https://ooni.org/post/2020-iran-dot/) for more
information.) information.)
### Decoration-Based Tracing ### 2.2. Decorator-Based Tracing
In [probe-engine#359](https://github.com/ooni/probe-engine/issues/359), we In [probe-engine#359](https://github.com/ooni/probe-engine/issues/359), we
started planning refactoring of `netx` to solve the identified issues started planning refactoring of `netx` to solve the identified issues
@ -903,7 +920,7 @@ the `CNAME` and getaddrinfo's return code very easily):
} }
``` ```
#### Concluding remarks #### Concluding remarks on decorator-based tracing
Historically, the decorator-based approach helped simplify Historically, the decorator-based approach helped simplify
the codebase (probably because what we had previously was the codebase (probably because what we had previously was
@ -922,29 +939,29 @@ Whatever choice we make, it should probably include some form of
dependency injection for a trace that allows us to collect the events we dependency injection for a trace that allows us to collect the events we
care about more precisely and with less effort. care about more precisely and with less effort.
### Step-by-Step Measurements ### 2.3. Step-by-step measurements
We had previous conversations around simplifying how we perform We've had many conversations about how to simplify the way we do measurements.
measurements (e.g., I remember a conversation with For instance, [Vinicius](https://github.com/fortuna) at some point advocated
[Vinicius](https://github.com/fortuna), where he advocated for decomposing measurements in simple operations. He rightfully pointed out
for decomposing measurements in simple operations, and he that tracing is excellent for debugging, but it complicates to assign
rightfully pointed out that tracing is excellent for debugging meaning to each measurement.
but complicating assigning meaning to measurements).
We also mentioned that, in the codebase, we documented that `netx` was discouraged as an We had documented in the codebase that `netx` was discouraged as an approach
approach for new experiments. The first chance to try a different tactic for new experiments. We got the first chance to try a
was the development of the `websteps` prototype. We tried to implement different tactic while developing the `websteps` prototype.
step-by-step measurements, which, in its most radical form, calls for
performing each relevant step in isolation, immediately saving a In `websteps`, we tried to implement step-by-step measurements: in its most
small trace and interpreting it before moving on to the next step radical form, this calls for performing each relevant step in isolation,
(unless there's an error, in which case you typically stop). immediately saving a small trace and interpreting it before moving on to the
next step (unless there's an error, in which case you typically stop).
Looking back at the Go stdlib API, the main blocker to implementing Looking back at the Go stdlib API, the main blocker to implementing
this tactic is how to reconcile it with HTTP transports, which expects to this tactic is how to reconcile it with HTTP transports, which expects to
dial and control their own connections. Luckily, dial and control their own connections. Luckily,
[Kathrin](https://github.com/kelmenhorst) [Kathrin](https://github.com/kelmenhorst)
[implemented](https://github.com/ooni/probe-cli/pull/432) the [implemented](https://github.com/ooni/probe-cli/pull/432) the
following trick that solved such an issue: following trick that allows us to solve this issue:
```Go ```Go
// NewSingleUseDialer returns a "single use" dialer. The first // NewSingleUseDialer returns a "single use" dialer. The first
@ -978,8 +995,8 @@ func (s *dialerSingleUse) DialContext(ctx context.Context, network string, addr
With a "single-use" dialer, we provide an HTTPTransport with a fake With a "single-use" dialer, we provide an HTTPTransport with a fake
dialer that provides a previously-established dialer that provides a previously-established
connection to the transport itself. The following snippet shows connection to the transport itself. The following snippet shows
code from my first naive attempt at writing code using this tactic, code from my first naive attempt at writing code using this approach. The pain
where originally identified pain points have been emphasized: points we had originally identified have been emphasized:
```Go ```Go
dialer := netxlite.NewDialerWithoutResolver() dialer := netxlite.NewDialerWithoutResolver()
@ -1015,31 +1032,31 @@ if err != nil {
``` ```
Let's discuss the good parts before moving on to the bad parts. It is Let's discuss the good parts before moving on to the bad parts. It is
dead obvious which operation failed and why, and we know what went dead obvious which operation failed and why, and **we know what went
wrong and can analyze the observations immediately. wrong and can analyze the observations immediately**.
Additionally, if you ask someone who knows the standard library to write Additionally, if you ask someone who knows the standard library to write
an experiment that provides information about TCP connect, TLS handshake, and an experiment that provides information about TCP connect, TLS handshake, and
HTTP round trip using `netxlite`, they would probably write something HTTP round trip using `netxlite`, they would probably write something
similar to the above code. similar to the above code.
#### Issue #1: no persistent connections #### Issue #1 with step-by-step approach: no persistent connections
Without adding extra complexity, we lose the possibility of using Without adding extra complexity, we lose the possibility to use
persistent connections. This may not be a huge problem except for persistent connections. This may not be a huge problem, except for
redirects. Each redirect will require us to set up a new connection, redirects. Each redirect will require us to set up a new connection,
even though an ordinary http.Transport would probably have reused an even though an ordinary `http.Transport` would probably have reused an
existing one. existing one.
Because we're measuring censorship, I would argue it's OK to not reuse Because we're measuring censorship, I would argue it's OK to not reuse
connections. Sure, the measurement could be slower, but we'd get connections. Sure, the measurement can be slower, but we'll also get
more data points on TCP/IP and TLS blocking. more data points on TCP/IP and TLS blocking.
#### Issue #2: requires manual handling of redirects #### Issue #2 with step-by-step approach: requires manual handling of redirects
Because we cannot reuse connections easily, we cannot rely on Because we cannot reuse connections easily, we cannot rely on
&http.Client{} to perform redirections automatically for us. This is why `&http.Client{}` to perform redirections automatically for us. This is why
websteps implements HTTP redirection manually. `websteps` implements HTTP redirection manually.
While it may seem that discussing redirects is out of scope, While it may seem that discussing redirects is out of scope,
historically I had been reluctant to switch to a step-by-step model historically I had been reluctant to switch to a step-by-step model
@ -1075,7 +1092,7 @@ Because this problem of redirection is fundamental to many experiments
(not only webconnectivity but also, e.g., whatsapp), any step-by-step (not only webconnectivity but also, e.g., whatsapp), any step-by-step
approach library needs this functionality. approach library needs this functionality.
#### Issue #3: DRY pressure #### Issue #3 with step-by-step approach: DRY pressure
Before, I said that saving traces seems complicated. That is not entirely Before, I said that saving traces seems complicated. That is not entirely
true. Depending on the extent to which we are willing to suffer the pain of true. Depending on the extent to which we are willing to suffer the pain of
@ -1195,7 +1212,7 @@ some network operation. If the API is doing too much, I
might not have the ability to hook me into it and run the follow-up might not have the ability to hook me into it and run the follow-up
experiment right after the operation I needed to do. experiment right after the operation I needed to do.
#### Concluding remarks #### Concluding remarks on step-by-step measurements
If we have a way to collect observations, If we have a way to collect observations,
this approach certainly has the advantage of having some this approach certainly has the advantage of having some
@ -1215,13 +1232,16 @@ probably want to split DNS and other operations to get a chance to
test all (or many) of the available IP addresses and use tracing within DNS and test all (or many) of the available IP addresses and use tracing within DNS and
other operations. other operations.
### DNSLookup and Endpoint Measurements ### 2.4. measurex: splitting DNSLookup and Endpoint Measurements
This approach is currently implemented as measurex. A DNSLookup This fourth and last approach of the ones we'll discuss is currently
measurement should be an obvious concept, while it's probably less implemented in `measurex`.
clear what's an endpoint measurement. So, let's clarify.
An endpoint measurement is one of the following operations: A **DNSLookup measurement** is perhaps an obvious concept, but an **endpoint
measurement** is probably not obvious. So, let's first clarify the
terminology:
We define an **endpoint measurement** as one of the following operations:
1. given a TCP endpoint (IP address and port), TCP connect to it; 1. given a TCP endpoint (IP address and port), TCP connect to it;
@ -1249,12 +1269,12 @@ addresses, so that pattern was already in OONI somehow (at least for the
most important experiment measuring web censorship.) most important experiment measuring web censorship.)
Another interesting observation about the above set of operations is Another interesting observation about the above set of operations is
that each of them could fail exactly once. The DNSLookup could fail or that each of them could fail **exactly once**. The DNSLookup could fail or
yield addresses, and TCP connect could fail or succeed. In the TCP connect yield addresses, and TCP connect could fail or succeed. In the TCP connect
plus TLS handshake case, you stop there if you fail the TCP connect. And plus TLS handshake case, you stop there if you fail the TCP connect. And
so on. so on.
Because of this reasoning, one could say that the measurex tactic Because of this reasoning, one could say that the `measurex` tactic
is equivalent to the previous one in relatively easily identifying is equivalent to the previous one in relatively easily identifying
the failed operation and filling the measurement. That seems to be an argument the failed operation and filling the measurement. That seems to be an argument
for having a library containing code to simplify measurements. for having a library containing code to simplify measurements.
@ -1263,15 +1283,17 @@ an entirely new API*. After careful consideration, it seems preferable to
select an API that is closer to what a typical Go programmer would select an API that is closer to what a typical Go programmer would
expect. expect.
## Step-by-step refactoring proposal ## 3. Step-by-step refactoring proposal
Finally, all the discussion is in place to get to a concrete proposal.
I tried to reimplement the telegram experiment using a pure step-by-step approach I tried to reimplement the telegram experiment using a pure step-by-step approach
([here's the ([here's the
gist](https://gist.github.com/bassosimone/f6e680d35805174d1f150bc15ef754af)). gist](https://gist.github.com/bassosimone/f6e680d35805174d1f150bc15ef754af)).
It looks fine, but one ends up writing a support library such as measurex. Yet, It looks fine, but one ends up writing a support library such as `measurex`. Yet,
as noted above, the API exposed by such a measurement as noted above, the API exposed by such a measurement
library matters, and an API familiar to Go developers seems preferable library matters, and an API familiar to Go developers seems preferable
to the API implemented by measurex. to the API implemented by `measurex`.
There are two key insights I derived from my telegram PoC. There are two key insights I derived from my telegram PoC.
@ -1415,7 +1437,7 @@ the PoC is that *traces are numbered*. This is not what happens
currently in OONI Probe, but it is *beneficial*. By numbering currently in OONI Probe, but it is *beneficial*. By numbering
traces, we can quite easily tell which existing event belongs to which traces, we can quite easily tell which existing event belongs to which
specific submeasurement. (If there's no need to number traces, we can specific submeasurement. (If there's no need to number traces, we can
just set a zero index to all the traces we collect, *e passa la paura*.) just set a zero index to all the traces we collect, *[e passa la paura](https://context.reverso.net/traduzione/italiano-inglese/passa+la+paura)*.)
One minor aspect to keep in mind in this design is that we need to One minor aspect to keep in mind in this design is that we need to
communicate to developers that the *trace* will cause the body snapshot communicate to developers that the *trace* will cause the body snapshot
@ -1423,14 +1445,14 @@ to be read as part of the round trip. This fact occurs because OONI's
definition of a request-response transaction includes the response body definition of a request-response transaction includes the response body
(or a snapshot) while Go does not include a body in http.Response (or a snapshot) while Go does not include a body in http.Response
but allows for streaming the body on demand. Because reading all the but allows for streaming the body on demand. Because reading all the
body with netxlite.ReadAllContext without any limit bound is unsafe (as body with `netxlite.ReadAllContext` without any limit bound is unsafe (as
it could consume lots of RAM, and we're not always running on systems it could consume lots of RAM, and we're not always running on systems
with lots of RAM), the example was already limiting the response with lots of RAM), the example was already limiting the response
body length before we introduced data collection. Yet, with the introduction of body length before we introduced data collection. Yet, with the introduction of
data collection, the explicit netxlite.ReadAllContext is now reading data collection, the explicit `netxlite.ReadAllContext` is now reading
from memory rather than from the network because the body snapshot has from memory rather than from the network because the body snapshot has
already been read. So, we need to ensure that developers know that already been read. So, we need to ensure that developers know that
netxlite.ReadAllContext cannot be used to measure/estimate the `netxlite.ReadAllContext` cannot be used to measure/estimate the
download speed. (In such a case, one would need to either use a download speed. (In such a case, one would need to either use a
different transport or not collect any snapshot and then read different transport or not collect any snapshot and then read
the whole body directly from the network--so perhaps we the whole body directly from the network--so perhaps we
@ -1453,16 +1475,16 @@ will always be some form of *limited* step-by-step, where we will always
split DNS lookup and endpoint measurements as we already do in split DNS lookup and endpoint measurements as we already do in
dnscheck to ensure we measure \~all IP addresses. dnscheck to ensure we measure \~all IP addresses.
Compared to measurex, I think step-by-step is \~better because it does Compared to `measurex`, I think step-by-step is \~better because it does
not require anyone to learn more beyond how to use netxlite instead of not require anyone to learn more beyond how to use `netxlite` instead of
the standard library. (BTW, we cannot really get rid of netxlite because the standard library. (BTW, we cannot really get rid of `netxlite` because
we have measurement requirements that call for wrapping and extending we have measurement requirements that call for wrapping and extending
the standard library or to provide enhancements beyond the stdlib the standard library or to provide enhancements beyond the stdlib
functionality.) functionality.)
Regarding the way to implement tracing, from the above discussion, it is Regarding the way to implement tracing, from the above discussion, it is
clear that we should move away from the wrapping approach because it clear that **we should move away from the wrapping approach because it
does not allow us to correctly collect specific events. (To be fair, it does not allow us to correctly collect specific events**. (To be fair, it
could allow us to do that, but it would entail could allow us to do that, but it would entail
significant wrapping efforts.) I would therefore recommend rewriting significant wrapping efforts.) I would therefore recommend rewriting
tracing to use the context (ugh!) but to wrap this implementation inside tracing to use the context (ugh!) but to wrap this implementation inside
@ -1538,21 +1560,21 @@ func NewArchivalTLSOrQUICHandshakeResult(index int64, started time.Time,
// package internal/netxlite // package internal/netxlite
func (thx *tlsHandshakerConfigurable) Handshake(ctx context.Context, func (thx *tlsHandshakerConfigurable) Handshake(ctx context.Context,
conn net.Conn, config \*tls.Config) (net.Conn, tls.ConnectionState, error) { conn net.Conn, config \*tls.Config) (net.Conn, tls.ConnectionState, error) {
// ... // ...
remoteAddr := conn.RemoteAddr().String() // +++ (i.e., added line) remoteAddr := conn.RemoteAddr().String() // +++ (i.e., added line)
trace := ContextTraceOrDefault(ctx) // +++ trace := ContextTraceOrDefault(ctx) // +++
started := time.Now() // +++ started := time.Now() // +++
err := tlsconn.HandshakeContext(ctx) err := tlsconn.HandshakeContext(ctx)
finished := time.Now() // +++ finished := time.Now() // +++
if err != nil { if err != nil {
trace.OnTLSHandshake(started, remoteAddr, config, // +++ trace.OnTLSHandshake(started, remoteAddr, config, // +++
tls.ConnectionState{}, err, finished) // +++ tls.ConnectionState{}, err, finished) // +++
return nil, tls.ConnectionState{}, err return nil, tls.ConnectionState{}, err
} }
state := tlsconn.connectionState() state := tlsconn.connectionState()
trace.OnTLSHandshake(started, remoteAddr, config, // +++ trace.OnTLSHandshake(started, remoteAddr, config, // +++
state, nil, finished) // +++ state, nil, finished) // +++
return tlsconn, state, nil return tlsconn, state, nil
} }
@ -1563,8 +1585,8 @@ func ContextWithTrace(ctx context.Context, trace model.Trace) context.Context {
// package internal/model // package internal/model
type Trace interface { type Trace interface {
OnTLSHandshake(started time.Time, remoteAddr string, config *tls.Config, OnTLSHandshake(started time.Time, remoteAddr string, config *tls.Config,
state tls.ConnectionState, err error, finished time.Time) state tls.ConnectionState, err error, finished time.Time)
// ... // ...
} }
@ -1580,7 +1602,7 @@ can stick with the current model where we have possibly unbounded
traces. (This is just an implementation detail that does not matter much traces. (This is just an implementation detail that does not matter much
regarding the overall design.) regarding the overall design.)
### Smooth transition ### 3.1. Smooth transition
We should do incremental refactoring. We should create a few issues We should do incremental refactoring. We should create a few issues
describing these design aspects and summarize what would be the way describing these design aspects and summarize what would be the way
@ -1600,7 +1622,7 @@ earlier with a simplified tree with less measurement-supporting
libraries. It also seems that dash, hhfm, and hirl can be migrated quite libraries. It also seems that dash, hhfm, and hirl can be migrated quite
easily away from netx and urlgetter. easily away from netx and urlgetter.
### Netxlite scope change ### 3.2. Netxlite scope change
If we move forward with this plan, we will slightly change the scope of If we move forward with this plan, we will slightly change the scope of
netxlite to include lightweight support for collecting traces. We netxlite to include lightweight support for collecting traces. We
@ -1613,7 +1635,7 @@ policy for saving measurements by implementing model.Trace properly. So,
we should also amend the documentation of netxlite to we should also amend the documentation of netxlite to
explicitly mention support for tracing as a new concern. explicitly mention support for tracing as a new concern.
### Cleanups ### 3.3. Cleanups
If we implement this step-by-step change, we no longer need a "flat" data If we implement this step-by-step change, we no longer need a "flat" data
format. We use the flat data format for processing the results format. We use the flat data format for processing the results
@ -1629,7 +1651,7 @@ concern is with the error wrapping, which probably should be in the same
place where we are using the context to inject a trace to ensure place where we are using the context to inject a trace to ensure
that error wrapping and tracing happen together.) that error wrapping and tracing happen together.)
## Reviews ## 4. Document Reviews
### 2022-06-09 - Review: Arturo ### 2022-06-09 - Review: Arturo

View File

@ -1,4 +1,4 @@
# Directory github.com/ooni/probe-engine/experiment # Directory github.com/ooni/probe-cli/internal/engine/experiment
This directory contains the implementation of all the supported This directory contains the implementation of all the supported
experiments, one for each directory. The [OONI spec repository experiments, one for each directory. The [OONI spec repository

View File

@ -1,4 +1,4 @@
# Package github.com/ooni/probe-engine/model # Package github.com/ooni/probe-cli/internal/model
Shared data structures and interfaces. We include in this Shared data structures and interfaces. We include in this
package the most fundamental types. Use `go doc` to get package the most fundamental types. Use `go doc` to get

View File

@ -57,7 +57,7 @@ type Measurement struct {
// MeasurementStartTimeSaved is the moment in time when we // MeasurementStartTimeSaved is the moment in time when we
// started the measurement. This is not included into the JSON // started the measurement. This is not included into the JSON
// and is only used within probe-engine as a "zero" time. // and is only used within the ./internal pkg as a "zero" time.
MeasurementStartTimeSaved time.Time `json:"-"` MeasurementStartTimeSaved time.Time `json:"-"`
// Options contains command line options // Options contains command line options