ooni-probe-cli/internal/engine/legacy/netx/DESIGN.md
Simone Basso d57c78bc71
chore: merge probe-engine into probe-cli (#201)
This is how I did it:

1. `git clone https://github.com/ooni/probe-engine internal/engine`

2. ```
(cd internal/engine && git describe --tags)
v0.23.0
```

3. `nvim go.mod` (merging `go.mod` with `internal/engine/go.mod`

4. `rm -rf internal/.git internal/engine/go.{mod,sum}`

5. `git add internal/engine`

6. `find . -type f -name \*.go -exec sed -i 's@/ooni/probe-engine@/ooni/probe-cli/v3/internal/engine@g' {} \;`

7. `go build ./...` (passes)

8. `go test -race ./...` (temporary failure on RiseupVPN)

9. `go mod tidy`

10. this commit message

Once this piece of work is done, we can build a new version of `ooniprobe` that
is using `internal/engine` directly. We need to do more work to ensure all the
other functionality in `probe-engine` (e.g. making mobile packages) are still WAI.

Part of https://github.com/ooni/probe/issues/1335
2021-02-02 12:05:47 +01:00

13 KiB

OONI Network Extensions

Author Simone Basso
Last-Updated 2020-04-02
Status approved

Introduction

OONI experiments send and/or receive network traffic to determine if there is blocking. We want the implementation of OONI experiments to be as simple as possible. We also want to attribute errors to the major network or protocol operation that caused them.

At the same time, we want an experiment to collect as much low-level data as possible. For example, we want to know whether and when the TLS handshake completed; what certificates were provided by the server; what TLS version was selected; and so forth. These bits of information are very useful to analyze a measurement and better classify it.

We also want to automatically or manually run follow-up measurements where we change some configuration properties and repeat the measurement. For example, we may want to configure DNS over HTTPS (DoH) and then attempt to fetch again an URL. Or we may want to detect whether there is SNI bases blocking. This package allows us to do that in other parts of probe-engine.

Rationale

As we observed ooni/probe-engine#13, every experiment consists of two separate phases:

  1. measurement gathering

  2. measurement analysis

During measurement gathering, we perform specific actions that cause network data to be sent and/or received. During measurement analysis, we process the measurement on the device. For some experiments (e.g., Web Connectivity), this second phase also entails contacting OONI backend services that provide data useful to complete the analysis.

This package implements measurement gathering. The analysis is performed by other packages in probe-engine. The core design idea is to provide OONI-measurements-aware replacements for Go standard library interfaces, e.g., the http.RoundTripper. On top of that, we'll create all the required interfaces to achive the measurement goals mentioned above.

We are of course writing test templates in probe-engine anyway, because we need additional abstraction, but we can take advantage of the fact that the API exposed by this package is stable by definition, because it mimics the stdlib. Also, for many experiments we can collect information pertaining to TCP, DNS, TLS, and HTTP with a single call to netx.

This code used to live at github.com/ooni/netx. On 2020-03-02 we merged github.com/ooni/netx@4f8d645bce6466bb into probe-engine because it was more practical and enabled easier refactoring.

Definitions

Consistently with Go's terminology, we define HTTP round trip the process where we get a request to send; we find a suitable connection for sending it, or we create one; we send headers and possibly body; and we receive response headers.

We also define HTTP transaction the process starting with an HTTP round trip and terminating by reading the full response body.

We define netx replacement a Go struct of interface that has the same interface of a Go standard library object but additionally performs measurements.

Enhanced error handling

This library MUST wrap error such that:

  1. we can classify all errors we care about; and

  2. we can map them to major operations.

The github.com/ooni/netx/modelx MUST contain a wrapper for Go error named ErrWrapper that is at least like:

type ErrWrapper struct {
    Failure    string // error classification
    Operation  string // operation that caused error
    WrappedErr error  // the original error
}

func (e *ErrWrapper) Error() string {
    return e.Failure
}

Where Failure is one of the errors we care about, i.e.:

  • connection_refused: ECONNREFUSED
  • connection_reset: ECONNRESET
  • dns_bogon_error: detected bogon in DNS reply
  • dns_nxdomain_error: NXDOMAIN in DNS reply
  • eof_error: unexpected EOF on connection
  • generic_timeout_error: some timer has expired
  • ssl_invalid_hostname: certificate not valid for SNI
  • ssl_unknown_autority: cannot find CA validating certificate
  • ssl_invalid_certificate: e.g. certificate expired
  • unknown_failure <string>: any other error

Note that we care about bogons in DNS replies because they are often used to censor specific websites.

And where Operation is one of:

  • resolve: domain name resolution
  • connect: TCP connect
  • tls_handshake: TLS handshake
  • http_round_trip: reading/writing HTTP

The code in this library MUST wrap returned errors such that we can cast back to ErrWrapper during the analysis phase, using Go 1.13 errors library as follows:

var wrapper *modelx.ErrWrapper
if errors.As(err, &wrapper) == true {
    // Do something with the error
}

Netx replacements

We want to provide netx replacements for the following interfaces in the Go standard library:

  1. http.RoundTripper

  2. http.Client

  3. net.Dialer

  4. net.Resolver

Accordingly, we'll define the following interfaces in the github.com/ooni/probe-engine/netx/modelx package:

type DNSResolver interface {
	LookupHost(ctx context.Context, hostname string) ([]string, error)
}

type Dialer interface {
	Dial(network, address string) (net.Conn, error)
	DialContext(ctx context.Context, network, address string) (net.Conn, error)
}

type TLSDialer interface {
	DialTLS(network, address string) (net.Conn, error)
	DialTLSContext(ctx context.Context, network, address string) (net.Conn, error)
}

We won't need an interface for http.RoundTripper because it is already an interface, so we'll just use it.

Our replacements will implement these interfaces.

Using an API compatible with Go's standard libary makes it possible to use, say, our net.Dialer replacement with other libraries. Both http.Transport and gorilla/websocket's websocket.Dialer have functions like Dial and DialContext that can be overriden. By overriding such function pointers, we could use our replacements instead of the standard libary, thus we could collect measurements while using third party code to implement specific protocols.

Also, using interfaces allows us to combine code quite easily. For example, a resolver that detects bogons is easily implemented as a wrapper around another resolve that performs the real resolution.

Dispatching events

The github.com/ooni/netx/modelx package will define an handler for low level events as:

type Handler interface {
    OnMeasurement(Measurement)
}

We will provide a mechanism to bind a specific handler to a context.Context such that the handler will receive all the measurements caused by code using such context. This mechanism is like:

type MeasurementRoot struct {
	Beginning time.Time // the "zero" time
	Handler Handler     // the handler to use
}

You will be able to assign a MeasurementRoot to a context by using the following function:

func WithMeasurementRoot(
    ctx context.Context, root *MeasurementRoot) context.Context

which will return a clone of the original context that uses the MeasurementRoot. Pass this context to any method of our replacements to get measurements.

Given such context, or a subcontext, you can get back the original MeasurementRoot using:

func ContextMeasurementRoot(ctx context.Context) *MeasurementRoot

which will return the context MeasurementRoot or nil if none is set into the context. This is how our internal code gets access to the MeasurementRoot.

Constructing and configuring replacements

The github.com/ooni/probe-engine/netx package MUST provide an API such that you can construct and configure a net.Resolver replacement as follows:

r, err := netx.NewResolverWithoutHandler(dnsNetwork, dnsAddress)
if err != nil {
    log.Fatal("cannot configure specifc resolver")
}
var resolver modelx.DNSResolver = r
// now use resolver ...

where DNSNetwork and DNSAddress configure the type of the resolver as follows:

  • when DNSNetwork is "" or "system", DNSAddress does not matter and we use the system resolver

  • when DNSNetwork is "udp", DNSAddress is the address or domain name, with optional port, of the DNS server (e.g., 8.8.8.8:53)

  • when DNSNetwork is "tcp", DNSAddress is the address or domain name, with optional port, of the DNS server (e.g., 8.8.8.8:53)

  • when DNSNetwork is "dot", DNSAddress is the address or domain name, with optional port, of the DNS server (e.g., 8.8.8.8:853)

  • when DNSNetwork is "doh", DNSAddress is the URL of the DNS server (e.g. https://cloudflare-dns.com/dns-query)

When the resolve is not the system one, we'll also be able to emit events when performing resolution. Otherwise, we'll just emit the DNSResolveDone event defined below.

Any resolver returned by this function may be configured to return the dns_bogon_error if any LookupHost lookup returns a bogon IP.

The package will also contain this function:

func ChainResolvers(
    primary, secondary modelx.DNSResolver) modelx.DNSResolver

where you can create a new resolver where secondary will be invoked whenever primary fails. This functionality allows us to be more resilient and bypass automatically certain types of censorship, e.g., a resolver returning a bogon.

The github.com/ooni/probe-engine/netx package MUST also provide an API such that you can construct and configure a net.Dialer replacement as follows:

d := netx.NewDialerWithoutHandler()
d.SetResolver(resolver)
d.ForceSpecificSNI("www.kernel.org")
d.SetCABundle("/etc/ssl/cert.pem")
d.ForceSkipVerify()
var dialer modelx.Dialer = d
// now use dialer

where SetResolver allows you to change the resolver, ForceSpecificSNI forces the TLS dials to use such SNI instead of using the provided domain, SetCABundle allows to set a specific CA bundle, and ForceSkipVerify allows to disable certificate verification. All these funcs MUST NOT be invoked once you're using the dialer.

The github.com/ooni/probe-engine/netx package MUST contain code so that we can do:

t := netx.NewHTTPTransportWithProxyFunc(
    http.ProxyFromEnvironment,
)
t.SetResolver(resolver)
t.ForceSpecificSNI("www.kernel.org")
t.SetCABundle("/etc/ssl/cert.pem")
t.ForceSkipVerify()
var transport http.RoundTripper = t
// now use transport

where the functions have the same semantics as the namesake functions described before and the same caveats.

We also have syntactic sugar on top of that and legacy methods, but this fully describes the design.

Structure of events

The github.com/ooni/probe-engine/netx/modelx will contain the definition of low-level events. We are interested in knowing the following:

  1. the timing and result of each I/O operation.

  2. the timing of HTTP events occurring during the lifecycle of an HTTP request.

  3. the timing and result of the TLS handshake including the negotiated TLS version and other details such as what certificates the server has provided.

  4. DNS events, e.g. queries and replies, generated as part of using DoT and DoH.

We will represent time as a time.Duration since the beginning configured either in the context or when constructing an object. The modelx package will also define the Measurement event as follows:

type Measurement struct {
    Connect             *ConnectEvent
    HTTPConnectionReady *HTTPConnectionReadyEvent
    HTTPRoundTripDone   *HTTPRoundTripDoneEvent
    ResolveDone         *ResolveDoneEvent
    TLSHandshakeDone    *TLSHandshakeDoneEvent
}

The events above MUST always be present, but more events will likely be available. The structure will contain a pointer for every event that we support. The events processing code will check what pointer or pointers are not nil to known which event or events have occurred.

To simplify joining events together the following holds:

  1. when we're establishing a new connection there is a nonzero DialID shared by Connect and ResolveDone

  2. a new connection has a nonzero ConnID that is emitted as part of a successful Connect event

  3. during an HTTP transaction there is a nonzero TransactionID shared by HTTPConnectionReady and HTTPRoundTripDone

  4. if the TLS handshake is invoked by HTTP code it will have a nonzero TrasactionID otherwise a nonzero ConnID

  5. the HTTPConnectionReady will also see the ConnID

  6. when a transaction starts dialing, it will pass its TransactionID to ResolveDone and Connect

  7. when we're dialing a connection for DoH, we pass the DialID to the HTTPConnectionReady event as well

Because of the following rules, it should always be possible to bind together events. Also, we define more events than the above, but they are ancillary to the above events. Also, the main reason why HTTPConnectionReady is here is because it is the event allowing to bind ConnID and TransactionID.