2021-09-30 01:36:03 +02:00
|
|
|
|
|
|
|
# Chapter XIV: A possible rewrite of Web Connectivity
|
|
|
|
|
|
|
|
In this chapter we try to solve the exercise laid out in
|
|
|
|
the previous chapter, using `measurex` primitives.
|
|
|
|
|
|
|
|
(This file is auto-generated. Do not edit it directly! To apply
|
|
|
|
changes you need to modify `./internal/tutorial/measurex/chapter14/main.go`.)
|
|
|
|
|
|
|
|
## main.go
|
|
|
|
|
|
|
|
The beginning of the file is always pretty much the same.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
package main
|
|
|
|
|
|
|
|
import (
|
|
|
|
"context"
|
|
|
|
"crypto/tls"
|
|
|
|
"encoding/json"
|
|
|
|
"flag"
|
|
|
|
"fmt"
|
|
|
|
"net/http"
|
|
|
|
"net/url"
|
|
|
|
"time"
|
|
|
|
|
|
|
|
"github.com/ooni/probe-cli/v3/internal/measurex"
|
|
|
|
"github.com/ooni/probe-cli/v3/internal/netxlite"
|
|
|
|
"github.com/ooni/probe-cli/v3/internal/runtimex"
|
|
|
|
)
|
|
|
|
|
|
|
|
func print(v interface{}) {
|
|
|
|
data, err := json.Marshal(v)
|
|
|
|
runtimex.PanicOnError(err, "json.Marshal failed")
|
|
|
|
fmt.Printf("%s\n", string(data))
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
## measurement type
|
|
|
|
|
|
|
|
We define a measurement type with the fields
|
|
|
|
that a Web Connectivity measurement should have.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
type measurement struct {
|
2021-11-05 10:46:45 +01:00
|
|
|
Queries []*measurex.ArchivalDNSLookupEvent `json:"queries"`
|
|
|
|
TCPConnect []*measurex.ArchivalTCPConnect `json:"tcp_connect"`
|
|
|
|
TLSHandshakes []*measurex.ArchivalQUICTLSHandshakeEvent `json:"tls_handshakes"`
|
|
|
|
Requests []*measurex.ArchivalHTTPRoundTripEvent `json:"requests"`
|
2021-09-30 01:36:03 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
## WebConnectivity implementation
|
|
|
|
|
|
|
|
We define a function that takes in input a context and a URL to
|
|
|
|
measure and returns a measurement or an error.
|
|
|
|
|
|
|
|
We will only error out in case the input does not allow us to
|
|
|
|
proceed (i.e., invalid input URL).
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
func webConnectivity(ctx context.Context, URL string) (*measurement, error) {
|
|
|
|
```
|
|
|
|
|
|
|
|
We start by parsing the input URL. If we cannot parse it, of
|
2021-10-11 17:48:45 +02:00
|
|
|
course this is a hard error and we cannot continue.
|
2021-09-30 01:36:03 +02:00
|
|
|
|
|
|
|
```Go
|
|
|
|
parsedURL, err := url.Parse(URL)
|
|
|
|
if err != nil {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
We create an empty measurement and a measurer with
|
|
|
|
default settings like we did in the previous chapters.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
m := &measurement{}
|
|
|
|
mx := measurex.NewMeasurerWithDefaultSettings()
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Now it's time to start measuring. We will address all
|
|
|
|
the points laid out in the previous chapter.
|
|
|
|
|
|
|
|
### 1. Enumerating IP addrs
|
|
|
|
|
|
|
|
Let us enumerate all the IP addresses for
|
|
|
|
the input URL's domain using the system resolver.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
dns := mx.LookupHostSystem(ctx, parsedURL.Hostname())
|
2021-11-05 10:46:45 +01:00
|
|
|
m.Queries = append(
|
|
|
|
m.Queries, measurex.NewArchivalDNSLookupEventList(dns.LookupHost)...)
|
2021-09-30 01:36:03 +02:00
|
|
|
|
|
|
|
```
|
|
|
|
|
2021-10-11 17:48:45 +02:00
|
|
|
This is code we have already seen in the previous chapters.
|
2021-09-30 01:36:03 +02:00
|
|
|
|
|
|
|
|
|
|
|
### 2. Building a list of endpoints
|
|
|
|
|
|
|
|
```Go
|
|
|
|
epnts, err := measurex.AllHTTPEndpointsForURL(parsedURL, http.Header{}, dns)
|
|
|
|
if err != nil {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
This is also code we have seen in previous chapters. The only
|
|
|
|
difference is that we supply empty headers since we're not going
|
|
|
|
to actually use the headers inside the endpoints.
|
|
|
|
|
|
|
|
### 3 and 4. Measure each endpoint
|
|
|
|
|
|
|
|
We will loop through the endpoints in the previous point
|
|
|
|
and issue the correct TCP or TLS primitive depending on
|
|
|
|
whether the input URL is HTTP or HTTPS.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
for _, epnt := range epnts {
|
|
|
|
switch parsedURL.Scheme {
|
|
|
|
case "http":
|
|
|
|
tcp := mx.TCPConnect(ctx, epnt.Address)
|
2021-11-05 10:46:45 +01:00
|
|
|
m.TCPConnect = append(
|
|
|
|
m.TCPConnect, measurex.NewArchivalTCPConnectList(tcp.Connect)...)
|
2021-09-30 01:36:03 +02:00
|
|
|
case "https":
|
|
|
|
config := &tls.Config{
|
|
|
|
ServerName: parsedURL.Hostname(),
|
|
|
|
NextProtos: []string{"h2", "http/1.1"},
|
|
|
|
RootCAs: netxlite.NewDefaultCertPool(),
|
|
|
|
}
|
|
|
|
tls := mx.TLSConnectAndHandshake(ctx, epnt.Address, config)
|
2021-11-05 10:46:45 +01:00
|
|
|
m.TCPConnect = append(
|
|
|
|
m.TCPConnect, measurex.NewArchivalTCPConnectList(tls.Connect)...)
|
|
|
|
m.TLSHandshakes = append(m.TLSHandshakes,
|
|
|
|
measurex.NewArchivalQUICTLSHandshakeEventList(tls.TLSHandshake)...)
|
2021-09-30 01:36:03 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
At this point we've addressed points 1-4. So let's
|
|
|
|
now focus on the last point:
|
|
|
|
|
|
|
|
### 5. HTTP measurement
|
|
|
|
|
|
|
|
We need to manually build a `MeasurementDB`. This is a
|
2021-10-11 17:48:45 +02:00
|
|
|
"database" where the networking code will store events.
|
2021-09-30 01:36:03 +02:00
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
db := &measurex.MeasurementDB{}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Following the hint from the previous chapter we use the
|
|
|
|
`NewTracingHTTPTransportWithDefaultSettings` factory
|
|
|
|
to create an `http.Transport`-like object that will trace
|
|
|
|
HTTP round trip events writing them into `db`.
|
|
|
|
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
txp := measurex.NewTracingHTTPTransportWithDefaultSettings(mx.Begin, mx.Logger, db)
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
We now build an `http.Client` using the transport
|
|
|
|
we've just created and a cookie jar (which we
|
|
|
|
use because otherwise some redirects will lead
|
|
|
|
to a redirect loop, as mentioned in previous chapters).
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
clnt := &http.Client{
|
|
|
|
Transport: txp,
|
|
|
|
Jar: measurex.NewCookieJar(),
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Now we use a method of the measurer that allows us to
|
|
|
|
perform an HTTP GET with an existing HTTP client
|
|
|
|
and a URL. This method will set a timeout and perform
|
|
|
|
the round trip. Reading a snapshot of the response
|
|
|
|
body is not implemented by this function but rather
|
|
|
|
is a property of the "tracing" HTTP transport we
|
|
|
|
created above (this type of transport is the one we
|
2021-10-11 17:48:45 +02:00
|
|
|
have been using internally in all the examples
|
2021-09-30 01:36:03 +02:00
|
|
|
presented so far.)
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
resp, _ := mx.HTTPClientGET(ctx, clnt, parsedURL)
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
To be tidy, we also close the response body in case
|
|
|
|
we have a response. We don't really need to read
|
|
|
|
the body here. As mentioned previously, we're already
|
|
|
|
using an HTTP transport reading a body snapshot.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
if resp != nil {
|
|
|
|
resp.Body.Close() // tidy
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Finally, we append the round trips we performed into
|
|
|
|
the right field and return the measurement.
|
|
|
|
|
|
|
|
To this end, we're using the `db.AsMeasurement` method that
|
|
|
|
takes the current set of events into `db` and assembles
|
|
|
|
them into the `Measurement` struct we've been using in all
|
|
|
|
the chapters we have seen so far.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
2021-11-05 10:46:45 +01:00
|
|
|
m.Requests = append(m.Requests, measurex.NewArchivalHTTPRoundTripEventList(
|
|
|
|
db.AsMeasurement().HTTPRoundTrip)...)
|
2021-09-30 01:36:03 +02:00
|
|
|
return m, nil
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
The rest of the program is pretty straightforward.
|
|
|
|
|
|
|
|
```Go
|
|
|
|
|
|
|
|
func main() {
|
|
|
|
URL := flag.String("url", "https://www.google.com/", "URL to fetch")
|
|
|
|
timeout := flag.Duration("timeout", 60*time.Second, "timeout to use")
|
|
|
|
flag.Parse()
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), *timeout)
|
|
|
|
defer cancel()
|
|
|
|
m, err := webConnectivity(ctx, *URL)
|
|
|
|
runtimex.PanicOnError(err, "invalid arguments to webConnectivity (wrong URL?)")
|
|
|
|
print(m)
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
## Running the example program
|
|
|
|
|
|
|
|
Let us perform a vanilla run first:
|
|
|
|
|
|
|
|
```bash
|
2021-10-22 16:17:57 +02:00
|
|
|
go run -race ./internal/tutorial/measurex/chapter14 | jq
|
2021-09-30 01:36:03 +02:00
|
|
|
```
|
|
|
|
|
|
|
|
Take a look at the JSON.
|
|
|
|
|
|
|
|
Now try running the program with `http://gmail.com` as
|
|
|
|
input. Take note of the redirect chain. See how the
|
|
|
|
domain changes during the redirect. Take note of the
|
|
|
|
fact that we are not measuring any TLS handshake. See
|
|
|
|
how we're not trying QUIC endpoints. These are, in
|
|
|
|
fact, some of the limitations of Web Connectivity that
|
|
|
|
we were trying to address when we wrote `measurex`.
|
|
|
|
|
|
|
|
Also, build the miniooni research client:
|
|
|
|
|
|
|
|
```
|
|
|
|
go build -v ./internal/cmd/miniooni
|
|
|
|
```
|
|
|
|
|
|
|
|
Run Web Connectivity with:
|
|
|
|
|
|
|
|
```
|
|
|
|
./miniooni -ni http://gmail.com web_connectivity
|
|
|
|
```
|
|
|
|
|
|
|
|
This writes the report in a file named `report.jsonl`.
|
|
|
|
|
|
|
|
Check the content of the file and match it with the
|
|
|
|
output of this chapter. Are there other notable
|
|
|
|
differences between the two outputs?
|
|
|
|
|
|
|
|
### Bonus question
|
|
|
|
|
|
|
|
The solution we presented is true to the original
|
|
|
|
spirit of Web Connectivity, where we first perform
|
|
|
|
separate DNS, TCP/TLS steps, and then we also
|
|
|
|
perform a separate HTTP step. Is there in `measurex`
|
|
|
|
an API allowing you to invert the order of the
|
|
|
|
operations, that is:
|
|
|
|
|
|
|
|
1. build a full-fledged HTTP client where we can
|
|
|
|
trace _any_ operation;
|
|
|
|
|
|
|
|
2. use such client to measure the URL;
|
|
|
|
|
|
|
|
3. figure out what TCP endpoints we did not
|
|
|
|
test for TCP/TLS during this process and run
|
|
|
|
TCP/TLS testing only for them?
|
|
|
|
|
|
|
|
If such an API exist, can you write a simple
|
|
|
|
main.go client that implements points 1-3 above?
|
|
|
|
|
|
|
|
## Conclusion
|
|
|
|
|
|
|
|
We have presented the solution to the exercise
|
|
|
|
proposed in the previous chapter, i.e., how
|
|
|
|
to rewrite Web Connectivity using `measurex` API.
|
|
|
|
|
|
|
|
You have now been exposed to some complexity and
|
|
|
|
APIs to perform OONI measurements. So you should now
|
|
|
|
be read to help us write new and maitain existing
|
|
|
|
network experiments.
|
|
|
|
|
|
|
|
If you have further questions, please [contact us](
|
|
|
|
https://ooni.org/about/).
|
|
|
|
|