ooni-probe-cli/internal/tutorial/measurex/chapter09
Simone Basso aa27bbe33f
fix(measurex): use same keys of the OONI data format (#572)
This change should simplify the pipeline's job.

Reference issue: https://github.com/ooni/probe/issues/1817.

I previously dismissed this possibility, but now it seems clear it
is simpler to have a very tabular data format internally and to
convert such a format to OONI's data format when serializing.

The OONI data format is what the pipeline expects, but processing
is easier with a more linear/tabular format.
2021-11-05 10:46:45 +01:00
..
main.go fix(measurex): use same keys of the OONI data format (#572) 2021-11-05 10:46:45 +01:00
README.md fix(measurex): use same keys of the OONI data format (#572) 2021-11-05 10:46:45 +01:00

Chapter IX: Parallel HTTPEndpoint measurements

The program we see here is really similar to the one we discussed in the previous chapter. The main difference is the following: rather than looping through the list of HTTPEndpoint, we call a function that runs through the list of endpoints using a small pool of background workers.

There is a trade off between quick measurements and false positives. A timeout is one of the most common ways of censoring HTTPS and HTTP3 endpoints. So, if we run measurements sequentially, a whole scan could in principle take a long time. On the other hand, if we run too many parallel measurements, we may cause our own congestion and maybe some measurements will fail because of that. Our solution to this problem is to have low parallelism: at the moment of writing this note, we have three workers. If you submit more than three HTTPEndpoint at a a time, we will service the first three immediately and all the other endpoints will be queued for later measurement.

(This file is auto-generated. Do not edit it directly! To apply changes you need to modify ./internal/tutorial/measurex/chapter09/main.go.)

main.go

The beginning of the program is pretty much the same.

package main

import (
	"context"
	"encoding/json"
	"flag"
	"fmt"
	"net/url"
	"time"

	"github.com/ooni/probe-cli/v3/internal/measurex"
	"github.com/ooni/probe-cli/v3/internal/runtimex"
)

type measurement struct {
	DNS       []*measurex.DNSMeasurement
	Endpoints []*measurex.HTTPEndpointMeasurement
}

func print(v interface{}) {
	data, err := json.Marshal(v)
	runtimex.PanicOnError(err, "json.Marshal failed")
	fmt.Printf("%s\n", string(data))
}

func main() {
	URL := flag.String("url", "https://blog.cloudflare.com/", "URL to fetch")
	address := flag.String("address", "8.8.4.4:53", "DNS-over-UDP server address")
	timeout := flag.Duration("timeout", 60*time.Second, "timeout to use")
	flag.Parse()
	ctx, cancel := context.WithTimeout(context.Background(), *timeout)
	defer cancel()
	parsed, err := url.Parse(*URL)
	runtimex.PanicOnError(err, "url.Parse failed")
	mx := measurex.NewMeasurerWithDefaultSettings()
	m := &measurement{}
	m.DNS = append(m.DNS, mx.LookupHostUDP(ctx, parsed.Hostname(), *address))
	m.DNS = append(m.DNS, mx.LookupHTTPSSvcUDP(ctx, parsed.Hostname(), *address))
	headers := measurex.NewHTTPRequestHeaderForMeasuring()
	httpEndpoints, err := measurex.AllHTTPEndpointsForURL(parsed, headers, m.DNS...)
	runtimex.PanicOnError(err, "cannot get all the HTTP endpoints")

This is where the program changes. First, we need to create a jar for cookies because the API we're about to call requires a cookie jar. (We mostly use this API with redirects and we want to have cookies with redirects because a small portion of the URLs we typically test require cookies to properly redirect, see https://github.com/ooni/probe/issues/1727 for more information).

Then, we call HTTPEndpointGetParallel. The arguments are:

  • as usual, the context

  • the cookie jar

  • all the endpoints to measure

	cookies := measurex.NewCookieJar()
	for epnt := range mx.HTTPEndpointGetParallel(ctx, cookies, httpEndpoints...) {
		m.Endpoints = append(m.Endpoints, epnt)
	}

The HTTPEndpointGetParallel method returns a channel where it posts HTTPEndpointMeasurements. Once the input list has been fully measured, this method closes the returned channel.

Like we did before, we append the resulting measurements to our m container and we print it.

Exercise: here we're not using the OONI data format and we're instead printing the internally used data structures. Can you modify the code to emit data using OONI's data format here? (Hint: there are conversion functions in measurex.)

	print(m)
}

Running the example program

Let us perform a vanilla run first:

go run -race ./internal/tutorial/measurex/chapter09 | jq

Take a look at the JSON output. Can you spot that endpoints measurements are run in parallel?

Conclusion

We have seen how to run HTTPEndpoint measurements in parallel.