ooni-probe-cli/internal/tutorial/measurex/chapter12
Simone Basso aa27bbe33f
fix(measurex): use same keys of the OONI data format (#572)
This change should simplify the pipeline's job.

Reference issue: https://github.com/ooni/probe/issues/1817.

I previously dismissed this possibility, but now it seems clear it
is simpler to have a very tabular data format internally and to
convert such a format to OONI's data format when serializing.

The OONI data format is what the pipeline expects, but processing
is easier with a more linear/tabular format.
2021-11-05 10:46:45 +01:00
..
main.go fix(measurex): use same keys of the OONI data format (#572) 2021-11-05 10:46:45 +01:00
README.md fix(measurex): use same keys of the OONI data format (#572) 2021-11-05 10:46:45 +01:00

Chapter XII: Following redirections.

This program shows how to combine the URL measurement "step" introduced in the previous chapter with following redirections. If we say that the previous chapter performed a "web step", then we can say that here we're performing multiple "web steps".

(This file is auto-generated. Do not edit it directly! To apply changes you need to modify ./internal/tutorial/measurex/chapter12/main.go.)

main.go

The beginning of the program is pretty much the same, except that here we need to define a measurement container type that will contain the result of each "web step".

package main

import (
	"context"
	"encoding/json"
	"flag"
	"fmt"
	"time"

	"github.com/ooni/probe-cli/v3/internal/measurex"
	"github.com/ooni/probe-cli/v3/internal/runtimex"
)

type measurement struct {
	URLs []*measurex.ArchivalURLMeasurement
}

func print(v interface{}) {
	data, err := json.Marshal(v)
	runtimex.PanicOnError(err, "json.Marshal failed")
	fmt.Printf("%s\n", string(data))
}

func main() {
	URL := flag.String("url", "http://facebook.com/", "URL to fetch")
	timeout := flag.Duration("timeout", 60*time.Second, "timeout to use")
	flag.Parse()
	ctx, cancel := context.WithTimeout(context.Background(), *timeout)
	defer cancel()
	all := &measurement{}
	mx := measurex.NewMeasurerWithDefaultSettings()
	cookies := measurex.NewCookieJar()
	headers := measurex.NewHTTPRequestHeaderForMeasuring()

Everything above this line is like in chapter11. What changes now is that we're calling MeasureURLAndFollowRedirections instead of MeasureURL.

Rather than returning a single measurement, this function returns a channel where it posts the result of measuring the original URL along with all its redirections. Internally, MeasureURLAndFollowRedirections calls MeasureURL.

We accumulate the results in URLs and print m. The channel is closed when done by MeasureURLAndFollowRedirections, so we leave the loop.

	for m := range mx.MeasureURLAndFollowRedirections(ctx, *URL, headers, cookies) {
		all.URLs = append(all.URLs, measurex.NewArchivalURLMeasurement(m))
	}
	print(all)
}

Running the example program

Let us perform a vanilla run first:

go run -race ./internal/tutorial/measurex/chapter12 | jq

Take a look at the JSON. You should see several redirects and that we measure each endpoint of each redirect, including QUIC endpoints that we discover on the way.

Exercise: remove code for converting to OONI data format and compare output with previous chapter. See any difference?

Conclusion

We have introduced MeasureURLAndFollowRedirect, the top-level API for fully measuring a URL and all the URLs that derive from such an URL via redirection.