1. add eof channel to event emitter and use this channel as signal
that we shouldn't be sending anymore instead of using a pattern where
we use a timer to decide sending has timed out (because we're using
a buffered channel, it is still possible for some evetns to end up
in the channel if there is space, but this is not a concern, because
the events will be deleted when the channel itself is gone);
2. refactor all tests where we assumed the output channel was closed
to actually use a parallel "eof" channel and use it as signal we
should not be sending anymore (not strictly required but still the
right thing to do in terms of using consistent patterns);
3. modify how we construct a runner so that it passes to the
event emitter an "eof" channel and closes this channel when the
main goroutine running the task is terminating;
4. modify the task to signal events such as "task goroutine started"
and "task goroutine stopped" using channels, which helps to write
much more correct tests;
5. take advantage of the previous change to improve the test that
ensures we're not blocking for a small number of events and also
improve the name of such a test to reflect what it's testing.
The related issue in term of fixing the channel usage pattern is
https://github.com/ooni/probe/issues/1438.
Regarding improving testability, instead, the correct reference
issue is https://github.com/ooni/probe/issues/1903.
There are possibly more changes to apply here to improve this package
and its testability, but let's land this diff first and then see
how much time is left for further improvements.
I've run unit and integration tests with `-race` locally.
This diff will need to be backported to `release/3.11`.
This diff forward ports 90bf0629b957c912a5a6f3bb6c98ad3abb5a2ff6 to `master`.
If we close the channel to signal the end of a task we may panic
when some background goroutine tries to post on the channel.
This bug is rare but may happen.
See for example https://github.com/ooni/probe/issues/1438.
How can we improve?
First, let us add a timeout when sending to the channel. Given that
the channel is buffered and we have a generous timeout (1/4 of a
second), it's unlikely we will really block. But, in the event in
which a late message appears, we'll eventually _unblock_ when
sending with a timeout. So, now we don't have to worry anymore
about leaking forever a goroutine.
Then, let us change the protocol with which we signal that a task
is done. We used to close the channel. Now, instead we just
synchronously post a nil on the channel when done.
In turn, we interpret this nil to mean that the task is done when
we receive messages.
The _main_ different with respect to before is that now we are
asking the consumer of our API to drain the channel. Because
before we had a blocking channel, it seems to me we were already
requiring the consumer of the API to do that. Which means, I think
in practical terms it did not change much.
Finally, acknowledge that we don't need a specific state variable
to tell us we're done and simplify a little bit the API by
just making isRunning private and using the "we're done" signal
to determine whether we've stopped running the task.
All these changes should be enough to close https://github.com/ooni/probe/issues/1438.