Bugfix: Send the TLS alert to the peer upon a ssl.CertificateError.#3413
Bugfix: Send the TLS alert to the peer upon a ssl.CertificateError.#3413sasozivanovic wants to merge 8 commits intopython-trio:mainfrom
ssl.CertificateError.#3413Conversation
This commit (partially) addresses python-trio/trio-websocket#199, which I traced to a bug in Trio itself. A corresponding bugfix relying on the present bugfix will be submitted to the Trio Websocket repo. From the user's perspective, the issue was that a TLS client hanged after submitting an invalid (e.g. expired) client certificate. Before this fix, when `SSLStream._retry` caught a `ssl.CertificateError`, the error was immediately re-raised (wrapped in a `trio.BrokenResourceError`). The TLS alert prepared by the `ssl` library and waiting in MemoryBIO was therefore never sent to the peer. Now, upon catching a `ssl.CertificateError`, we first check whether we have any pending outgoing data (`self._outgoing.pending`). If so, the exception is stashed away and only raised (again, wrapped in a `trio.BrokenResourceError`) after sending out the pending data (which should contain the alert). I had first tried an alternative implementation, where the `CertificateError` was not stashed away. Rather, the error was raised immediately, but only if there was no pending outgoing data. The idea was to rely on the loop in `SSLStream._retry`. Upon seeing `CertificateError` for the first time, there would be pending data, so the exception would not be reraised and the pending data would be sent out, while upon seeing the `CertificateError` for the second time, there would be no pending data, so the exception would be re-raised. However, it turned out that the loop did not always continue (I'm not sure why), so there was no second time in some situations. TESTS in `test_ssl.py` `test_ssl_client_basics` (modified) Here we test whether the TLS alert sent out by the client reaches the (blocking) server. The second (no CA file) and the third (wrong host name) subtest of this test were modified to check that the server encounters the correct SSL error. In the old code, the server encountered `UNEXPECTED_EOF_WHILE_READING` (protocol error) in both subtests. After the fix, it correctly receives `TLSV1_ALERT_UNKNOWN_CA` and `SSLV3_ALERT_BAD_CERTIFICATE`, respectively. To facilitate the modified test, function `ssl_echo_serve_sync` (called by `ssl_echo_server_raw` called by this test) now allows for a special value of keyword argument `expect_fail`: `"raise"`. When given this value, the error is expected but raised nevertheless, the idea being that it should be caught and inspected in the test code. `test_client_certificate` (new) Here we test whether the TLS alert sent out by the server reaches the (blocking) client. The test is modeled on `test_ssl_server_basics`. The server is configured to require client authentication (`SERVER_CTX_REQ`, defined at the top of the file). In the first subtest, the client submits a valid certificate; in the second subtest, an expired one. There is a complication with the second subtest. If the client does not send out the TLS alert (like before the bugfix), the server hangs, but we don't want the test to hang. I could think of no other way to test whether the server hangs than imposing an arbitrary (small) timeout, and there is a (very) small chance that even the correct code will not finish within the allotted time, resulting in a false negative.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3413 +/- ##
===============================================
Coverage 100.00000% 100.00000%
===============================================
Files 128 128
Lines 19417 19482 +65
Branches 1318 1321 +3
===============================================
+ Hits 19417 19482 +65
🚀 New features to boost your workflow:
|
|
On the coverage report. The missing code will never be reached assuming that every |
This commit relies on the bugfix proposed in PR python-trio/trio#3413 to address issue python-trio#199. From the user's perspective, the issue was that a TLS client hanged after submitting an invalid (e.g. expired) client certificate. The trio bugfix makes sure that the server (in general, peer) sends out the TLS alert after catching a `ssl.CertificateError`. In this commit, we do two things. First, we intercept `ssl.SSLError`s created by the alerts when receiving data from the stream (`WebSocketConnection._reader_task`) and sending data to the stream (`WebSocketConnection._send`), and wrap them into a `HandshakeError`. Second, we catch and ignore these `HandshakeError`s in `WebSocketServer._handle_connection`, so that a certificate error does not crash the server. Here, I'm unsure whether we should ignore *all* SSL errors (currently, this is the case), or maybe only those generated by alerts, or only those having to do with certificates, and neither do I know how to compile a comprehensive list of the latter, if that is what should be done.
|
(similar to you, I don't especially know much about networking....)
You can drop this -- we already have general timeouts for tests. We have other tests which do similar evil things (specifically the REPL, if I remember correctly).
Huh I'm surprised the issue isn't instead that
IMO add an Looks like CI is failing! Also please rename the newsfragment, I think |
According to the coverage report, `self._outgoing.pending` is always non-empty.
`datetime.UTC` was only introduced in Python 3.11, so the CIs were failing on setups with Python 3.10.
The additional semantics for `expect_fail` still seems less invasive than a new keyword argument like `silent`.
|
Thanks for the quick reply!
I'm not sure about this. The general timeout is not only rather long (1 minute), it also doesn't do the job: the test client works in a thread, but this thread is not abandoned upon the general timeout, so the test hangs. I followed the rest of your suggestions — thanks for the ideas and reminders! I also looked at the CI reports, and I managed to fix all issues but one (occuring twice):
I first thought this was an issue with exception group nesting, so I switched from |
|
My first guess about the 3 mypy errors (not looking at the code) is that the first is because you forgot to explicitly provide a type somewhere, and the two you mention I suspect are because the variable name is reused? (mypy is very picky that variables always mean the same type! at least by default, I forget if we changed the right settings...)
Hm, yeah, I suppose. There's other tests that hang rather than fail, though that's definitely a worse experience (but... as long as CI fails if this is broken, all's well!). I guess it's up to you, though make sure to size the timeout large enough because the free GitHub Actions runners we use are... pretty bad -- a |
A5rocks
left a comment
There was a problem hiding this comment.
Hopefully these specific suggestions help? It's possible there's a mypy bug or something weirder going on...
|
@A5rocks Many thanks, I'll look into this right away! |
|
@A5rocks Fantastic, your suggestions worked like a charm! |
I'm thinking, maybe increase the timeout from 0.1s to 1s, just to be on the safe side. The thing is, this is not a timeout on the entire test, but on a very specific part of it, between the client receiving the TLS alert and setting the |
Disclaimer: I'm new to networking, async stuff, and especially SSL (just a regular programmer who wants to inform the client that their certificate expired), while
SSLStream._retryis claimed to be "probably the single trickiest function in Trio" in the code comments. I tried my best, and learned a lot on the way, but this pull request really has to be reviewed, and thoroughly, by someone who knows very well what they are doing. 😁This commit (partially) addresses python-trio/trio-websocket#199, which I traced to a bug in Trio itself. A corresponding bugfix relying on the present bugfix will be submitted to the Trio Websocket repo.
From the user's perspective, the issue was that a TLS client hanged after submitting an invalid (e.g. expired) client certificate.
Before this fix, when
SSLStream._retrycaught assl.CertificateError, the error was immediately re-raised (wrapped in atrio.BrokenResourceError). The TLS alert prepared by thessllibrary and waiting in MemoryBIO was therefore never sent to the peer.Now, upon catching a
ssl.CertificateError, we first check whether we have any pending outgoing data (self._outgoing.pending). If so, the exception is stashed away and only raised (again, wrapped in atrio.BrokenResourceError) after sending out the pending data (which should contain the alert).(I had first tried an alternative implementation, where the
CertificateErrorwas not stashed away. Rather, the error was raised immediately, but only if there was no pending outgoing data. The idea was to rely on the loop inSSLStream._retry. Upon seeingCertificateErrorfor the first time, there would be pending data, so the exception would not be reraised and the pending data would be sent out, while upon seeing theCertificateErrorfor the second time, there would be no pending data, so the exception would be re-raised. However, it turned out that the loop did not always continue (I'm not sure why), so there was no second time in some situations.)Tests in
test_ssl.pytest_ssl_client_basics(modified)Here we test whether the TLS alert sent out by the client reaches the (blocking) server.
The second (no CA file) and the third (wrong host name) subtest of this test were modified to check that the server encounters the correct SSL error. In the old code, the server encountered
UNEXPECTED_EOF_WHILE_READING(protocol error) in both subtests. After the fix, it correctly receivesTLSV1_ALERT_UNKNOWN_CAandSSLV3_ALERT_BAD_CERTIFICATE, respectively.To facilitate the modified test, function
ssl_echo_serve_sync(called byssl_echo_server_rawcalled by this test) now allows for a special value of keyword argumentexpect_fail:"raise". When given this value, the error is expected but raised nevertheless, the idea being that it should be caught and inspected in the test code.test_client_certificate(new)Here we test whether the TLS alert sent out by the server reaches the (blocking) client. The test is modeled on
test_ssl_server_basics.The server is configured to require client authentication (
SERVER_CTX_REQ, defined at the top of the file). In the first subtest, the client submits a valid certificate; in the second subtest, an expired one.There is a complication with the second subtest. If the client does not send out the TLS alert (like before the bugfix), the server hangs, but we don't want the test to hang. I could think of no other way to test whether the server hangs than imposing an arbitrary (small) timeout, and there is a (very) small chance that even the correct code will not finish within the allotted time, resulting in a false negative.