A problem has two ends
A simple ssl issue gave me the opportunity to spread some troubleshooting love
So today I had an interesting teaching experience with someone who claimed they had a certificate problem. I needed to see if they were right. I explained my thinking to them in order to help them understand why I had come to my eventual conclusion.
I'm a visual thinker, and the best way for me to start to see any problem is by visualising the two ends of it, the cause and the effect. There's a reason the most hated question in tech support is 'is it plugged in?'. They are looking to the far-left of the problem, starting at one end. Picking one end is a good technique and one that's worked for me for decades.
My colleague believed that the effect - a broken chrome browser session in a third-party app complaining about net::ERR_INSECURE_RESPONSE - was linked to a cause - an issue with the certificate being presented by the target site. Most often (according to Google) this type of error is caused by certificate issues like use of SHA1, use of the wrong CN etc. so I can see why the finger is being pointed at the far-right end. No harm, no foul so far.
However our front-end is AWS Cloudfront, backed by AWS certificates so I'm sceptical that it's a certificate problem :)
The simplest way for me to see if there are problems with the certs is to take a look at them. So I requested the site in a different browser, one I know to be working. This worked just fine, but I couldn't easily see the full detail (I know it's in there, but it's not all readable in one place afaik) so I called out using openssl to download the certs. I did this using a quick.
openssl s_client -showcerts -verify 5 -connect <domain>:443 < /dev/null
I checked the canonical name, the SANs and the crypto methods, and everything looked fine. So could there be a problem with the certificate? No, the cert and its chain are fine.
I'm 99% certain at this point that it's a client problem but I'm not strictly asking the same question the client browser was asking. Let's step right in the problem from the browser end and check if anything obvious could be causing the problem.
I'm doing a plain http/1.1 GET, however the browser in this instance was doing a CORS pre-flight OPTIONS request. Is this important? No, because the ssl handshake is failing at connection, before the client is sending their OPTIONS request. Just for the hell of it I tried exactly the same request using curl, which worked fine so it's not that.
Let's step right again to the content being asked for:
If the requested page contained an iframe to an insecure site, it might complain about ssl errors in the browser window. However the trace we had was a session waterfall graph, and it clearly showed the error on request of this specific page. So it's not this either.
But let's not forget that there are two ends. So now we feel like we're firmly beyond our world, we know the certificates are fine and we've dug around a bit at the edge to see if we missed anything, so let's look far-right at the client. It's third party, and I have nothing but some traces. I took a look anyway to see what could be seen.
Then I noticed the user-agent string in the original request:
This is Chrome 56, which is a 2017 release. An awful lot of SSL water has passed under the bridge since 2017, so maybe that's where the issue is? There's also a note in the original email that this is 'a new release'. Wait, what?
One rule of break/fix is that something must change for something to break. Slow, erosive changes, environmental changes, certificate changes, software releases. At its root all things which now work will experience change, and thus will break.
So I can't see a certificate problem, and they recently changed something and deployed something old, and then something broke.
Right now the evidence points to their release. Perhaps an old version of a browser bundled into the app which is hitting a known bug, perhaps it's missing some more modern capability or missing something from the certificate store. I'm confident now to ask them to go ask the third party to check.
I got this back:
So today I taught some people that not all SSL errors are caused by server-end certificates, and helped them figure out where the issue might be :)
Now I'm just waiting to find out what it was!