When Let's Encrypt Broke Our HL7 mTLS: A Certificate Incident Post-Mortem
You don’t expect a routine certificate renewal to take down a healthcare integration. But that’s exactly what happened when Let’s Encrypt dropped the clientAuth Extended Key Usage (EKU) from our renewed certificates as part of an industry-wide policy shift. Messages stopped flowing. No alerts fired. The automation that was supposed to keep things running is what broke them.
This post walks through the full incident, the root cause, and the seven lessons we took away from it. No customer names, but the architecture and mistakes are real.
If you’re using any public CA for mTLS today, pay attention — this isn’t a Let’s Encrypt-specific problem. Every major public CA is on a timeline to drop clientAuth by mid-2026.
The Setup
Our stack uses Mirth Connect as an HL7 integration engine, handling healthcare messages over TCP/MLLP (Minimal Lower Layer Protocol). The connection to the remote health system uses mutual TLS (mTLS) — both sides present certificates and verify each other’s identity.
Here’s what the architecture looked like before the incident:
┌──────────────────────┐ mTLS (TCP/MLLP) ┌──────────────────────┐
│ Mirth Connect │◄────────────────────────────────► │ Remote HL7 System │
│ (HL7 TCP Sender) │ Port 18032 │ (Epic / Gateway) │
│ │ │ │
│ Cert: Let's Encrypt │ │ Cert: Enterprise CA │
│ EKU: serverAuth + │ │ Trust: Our CA root │
│ clientAuth │ │ │
└──────────────────────┘ └──────────────────────┘
The key detail: our Mirth server’s TLS certificate was issued by Let’s Encrypt via an automated ACME renewal process managed by Ansible. This had been working fine for months. The certificate carried both serverAuth and clientAuth in its Extended Key Usage — which is what the remote system required to establish the mTLS handshake.
An Ansible playbook ran daily via cron to check certificate expiry and renew when needed. Clean, automated, hands-off. Until it wasn’t.
What Happened
The Industry Shift Behind Our Outage
On February 11, 2026, Let’s Encrypt changed the default behavior of their “classic” ACME profile: certificates issued after this date no longer include the clientAuth EKU. Our automated renewal ran, got a fresh certificate, and everything looked fine — valid cert, correct CN/SAN, proper chain. Except it was missing id-kp-clientAuth, and our mTLS partner rejected the handshake.
For the vast majority of Let’s Encrypt users, this change is completely invisible. Web browsers and HTTPS clients only care about serverAuth. But for mTLS, where the server certificate is also used to authenticate as a client to the remote peer, clientAuth is mandatory.
Why Is This Happening Across the Entire Industry?
This wasn’t a Let’s Encrypt quirk. It’s a coordinated, industry-wide policy change driven by Google’s Chrome Root Program Policy v1.6, which mandates that certificate hierarchies included in Chrome’s trust store must be dedicated solely to TLS server authentication by June 2026.
The rationale:
- Security risk reduction — Multipurpose certificates (with both
serverAuthandclientAuth) could be misused. A compromised server cert shouldn’t also grant client authentication capabilities. - Certificate purpose specificity — Public CAs should assert exactly what they’re vouching for. Server identity and client identity are fundamentally different trust models.
- Browsers don’t need it — No major browser checks
clientAuthon a website’s certificate. Including it was a legacy behavior, not a requirement.
Google’s Chrome Root Program gave CAs a deadline, and every major public CA is complying:
| CA | Default Removal | Complete Removal |
|---|---|---|
| Let’s Encrypt | Feb 11, 2026 | May 13, 2026 |
| Sectigo | Sep 15, 2025 | May 15, 2026 |
| DigiCert | Oct 1, 2025 | May 1, 2026 |
| SSL.com | Sep 15, 2025 | — |
| Google Trust Services | Nov 10, 2025 | Apr 13, 2026 |
Let’s Encrypt did provide a temporary migration path — a tlsclient ACME profile that still includes clientAuth — but it will be retired on May 13, 2026. After that date, no Let’s Encrypt certificate will carry clientAuth, period. And after June 15, 2026, Chrome will reject any public SSL certificate that still contains clientAuth.
This is worth emphasizing: this is not something you can opt out of. If you’re using any public CA for mTLS client authentication, you are on a countdown.
Why HL7 mTLS Is Especially Vulnerable
HL7 v2.x has no built-in security — the standard explicitly states that information security is outside its scope. Messages are sent in plaintext by default over MLLP with no authentication or encryption. mTLS was bolted on as the transport-layer fix: authenticate both parties before any HL7 message exchange.
The problem is that many healthcare integration teams (including us) used the same public CA certificate for both the server-side and client-side of the TLS connection. This worked fine when public CAs included both EKUs. Now it doesn’t.
The HL7 FHIR security specification explicitly lists mutual-authenticated-TLS as a valid client authentication method. But FHIR doesn’t prescribe how you manage the certificates — and using a public CA for the client side was always a shortcut, not a best practice.
The Incident Timeline
- Automated renewal runs on schedule, obtains a new certificate from Let’s Encrypt
- New cert is issued under the updated “classic” profile — no
clientAuthEKU - Ansible imports the new cert into Mirth’s SSL Manager (no errors, cert is valid)
- Mirth attempts to send HL7 messages to the remote system
- Remote system rejects the TLS handshake — our cert lacks
clientAuth - HL7 messages stop flowing
No monitoring caught this. Our certificate monitoring checked for expiry, not for EKU contents. The cert was valid, not expired, and had the right CN/SAN — it just lacked one critical extension.
The Fix
The emergency fix was straightforward in concept but had several gotchas in execution:
- Generate a self-signed CA (root key pair) for each environment (UAT and Prod)
- Issue leaf certificates signed by our own CA, with explicit
serverAuth+clientAuthEKUs and correct SAN - Import the
.p12bundles into Mirth’s SSL Manager - Coordinate with the remote system to add our new CA root to their trust stores
- Disable the Let’s Encrypt cron to prevent re-breaking on the next renewal
The permanent fix replaced the Let’s Encrypt ACME automation with a self-signed CA workflow, where the CA keys live in AWS SSM Parameter Store and leaf certificates are generated and renewed by Ansible.
Seven Lessons from the Incident
1. Public CAs Are Dropping clientAuth — This Is Industry-Wide
This is the core lesson. Let’s Encrypt dropped clientAuth from their default ACME profile on February 11, 2026, as part of compliance with Google’s Chrome Root Program Policy v1.6. Every major public CA is on the same timeline, with complete removal by mid-2026.
This wasn’t a surprise if you were watching the CA/Browser Forum, but it was invisible if you were just relying on automated renewals. The Let’s Encrypt announcement was clear, but how many ops teams are subscribed to CA policy updates?
The rule: Never rely on a public CA for clientAuth certificates used in mTLS scenarios. Public CAs are aligned to browser TLS. Healthcare HL7 mTLS needs self-managed CAs where you control the EKU.
If you’re using Let’s Encrypt or any other public CA for mTLS, audit your certs right now:
openssl x509 -in your-cert.crt -noout -text | grep -A5 "Extended Key Usage"
If you see only TLS Web Server Authentication and no TLS Web Client Authentication, your mTLS setup is one renewal away from breaking.
2. SAN Must Match the Environment
During the emergency fix, the first UAT leaf cert was accidentally generated with a production SAN (hl7tls.prod.example.com) because the openssl command’s -extfile was copy-pasted from the prod cert generation.
Modern TLS clients do not check CN (Common Name). They exclusively validate the Subject Alternative Name (SAN). A cert with CN=hl7tls.uat.example.com but SAN=hl7tls.prod.example.com will fail validation on UAT.
Always verify after generation:
openssl x509 -in cert.crt -noout -text | grep -A2 "Subject Alternative"
This should be a non-negotiable step in any cert generation runbook.
3. Multiple Trust Stores Require Individual Attention
The remote system in this case had two separate ingestion points:
| System | Protocol | Port | Trust Store |
|---|---|---|---|
| HL7 Engine | TCP/MLLP | 18032 | Separate JKS/PKCS12 |
| API Gateway | HTTPS | 2005 | Separate trust config |
Adding our CA root to one trust store didn’t fix the other. Each system maintains its own certificate trust independently.
The lesson: When coordinating cert trust changes with a remote party, always ask: “Which specific systems and ports are affected?” and confirm each one individually. Don’t assume that “we added your CA” means all paths are covered.
4. Mirth’s SSL Manager Replaces the Entire Store
This one bit us during the fix. Mirth Connect’s SSL Manager API (POST /api/extensions/ssl/all) performs a full replacement of both identity certificates and trusted CA certificates. It doesn’t merge — it wipes and replaces.
This means:
- Any CA cert added manually through the Mirth UI gets blown away on the next Ansible run
- The Ansible variable
pki_trust_pem_list(inmirth/defaults/main.yml) is the single source of truth - If a new outbound CA needs to be trusted, it must be added to
pki_trust_pem_list, not just the UI
If you’re managing Mirth certificates with automation, treat the automation config as authoritative. Manual UI changes are temporary.
5. Disabling Cron Doesn’t Eliminate All Trigger Points
Our immediate response was to disable the daily renewal cron job. Problem solved, right? Not quite.
There were two entry points that triggered certificate operations:
play-renew-mirth-certificate.yml— the daily cron job (disabled)roles/mirth/tasks/ssl.yml— called byplay-mirth-server.ymlon every fresh instance launch via cloud-init
If a Mirth EC2 instance had been replaced during the incident (auto-scaling event, instance failure, version upgrade), the cloud-init bootstrap would have run the full server playbook, which would have called Let’s Encrypt ACME again and generated another cert without clientAuth.
The lesson: When disabling an automated process during an incident, trace all code paths that invoke it. Cron is usually the obvious one, but boot scripts, CI/CD pipelines, and infrastructure-as-code provisioning are all potential triggers.
6. Certificate Alias vs Filename in PKCS12
During manual recovery, someone renamed the .p12 file thinking that would change the certificate alias visible in Mirth’s SSL Manager. It didn’t.
The Mirth SSL Manager alias is determined by the -name flag in the openssl pkcs12 command, not the filename:
# The alias is "hl7-letsencrypt", regardless of what you name the .p12 file
openssl pkcs12 -export \
-in cert.crt \
-inkey cert.key \
-out whatever-filename-you-want.p12 \
-name "hl7-letsencrypt" # <-- THIS sets the alias
Mirth channels reference certificates by alias. If the alias doesn’t match what the channel expects, the channel won’t find the cert — even if the .p12 file is correctly imported.
Document this in your runbooks. It’s a common point of confusion during incident response when people are under pressure.
7. Self-Signed CAs Are Appropriate for Healthcare mTLS
There’s sometimes an instinct to use a “real” CA for everything, because self-signed feels less professional or less secure. For healthcare HL7 mTLS, the opposite is true.
A self-signed CA gives you:
- Full control over EKUs — you decide what goes in the certificate
- Control over validity periods — 10-year CA certs, 1-year leaf certs, whatever your policy requires
- Control over CN and SAN — no domain validation restrictions
- No dependency on external infrastructure — no ACME challenges, no DNS verification, no rate limits
- Simplicity — a two-level hierarchy (root CA + leaf cert, no intermediate) is perfectly adequate
The trade-off is that you need to distribute your CA root to every party that needs to trust your certificates. In healthcare integrations, you’re already doing this coordination anyway — exchanging trust materials is a standard part of onboarding.
The final architecture:
┌──────────────────────────────────────────────────────────────────┐
│ Self-Signed CA (per environment) │
│ │
│ Root CA Key → AWS SSM Parameter Store (SecureString) │
│ Root CA Cert → AWS SSM Parameter Store │
│ Validity → 10 years │
│ │
│ ┌───────────────────┐ │
│ │ Leaf Cert │ │
│ │ EKU: serverAuth │ │
│ │ + clientAuth │ │
│ │ Validity: 1 year │ │
│ │ Auto-renewed by │ │
│ │ Ansible │ │
│ └───────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
What We Changed
Automation Updates
The permanent fix required changes across both Ansible and Terraform:
New Ansible task: renew_outbound_cert.yml
- Pulls the CA key and cert from SSM Parameter Store
- Generates a new leaf private key (PKCS#8 format for Mirth compatibility)
- Creates a CSR with correct CN, SAN, and EKUs (
serverAuth+clientAuth) - Signs the leaf cert with the CA (1-year validity)
- Assembles the certificate chain (leaf + CA)
- Persists the new key and chain back to SSM
- Cleans up the CA private key from the temp filesystem in an
alwaysblock
Toggle variable: mirth_outbound_cert_provider
- Defaults to
acme(preserving existing behavior for all other stacks) - Set to
self_signed_cafor the affected stack via group vars - Both
ssl.yml(full deploy path) andplay-renew-mirth-certificate.yml(cron path) are conditional on this toggle
Terraform IAM update:
- Added
pki/ca/private/*to the SSMGetParameterpolicy so the Mirth EC2 instance can read the CA private key during renewal
What We Did Not Change
Equally important: the existing acme_certificate role was left completely untouched. It’s still used by nginx-based stacks and other Mirth deployments that don’t use mTLS. The toggle variable ensures the change is scoped to only the affected stacks.
Detection Gaps and Monitoring Improvements
The fact that this incident wasn’t caught by monitoring is arguably the most important failure. Here’s what we added:
-
EKU validation in cert monitoring — Don’t just check expiry. Check that the certificate contains the expected EKU extensions. A cert that’s valid for 90 more days but lacks
clientAuthis just as broken as an expired one. -
Post-renewal verification — After any certificate renewal (automated or manual), verify the full certificate chain including EKUs, SAN, and chain trust before deploying it to the application.
-
HL7 message flow monitoring — Monitor the actual message throughput, not just the cert. If messages stop flowing, alert regardless of what the certificate looks like.
# Quick EKU check you can add to any cert monitoring script
if ! openssl x509 -in /etc/pki/tls/certs/hl7-letsencrypt.crt -noout -text \
| grep -q "TLS Web Client Authentication"; then
echo "CRITICAL: Certificate missing clientAuth EKU"
exit 2
fi
Action Items If You’re Affected
If you’re running mTLS with certificates from any public CA, here’s your timeline:
- Now: Audit all systems using public CA certificates for mTLS/client authentication
# Check every cert in your infrastructure openssl x509 -in your-cert.crt -noout -text | grep -A5 "Extended Key Usage" - Before May 13, 2026: Migrate client authentication to a private CA. Let’s Encrypt’s temporary
tlsclientACME profile expires on this date. - Before June 15, 2026: Ensure no production systems depend on public CA certificates with
clientAuthEKU — Chrome will reject them after this date.
For private CA options, consider: self-managed OpenSSL CA (what we did), AWS Private CA, HashiCorp Vault PKI, or EJBCA. For healthcare specifically, a simple two-level hierarchy (root + leaf) managed by Ansible/Terraform is often the right level of complexity.
Key Takeaways
If you’re running healthcare integrations over mTLS, here’s the summary:
- Don’t use public CAs for mTLS client certificates. They optimize for browser TLS and are actively removing
clientAuthacross the board by mid-2026. - Self-signed CAs are not a compromise — they’re the correct architecture for point-to-point mTLS in healthcare. The FHIR security spec lists mTLS as a valid authentication method, but it doesn’t require a public CA.
- Monitor certificate content, not just expiry. EKU, SAN, chain length, and issuer are all things that can change on renewal.
- Trace all code paths that touch certificates. Cron jobs, boot scripts, deploy playbooks — any of them can reintroduce a bad cert.
- Document the non-obvious. PKCS12 alias semantics, trust store boundaries, SAN vs CN behavior — these are the things that trip people up during 2 AM incident response.
- HIPAA doesn’t mandate a specific CA type. It requires “technical safeguards” for PHI access control and encryption in transit (TLS 1.2+). A private CA satisfies these requirements just as well as — arguably better than — a public CA for point-to-point integrations.
The incident cost us several hours of downtime and a lot of cross-team coordination. The permanent fix — a self-managed CA with automated renewal — is actually simpler and more reliable than what we had before. Sometimes the “less sophisticated” approach is the right one.