Hunting AitM Phishing Infrastructure using Certificate Transparency

1. How It Began

It all began when my university was attacked. I was still a student at UMBC in my final semester when I received a suspicious email over the weekend.

Initial phishing email and redirect chain

I immediately became suspicious, because while the email urged me to click a link regarding my university ID, there were some grammatical errors. Additionally, I saw the urgency implied by the email and that made me even more suspicious. At this point I realized it was a phishing scam. The email was sent from an email address within UMBC, but it was not the appropriate one for that kind of message.

I decided to try to investigate, so I made sure to download a copy of the phishing email. I also reported it to my university. I was able to find others had received an identical email, which meant that it was likely a large-scale phishing attack.

Screenshot of the initial phishing email — Figure 1: The initial phishing email. Notice the grammatical errors. The sender email has been redacted for privacy.

I decided to click the link. It directed me to an intermediate Google Sites page (sites.google.com) before forwarding me to the phishing site. This two-hop structure, combined with a deceptive link label mimicking a legitimate UMBC URL, was clearly designed to evade spam filters.

The final landing page was visually identical to the real login page. However, after some fiddling, I realized it wasn’t a clone; it was actively reverse-proxying the legitimate site. I attempted to trick the phishing website into reverse proxying an arbitrary page, but I soon discovered that it would only proxy subdomains that were necessary for login to succeed.

I also tried reporting the phishing pages to Google Safe Browsing and Netcraft, but neither would flag the page. I believe this is because the site looks legitimate, being a proxied version of the real login page. I even tried the Web Risk Submission API, which checks URLs almost immediately, but it also failed to flag the site. Thankfully, I was able to report the URL to urlscan.io and Cloudflare Radar to keep a record of the attack. I also reported the site to GoDaddy, the hosting provider.

Identifying an Adversary-In-The-Middle Attack

I soon identified this as an Adversary-in-the-Middle (AitM) attack. Attackers favor this method because it proxies the entire authentication flow, including the 2FA challenge. Since the user successfully logs in through the proxy, the attacker can silently steal the resulting session cookie and bypass security controls.

Diagram depicting the attack as described so far — Figure 2: The AitM attack flow. Note how the attacker sits between the user and the real server to intercept the session cookie.

Pivoting to Certificate Transparency

I examined the site’s TLS certificate and noticed it contained multiple Subject Alternative Names (SANs). When I attempted to visit one of these other subdomains, the server, likely due to a misconfiguration, responded with a different certificate. This revealed a completely separate phishing attack against another university running on the same infrastructure.

This led me to Certificate Transparency (CT) logs. Since major browsers require CT for certificates to be trusted, almost all public certificates are recorded in these append-only Merkle Tree-backed logs. I used crt.sh to search for the attacker’s domain and found a treasure trove of data: the logs revealed not only the current attack certificates, but a history of past campaigns as well.

When looking at the certificate transparency data, one of the things I noticed repeatedly was an api-<hex> pattern which I recognized from Duo Security, a SaaS 2FA provider used by many universities.

Normally, a university login page redirects to api-<hex>.duosecurity.com, where the hex string is a unique identifier for the organization. I realized the attackers were mimicking this pattern when proxying the Duo prompt.

While not every target used Duo, a significant portion of them did. The subdomains proxying the main login pages were often mangled or changing between attacks, making them harder to track. However, the Duo pattern was consistent. I knew that if I could search for api-<hex>* across the logs, I could uncover the rest of the infrastructure, but crt.sh couldn’t handle that type of query.

When I had time, I looked for tools that could search Certificate Transparency logs in the way I needed. After some searching, I finally found MerkleMap, which turned out to be exactly what I was looking for. It allows wildcard matching against certificate domains and Subject Alternative Names (SANs), which meant I could run searches like api-<hex>* to find attacks against a specific university.

While parts of MerkleMap are paid, it provides enough information for free to pivot back to crt.sh and retrieve the remaining details. I realized this could allow me to map out attacker domains and their historical infrastructure. However, I also recognized that doing so would be a significant time investment, so I paused my investigation there for the time being.

2. The Second Attack

The second attack occurred midweek and was strikingly similar to the first. Surprisingly, it used the same final destination URL, even though the intermediary Google Sites redirect had been updated (likely because the previous one was suspended). The attackers also used a different compromised UMBC account to distribute the emails.

I was shocked to see the malicious domain was still active, despite my earlier report to GoDaddy. Although they acknowledged the report in their tracker, it’s unclear what action they took; maybe they only suspended the hosting, allowing the attackers to simply set it up again.

Screenshot of the second phishing email — Figure 3: The second phishing email. Notice the improved design, but that it also has significant grammatical errors.

With more time to investigate, I wanted to specifically analyze how the kit proxied the Duo two-factor authentication prompt. By this point, the attackers had blocked my home IP address. To get around this, I tethered my laptop to my phone for a fresh IP and used the Chrome DevTools device simulator to spoof a Windows User-Agent (since I am on Linux) to avoid getting blocked again.

I also discovered a critical detail in my university’s login implementation: the initial redirect to Duo passes a JWT token directly in the URL. I realized I could copy this legitimate URL and simply swap in the phishing domain. This allowed me to load the proxied Duo prompt in isolation by safely triggering the request without ever entering my password.

Loading the prompt revealed some interesting behaviors. First, my security key failed to work entirely, likely due to FIDO2 origin binding protections. Surprisingly, the prompt itself seemed unmodified; I expected the kit to force a downgrade to SMS, but it faithfully proxied the request (which then failed).

I also noticed that the “Location” field in the Duo prompt changed slightly with every request. While it always showed a location near the university, the specific city would shift. Combined with the noticeably slow load times, this strongly suggests the attackers were using a rotating residential proxy to avoid being blocked.

I captured HTTP Archive (HAR) files to preserve the network traffic for later analysis. By this point, I had spent hours probing the infrastructure. Once the site was taken down by the threat actors, I decided to shelve the investigation for the moment and focus on my homework.

3. The Third Attack

A third attack launched while I was busy, and the security team managed to purge the emails before I even saw them. However, this incident served as a stark reminder that the campaign was still active.

During my earlier manual searches in the CT logs, I had spotted other universities being targeted, but I hadn’t been fast enough to warn them. I used this latest incident as an opportunity to share my findings with my university’s security team to help improve their response.

This also motivated me to fully map out the scope of the attacker’s infrastructure. I had the tools; I had just been reluctant to invest the time. Deciding it would be a worthwhile endeavor, I spent the weekend connecting the dots.

4. Mapping Out the Infrastructure

My methodology was straightforward. I used MerkleMap to search for a target’s unique Duo identifier (the api-<hex> string) and map every domain associated with it. I also searched for unique string patterns within the attacker subdomains to catch outliers that did not use Duo.

MerkleMap search results showing the attacker infrastructure — Figure 4: Searching MerkleMap for the UMBC-specific identifier revealed the attacker’s domain infrastructure.

From there, I pivoted to crt.sh to investigate those domains. This often revealed attack clusters, where single domains were used to attack multiple universities at different times. In some cases, crt.sh failed to index specific domains found on MerkleMap. When that happened, I used CIDRE to find the raw certificate logs and then linked back to the crt.sh record for the full details.

crt.sh search results showing the attacker infrastructure — Figure 5: Pivoting to crt.sh on a specific attacker domain. The ‘Matching Identities’ column reveals they were targeting multiple universities simultaneously on the same infrastructure.

If I encountered an unrecognized target, I analyzed the subdomain patterns and cross-referenced them with legitimate login portals to identify the organization. For every new target I found, I recursively mapped out their entire history of attacker-controlled domains.

About halfway through, I attempted to speed up the analysis using Github Copilot with an MCP server. I had identified most targets and just needed to process the remaining domains. However, this turned into a huge time sink; I spent nearly two hours just “herding” the AI to correct its mistakes. In retrospect, manual verification would have been significantly faster, though the AI did help me churn through a small portion of the backlog before I abandoned it.

5. Finding the Latest Phishing Sites

I eventually decided to build a tool to monitor certificate issuance in real-time. I deployed an instance of the open-source tool certstream-server-go to handle the heavy lifting of ingesting the stream of data from Certificate Transparency logs. This allowed me to focus entirely on filtering the resulting data.

Initially, I used a simple regex to search for the api-<hex> pattern, but it generated too many false positives. I realized I needed to fingerprint the infrastructure itself. Looking back at my earlier analysis, I noticed a unique signature: the attackers almost exclusively used Cloudflare nameservers combined with GoDaddy as the registrar. Once I added these conditions, the false positives vanished. To further refine the alerts, I also checked for certificates containing multiple subdomains (SANs).

However, I soon discovered a flaw in this strict approach.

By requiring all three checks to pass (Nameservers + Registrar + Domain Count), I accidentally filtered out legitimate alerts when the attackers switched registrars. To fix this, I adjusted the logic: the script now notifies me via Discord immediately if the regex and domain count checks pass. It still checks the registrar and nameservers to add context to the alert, but it no longer suppresses the notification if they don’t match. This puts the final decision in my hands.

I also implemented duplicate filtering to handle cases where the same certificate appears in multiple CT logs simultaneously. Finally, I refined the script to automatically match the target to known target organizations based on their unique identifiers. This generates a pre-written email template, allowing me to notify a university security team with a single click the moment their infrastructure is targeted.

Discord alert showing an attack against UMN — Figure 6: A sample Discord alert. Note the ‘58 minutes’ freshness; this is an artifact of Let’s Encrypt backdating certificates by one hour for clock skew. The alert actually fired within 2 minutes of issuance.

6. Conclusion

What started as a suspicious email in my student inbox turned into a deep-dive into AitM infrastructure. By refusing to accept “no” from industry-standard reporting tools, I was able to fingerprint the attacker’s infrastructure and detect this class of attacks in real-time.

Interestingly, I recently discovered a report from Infoblox that confirmed exactly what I was seeing. While they relied on passive DNS data rather than Certificate Transparency, they identified the same patterns of Evilginx phishing kits targeting universities.

For a while, it felt like I was the only one tracking these actors in such detail. Finding that report was a validating moment; it proved that my independent research was tracking the same threats as a much larger security vendor.

This cat-and-mouse game will never end, but for now, I’m happy to say I’m watching the logs. I’ll know when they strike next.

7. Indicators of Compromise (IOCs)

I have compiled my full research data into two datasets to help others analyze these campaigns.

7.1 Active Threat List (Updated Jan 2026)

A list of attacker domains I have observed recently. This includes domains found during the initial investigation as well as new infrastructure detected by my real-time monitoring tool.

Download Active Threat List (CSV)

7.2 Historical Investigation Data

The raw analysis data from the initial targeted campaign described in this post. While this infrastructure is likely offline, it serves as a reference for the behavior and patterns of AitM campaigns targeting universities.

Download Historical Analysis (CSV)

1. How It Began#

Initial phishing email and redirect chain#

Identifying an Adversary-In-The-Middle Attack#

Pivoting to Certificate Transparency#

2. The Second Attack#

3. The Third Attack#

4. Mapping Out the Infrastructure#

5. Finding the Latest Phishing Sites#

6. Conclusion#

7. Indicators of Compromise (IOCs)#

7.1 Active Threat List (Updated Jan 2026)#

7.2 Historical Investigation Data#