Cognitive Research: Fake Blogs Generating Real Money
In the past several months Cisco Cognitive Threat Analytics (CTA) researchers have observed a number of blog sites using either fake content or content stolen from other sites to drive traffic to click on ad-loaded web sites. We have observed traffic volume up to 10,000 requests per hour, targeting hundreds of sites. The estimated lifetime of this campaign is at least 9 months. With a single click worth anywhere from $0.01 and $1, these scams can yield substantial returns for their owners.
Fake blogs are not new, but these actors are operating with a slightly different MO. Effort has been made to evade web reputation based blocks and hide from the eyes of investigators. First, we observe a large number of similar sites with word-based and topic-based generated domain names. These sites look like benign travel-related blogs full of content at first sight. Secondly, most of the intermediate infrastructure will redirect a random request away towards Google, making the investigation more difficult.
The general traffic pattern was observed as follows:
- Large numbers of requests arrive from infected clients to the fake blog sites. To look less suspicious, the requests look like search queries – for example: cruiserly.net/search/q/greyhounds.
- There is a series of redirects via intermediate sites, which are already associated with click-frauds – for example: findreek.com.
- These redirects bring the clients towards another set of fake sites, with travel related names (e.g. tourxperia.com), this time these sites have no content.
- Finally, clients are sent to browse arbitrary web sites to generate clicks and/or revenue.
Details of the analysis follow:
Fake blog sites
The investigation started when Cognitive Threat Analytics (CTA) identified a volume of similar looking HTTP anomalies of the form
Since beginning of our investigation, we have discovered dozens of domains following the same pattern. The domain names are continually updated and revolve around travel:
The front page of these sites greets us with the banner “Travel worldwide with us” and there are a few blog posts. However, these articles are the same on all of the sites, and their content is copied word by word from Atlas Obscura, a popular travel-blog site.
Visiting any of the exact URLs with the “/search” component yields a page that states 0 results have been found. If these requests were human clicks, this would mean that all site visitor searches come up empty, all of the time.
Visiting a URL with the “/go” component will result in a HTTP/302 redirect to google.com. This is unusual behavior for a web service. Reading between the lines this is similar to saying “nothing to see here, move on”.
To investigate further, we needed to see what happened when the infected customers visited these URLs. This was performed by forensic analysis of the actual network traffic, which was available in form of web proxy logs.
By stitching together these logs using referrer headers and time proximity, we discovered a characteristic sequence that the infected clients perform:
Again, visiting most of these URLs from a browser leads to HTTP/302 towards Google. As we established before, this is indicative of intent to hide the infrastructure from uninvited guests.
To verify whether these clicks are likely to be human or machine, we analyzed their volume:
The different colored lines represent different customers aggregate activity per hour towards these sites. In the marked areas there is observable correlation of ramp-up and/or ramp-down of the activity. This suggests some coordination capability across the infected user base. The volume is large, peaking at the order of 10,000 hits per hour, which is not likely to be caused by human behavior. Putting these facts together, it would seem a botnet is the primary cause of this behavior.
The final redirect in the chain can send the client to a large set of web sites. For example, tourxperia.com/search is found to be a referrer of at least 100 other sites. We took a detailed look at two representatives:
These sites don’t look as fake as the original travel-related blogs, but what they have in common is a fair amount of advertisements. The access pattern in this case is not a single URL (e.g. a single advertisement), but a full load of the web page. This means the botnet is used to draw traffic to sites, which opens the door to various monetization schemes.
IOCs and beyond
From the above analysis we can observe two HTTP destinations that are shared between all the traffic:
We have observed no legitimate use for these domains across hundreds of additional customers. Searching AMP Threat Grid for these domains shows a hefty set of malware samples with a threat score of at least 90 out of 100. These two pieces of information combined make good Indicators of Compromise (IOCs) and candidates to be blocked.
To search for more IOCs and understand the dynamics of the campaign, we have trained a customized classifier on the known IOC traffic and let it search the CTA telemetry data across a broader time scale to highlight further points of interest.
The results are as follows:
- The fake blog sites are rotated out and replaced with new ones. However there is no update of the content of the sites. One example would be voiage.net [sic], which was registered in early February 2015, but the mock-articles are still dated to fall 2014.
- The IP address 188.8.131.52 has been also replaced with the domain f.click-process.com. One of its DNS address records still points to this IP address.
- The findreek.com domain has been in place for quite some time. We can observe it dating back to September 2014. At that time, it replaced an even older domain, clickered.com, which had the same behavior. According to passive DNS records, clickered.com has been in use from early 2013.
In this article, we have presented a network-centric deep dive into a long-running campaign. One of the unique aspects is the use of blog sites with fake content as front-ends for c&c. The analysis also shows how network forensics combined with advanced anomaly detection (CTA) and sandboxing (AMP ThreatGrid) can work together to identify and dissect such a campaign back-to-back, across its whole kill-chain.
Since the discovery of this campaign, the indicators of compromise (IOCs) have been propagated into the respective products. In case the IOCs change, we also have the custom trained classifier as part of CTA now, where it provides additional layer of robust detection capability.
About Cisco Cognitive Threat Analytics:
Cisco Cognitive Threat Analytics (CTA) is a cloud-based breach detection and analytics technology focused on discovering novel and emerging threats by identifying C&C activity of malware. CTA processes web access logs from the Cisco Cloud Web Security (CWS), Cisco Web Security Appliance (WSA), or 3rd party web proxies such as Blue Coat ProxySG. CTA reduces time to discovery (TTD) of threats operating inside the network. It addresses gaps in perimeter-based defenses by identifying the symptoms of a malware infection or data breach using behavioral analysis and anomaly detection. The technology relies on advanced statistical modeling and machine learning to independently identify new threats, while constantly learning from what it sees and adapting over time. Through additional careful correlation, CTA presents 100% confirmed breaches to keep security teams focused on the particular devices that require a remediation. Focusing on C&C activity detection, CTA addresses a security visibility gap by discovering threats that may have entirely bypassed web as an infection vector (infections delivered through email, infected USB stick, BYOD).