Modern bots traffic and how to detect them

Bot detection is the first step in preventing automated attacks on your websites, mobile apps, and APIs. It separates your traffic into requests coming from humans and requests coming from bots. Because at least a third of the world’s total web traffic consists of malicious bot traffic, good bot detection is vital for protecting businesses against online security threats.

But detecting bot traffic is harder than it has ever been. Bot developers are constantly finding new ways to get around the standard security solutions that most companies still rely on. This includes the use of captcha farms, artificial intelligence, and instrumentation frameworks such as Puppeteer. Today, efficient bot detection requires truly specialized know-how.

This article will cover the following topics:

What is bot traffic?

Bot traffic is the combination of all the bots browsing your websites, mobile apps, and APIs. There is good bot traffic and bad bot traffic. Good bot traffic includes search engine bots such as the Googlebot and site monitoring bots such as WordPress pingbacks. Bad bot traffic includes any bot programmed to perform a task that can hurt your company or users.

Many bots operate in between good and bad bot traffic. For example, SEO tools such as Ahrefs and Moz use bots to crawl millions of websites for analysis purposes, including yours. If you don’t use either service, these bots take up bandwidth that you’d probably rather save. The solution is to whitelist good bots while keeping all other bots out, which you should be able to do with any advanced bot detection tool.

Why is bot detection important?

Bot detection is the first step in preventing the most severe security threats in today’s online world. Without good bot detection, you might not even know you’re under attack. Certain bot attacks, such as account takeover fraud and price scraping, can fly under the radar until it’s too late.

Good bot detection is a requirement for good bot prevention. When you block bad bots from crawling your websites, mobile apps, and APIs, you will:

Reduce your IT costs. Bad bots take up bandwidth and increase the bills from your server, API, and CDN providers. When you block these bots, your IT costs will go down.
Protect user experience. Sudden spikes of bot traffic can slow down or, worse, crash your website. This makes for bad UX, which you can prevent by blocking bad bot traffic.
Stay ahead of your competitors. Some competitors rely on bots that scrape your prices and content to underbid or republish for their own benefit. Good bot detection and prevention makes it hard for competitors to collect this information.
Spend less time putting out fires. A successful bot attack affects all your departments, from IT to customer support to marketing. Blocking bad bot traffic will ensure your departments don’t have to spend time in crisis mode because of a bot attack.
Stay compliant with data protection frameworks such as GDPR and CCPA. Regulators are waking up to the importance of data protection. They issue heavy fines, often millions of dollars, to the companies that are not compliant. Prevent bad bot traffic to protect your sensitive data.

These are only a few of the reasons why it is so crucial to identify bots. A successful bot attack is an existential threat to SMEs and can hurt even the most financially secure companies. Protect yourself with good bot detection and prevention.

Why is bot detection so challenging?

Bot detection is challenging for several reasons. First, bots now attack all endpoints. They no longer attack just websites. They also attack mobile apps, web apps, servers, and APIs. Leaving one of these endpoints unprotected is dangerous.

Second, bots now use the same technologies as humans. They use browsers that have extremely similar fingerprints to human browsers and can, for example, resort to mobile phone farms to use real devices instead of simulated ones.

Third, bot operators can easily distribute their attacks across time and space. They can attack your mobile app API for several days across a wide variety of countries, all with very little effort and for little money.

Fourth, bots can rotate through millions of clean, residential IPs. Often, each IP will send no more than one or two requests before the bot switches to another IP. Many security solutions, such as WAFs, rely exclusively on IPs to distinguish bots from humans. This trick renders them ineffective.

Fifth, the emergence of bots-as-a-service now allows anyone to launch a bot attack. These services give malicious operators the ability to set up a botnet and send bots to a particular website or app. Because these services are set up so their users only pay for successful requests, they are incentivized to make their bots as advanced as possible. All these factors combined make bot detection incredibly challenging.

How can you identify bot traffic?

Despite these challenges, there are still a few indirect ways you can identify bot traffic. These are all indicators that something bad is happening on your websites, apps, and APIs:

Abnormally high pageviews. Certain bot attacks will try to overwhelm your servers. Whether it’s a DDOS attack or a large number of scrapers, this will show as a sudden, inexplicable pageview spike in your analytics software.
Abnormally high bounce rate. Every bot has a goal. Once it achieves its goal, or once it sees it cannot achieve its goal, it tends to leave immediately. Because bots can operate in milliseconds instead of seconds, this will show an abnormally high, fast bounce rate.
Abnormal session durations. Sessions in the range of milliseconds are suspicious, as are abnormally long sessions. Humans tend to stay for at least a few seconds, but don’t often stay on one page for more than a few minutes. Keep an eye out for session duration outliers in your analytics software.
Spikes in traffic from unknown locations. For example, if your business doesn’t operate in Vietnam, but you suddenly receive an influx of requests from Vietnam, there’s a good chance it’s a bot attack. Requests coming from countries that don’t make sense for your business are often bot requests.
Junk conversions. Are you receiving contact form submissions that make no sense? Do certain users constantly place items in their shopping carts without buying them? Does your free newsletter suddenly have a large number of bouncebacks? All these are junk conversions that indicate bot behavior.

Common bot detection methods & techniques (& their limitations)

Captchas

Captchas were created in the late 1990s to prevent bots from spamming search engines or forums. Bots weren’t so hard to filter out back then, and captchas worked reasonably well for almost two decades. Today, however, captchas have become problematic for two reasons.

First, captchas make the Internet less accessible. The audio or image recognition challenges are a nightmare for people with disabilities. They also kill your conversions, because they can be quite hard to solve and add friction at crucial points on your websites or web apps.

Second, captchas are no longer very good at identifying bots. Many bots now use an API that connects to captcha farms, which can solve any challenge in seconds for almost no money. In addition, bots can now seem so human they’re often not served a captcha in the first place.

WAFs

Web Application Firewalls are designed to protect websites or web apps from known attacks, such as SQL injections, session hijacking, and cross-site scripting. They use a set of rules that filter out good bot traffic from bad bot traffic. In particular, WAFs look for requests that carry familiar attack signatures.

As a result, WAFs can only block familiar threats. They’re ineffective for blocking today’s ever-changing, advanced bots that don’t carry obvious attack signatures. In addition, many bot attacks, such as account takeover fraud, remain within perfectly normal business logic. It just looks like someone is trying to log in, which a WAF won’t recognize as a potential problem.

WAFs also rely heavily on IP reputation to manage bots. If the IP reputation of a request is bad, it assumes all activity from that IP will be bad. Conversely, if the IP reputation is good, it is likely to let all requests coming from that IP through. As mentioned previously, bot operators can now rotate high-quality, residential IPs cheaply and easily, making a WAF an ineffective solution to detect and prevent bots.

MFA

Multi-factor authentication is a good way to secure a user’s account. If your users have accounts on your websites or apps, you should recommend them to toggle it. But you’ll quickly notice that many users won’t bother. It’s simply too much friction. This limits the ability of MFA to serve as a security solution.

On top of that, while MFA helps protect your users against credential stuffing attacks and account takeovers, it doesn’t protect your business against the other bot attacks that can still do serious damage, such as web crawlers or DDOS attacks.

What you need in an advanced bot detection solution

If you want to identify bots and stop them from attacking your online properties, you need a dedicated, advanced bot detection solution. Here are a few indicators that distinguish an advanced solution from a less advanced one:

First, an advanced solution relies on data and must therefore analyze all requests. It doesn’t just analyze samples at regular intervals. It analyzes 100% of requests across all your endpoints. It must be able to process trillions of pieces of data in real time, and smoothly absorb the sudden traffic peaks often typical of bot attacks.

Second, it must use both server-side and client-side signals to identify bots. Server-side detection can be enough to identify basic bots, but it cannot identify advanced bots with consistent HTTP, TLC, and TLS fingerprints. Client-side detection uses techniques such as browser tracking, app tracking, and user event tracking to detect significantly more advanced bots. Both are required for an efficient bot detection solution.

Third, an advanced solution should allow you to whitelist good bots while also blocking bad bots that masquerade as good bots. A large number of bots use crafty techniques to pretend they’re the Googlebot, because that’s a bot that almost all businesses have whitelisted. An advanced bot detection solution can tell which requests are good bots and which ones only pretend to be.

Fourth, an advanced bot detection solution will use machine learning to stay ahead of the latest bot trends. It uses ML to organize the vast amounts of data it processes and improve its prediction accuracy even for security threats that it has never seen before.

The fifth and final point is that an advanced bot detection solution will be backed up by a competent threat research team that analyzes open-source libraries and infiltrates hacker forums to understand what new technologies and techniques hackers are using. The insights from this team are then integrated into the security solution, so it can protect its users from new threats.

In conclusion

Bot detection has never been more important, but it has also never been harder. Bots use captcha farms, real devices, residential IPs, and many other techniques to seem indistinguishable from humans. Security solutions such as a WAF, MFA, and captchas no longer stop bots by themselves.

You need a dedicated, advanced bot detection solution to protect yourself fully. DataDome is that solution. It identifies and blocks the most advanced bots in real-time without disrupting your users. DataDome integrates with any tech stack and can be installed within minutes.

Search This Blog

MUGOYA DIHFAHSIH