Table of Contents

Introduction

Tired of your API being hit by web-crawlers? These robots and other bad User Agents can ruin your analytics, place unnecessary load on your infrastructure and increase the cost of your AWS bill.

An API Gateway Resource Policy can be used to selectively block certain User Agents, alleviating these issues.

Using a Resource Policy

Resource Policies are one way of controlling access to your API. They are JSON policy documents that are applied to API resources either through the Console or the CLI.

AWS supplies official examples if you’d like to:

You can combine the official examples with our below policy documents to increase the security of your API and block bad User Agents or web-crawlers

Block Specific User Agents

In the below example we block the most popular web crawlers reported by KeyCDN:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*"
        },
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*",
            "Condition": {
                "StringEquals": {
                    "aws:UserAgent": [
                        "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
                        "Google (+https://developers.google.com/+/web/snippet/)",
                        "Mozilla/5.0 (compatible; Bingbot/2.0; +http://www.bing.com/bingbot.htm)",
                        "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
                        "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)",
                        "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
                        "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
                        "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
                        "Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)",
                        "facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)",
                        "ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)"
                    ]
                }
            }
        }
    ]
}

Block User Agents with a wildcard

You can also use wildcards if you’d like a more flexible “catch-all” approach:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*"
        },
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*",
            "Condition": {
                "StringLike": {
                    "aws:UserAgent": [
                        "*Googlebot*",
                        "*Google*",
                        "*Bingbot*",
                        "*Slurp*",
                        "*DuckDuckBot*",
                        "*Baiduspider*",
                        "*YandexBot*",
                        "*Sogou*",
                        "*Exabot*",
                        "*facebot*",
                        "*ia_archiver*"
                    ]
                }
            }
        }
    ]
}