Table of Contents
Introduction
Tired of your API being hit by web-crawlers? These robots and other bad User Agents can ruin your analytics, place unnecessary load on your infrastructure and increase the cost of your AWS bill.
An API Gateway Resource Policy can be used to selectively block certain User Agents, alleviating these issues.
Using a Resource Policy
Resource Policies are one way of controlling access to your API. They are JSON policy documents that are applied to API resources either through the Console or the CLI.
AWS supplies official examples if you’d like to:
You can combine the official examples with our below policy documents to increase the security of your API and block bad User Agents or web-crawlers
Block Specific User Agents
In the below example we block the most popular web crawlers reported by KeyCDN:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*"
},
{
"Effect": "Deny",
"Principal": "*",
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*",
"Condition": {
"StringEquals": {
"aws:UserAgent": [
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
"Google (+https://developers.google.com/+/web/snippet/)",
"Mozilla/5.0 (compatible; Bingbot/2.0; +http://www.bing.com/bingbot.htm)",
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
"DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)",
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)",
"facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)",
"ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)"
]
}
}
}
]
}
Block User Agents with a wildcard
You can also use wildcards if you’d like a more flexible “catch-all” approach:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*"
},
{
"Effect": "Deny",
"Principal": "*",
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:{REGION}:{ACCOUNT_ID}:{API_NAME}/*",
"Condition": {
"StringLike": {
"aws:UserAgent": [
"*Googlebot*",
"*Google*",
"*Bingbot*",
"*Slurp*",
"*DuckDuckBot*",
"*Baiduspider*",
"*YandexBot*",
"*Sogou*",
"*Exabot*",
"*facebot*",
"*ia_archiver*"
]
}
}
}
]
}