Webhook security: a hands-on guide
By Mike Coutermarsh |
We recently released webhooks for PlanetScale.
One of the more interesting parts of building a webhooks service is making it secure and protected from abuse.
As soon as we started talking about the project internally, engineers throughout PlanetScale started sharing the different ways they have seen webhooks be abused or exploited in the past.
These collective experiences gave us a good list of things to worry about while building out our own webhooks service.
In this post, we'll go through some of the primary steps we took to build our webhooks service securely.
Server-side request forgery (SSRF)
The main vulnerability in any webhooks service is server-side request forgery (SSRF). An SSRF is when an attacker causes your service to make an internal, unintended request within your own network.
Webhooks are the perfect target for this. The user provides a URL, and then triggers your application to send a request to it.
This request could be harmful by either returning private information to the attacker, or by triggering an internal service to perform some action on their behalf.
For example, if a web server is running an internal metrics
endpoint that responds to HTTP POST requests, an attacker could direct the webhook service to send a request to the service. If the webhook service displays the response in the UI, the attacker has now gained access to your internal metrics data.
Mitigating webhook SSRF's
When building a webhook service, there are two layers of defense to setup to protect against SSRFs. First, is limiting the URLs users are allowed to set up webhooks for. And, second, limiting where your webhook service can make HTTP connections via egress rules or a proxy.
Strict validation of the webhook URL
Adding validations for allowed URLs mainly benefits the user by quickly giving them feedback that the URL they entered won't work with your webhook service.
Since DNS can be easily changed, URL validation alone is not enough to mitigate from SSRFs.
For our service, we check for the following:
Require HTTPS
These days, running a web service without SSL is rare. We felt that makinghttps
a requirement for any webhook we send is a fair request that limits vulnerabilities and protects the potentially sensitive data being sent in our webhook payloads.Block private and loopback IP addresses
We used Ruby'sipaddr
to identify if an IP address is private (internal) or a loopback (localhost) address.If we see either of these, they fail the validation.
uri = URI.parse(url) host_ip = begin IPAddr.new(uri.host) rescue nil end return false if host_up && (ip.private? || ip.loopback?)
Block our own domains
To protect against a user sending traffic to another external service owned by PlanetScale, we set up a domain blocklist which includes all of our other public services.uri = URI.parse(url) if BLOCKED_DOMAINS.any? { |domain| uri.host&.include?(domain) } return false end
DNS resolution test
Once the URL has passed basic tests, we then resolve the DNS to further validate it is not pointing towards any private or loopback IP addresses.Remember, the user can always update the host's DNS after this check has passed. This alone is not enough to protect from SSRFs.
def host_resolves_valid_ips?(host) ip_addresses = Resolv.getaddresses(host) return false if ip_addresses.none? if ip_addresses.any? { |ip| blocked_ip?(IPAddr.new(ip)) } return false end true end def blocked_ip?(ip) ip.private? || ip.loopback? end
HTTP egress rules
No matter how rigorous your URL validations are, you cannot fully trust any URL provided by a user. Because of this, it's critical to isolate and limit where the webhooks service can send HTTP requests.
How this is implemented will depend on your infrastructure. Our application is deployed using Kubernetes. We set up an isolated service dedicated to sending webhooks. This service sends all HTTP requests via an Envoy Proxy which only allows HTTP requests outside of our network. It has similar rules as the URL validations above, but are executed when the webhook is being sent.
The key rules to put in place are:
- Block any connections to internal/private IPs.
- Limit traffic to HTTPS ports.
Mitigating distributed denial-of-service (DDoS)
Webhook services can be manipulated to send large amounts of traffic to a URL. To implement this attack, all an attacker needs to do is setup a webhook, and then find a way to trigger it in large quantities.
API based rate limiting
One simple way to protect against this is to set reasonable rate limits at your API layer. This restricts how many actions an attacker can take and stops them from enqueueing an unlimited number of webhooks.Our entire API service has a general rate limiter that protects all endpoints.
For our webhooks service, we have a
test
endpoint that triggers a test webhook. For this endpoint specifically, we added a rate limit of 1 request per 20 seconds. This felt reasonable for users who are testing their hooks while also eliminating the risk of the test webhook being abused.Webhook uniqueness/locking
Our webhook service uses a Sidekiq queue to process and send webhooks. With Sidekiq, we are able to set up a uniqueness check on each webhook that is added to the queue.Duplicate webhooks in quick succession get rejected, resulting in only a single unique webhook being sent out from our service, as well as limiting the number of webhooks we need to process.
Isolated infrastructure
In the event that our other mitigations fail, we run our webhooks queue on isolated machines to protect against webhooks impacting the availability of other PlanetScale services.If our webhooks are being abused, we do not want that to impact the reliability of the rest of our systems. They can be easily paused or disabled in the event of an incident.
Set strict timeouts
Sending a webhook ties up our resources while waiting for a response. One possible attack vector is queueing many webhooks that resolve very slowly. This can be mitigated by setting a short timeout on webhook requests.Limiting number of webhooks
We set an initial limit of 5 webhooks per database. We felt this was enough for people to automate several workflows, while also protecting us from having users trigger large number of hooks for the same events. Starting with 5 is fairly conservative, but leaves us space to grow and allow more if people have use cases for them. Adding more later is always easier than taking it away.
Conclusion
Hopefully you enjoyed this overview on how we secured PlanetScale webhooks. If you haven't tried webhooks yet, you can learn more about them in our Webhooks documentation.