miércoles, 5 de abril de 2017

Some weird behaviour in the Google endpoints - Possible DoS by application-layer flooding?

More than a month ago I reported to Google a weird behaviour that I detected while uploading a picture to Google Docs. I was working on a script to upload pictures and attach them to documents when I made a mistake while sending the document ID and the API kept the request waiting for about 3 minutes to end up sending back an internal server error after a weird four-way handshake.

This behaviour made me think that Google Docs would be trying to find the document and since it was unable to find it in any sort of recently used documents, the request was forwarded to another subsystem for long term storage. Another possible option was that since the document was not in the DB index it was performing a full-scan.
After try some other requests I found other two endpoints with a similar behaviour where it is more easy to perform the possible DoS attack since these doesn't require to upload pictures or to send multiple requests. These endpoints keep the client waiting for 4 mins instead of 3. 

The endpoints are:
  • https://docs.google.com/document/d/invalid_docu_id/edit

And this is what happens when you use ApacheBenchmark to test the docs.google.com URL:

As you can see from the ab results, on this case all the 10 requests are taking 4 mins.

Other times some requests are resolved in ms, this was a bit disconcerting to me, making me think that this could be caused by any sort of rate limiter. In order to verify that this was not the case I run the same request from a server that had never execute a request like this being the result the same 4 mins as before. After this, I think that the most plausible explanation is that there is a crash report system or any other sort of logs collector trying to get information from the document that ends up timing out.

The next is what we can see using WireShark:

The transmission abobe is as next:
  1. The server starts answering but it get stuck
  2. The client sends a TCP Window Update since the transmission is not complete
  3. The client sends every minute a TCP Keep-Alive and the server answers to this keep-alive packages, this means that the server is actually listening and has the socket open
  4. The server sends the FIN, ACK in order to execute the four-ways handshake, I guess that after an internal timeout
  5. The client answers with an ACK as expected during the handshake
  6. The client answers with an “Encripted Alert”, this is the client requesting the termination of the TCP secured connection and because of the payload was not completely sent
  7. The client sends the FIN, ACK as expected
  8. The server instead of the ACK, sends a RST now dropping the connection without shut it down gracefully

This kind of behaviour facilitates (a lot) a DoS by application-layer flooding. This means that, if you start performing requests to this endpoints, you could end-up causing the collapse of the system.

The collapse could be produced by:
  1. The reach of the max open sockets limit. Since the servers keep answering to the keep-alive but not verifying that the customer has the socket still open you could just send requests and don't wait for any answer causing the servers to start rejecting incoming connections, this will also cause other resources consumption like memory, CPU and so. You have 4 mins to send as many requests as you can.
  2. In case of this request being causing memory, I/O or CPU pressure in any of the shards, you can just send several requests using different IDs in order to flood all the shards, this would make the systems collapse causing the DoS

Since you have 4 mins to cause one of the above described situations, that facilitates a lot a possible DoS attack, being able to be launched from a single computer using a commodity network. The attack could also be performed by mistake.

Google doesn't consider this as a security issue:

I understand that a generic DDoS attack shouldn't be considered a security issue, since there is too little you can do to prevent a Botnet from attacking your systems. But on this specific case it is like leave yours home door completely open because you can't do anything to prevent people from breaking in.

From my point of view, this is a security issue, I answered with the e-mail below explaining the reasons why I think so:

They answered with the next e-mail:

But after some weeks I received a generic e-mail discarding this issue as security issue, the issue is still reproducible.

No hay comentarios:

Publicar un comentario