问题
I have a vanilla cloud function that takes 60 seconds and then returns status 200 with a simple JSON object. The timeout for the function is set to 150s. When testing locally, and when running the function via it's cloudfunctions.net address, the function completes at 60s and the 200 response and body are correctly delivered to the client. So far so good.
Here's the kicker -- If I run the exact same function proxied through firebase hosting (setup via a "target" inside firebase.json), according to the stackdriver logs, the function is instantaneously restarted anywhere from 1-3 times, and when those finish the function sometimes is AGAIN restarted, eventually returning a 503 Timeout from Varnish.
This behavior is ONLY consistently replicable when the function is called on a domain that is proxied through firebase hosting. It seems to ONLY happen when the function takes ~60s or longer. It does not depend on the returned response code or response body.
You can see this behavior in a test function I have setup here: https://trellisconnect.com/testtimeout?sleepAmount=60&retCode=200
This behavior was originally identified in a function that is deployed via serverless. To rule out serverless I created a test function that makes testing and verifying the behavior easy and deployed it with regular firebase functions and called it from it's cloudfunctions.net domain and verified that I always got a correct response at 60s. I then updated my firebase.json to add a new route that points to this function and was able to replicate the problem.
index.js
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
exports.testtimeout = functions.https.onRequest((req, res) => {
const { sleepAmount, retCode } = req.query;
console.log(`starting test sleeping ${sleepAmount}...`);
sleep(1000 * sleepAmount).then(result => {
console.log(`Ending test func, returning ${retCode}`);
return res.status(retCode).json({ message: 'Random Response' });
});
});
firebase.json
{
"hosting": {
"public": "public",
"ignore": ["firebase.json", "**/.*", "**/node_modules/**"],
"rewrites": [
{
"source": "/testtimeout",
"function": "testtimeout"
}
]
},
"functions": {}
}
</snip>
A correct/expected response (sleepAmount=2 seconds)
zgoldberg@zgblade:~$ time curl "https://trellisconnect.com/testtimeout?sleepAmount=2&retCode=200"
{"message":"Random Response"}
real 0m2.269s
user 0m0.024s
sys 0m0.000s
And a sample of how things appear when sleepAmount is set to 60 seconds
zgoldberg@zgblade:~$ curl -v "https://trellisconnect.com/testtimeout?sleepAmount=60&retCode=200"
* Trying 151.101.65.195...
* TCP_NODELAY set
* Connected to trellisconnect.com (151.101.65.195) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=admin.cliquefood.com.br
* start date: Oct 16 20:44:55 2019 GMT
* expire date: Jan 14 20:44:55 2020 GMT
* subjectAltName: host "trellisconnect.com" matched cert's "trellisconnect.com"
* issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x563f92bdc580)
> GET /testtimeout?sleepAmount=60&retCode=200 HTTP/2
> Host: trellisconnect.com
> User-Agent: curl/7.58.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
< server: Varnish
< retry-after: 0
< content-type: text/html; charset=utf-8
< accept-ranges: bytes
< date: Fri, 08 Nov 2019 03:12:08 GMT
< x-served-by: cache-bur17523-BUR
< x-cache: MISS
< x-cache-hits: 0
< x-timer: S1573182544.115433,VS0,VE184552
< content-length: 449
<
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>503 first byte timeout</title>
</head>
<body>
<h1>Error 503 first byte timeout</h1>
<p>first byte timeout</p>
<h3>Guru Mediation:</h3>
<p>Details: cache-bur17523-BUR 1573182729 2301023220</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
* Connection #0 to host trellisconnect.com left intact
real 3m3.763s
user 0m0.024s
sys 0m0.031s
Here's the crazy part, checkout the stackdriver logs, notice how the function completes in 60s and almost immediately after 3 more executions are started...
Notice the original call comes in at 19:09:04.235 and ends at 19:10:04.428 -- almost exactly 60s later. Almost exactly 500ms later, 19:10:05.925 the function is restarted. I promise to you I did not hit my curl command again 0.5s after the initial response. None of the subsequent exectutions of the function here were generated by me, they all seem to be phantom retries?
https://i.imgur.com/WDY17pw.png (edit: I don't have 10 reputation to post the actual image, so just a link above)
Any thoughts or help is much appreciated
回答1:
From Firebase Hosting: Serving Dynamic Content with Cloud Functions for Firebase:
Note: Firebase Hosting is subject to a 60-second request timeout. Even if you configure your HTTP function with a longer request timeout, you'll still receive an HTTP status code
504
(request timeout) if your function requires more than 60 seconds to run. To support dynamic content that requires longer compute time, consider using an App Engine flexible environment.
In short, unfortunately your use-case is not supported as the CDN/Hosting instance just assumes that the connection was lost and tries again.
来源:https://stackoverflow.com/questions/58759906/firebase-hosted-cloud-function-retrying-on-any-request-that-takes-60s-even-when