Speedy Screenshots, or How I Improved the Robustness of the Screenshot Service
This is a reply to my own blog post: Building an Automated Screenshot Service on Netlify in ~140 Lines of Code.
There is a limitation with the Screenshot Service: when the page you’re taking a screenshot of is slow and/or very large, the request times out. Quoth myself, from ~7 months ago:
What happens if a site is super slow or is currently down?
Netlify Functions have a 10 second execution limit. If the site doesn’t render in 10 seconds, we show a fallback image by default. Currently this is a low-contrast 11ty logo using the same image size as the requested screenshot (via SVG width and height attributes).
While this fallback behavior is okay I was starting to see it more often than I’d like. Why, you might ask? Why would it take more than 10 seconds to fetch a screenshot?
Here’s a sample OpenGraph image declaration from the <head>
of one of my blog posts (the <details-utils>
one):
<meta name="og:image" content="https://v1.screenshot.11ty.dev/https%3A%2F%2Fwww.zachleat.com%2Fopengraph%2Fweb%2Fdetails-utils%2F/opengraph/">
When requesting this image, the api-screenshot
service loads and renders https://www.zachleat.com/opengraph/web/details-utils/ using Puppeteer to return a 1200×630
screenshot jpeg
image.
However, on that dedicated /opengraph/web/details-utils/
page waits a big ’ol chunk of kryptonite. Specifically, this page makes another screenshot service request 😅 to use the referenced blog post as a background image (in this case /web/details-utils/
).
background-image: url('https://v1.screenshot.11ty.dev/https%3A%2F%2Fwww.zachleat.com%2Fweb%2Fdetails-utils%2F/opengraph/');
Okay, fine. Let’s admit what happened here. I flew too close to the sun. I chained too many screenshots together. This was causing timeouts for larger/weightier blog posts and pages (showing the low-contrast SVG of the default 11ty logo).
Have I overengineered it? Yes. But if we engineer it more—it will modulo back around to normal levels of engineering. Maybe even underengineering. Right? That’s how this works? I’m not willing to admit the answer to this yet.
But I do know that we can fix it. And we can fix it without removing any of the links in the chain of prized and celebrated screenshots.
Adding a new timeout
I started by adding a new timeout to the screenshot service:
- New: At 7 seconds (by default, 1.5 seconds before the
timeout
option), we attempt to inject a clientside JavaScriptwindow.stop()
on the page to cancel page load. The logic here is that a partially rendered page is better than the fallback 11ty logo.- via MDN: This method cannot interrupt its parent document's loading, but it will stop its images, new windows, and other still-loading objects.
- At 8.5 seconds (by default), we use Puppeteer’s
timeout
property on thegoto
method to stop early. You can now customize this in the Screenshot API url. We handle this error by showing the aforementioned fallback 11ty logo. - At 10 seconds, the serverless function times out and shows a gnarly HTTP 502 with a text/plain error message. You shouldn’t see this.
- e.g.
{"errorMessage":"2022-02-20T02:00:14.320Z […truncated] Task timed out after 10.00 seconds"}
- e.g.
For the second rendered screenshot in my Open Graph image chain (the one of the real blog post), I’ve manually lowered the timeout
option (and the clientside timeout) on the second screenshot before the first screenshot hits the timeout too.
You can see it in action on this 12 second blocking external CSS file demo. Note that when the page loads successfully (after 12 seconds), it has a green background.
Now check out this screenshot of the 12 second demo with a 3 second screenshot timeout:
Previously, running the screenshot service against this page would have shown the fallback 11ty logo.
Use ttl
for fallback images
When an image times out and the 11ty logo fallback image is shown, we were forced to use a HTTP 200 status code for that condition or some browsers wouldn’t show the fallback image at all (Firefox). This was a bit of a problem because it meant that screenshots that timed out wouldn’t retry again until a new build was triggered (which could be a long time for a dedicated screenshots service).
Fortunately Netlify has added a Time to live ttl
option to On-demand builders that allows you to specify a fixed amount of time (minimum 60 seconds) before a request is invalidated and a new request is generated. We can now add the ttl
specifically for requests that hit this timeout without invalidating any of the previously successful ones!
Other Puppeteer improvements
Next, I added a grab bag of small performance tweaks to Puppeteer:
goto->waitUntil
: Addedwait
option to control when a specific screenshot considers the page to be “finished.”screenshot->captureBeyondViewport
: UsedcaptureBeyondViewport: false
(default wastrue
), this cuts the screenshot to the viewport size. I’m not sure how this is different but I also enabled thescreenshot->clip
option in Puppeteer.- I also attempted the GitHub approach to speed up Puppeteer (in the Some Performance Gotchas section) but the approach only worked with foreground images.
Things I didn’t do but should (?)
I could have removed the api-opengraph
references from my site altogether. It’s my site. I have full knowledge of where the OpenGraph images are. I don’t need an external service to tell me that.
I could make requests directly to the screenshots API. However, api-screenshot
doesn’t currently support image resizing with the opengraph
viewport size—but api-opengraph
does resize/optimize images.
I kept api-opengraph
to get image optimization for free to avoid those weighty default 1200×630
images clogging up my page. Ultimately this means I’m chaining 3 different serverless functions together under the same 10 second limit, which feels a little risky but seems okay in practice (maybe because my site renders pretty fast as-is?).
I will likely improve the screenshot service to support at least one smaller OpenGraph image size (probably 600×315
) and additional image formats (png
and webp
) at some point. Feels like v2.screenshot.11ty.dev
may be in our future.
Good enough for now
With these changes in place, I haven’t seen any fallback 11ty logo screenshots on my site in quite some time!
1 Comment
@m_yxnk
"Have I overengineered it? Yes. But if we engineer it more—it will modulo back around to normal levels of engineering. Maybe even underengineering. Right?" quote of the day