Speedy Screenshots, or How I Improved the Robustness of the Screenshot Service

February 18, 2022

This is a reply to my own blog post: Building an Automated Screenshot Service on Netlify in ~140 Lines of Code.

There is a limitation with the Screenshot Service: when the page you’re taking a screenshot of is slow and/or very large, the request times out. Quoth myself, from ~7 months ago:

What happens if a site is super slow or is currently down?

Netlify Functions have a 10 second execution limit. If the site doesn’t render in 10 seconds, we show a fallback image by default. Currently this is a low-contrast 11ty logo using the same image size as the requested screenshot (via SVG width and height attributes).

While this fallback behavior is okay I was starting to see it more often than I’d like. Why, you might ask? Why would it take more than 10 seconds to fetch a screenshot?

Here’s a sample OpenGraph image declaration from the <head> of one of my blog posts (the <details-utils> one):

<meta name="og:image" content="https://v1.screenshot.11ty.dev/https%3A%2F%2Fwww.zachleat.com%2Fopengraph%2Fweb%2Fdetails-utils%2F/opengraph/">

When requesting this image, the api-screenshot service loads and renders https://www.zachleat.com/opengraph/web/details-utils/ using Puppeteer to return a 1200×630 screenshot jpeg image.

However, on that dedicated /opengraph/web/details-utils/ page waits a big ’ol chunk of kryptonite. Specifically, this page makes another screenshot service request 😅 to use the referenced blog post as a background image (in this case /web/details-utils/).

background-image: url('https://v1.screenshot.11ty.dev/https%3A%2F%2Fwww.zachleat.com%2Fweb%2Fdetails-utils%2F/opengraph/');

Okay, fine. Let’s admit what happened here. I flew too close to the sun. I chained too many screenshots together. This was causing timeouts for larger/weightier blog posts and pages (showing the low-contrast SVG of the default 11ty logo).

Have I overengineered it? Yes. But if we engineer it more—it will modulo back around to normal levels of engineering. Maybe even underengineering. Right? That’s how this works? I’m not willing to admit the answer to this yet.

But I do know that we can fix it. And we can fix it without removing any of the links in the chain of prized and celebrated screenshots.

Adding a new timeout

I started by adding a new timeout to the screenshot service:

New: At 7 seconds (by default, 1.5 seconds before the timeout option), we attempt to inject a clientside JavaScript window.stop() on the page to cancel page load. The logic here is that a partially rendered page is better than the fallback 11ty logo.
- via MDN: This method cannot interrupt its parent document's loading, but it will stop its images, new windows, and other still-loading objects.
At 8.5 seconds (by default), we use Puppeteer’s timeout property on the goto method to stop early. You can now customize this in the Screenshot API url. We handle this error by showing the aforementioned fallback 11ty logo.
At 10 seconds, the serverless function times out and shows a gnarly HTTP 502 with a text/plain error message. You shouldn’t see this.
- e.g. {"errorMessage":"2022-02-20T02:00:14.320Z […truncated] Task timed out after 10.00 seconds"}

For the second rendered screenshot in my Open Graph image chain (the one of the real blog post), I’ve manually lowered the timeout option (and the clientside timeout) on the second screenshot before the first screenshot hits the timeout too.

You can see it in action on this 12 second blocking external CSS file demo. Note that when the page loads successfully (after 12 seconds), it has a green background.

Now check out this screenshot of the 12 second demo with a 3 second screenshot timeout:

Previously, running the screenshot service against this page would have shown the fallback 11ty logo.

Use `ttl` for fallback images

When an image times out and the 11ty logo fallback image is shown, we were forced to use a HTTP 200 status code for that condition or some browsers wouldn’t show the fallback image at all (Firefox). This was a bit of a problem because it meant that screenshots that timed out wouldn’t retry again until a new build was triggered (which could be a long time for a dedicated screenshots service).

Fortunately Netlify has added a Time to live ttl option to On-demand builders that allows you to specify a fixed amount of time (minimum 60 seconds) before a request is invalidated and a new request is generated. We can now add the ttl specifically for requests that hit this timeout without invalidating any of the previously successful ones!

Other Puppeteer improvements

Next, I added a grab bag of small performance tweaks to Puppeteer:

goto->waitUntil: Added wait option to control when a specific screenshot considers the page to be “finished.”
screenshot->captureBeyondViewport: Used captureBeyondViewport: false (default was true), this cuts the screenshot to the viewport size. I’m not sure how this is different but I also enabled the screenshot->clip option in Puppeteer.
I also attempted the GitHub approach to speed up Puppeteer (in the Some Performance Gotchas section) but the approach only worked with foreground images.

Things I didn’t do but should (?)

I could have removed the api-opengraph references from my site altogether. It’s my site. I have full knowledge of where the OpenGraph images are. I don’t need an external service to tell me that.

I could make requests directly to the screenshots API. However, api-screenshot doesn’t currently support image resizing with the opengraph viewport size—but api-opengraph does resize/optimize images.

I kept api-opengraph to get image optimization for free to avoid those weighty default 1200×630 images clogging up my page. Ultimately this means I’m chaining 3 different serverless functions together under the same 10 second limit, which feels a little risky but seems okay in practice (maybe because my site renders pretty fast as-is?).

I will likely improve the screenshot service to support at least one smaller OpenGraph image size (probably 600×315) and additional image formats (png and webp) at some point. Feels like v2.screenshot.11ty.dev may be in our future.

Good enough for now

With these changes in place, I haven’t seen any fallback 11ty logo screenshots on my site in quite some time!

Zach Leatherman is a builder for the web at Font Awesome and the creator/maintainer of Eleventy (11ty), an award-winning open source site generator. At one point he became entirely too fixated on web fonts. He has given 85 talks in nine different countries at events like Beyond Tellerrand, Smashing Conference, Jamstack Conf, CSSConf, and The White House. Formerly part of CloudCannon, Netlify, Filament Group, NEJS CONF, and NebraskaJS. Learn more about Zach »