Zach’s ugly mug (his face)

Zach Leatherman

The Art of Deception, Lighthouse Score Edition

October 14, 2021 #1 Popular

A few very interesting discussions on Twitter have led me to understand that some folks are talking about Lighthouse scores in a way that is—in my opinion—not as forthright as it could be (intentionally or not). Let’s level set a bit and talk a bit about the different flavors of wiggle room:

Super Fast Hardware

Here’s a screenshot of a Lighthouse result from this morning, October 14th, 2021, run on an old MacBook Air (2012) using Chrome 86.

nextjs.org Lighthouse Score for Mobile on Old Hardware: 64 on Performance, 97 on Accessibility, 100 on Best Practices, 100 on SEO

Here’s the same result on my MacBook Air (M1, 2020) using Chrome 94:

nextjs.org Lighthouse Score for Mobile: 94 on Performance, 89 on Accessibility, 93 on Best Practices, 100 on SEO

It’s incredible to me the variability effect that your hardware can have: from a 64 on Performance to a 94—that’s a thirty point swing!

Don’t Do This™ Evil Tip: When running Lighthouse, only use the best, beefiest, latest and greatest, most expensive hardware and network connections.

As additional context, I built a project called Speedlify, which is a self-hosted dashboard for performance monitoring and comparison. We use it on the Eleventy Leaderboards.

Speedlify has two common modes of operation: on a hosted CI/CD server or in DIY mode on your local machine. These two methods often provide different Performance scores! Running on a hosted server is typically more resource constrained and is more challenging to score well in Lighthouse’s Performance category.

For disclosure purposes, the Eleventy Leaderboards run in DIY mode primarily because the scale of the number of sites tested goes well beyond the build-time limit of the build server—but that does also mean that the scores are likely higher than if we were to able to run the project in hosted mode.

Normal Statistical Variability

Network conditions can vary. Maybe your computer was doing a resource intensive task while you were completing your test. Here are two additional runs of the same site on my MacBook Air (M1, 2020) using Chrome 94. It’s notable that these results offer a slightly higher performance score compared to the first run above.

nextjs.org Lighthouse Score for Mobile: 99 on Performance, 89 on Accessibility, 93 on Best Practices, 100 on SEO
nextjs.org Lighthouse Score for Mobile: 100 on Performance, 89 on Accessibility, 93 on Best Practices, 100 on SEO
Don’t Do This™ Evil Tip: When running Lighthouse, run it a bunch of times and pick the highest score.

In Speedlify, we attempt to smooth out this issue by running each test multiple times (by default 3) and selecting the median run using an algorithm from Lighthouse based on First Contentful Paint, Time to Interactive, and Largest Contentful Paint.

Speedlify improvements sparked by a discussion on Twitter with Patrick Hulce, who works on Lighthouse.

Mobile versus Desktop

Mobile scores are more difficult to score a perfect 100, particularly in the performance category. As Andy Davies states, this is “by design as mobile uses a simulated slower network, and CPU.” Importantly, and as previously discussed, the performance conditions are relative to the hardware/software of the current machine.

When a user shares a screenshot of a perfect Four Hundo score, Lighthouse (as it stands) makes it impossible to visually distinguish whether or not that score was taken under the Mobile or Desktop mode.

For example, here’s two Lighthouse scores of the same site taken back-to-back on the same hardware. One is a mobile result and one is desktop (you can click through to see a broader view). Note that structurally the screenshots are the same.

gatsbyjs.org Lighthouse Score for Mobile, Performance score of 90
gatsbyjs.org Lighthouse Score for Desktop, Performance score of 100

MacBook Air (M1, 2020) using Chrome 94.

Don’t Do This™ Evil Tip: Always share your Desktop Score. Never reveal that it is a Desktop score—keep any discussion of the testing mode ambiguous.

There has been some discussion about adding a visual indicator to make the mode more obvious, which would help greatly!

Some more related Patrick Hulce discussion on Twitter.

I feel as though I should also mention—in a perfect world—if a web benchmark were to start from scratch with a new Lighthouse, the slow hardware simulation, network throttling, viewport size testing should be built into a single mode. I’d love it if the next version of Lighthouse ran Mobile mode, then Desktop mode, and displayed both scores together or combined them somehow. Get rid of the separation and it would clear up a bunch of the confusion in a very clean way.

Lab Data versus Field Data

This is perhaps the most nefarious distinction, because it is the most complex and as such offers the most effective kind of wiggle room: confusion.

Lab data is taken in a controlled environment. Field data is gathered from the recorded measurements of the performance of real visitors. Related: Why lab and field data can be different (and what to do about it).

Most of the methods we’ve talked about so far are only reporting lab data. But having field data is great, too! The caution I’d offer here is when someone focuses too closely on Field Data and never mentions Lab Data. But why? Isn’t it better to measure the real world? Why does it matter what happens in the lab?

Let us consult this classic blog post from Chris Zacharias: Page Weight Matters, in which Chris discusses a case study on the YouTube web site in which they decreased the page weight and the measured field data results got worse!

I had decreased the total page weight and number of requests to a tenth of what they were previously and somehow the numbers were showing that it was taking LONGER

Correspondingly, entire populations of people simply could not use YouTube because it took too long to see anything.

Large numbers of people who were previously unable to use YouTube before were suddenly able to.

If you have great field data: you may exist in the same realm as pre-optimized YouTube! The point being is that takes a holistic view of both field data and lab data to make good performance decisions!

Don’t Do This™ Evil Tip: If you have a wealthy, first-world, limited San Francisco-heavy audience with good hardware, make sure you shout about your field data! Field data is the most important thing! Pay no attention to the mobile/throttled/average hardware Lab data hiding behind the curtain.

Another way to say it:

  • If your field data is good and your lab data is bad, you may have built yourself a site for the wealthy western web.
  • If your field data is bad and your lab data is good (and assuming you aren’t doing any of the things we discussed to fudge your lab data scores), don’t fret! Your world wide web site may be reaching a global audience!
  • If you have both good field data and good lab data then you are a unicorn—I applaud you and celebrate your success. I love that for you. Please share how you successfully banished your third-party JavaScript to the shadow realm.

Conclusion

You might walk away from this article thinking: wow, Lighthouse scoring could be improved! I agree, but I also think that it’s been a net-win for performance discussions with other stakeholders in a professional setting. I genuinely hope they solve the Performance variability problem and add visual indicators to show you the mode in which a test ran (Desktop or Mobile).

But mostly, this is a plea to y’all: please don’t game your Lighthouse score. I hope an increased awareness of these tricks will decrease the frequency at which we see them appear in the wild. Stay safe out there, y’all.

Zach’s ugly mug (his face)

Zach is a builder for the web with IndieWeb Avatar for https://www.netlify.comNetlify. He created the IndieWeb Avatar for https://www.11ty.devEleventy site generator and is still fixated on web fonts. His public speaking résumé includes talks in eight different countries at events like IndieWeb Avatar for https://jamstackconf.com/Jamstack Conf,btconf’s AvatarBeyond Tellerrand, IndieWeb Avatar for https://smashingconf.com/Smashing Conference,CSSConf’s AvatarCSSConf, and IndieWeb Avatar for https://www.whitehouse.govThe White House. He is an emeritus of IndieWeb Avatar for https://www.filamentgroup.comFilament Group, nejsconf’s AvatarNEJS CONF, and still helps out with nebraskajs’s AvatarNebraskaJS. Read more about Zach »

Previous
A New Eleventy Mascot from Geri Coady
Next
Who Pays for Web Frameworks?

11 Retweets

One Kaitou of the Dead 🐍Web DirectionsTim VereeckeBrad FrostMarc Filleul 🇫🇷Diana LeXoan /ʃαn/ 🐟captain.dynamiteBenjamin GriesYoav WeissAmit Gharat
71 Likes
FasterizeWilliam WijayaValentino Gagliardi 🇮🇹night of the living dezPiotr NalepaJens GrochtdreisMatthew PayneMatt SecoskeMike Bifulco 😷💉 (get vaxx'd)Javier Diaz ☀️Matthew RoachHector PiñeroArnaud TanielianAndy DaviesOne Kaitou of the Dead 🐍Niklas StåhleMichael GoodingSidNicolas ForgeotPatrick HulceTim VereeckeAdam AhmedOptimizing MatterssozonomeGeorge LiuJacky EfendiRaphaël Améaumetigersway.netHugo NogueiraBrad FrostManda Putra ⠕Alex PopoutsisJoseph CurtisOleksandr ShutDiana LeRyan BrooksLuca DegasperiBogdan CerovacSamuel HauserSkullinsworthJohn Kemp-Cruzcaptain.dynamitebertrandkellerRyan MulliganBenjamin Gries𝕕𝔾𝕣𝕒𝕞𝕞𝕒𝕥𝕚𝕜𝕠MWDelaneyHeatherphil, who actually grew a pumpkin this year 🎃Søren Birkemeyer 🦊Chris HannabyScott JehlHolger BartelFynn BeckerSanti CrosSam TancharoensuksavaiMichael ScharnaglTim GilesOsmelrossKristofer KoishigawaBrett JankordAlex (He/Him)Prince WilsonDana BOOerly 💀Max BöckTim ChaseJim NielsenBrantley Harris 🪐Eric WallaceThord D. Hedengren⚡️
1 Bookmark
  1. nicolas-hoizey.com #

    https://www.zachleat.com/web/lighthouse-deception/ I agree with Zach that the Lighthouse scores we see - shared mostly on Twitter - should be taken with a grain of salt, as they are often shared without any mention of the test conditions. Even using Page S… Truncated

8 Replies
  1. Zach Leatherman

    Zach Leatherman @zachleat #

    Good question! I didn’t update anything on the old MacBook before running the test—so the versions of Chrome are different. Worth re-running on a newer version of Chrome with the old hardware to test though

  2. bkardell

    bkardell @briankardell #

    Yeah, it would be great to ack that in the document somehow - I was suprised it wasn't mentioned because they do seem to vary. I would guess the answer is "one has more DOM loaded at the time of measure" or "the screen size has diff responsive elements hidden/shown"

  3. Jonathan Holden

    Jonathan Holden @JonathanDHolden #

    Interesting about speed... But how do you account for the different a11y scores in the first 2 images?

  4. Brett Jankord

    Brett Jankord @bjankord #

    I’ve been a fan of using web.dev/measure to collect metrics and averaging out scores over multiple runs. It allows us to reduce some variability around network speeds and hardware as we offload it to Google. They throttle it down to fast 3g speeds and 4x slowdown on CPU.

  5. Zach Leatherman

    Zach Leatherman @zachleat #

    jinx! 🏆

  6. Zach Leatherman

    Zach Leatherman @zachleat #

    Yeah! Great point. I should add a note about that and developers.google.com/speed/pagespee… as a way to avoid hardware variability (though these don’t solve the other variability issues, in my experience).

  7. Santi Cros

    Santi Cros @santi_cros #

    And still there's some variation even if you run it repeatedly:s

  8. Santi Cros

    Santi Cros @santi_cros #

    Great article Zach. One thing I do is to check the Lighthouse score on web.dev/measure so my computer doesn't affect the results :)

    30 Mentions
    1. Sienna Web Designs

      Sienna Web Designs @siennawebdesign #

      ThE ARt Of DeCePtiON, #LIGHtHOUSE ScORE EdiTion ZacHLeAt.cOM/WEb/LightHoUsE…

    2. M1のMacBookだと性能が高いのでLighthouseで計測したスコアが他のパソコンより良くなるのではないかと考えたことはあるが、やはりそうなるのか。💻 #Lighthouse zachleat.com/web/lighthouse…

    3. Gregory Hammond

      Gregory Hammond @devGregory #

      Lighthouse scores. Many people who work on websites value them, and the higher the better. But those scores are subjective based on things that don't have to do with your website, as @zachleat explains in a blog post on his site at zachleat.com/web/lighthouse… via/ @perfemail

    4. Front-End Front

      Front-End Front @frontendfront #

      ThE ArT OF DeCePTION, LIGhThOuse SCoRe EdiTION ZAchLeaT.CoM/WeB/LighthoUsE…

    5. Frontend Daily 🚀

      Frontend Daily 🚀 @FrontendDaily #

      tHe art Of deCePtioN, liGhTHoUsE scorE EdITIOn: zAcHlEaT.com/wEB/lighthouSE…

    6. Angsuman Chakraborty

      Angsuman Chakraborty @angsuman #

      thE ArT oF deCepTION, lIghThouse ScoRE EditioN zAChLeAT.cOM/wEb/LiGhThOUSe…

    7. Hacker News

      Hacker News @HNTweets #

      thE ARt oF dECePtioN, lIGhTHoUsE SCOrE EditIoN: ZAcHLEAt.com/weB/lIghtHouSE… COMMENtS: nEWs.YcoMBInaTOr.coM/itEm?Id=289333…

    8. Winson Tang

      Winson Tang @winsontang #

      THE ART of DEcEption, LiGHtHOusE ScOrE edItIOn ZachLeAT.coM/wEb/LiGhTHOUsE…

    9. HN Front Page

      HN Front Page @hn_frontpage #

      The ART Of deCePtion, LiGHTHOUSe SCOre EdItiOn l: ZAChLEat.coM/wEb/lIGhThoUse… c: NEws.yCOmbInaTOr.COM/ITEM?iD=289333…

    10. James Evers-Swindell

      James Evers-Swindell @james_evers #

      I'm a bit late to the party on this one, but @zachleat article The Art of Deception, Lighthouse Score Edition raises some interesting points about the transparency and accuracy of Lighthouse performance tests when presented to us with limited context. zachleat.com/web/lighthouse… Truncated

    11. Nebojsa (Cookie Duster)

      Nebojsa (Cookie Duster) @CookieDuster_N #

      You can game your Lighthouse score with simple tricks but don't do it. zachleat.com/web/lighthouse… awesome take from @zachleat

    12. Irina the Nerd 🐻

      Irina the Nerd 🐻 @IrinaTheNerd #

      ThE aRt Of decEPtION, LIghTHouse ScorE EdItIon zACHLEAt.coM/WEb/lIghtHouSe… viA @FrontEnDfront

    13. studioxrio.org #

    14. Adam Shiver

      Adam Shiver @elixirgraphics #

      tHE aRt Of DecepTiOn, lighThouSE SCoRE EDItIoN: ZaChleAt.Com/Web/lIGhThouse…

    15. Silvestar Bistrović 🤘

      Silvestar Bistrović 🤘 @malimirkeccita #

      👉 ThE aRt of decePtION, LIGhtHoUSe SCore eDItIon 💬 ZacH LeAtHERmAn explAIns The PosSiBiLitiES tO ManIPuLATe ThE LiGhtHOuSe sCORe AnD HOw to AVOID COMmON mIstakES. ZaChleaT.cOm/wEB/lIGhTHoUse… vIA @ZACHLEat

    16. Chiyana Simões 

      Chiyana Simões  @kundukundu #

      the ARt oF deCEptioN, lighTHOUsE score EDitIon—ZAChleAT.com zAChLEaT.coM/weB/LIgHtHoUSe…

    17. Mat Casner : Designer : Freelancer : Coach

      Mat Casner : Designer : Freelancer : Coach @matcasner #

      THE ART of DECepTIOn, LiGHtHousE ScOrE edITiOn FREeLanCeoNfIRE zachleAT.cOm/WeB/ligHtHOUSE…

    18. Веб-стандарты

      Веб-стандарты @webstandards_ru #

      Искусство обмана. Зак Лезерман делится вредными советами, как получить высокий рейтинг в Lighthouse, когда на самом деле производительность сайта страдает. zachleat.com/web/lighthouse…

    19. reportwire.org #

    20. Friday Front-End

      Friday Front-End @fridayfrontend #

      thE art of deceptiOn, LIgHthOusE ScORE eDiTIon, By @zAcHlEat zaCHLEAt.CoM/WEb/LIGhTHOuSe…

    21. Jacky Efendi

      Jacky Efendi @jackyef__ #

      Alternative title: 4 tips to achieve your lighthouse-related OKRs

    22. Alex Popoutsis

      Alex Popoutsis @apopoutsis #

      Good reminders here: Lighthouse is not perfect. It's not the end all be all. Don't overstate it's importance, and definitely don't game your score.

    23. Fresh Frontend Links

      Fresh Frontend Links @frontender_ua #

      tHe aRT OF DecEptIOn, LigHtHOUsE sCOrE edItIon ZaChlEAT.Com/WEb/LIghthoUSE…

    24. Pablo Lara H

      Pablo Lara H @pablolarah #

      tHe Art Of dECeptIOn, LIGhthoUsE SCorE edItIon BY zAcH lEAthERMAn @zAchleAt fOUR eVIl tipS yoU mIGHT bE UnInTEntIOnALLY uSiNg wHen COMmUNicAtIng ABOut lIghtHouSE ScORes. #WeBDeV #wEBPERf #liGHTHOuse ZachlEat.cOM/WEb/ligHTHousE…

    25. Alexander

      Alexander @__alexander_ #

      tHe ARt OF deCEptION, liGHTHoUsE ScORe EdItIoN: ZAcHleAt.COm/WEB/LIGHtHOUSe… #wEBpErf

    26. Chris Heilmann

      Chris Heilmann @codepo8 #

      tHe ART Of DeCeptIoN, LiGhThouSE scorE EditioN ZachlEaT.Com/WeB/LIGHTHoUSE…

    27. Jim Nielsen

      Jim Nielsen @jimniels #

      This is great. Zach’s 4 “evil tips” feel more like a documentation of 4 “industry standard practices”. Posting lighthouse scores on Twitter is the lifestyle influencing of Instagram: a dreamy facade depicting a reality from which it is, ironically, entirely divorced.

    28. Jim Nielsen

      Jim Nielsen @jimniels #

      This is great. Zach’s 4 “evil tips” feel more like a documentation of 4 “industry standard practices”. Posting lighthouse scores on Twitter is the lifestyle influencing of Instagram: a dreamy facade depicting a reality from which it is, ironically, entirely divorced.

    29. Joel G Goodman

      Joel G Goodman @joelgoodman #

      Great level-setting for how we should talk about performance testing results.

    30. Fynn Becker

      Fynn Becker @MVSde #

      Lighthouse is a fantastic tool but it can be gamed or produce unrealistic data, be it intentionally or by mistake. zachleat.com/web/lighthouse…

    Social Card Image Preview

    This is what will show up when you share this post on Social Media:

    How did you do this? I automated my Open Graph images. (Peer behind the curtain at the test page)