Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Visual regression testing #427

Unanswered
LarsDenBakker asked this question in General
Aug 24, 2020 · 17 comments · 31 replies
Discussion options

We should support visual regression testing with the test runner. Because we run tests in a real browser, it will be relatively easy to support this within the test runner. Most of the basic building blocks are already present.

Some different topics:

You must be logged in to vote

Replies: 17 comments 31 replies

Comment options

Additional Features (maybe via plugin) could be

  • Open "Compare UI" in Browser to inspect images (similar link as code coverage?)
  • Allow to "approve" a new image within the "Compare UI"
  • Allow to "approve" a new image while running within the CLI menu?
  • Save approved images not to git but to an external host
  • Have plugins for suitable image hosters (find at least one free)
  • Image Hoster Ideas
    • Google Drive
    • Amazon S3 or Amazon EFS
    • ImageShack
    • Artifactory (for corporations)
  • Integrate with existing solutions
    • Percy
    • applitools
You must be logged in to vote
2 replies
Comment options

Looking into a possible compare UI here: https://webcomponents.dev/edit/yXxe3Z8mQ6uCfZXTFulA/src/index.ts

Not sure how we'd want to attach "approve" workflows, especially for tests that run in CI, but would love to hear more of your thoughts on that!

Comment options

uh that looks really nice 💪

for approval, I could see a workflow similar to how we use changeset now. That could look something like this

  1. PR with changes to the UI => so tests for it fail
  2. You add a file like imagechange/something-something.md
  3. In that md file you say which images/urls/imagestates are allowed to change and you add a description of the change
  4. PR runs again... all UI changes are approved by the added file => CI green
  5. Optionally: This could also enable a changelog for all visual changes (no images or only thumbnails to save size)
Comment options

@daKmoR what's the benefit of saving approved images not in git but to an external host?

MDC were using google storage (report example) for their screenshots tests before version 5.0 (they have now moved to an internal infra screenshots tests).

You must be logged in to vote
1 reply
Comment options

There are couple of things that can go awry why working with in repo image caches:

  1. Screenshot churn can increase the size of a repo and make it slow to work with. Still unsure whether this is the personal opinion of the colleague that brought it up or worth worrying about for real. I also work with multiple GB repos that make even the 4-500MB of screenshots that I've gotten up to feel like not the part of a project that would make this an issue.
  2. If you check in the images, the it's possible that you actually commit the images. Visual regression is very brittle to things like OS (sometimes even version), browser version, font availability, processor/RAM availability, so they can shift and change in a local environment. If you commit those changes then it can be next to impossible to recreate them. It's very important that you always build in the same context (whether that's CI, requires OS build, docker, etc.) and allowing those images to the repo increases the possibility that that's not true.
  3. Being in repo means that any tooling that you leverage around those screenshots also needs to have access to that repo. Maybe more a reality than an issue as these tools will need access to these images where ever they live, but ensuring they don't live next to your code reduces the access those tools require to give you value.
Comment options

for a bigger list of existing solutions, you can check this:

https://github.com/mojoaxel/awesome-regression-testing

You must be logged in to vote
0 replies
Comment options

@MathieuPuech thanks for the links, that's really helpful.

When you store images in git, each change retains the old image in history not the diff between the two images. This blows up repository size pretty quickly.

However we don't force users here, they can choose how to store the images. We should just make it possible to hook it up to whatever you want.

You must be logged in to vote
0 replies
Comment options

When you do visual regression testing, you can have some challenges with the render done by browsers in different environments.

  • OS native scrollbars differences (puppeteer remove them by default)
  • blinking caret (easiest thing to do is changing caret-color to transparent)
  • system font (differents on windows, mac, linux)
  • font aliasing

A solution can be to use docker to run tests but it's slower and the developer needs docker installed.

You must be logged in to vote
5 replies
Comment options

these are awesome points - would have never thought about them 🙈
thx for bringing them up 🤗

removing scrollbars and using a transparent caret I think are awesome solutions 👍

the others are really tricky 🤔

  • system font => we could force a system font that is available on both systems (e.g. processing all css and then have a map of fonts to replace)
  • font aliasing/smoothing => puh yeah that is going to be tough... we could try to force -webkit-font-smoothing: none; 🤔
Comment options

I would recommend to check how existing libraries handle some of the problems listed above.
One good example is looks-same used by gemini and hermione.

In particular, it has antialiasingTolerance and ignoreCaret options.

Comment options

I used this library for comparison and by experience the ignoreCaret option is not always working. But maybe the antialiasing tolerance is better

Comment options

i've already been through the pain of finding a good VRT solution so you might be interested in my findings so you don't have to:

  • pixelmatch with my own script launching puppeteer - font smoothing is a problem, and there's a lot of manual setup for this solution
  • backstopjs - has its own report system, its an all-in-one solution really but yet again, font smoothing is a problem and there's a lot of manual setup
  • looks-same with my own script launching puppeteer - slightly better than pixelmatch alone, unsurprisingly even with its "antialiasing tolerance" it has the font smoothing problem
  • any of the above containerised - no font smoothing problem, everything works fine, seems overkill and a lot of manual setup to have a container just for consistent font smoothing though

i tried fiddling with chrome flags, matching OS versions, browser versions, etc. but there will always inevitably be some minor difference between two machines, especially a CI server and a local machine. so the font smoothing mismatch has proven to be a problem every time.

in the end, i used percy (now owned by browserstack) with a storybook plugin for it. this means it just uploads a static storybook to percy, percy does the visual diffing and produces a report in their dashboard.

this is one case where doing it myself wasn't worth it tbh.

of course it could still be worth implementing a looks-same solution here to give users the flexibility, but in the end i think a service is the right way to go and sits outside your test runner anyway.

Comment options

LarsDenBakker Sep 6, 2020
Maintainer Author

thanks for sharing your experience here

we should definitely look into percy. I like reusing storybook stories inside tests, but limiting this to only tests is unfortunate

Comment options

I created a first implementation of visual regression in WTR: https://github.com/modernweb-dev/web/tree/master/packages/test-runner-visual-regression

This implements taking screenshots from the browser, and failing when there are diffs. We can use this to experiment and test, and decide on the followup actions from there. I split up the discussion in some different topics, which are linked in the first post in this thread.

You must be logged in to vote
0 replies
Comment options

Hi, would like to see the ability to pass image data instead of dom node.

Currently i'm using jest together with https://github.com/Prior99/jest-screenshot to do visual regression for pdf (generated in client side) screenshots.

This is a few features that keep me from ditching jest completely

You must be logged in to vote
2 replies
Comment options

LarsDenBakker Sep 14, 2020
Maintainer Author

What do you mean by image data? The actual images? Tests are executed in the browser while the screenshots are taken server side.

Comment options

What do you mean by image data? The actual images?

Yes. pdf is generated client side (pdfkit/pdfmake), converted to image (png) using pdf.js and compared using jest.

Here's one example:

https://github.com/foliojs/pdfkit/blob/master/tests/visual/helpers.js#L23-L25

pdf2png returns an array of images as buffer. In case of using browser, the image would be a Blob or Uint8Array

Comment options

Hi, we started using this in our project and discovered an issue we would like fixed. We've created a PR here: #1068

You must be logged in to vote
0 replies
Comment options

Created a project with an example config of running visual tests in Sauce Labs:
https://github.com/web-padawan/components-visual-tests

You must be logged in to vote
0 replies
Comment options

Configuration options that I'd be interested in seeing:

  • Currently, when there are "new" test their screenshot is taken as the baseline and considered a pass. I'd like an option that would allow me to fail the tests in this case. I'd expect these "new" screenshot to them be included with the "updated" files. This would encourage a process where new tests were regularly confirmed before being included in the baseline.
    • proposal: add strict or failOnNew or similar to config
    • alternative: only update when the update flag is set to true. However, that would be a "breaking" change for users expecting the auto-fill functionality. Maybe instead we move update to update: boolean | 'auto';
  • Currently, only "updated" images are kept from the current pass. I work with a persistent remote cache of screenshots as a baseline and update it in full when additions/subtractions/updates are made. I'd like to be able to create a "new" cache including passes and updates that I can replace the old cache with on demand. I initially thought I could simply push the "updated" images into the "baseline" and save that as the next cache, but this doesn't fully support adding/removing tests overtime and would simply grow to cover ever test ever covered. It would be great to see an option to store the "updated" screenshot of all images, not just the "failed" ones.
    • question: does saveFailed required a "failed" test to be called?
    • proposal: add storeAll option to config, or always call saveFailed and allow user overrides of that gating, possibly update to saveCurrent?

I'm happy to dive into adding these if they make sense. If we can coalesce around API structure, I can start by converting this conversation into issues for tracking.

You must be logged in to vote
3 replies
Comment options

I've added the following PR to further the discussion around the possible API changes/additions that would be associated with the above: #1339

Comment options

With a tool like Screener.io, new screenshots are marked as "new" and not accepted by default, which requires manual acceptance to add them to the baseline.

Comment options

Which I think is the right path, something not being nothing, and new results should fail. If you want to hard update the baseline (which happens..., new project, major upgrade, etc.) then you can do so with the update flag.

The PR above would make that the default here, too.

Comment options

I really like the idea of not storing baselines somewhere but running base (e.g. master) and compare (e.g. new feature/bugfix branch) at the same time and comparing...

this would reduce all the pain point for needing multiple baselines for each system/browser combination...

It's true it would require more CPU time... but it would work out of the box and if CPU/CI time is your issue then you can think of adding caching (e.g. storing baselines) on top 🤔

@castastrophee 's suggestion of doing something like what tachometer does and actually building both screenshots on demand so there is no caching is very intriguing. I'm not container or bash or even CI dev, but might be worth spending some time to work that approach out...

https://twitter.com/WestbrookJ/status/1365725266291666949?s=20

You must be logged in to vote
1 reply
Comment options

@daKmoR do you know of any work being done on this feature?

Comment options

Put together an update to the comparison on non-size matched screenshots, PTAL: #1352

You must be logged in to vote
0 replies
Comment options

I've been experimenting with adding visual regression testing and was hoping this would be the easiest solution but it doesn't seem to work. I'm running the test from the README exactly and it always fails with:

Error: Protocol error (Runtime.callFunctionOn): Execution context was destroyed.
 at /home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:208:63
 at new Promise (<anonymous>)
 at CDPSession.send (/home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:207:16)
 at ExecutionContext._evaluateInternal (/home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:201:50)
 at ExecutionContext.evaluate (/home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:107:27)
 at ElementHandle.evaluate (/home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/JSHandle.js:102:46)
 at ElementHandle._scrollIntoViewIfNeeded (/home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/JSHandle.js:280:34)
 at ElementHandle.screenshot (/home/espeed/projects/rikaikun/node_modules/puppeteer/lib/cjs/puppeteer/common/JSHandle.js:599:20)
 at processTicksAndRejections (internal/process/task_queues.js:93:5)
 at async Object.executeCommand (/home/espeed/projects/rikaikun/node_modules/@web/test-runner-visual-regression/dist/visualRegressionPlugin.js:40:45)
 at async TestRunnerApiPlugin._onCommand (/home/espeed/projects/rikaikun/node_modules/@web/test-runner/node_modules/@web/test-runner-core/dist/server/plugins/api/testRunnerApiPlugin.js:145:32)

The only custom part of my config for this test is the browser setup required to run tests at all:

browsers: [
 puppeteerLauncher({
 launchOptions: {
 executablePath: '/usr/bin/google-chrome',
 headless: true,
 // disable-gpu required for chrome to run for some reason.
 args: ['--disable-gpu', '--remote-debugging-port=9333'],
 },
 }),
 ],

I haven't been able to find many details about that error on the web but maybe others have come across it here...

You must be logged in to vote
7 replies
Comment options

Yeah, I pushed a branch with a real test that works except for the visualDiff part:
https://github.com/melink14/rikaikun/blob/538c7508b876cf14428bd8d55566eb8c1f155571/extension/test/rikaicontent_test.ts#L139

(configs are all at the top level if they're important)

I had uploaded it to see if it would work in the Github action environment in case the problem was my WSL2 local env being weird. (However, for some reason even more tests failed though it's hard to tell given #1549 log spam)

As I said though, it fails with just the code sample too so I could probably make a branch for that as well.

Comment options

I did some more debugging and got it to successfully not hang in a Github action though it fails with:
There was no baseline image to compare against.

I could probably figure out how to make an action which can also update images but I'm surprised since documentation implied that the absence of baseline image would lead to baseline creation:

When you run a diff test for the first time, a baseline image is saved to screenshots/baseline/${browser}/${name}.png. Afterward, every time you do a diff it is compared to this baseline images.

Locally, it times out but the error I get is:
Error: Node is either not visible or not an HTMLElement

which probably is due to whatever limitations WSL gives me. It might be nice to fail fast with good error message when that happens though.

Comment options

On the first run it will fail when there is no baseline to compare to, even as it creates a baseline for the next test. This is to clarify whether the new image is approved to enter the cache or not. Once approved and saved to your cache the test will work against that baseline going forward.

Re: "The Cache", the plugin in un-opinionated as to how you manage your own cache. A naive approach is to run once, commit the baseline images to your repo and then run again for regressions, though that means that your golden images are saved to your repo, which can, over time, increase the size of your repo, the speed at which this occurs depending on the size of your test suite. Other options, like saving in a separate repo, saving in an archive in your CI provider, etc. need to be handled by you in your project.

The Error: Node is either not visible or not an HTMLElement seems to be coming from Puppeteer, which means it's not something we could fail early on as it's not resolved till quite late in the test process. Are you sure you're building the test context correctly on local? I'm not sure that platform should effect this, however I don't know much about the file ownership situation between Windows and a related WSL install.

Comment options

Thanks; that makes sense; though I would have expected some kind of success message if it also saved the screenshots (like No baseline detected, new baselines saved at /file/location).

Re 'error handling', I had assumed it was a matter of wrapping the calls to the pupeeteer page object since it seemed to be called from the handler. Either way, WSL2 is known to have various problems when it comes to puppeteer and browsers so I'll either wait for more updates to the windows linux kernel or figure something else out. Given it seems to work on regular linux in Github CI environment it's probably something out of scope for the plugin at least.

Thanks for the help and chiming in.

Comment options

I worked on this some more and got it working locally as well:
I found that even running chrome --headless from the command line would cause chrome to hang so I worked on solving that problem. I unfortunately did several things before testing to see if it worked so I haven't isolated the fix but essentially I installed an X server and all the GUI libs which were needed to work with it. After that everything worked normally (and I guess as a bonus headful mode will also work!)

My project isn't so big so storing the images in Github is probably fine but reviewing diffs sounds like it will be a pain in Github; my next plan is to run this tool with update flag always and then use percy as my remote cache. I think this could be done by just adding a command for saveBaseline which will essentialy capture screenshots and upload them to percy for diff processing.

I actually tried to integrate percy directly but it does a weird thing where it tries to reload the test page causing many things to break. 😅

Thanks again!

Comment options

Hey! I've been evaluating this tool for use. I have experience using jest/puppeteer to create visual snapshots. One thing that I immediately miss from a jest setup is storing the snapshots adjacent to the test file (I like both tests and snapshots to be as close as possible to my components).

So that the directory stucture could be like:

├── my-component
│  ├── my-component.ts
│  ├── my-component.test.ts
│  └── screenshots
│  └── [filename].png

Another aspect related to this is automatically generating image file names from some combination of test names.

e.g. given this test:

describe('my-component', () => {
 it('passes some test', async () => {
 const element = document.createElement('p');
 element.textContent = 'Hello world';
 element.style.color = 'blue';
 document.body.appendChild(element);
 
 await visualDiff(element);
 });
});

Then the file could be automatically named my-component-passes-some-test-1.png.

It seems the current extensibility points do not allow for this. Is there any plan to support this kind of structure/setup?

You must be logged in to vote
3 replies
Comment options

Hi, currently it is possible by passing import.meta.url

Here is a code that we use to get path to a test file from it. The test can look like this:

it('disabled', async () => {
 element.disabled = true;
 await visualDiff(div, `${import.meta.url}_disabled`);
});
Comment options

@web-padawan interesting! This has given me an idea...

Since testFile is available in WTR plugins, I wonder whether this could be added to GetNameArgs? That way you wouldn't need to remember to pass import.meta.url in each test, it could be calculated entirely within the extension points like getBaselineName etc.

export interface GetNameArgs {
 browser: string;
 name: string;
+ testFile: string;
}

I would be happy to make a PR for this.

Also, your linked wtr-utils is full of useful snippets! I'm going to dive into that. Thanks

Comment options

Made a PR here #1657

Comment options

it could be nice if we can pass a clip of what we want to screenshot instead of doing a screenshot only of the element.

Sometimes, you want to add some margin around your element to screenshot shadows and sometimes your element open an overlay you want to screenshot too (for example an opened menu/select)

You must be logged in to vote
1 reply
Comment options

here is a snippet I've done to get a clip of around 2 elements (selector and endClipSelector) with a 8px margin:

async function getClip({ page, selector, endClipSelector }) {
 const clip = await page.evaluate(
 ([sel, endSel]) => {
 const el = document.querySelector(sel);
 if (!el) {
 throw new Error(`An el matching selector \`${sel}\` wasn't found`);
 }
 const range = document.createRange();
 range.setStartBefore(el);
 range.setEndAfter(endSel ? document.querySelector(endSel) : el);
 const rect = range.getBoundingClientRect();
 const margin = 8;
 const x = rect.left - margin < 0 ? 0 : rect.left - margin;
 const y = rect.top - margin < 0 ? 0 : rect.top - margin;
 return {
 x,
 y,
 width: rect.width + (rect.left - x) * 2,
 height: rect.height + (rect.top - y) * 2,
 };
 },
 [selector, endClipSelector]
 );
 return clip;
}
Comment options

I really like how customizable this plugin is through its options. Using that customization, I made it so when new screenshots fail, the "baseline" screenshot gets overwritten with the new screenshoot but the test still fails (the first time). This makes it very easy to update screenshots. If the new baseline is actually bad, that will get caught in the git diff or pull request code review. Since the first test run fails, CI pipelines will still fail on legitimate errors.

You must be logged in to vote
6 replies
Comment options

 visualRegressionPlugin({
 // update is set to false so that the test fails for the first time. The modification to getFailedName causes the file to get overwritten even though update is set to false
 update: false,
 baseDir: 'test-screenshots',
 getBaselineName: (args) => {
 return getTestFileName(args, '');
 },
 getDiffName: (args) => {
 return getTestFileName(args, 'diff');
 },
 getFailedName: (args) => {
 return getTestFileName(args, '');
 },
 }),

And then getTestFileName is this:

function getTestFileName(args, type) {
 // most of this is just picking a screenshot structure that I like
 const screenshotDir = relative(
 join(rootDir, 'src'),
 args.testFile.replace(/\.[jt]sx?$/, ''),
 );
 // the real magic is here, where extension, and ultimately screenshotName, are identical for both failed and baseline files 
 const extension = `${type ? `${type}.` : ''}png`;
 const screenshotName = `${
 args.name
 }.${process.platform.toLowerCase()}.${args.browser.toLowerCase()}.${extension}`;
 const dirs = [
 screenshotDir,
 args.name,
 type === 'diff' ? 'failure-diff' : '',
 ].filter((a) => a);
 return join(...dirs, screenshotName);
}

Here's my source code: https://github.com/electrovir/tdd-html-challenges/blob/main/.virmator/web-test-runner.config.mjs

Comment options

That's a great insight. having baseline and failed names be the same is an easy way to get this functionality! Given source control makes diffs easy to see and I never want to check-in failed screenshots it seems like there's no downside with this unless you need to do offline processing.

Comment options

Could you guys also share the github actions you are running?

Comment options

Mine is a simple one that runs tests and then if there's a failure uploads new screenshots for comparison. Here's the relevant section: https://github.com/melink14/rikaikun/blob/5b835f5f40bcb01f50a47cfcc936a6eb6d35707c/.github/workflows/presubmit.yml#L42

Comment options

Thank you very much. That is enough inspiration to get me going :)

Comment options

Hi, I keep getting Error: Execution context was destroyed, most likely because of a navigation. I saw the previous comment mentioning the same problem caused by missing baseline images, but running the tests with update: true doesn't seem to change anything. Is there a known solution for that? Thanks.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /