033: So, You Want an Accessibility Score with Karl Groves

This episode is a recording of a June 2022 WordPress Accessibility Meetup where Karl Groves, a widely known and respected accessibility advocate, spoke about how to score accessibility in a manner that truly reflects how usable a product is for users with disabilities. If you want to watch a video recording from the meetup, you may do so on the Equalize Digital website: So, You Want an Accessibility Score? – Karl Groves.

WordPress Accessibility Meetups take place via Zoom webinars twice a month, and anyone can attend. Learn more about WordPress Accessibility Meetup and see upcoming events.


Mentioned in This Episode


>> CHRIS HINDS: Welcome to episode 033 of the Accessibility Craft Podcast, where we explore the art of creating accessible websites while trying out interesting craft beverages. This podcast is brought to you by the team at Equalize Digital, a WordPress accessibility company and the proud creators of the Accessibility Checker plugin.

This episode is a recording of a June 2022 WordPress Accessibility Meetup where Karl Groves, a widely known and respected accessibility advocate, spoke about how to score accessibility in a manner that truly reflects how usable a product is for users with disabilities. WordPress Accessibility Meetups take place via Zoom webinars twice a month, and anyone can attend. For show notes, a full transcript, and additional information about meetups, go to AccessibilityCraft.com/033.

And now, on to the show.

>> AMBER HINDS: Karl is Chief Innovation Officer at Level Access and he has nearly two decades of experience doing IT consulting for the biggest companies in the world and the biggest agencies for the US government. He has previously spoken for us and has done a phenomenal job. I always feel like I’m learning from him on social media and all the things so I decided to have him here. I am going to stop sharing and let Karl take over.

>> KARL GROVES:  Thank you for the introduction. Let’s see if I can figure out this multi-screen thing like we were talking about before. Most notably where the heck the mouse is is the challenge that I always face. Alright, so this is actually short, this is actually a pretty short presentation. With my slides, there’s inevitably going to be so many “Can I get a copy of your slides?” With all of my presentations I actually don’t have anything meaningful on the slides, as you’ll see. It’s usually a background image and then a couple of words. Unless there’s data being presented, and then that’s a different story. So that’ll be the situation with the slides. I’ll explain any visuals that are necessary for understanding or sometimes they are humurous as well. I try to inject humour into this stuff, especifically for mainstream audiences because most mainstream audiences aren’t huge fans of accessibility, so if I can make it fun for them then I will.

Super happy to be doing this one. This is a topic that is actually very, very interesting to me. In particular, I’ve been doing this stuff for a really long time. For those who don’t know me, my name is Karl Groves. Amber has already done a really good introduction but I’ve been doing accessibility consulting for about 20 years now. There’s always this question from customers like, “Can you give us a grade?” Really, the true answer is, “Well, if you have accessibility problems, then you fail.” That’s not a message that most people want to hear. Most people want to hear that they’re doing, well, they don’t want bad news.

Most people just don’t want bad news and unfortunately, when it comes to web accessibility consulting, the news is mostly bad. If somebody is new to accessibility, then it’s always going to be bad because people just don’t think about it upfront and it’s always just going to be a bad time. This question in terms of can we get a grade, can I get accessibility score? This topic was raised in my mind after several customers asked me if tenon can give a grade. Tenon philosophically is a product that’s its sole job is to find problems not to give a grade.

That’s one thing that was something that I had to mention to people, which was, it is a diagnostic tool. It’s not a program management tool. There’s other tools out there, Level Access, for instance, has a product called AMP. AMP literally stands for accessibility management platform. It is a more all-encompassing performance, enterprise performance, and governance type of platform. We have a new product called Elevin. Elevin is again a much more enterprisey kind of tool and that has capabilities for scoring and things like that.

Part of me, when it came to Tenon and its philosophy on finding problems really also this question about getting a grade is perplexing in a lot of ways, because what is a grade going to be based on? Like I said, some of my background images have meaning behind them. This background image is a picture of a person assembling an engine, a car engine.

I mentioned to Amber before we started that I’m a gearhead. As a matter of fact, once I retire I’m starting a high rock shop. There’s a tool placed on top of the engine block, it’s a metal bar bolted to the block and its job is to find what’s called top dead center. Top dead center is especially top dead center on cylinder one it’s how you know how to time the engine’s ignition and all this other stuff. What are we going to base our grade on? That’s question number one, getting a grade for something a grade is really actually simple.

You divide the passed things, passed as in the things that are good, by the total things, multiply that quotient by 100, then apply your standard North American grading scale to it. A is 90 to 100, B is 80 to 89, C is 70 to 79, so on and so forth. If you subject your website to 20 accessibility tests and you pass 15 of those tests, you get a 75, which is a C grade and you’re done. That is actually the answer to how to score something for accessibility at least when it comes to an automatic testing tool, depending on your manual methodology, this idea works there too.

There’s a number of flaws with most manual testing methodologies that I’ll talk about as we go through this. That’s really my answer. Like if you’re not going to listen to anything else, if you’re like, you just want the answer to how to get a grade it’s passed items, total items, divide those, you’re done. There’s a lot of things other people talk about when they talk about grading and so you’ve hear people when they’re talking about this concept of applying a grade, there’s people who have other things that they want to think about when they talk about the grade.

One of those things could be relevance. Background image here is a protest poster. Someone’s carrying and the words on the protest poster says, “I can be persuaded by a logical argument.” Thinking about relevance actually is an interesting one. Most, if not all, automatic testing tools are unable to give a reliable score because they don’t track anything but failures and this goes back to the tenant philosophy, but it’s also true for almost all other tools out there. If you think about it, they’re not telling you what is good. They’re telling you what is bad. They’ve telling you they’ve found an issue. They found a problem.

There is no concept of relevance in terms of performance. There’s no concept of passing other than by virtue of not failing. You see this a lot when people are talking about accessibility, they’re saying, “I want a clean score from WAVE or a clean score from aXe,” or whatever. We got no issues. Just because an automatic testing tool didn’t find any issues doesn’t mean that you haven’t gotten any issues.

What it means is that you’ve got no failures from the things that that thing tests. A passed condition is created by either not failing the existing tests or the tests not being relevant. This is actually why Tenon doesn’t give a grade it’s because it doesn’t track what tests are relevant or passed, it just tests failures. While there is a value in getting a score based on the extent or lack thereof of accessibility errors, it lacks that context. Reading a really useful score requires knowing a couple of things.

First off, of the tests that were relevant, which ones passed and which ones failed. You really actually had to keep track of what tests were relevant in the first place. Again, to maybe simplify this, let’s say, we’re talking about a testing tool that has tests for tables. You have no tables and then therefore those tests are not relevant. You can’t neither pass nor fail those things. This is actually something that I’ve had taught conversations with customers that say, “Well, an irrelevant thing is a pass because it didn’t fail,” and I’m like, “No, that’s really spurious logic and irrelevant thing can neither pass or fail because it doesn’t meet the criteria to do either thing.”

To use a computer programming analogy, I think an irrelevant test is something that’s null. For anybody who’s program done any programming, JavaScript, PHP, whatever and irrelevant thing is null because it can neither be true or false. That’s what I think and that’s how I think about this. We actually built this capability into Mortise.io, which is the Tenon company’s manual testing tool. Each test has specific criteria and those criteria determine if that test is even applicable in the first place. It provides specific instructions for testing whether it’s applicable and whether it’s passed or failed.

I believe at least in terms of scoring, this relevance part is pretty important and it’s important to have that knowledge before we even think about any grading scheme. This is one of my favorite accessibility failed pictures. It is a picture unfortunately highly pixelated at this resolution, but there’s a set of stairs. There’s a ramp on it. It’s going at least a 45-degree angle, very high, clearly not a viable ramp, probably really fun to go down well, until you get to the bottom, impossible to get up.

Anyway, the text on the slide says, “What about user impact?” This is actually an interesting one. The argument made for factoring in user impact is basically this, that a raw pass versus fail score is fine, if everything that we’re testing for has the same impact, but we know, and all of us have done accessibility for any length of time knows accessibility’s different. Some things have very different levels of impact for different users and this is extremely hard to do with automation.

A lot of people know me for a lot of my content outbound on overlays. As I often say in the context of overlays, it’s easy to find images that have no text alternatives. Very easy to find, there’s an algorithm on W3C website about accessible name calculation. You can program something to find the accessible name of something, following that algorithm very easily. That’s really easy and that’s a bullying pass-fail thing. It’s really hard to determine whether a text alternative is accurate and informative.

As a matter of fact, I served as an expert witness on a legal case called Murphy versus Eyebobs. One of the big things about Murphy versus Eyebobs is that the defendant in the case, which is Eyebobs, their lawyer was trying to argue that we use AccessiBe and because we use AccessiBe, we are compliant. What we found in our research on AccessiBe’s case capabilities is that it uses, I don’t know what library or what system it uses for its image recognition, but it uses some image recognition software. It was often wrong.

When we were talking about looking through the machine-generated text alternatives that were provided, a lot of them were completely wrong. Sometimes they got the content of the image right, but not the context. It’s really, really, really hard to determine whether text alternative, if its supplied, is correct in the first place, but then there’s this whole issue of if the text alternative is wrong, how wrong is it? If we’re gauging our score based on user impact, how bad is it that it’s wrong?

For instance, if it’s a picture of a product and they describe the product, but they get the color of the product wrong, how big of a deal is that versus whether the text alternative is wrong enough that the user is missing important information, it’s not conveyed any other way on the page and so on and so forth. That’s a different story. In one case, we’re talking about nuisance, and in the other case, we’re talking about the complete failure of a core system task.

Another thing is that some issues impact multiple user types, and those impacts may vary. A missing label on a form field or a missing programmatic association for that label could cause an impact for person on voice dictation software, but they could use the mouse grid or something like that, and a person who’s blind could screw it up completely. We have a high impact for one population, and a medium to low impact for another population. How does that weigh in? For us, with 10 and more, it is what we do, is we actually factor some of that stuff into the prioritization scoring.

The prioritization scoring that we use has a number of factors that contribute to the prioritization score and this severity of impact is combined into the prioritization, not the actual score, so to speak. Priority is simply a measure of urgency that you really want to fix the issue. In other words, you fail and it’s how badly are you failing? That being said, I remain open to the idea that severity should be its own metric, but I still don’t know how to apply that to an accessibility grade that brings its own set of challenges. What about volume? The picture here is actually a picture of a tulip field in the Netherlands. Assuming millions upon millions of red tulips are in this picture. Okay.

The text on the slide here says, “Wait, what about volume?” As it is most basic, the more issues this system has, the lower its quality. This is not unique to the web and it’s not unique to web accessibility, this is actually a metric that’s pretty well tracked traditionally in software QA before the web being existed. Basically in the context of accessibility, the more accessibility issues a system has the higher number of the accessibility issues, the lower its accessibility grade should be. It’s a lower-quality system. Raw issue count. The difference here is on the web. Raw issue count isn’t really useful without additional context.

This is where the concept called defect density comes in. It takes into consideration the number of issues in the code versus the size of the page. Tenon was actually the first accessibility testing tool to provide this metric, but I didn’t come up with it myself. This is again, it’s been a traditional QA thing for a long time. In traditional QA, defect density is the number of issues per 1,000 lines of code. They call it KLOC, number of issues per kilobyte of code. Because websites have many blank lines and a lot of white space, what Tenon does is it collapses all the white space and uses that as kilobyte or source code comparison.

The logic for defect entities is pretty straightforward. A simple webpage with a lot of issues is worse than a complex webpage with the same number of issues, so when I talk about this with customers, what I say is, imagine that you test the Google homepage. Google homepage has like a logo, search form, button on the top and bottom corners, there’s other links and settings menus and stuff like that but feature-wise, super, super simple page. We test that we get 100 issues, and then we take the msnbc.com webpage and we get a hundred issues. Msnbc.com obviously is much more complex, and lots, lots more content, lots more code, of course.

100 issues on Google, 100 issues on MSNBC, if we discount raw account, of course, they’re going to seem the same, they’re going to seem equal, but we know for a fact that that’s not the case. If there’s 100 issues on the Google homepage, that means very simply that it’s a worse page. It’s a much more simple page, and in practice, what we’ve seen is a strong correlation between density and usability. Pages that exceed 50% density on Tenon are found to be more difficult for users to deal with in the real world. As density increases the likelihood that users are going to be completely unable to use the content and features on that page.

This, in my opinion, actually begs the question as to whether density is really the actual true metric that we should measure a grade on. 

>> STEVE JONES: This episode of Accessibility Craft is sponsored by Equalize Digital Accessibility Checker, the WordPress plugin that helps you find accessibility problems before you hit publish. 

A WordPress native tool, Accessibility Checker provides reports directly on the post edit screen. Reports are comprehensive enough for an accessibility professional or developer, but easy enough for a content creator to understand. 

Accessibility Checker is an ideal tool to audit existing WordPress websites find, accessibility problems during new builds, or monitor accessibility and remind content creators of accessibility best practices on an ongoing basis. Scans run on your server, so there are no per page fees or external API connections. GDPR and privacy compliant, real time accessibility scanning. 

Scan unlimited posts and pages with Accessibility Checker free. Upgrade to a paid version of Accessibility Checker to scan custom post types and password protected sites, view site wide open issue reports and more.

Download Accessibility Checker free today at equalizedigital.com/accessibility-checker. Use coupon code accessibilitycraft to save 10% on any paid plan.

>> KARLK: Picture on the background of this slide is apparently a stuntman who is on a dirt bike riding through or has ridden through something on fire and he himself is also on fire. The text on the slide says, “Wait, what about comparing the norm?” We hear a lot of that in the accessibility consulting field from people who are like, “Well, how do we compare to our peers?” I’ve heard that a ton from e-retailers for sure. One retailer would be like, “How do we compare against–” and they would name their direct competitors, or at least who they see as a competitor.

At this point, the Tenon product has assessed millions of pages on the web and logged tens of millions of issues across those pages. This is actually more than enough data for us to calculate any data point that we want with statistically significant sample size, a confidence level of 99%, and a confidence interval of one. What I’m saying there is that we can gather any statistics that we want, and it will be an accurate comparison against the web as a whole. One way to do that, to do this is comparison thing, is to provide a grade based on the norm. In other words, a comparison against all the other pages that have ever been tested.

A common example of this could be basically considered like grading on a curve in college. Unfortunately, the normal webpage is really bad so the average number of errors across the web is 83 errors per page. Tenon is a little bit unique in the way it does its testing. We count individual instances of issues. Let’s say we have a table that has 10 columns, and none of those 10 columns has a scope attribute on them. We would log 10 issues. Other tools might log one issue and say, “This table is messed up.”

The average number of errors per page is 83, and the average density is 15%. What this means, this average density of 15% suggests that most pages on the web are kind of crappy. When it comes to grading for accessibility, it doesn’t really seem useful to base a grade on the norm, when the norm itself is just not accessible, so what about scope? This image doesn’t mean anything. What about scope? There’s several layers to consider in a scoring scenario in terms of the scope of the thing we’re measuring. We’ll talk about three different layers. The component, that’s the individual feature of a page or an application screen such as its navigation.

There’s the page, that’s going to be the entire page or application screen and all of its individual components or you can also call this a view. Then there’s the product, and that’s the entire collection of pages or screens that make up the product. Getting a grade on a component is actually really useful, I think. At least, determining the urgency with which you need to make repairs. Getting a grade on a page is less useful without any specific means to identify the value of the page. In other words, a per-page grade is pretty simple, but getting A grade on an inconsequential page is less useful than getting an A grade on a page that sees the most traffic from users or includes specific features or documentation for people with accessibility concerns. At the page level, what tends to happen is that the cumulative grade of the product could be impacted either too high or too low by outliers. That skew the results in one direction or another.

For instance, a great example of this would be pages that are just text, blog posts, or documentation, or something like that. Let’s face it. People do screw that stuff up, but by volume, you’re only going to get things like color contrast headings, that sort of stuff out of mostly text pages where stuff that’s interactive is going to have a lot more likelihood of having errors anyway. Identifying the relative importance of a page can be useful, but what I’ve seen or what I feel in practice is that that’s actually probably still better left as part of the priority scoring for those issues.

In other words, we’re going to deprioritize low traffic, low importance, low interactivity pages from the prioritization and increase the priority for the other things. In other words, I don’t think scoping of a page rather is relevant towards a grade. Probably scoping the grades for components is more useful. This background image has three slots for binders. One on the left is numbered 403. The number on the right is labeled 405. That means 404 is not found.

The text on this slide, “No, but really, what about relevance?” I want to harp on this one a little bit more and the reason why is relevant is especially in terms of automatic testing. Well, actually manual also applies here as well. No matter how we’re running this assessment, the relevance of the grade is directly tied to the completeness and relevance of the test set. The really simple one in terms of automation is the more tests you have, the more complete your coverage is going to be.

Of course, as anyone who’s involved in automatic testing in general, unit testing and stuff like that, testing irrelevant stuff or testing consequential stuff is a problem. You don’t want to just sit there and have a pile of completely irrelevant tests. Assuming, however, we can have a very large set of relevant tests for our system, the more the merrier. It pays to use a product that has a large number of tests.

If you’re, for instance, basing your score on something you get out of Lighthouse, Lighthouse is good and the tests within Lighthouse are very good, but there’s not a lot of them. They’ve chosen to use the subset of the aXe tests, even aXe doesn’t have as many tests as say AMP or Elevin or Tenon. That’s a real important thing to mention, but you can close that gap, of course, and you can close that gap with manual testing. Again, you have to have a codified set of manual tests, complete instructions, steps, and requirements for determining relevance, accuracy, and all that stuff. You can get 100% coverage reliably if both your automated and your manual testing is, of course, complete.

Our picture here in the background is another gearhead one. This is a person who’s assembling an engine. He’s got pistons on the table and a cylinder head. Our text here on the slides is putting it together. It turns out really in my exploration of this topic that there’s really two things that are most important with creating grade that’s relevance and number of tests, or really one thing or number of relevant tests. That’s the most important part. Relevance is really vital. If you have 50 tests to do with forms, but you have no forms on the thing you’re testing, then considering those tests into a grade makes no sense and artificially skews your data.

The number of tests is vital because if you don’t have a complete and thorough list of tests, then you may not be gathering enough data upon which to base the grade in the first place and you’ll wind up with a score that is not accurate. By the way, you also might remember that I talked about defect density. I mentioned that defect density itself may suffice is the only necessary metric forward grade, but what I found when I was looking at Tenon’s data is that once you start tracking relevant tests and passed tests, densities actually automatically figured into that. Either one of those is a fine metric density or the grade itself because they’re both synonymous at this point.

The image in the background of this slide has nothing to do with the content, but I’ve had it sitting in my slide assets archive for a long time and I just had to use it. It’s a picture of a stormtrooper and he’s holding the sign that says, “At least we didn’t kiss our sister.” If you’re familiar with Star Wars, you’ll know where that’s coming from. If you haven’t, then just watch the first one chronologically or according to release date.

Our text says, “Our target must be an A.” Regardless of what you base your score on, your target must be an A. Going back to these requests that we’ve gotten from customers, like how do we compare it to our competitor? No. That’s totally the wrong question to ask. Getting a grade that you can look at and immediately know where your system stands is super useful. It’s an awesome idea. It’s a great way of tracking your progress provided that, of course, you are tracking your progress.

It should be relatively straightforward to get a grade that’s useful and then use that as a– I don’t know. A KPI to say what’s our distance from an A accessibility because it’s a compliance domain. It’s the kind of thing that large companies want to track and that’s fine, but in practice, at least in my history, organizations that do this, or they’re doing a bottoms-up race to whatever their bare minimum acceptable grade is going to be, and then they stop. They’ll be, “Oh, well, a B is good enough.” Well, a B’s not good enough. That’s going to be their target. They’re not going to look any further. That’s why the desire to pursue a grade really is misleading and dangerous anyway.

I want to read from you a quote from WCAG standard, and it says, “Conformance to a standard means that you meet or satisfy the requirements of the standard.” That seems pretty self-explanatory. In WCAG, the requirements are the success criteria. To conform to WCAG, you need to satisfy the success criteria, that is, there is no content which violates the success criteria.

At a glance, the ability to see a score and intuitively understand how far away you are from getting that grade is cool, but choosing a less than perfect grade as good enough is dangerous, especially when you are working for an organization that has a high-risk profile e-retail, banking, travel, that sort of stuff, or you have your subject to regulatory oversight if your organizations of a certain size in Canada or Europe or your federal agency, or something like that. You have to comply with all of the success criteria at the chosen level of WCAG in order to be considered compliant. A B is not compliant, a C is not compliant.

It’s important to keep that in mind that anybody who asks you this question like what’s our grade is if you have any issues, you are non-compliant, and that’s really the message that they need to get. A grade is good for determining how far away you are from that. The background image here is another funny one. It says it’s a picture in a bathroom. Part of the sign is cut off, but it says, “If this bathroom needs service, please turn on the switch.” Below that sign is the switch plate that has no switch. The text and this slide says, “There’s only one true metric,” and this is actually one that I borrowed from a friend of mine who was doing accessibility compliance at Google. He mentioned this to me that there’s really only one thing that we need to track, there’s only one real KPI. That is, will users with disabilities want to use the product? For this, I’ll take us back to another quote from the WICAG standard. It says, although these guidelines cover a wide range of issues, they’re not able to address the needs of people with all types of degrees and combinations of disability. The real approach for scoring, it really requires us to interact with the real users, watch them use the product, ask them one of these three questions. If you’re not a current user of our product, would you want to use it? Or if you are a current user of our product, would you continue to use it? Third, if you’re a former user of this product, would you come back to using it?

So while automated and manual testing is really useful in finding potential problems in your product, only usability-testing with real users is going to tell you if you’ve gotten it right, and this one true metric is basically, will people with disabilities want to use this product? That’s the way we get our score. So that’s me, Karl Groves, you can follow me on Twitter @carlgroves, where I mainly talk about overlays and politics. The web here, levelaccess.com. My email is carl.groves@levelaccess.com. Are there any questions? I see there’s 17 comments in the chat that I haven’t read. Let’s take a look at some of these and see if there’s anything.

>> AMBER: There were a couple of comments about– let’s see, let me scroll back through, about measuring against, which I think you might have touched on, but just in case, measuring against that base of other sites, most of which have problems. I don’t know if you have any extra thoughts on that?

>> KARL: Yes. It’s really bad. I mean, the reality is there’s a lot of bad websites out there and especially in certain industries or certain market segments, a lot of those can be horrible. Unfortunately, a K-12 is a big area where things are just horrible for people with disabilities to even try to use. You can kind of guess that the larger the company, the more likely it is they’ll have done something for accessibility. But just in general, I wouldn’t use what other people do as a good example or a good thing to follow.

>> AMBER: M had a comment, you can get a 100% accessibility score from certain auto test tools, even though the site is not accessible, or you use workarounds to trick the tools?

>> KARL: Yes.

>> AMBER: I think a good follow on question like that is, are there common tools that people use that– and I know, I’m not asking you to bash on anything, but things that you wouldn’t recommend, or obviously your product is a great product, but are there other things that can be helpful in finding some of the obvious problems but that won’t create a misleading view of what the accessibility status of the product is or the website?

>> KARL: Well, first off I would say that any testing tool is good as a way to get started because you’re testing. I got to say that there are tools out there. There’s certain legacy tools that I’ve seen people use that are really old and sort of out of date and things like that. I’m not going to mention their names because I don’t want to disparage anybody. I mentioned before lighthouse. Lighthouse uses a subset of the x rule set. The tests are good. If you’re using lighthouse and doing testing with it or using access building insights from Microsoft which also uses subset. Those are good tests. If you’re finding stuff with those and you’re fixing them you’re already doing a good thing.

You’re not getting a [inaudible] because again they’re just a subset. What I would caution is people trusting any tool as gospel. We see this a lot with legal cases. There’s actually a law firm in Beverly Hills that fires out these demand letters and they base all of their stuff on wave. Wave is awesome, it’s the number one downloaded and used accessibility testing tool for a reason right? But Wave in the hands of an armature can be deceiving because what it’s going to list on wave is outright errors, color contrast problems which you do sometimes need to verify then warnings and also information. The warnings could be completely irrelevant. It could just be wrong and that’s fine because they disclose that’s a warning.

The information ones, that’s just for information view to assist you when you’re doing your manual tech checking. A lot of people don’t understand that and they’ll take it as gospel or even my tool or others will have an accurate test but there’s conditional thing that makes it irrelevant. I used to use this line from– I used to work at a Harley dealership a long long time ago and there was a mechanic there we called him old man Brian. His name was Brian and he’s old. There was this younger kid that worked as a mechanic next to him who– he had the worst snap on habit. Snap-on is a tool company. They’ll drive to auto-shops, something like that, they used to sell you tools and they had the best tools. They are phenomenal tools, Snap-on is.

But this kid always had to have the latest and greatest stuff and Brian would say it’s a poor mechanic who blames his tools for not getting the job done right. I carry that over to everything else. If you don’t understand what the testing tool is doing then you’re going to have problems. You can get misled or you’re going to chase your tail. That’s the only thing I would say about some of that stuff is tool quality is a big deal, understanding the tool is even bigger.

>> AMBER: For people that are just getting started, what resources do you have for learning or trying to better understand the tools and creating what their process should be for testing?

>> KARL: That’s a good question. For anyone involved in accessibility or anyone especially anyone new for accessibility, go to the web-aim website, webaim.org. They were instrumental in both their website and their discussion list were instrumental for me when I was beginning, starting out and I still point people there. Huge shout out to the folks at Webaim. I love them all dearly. Small plug level access has released access academy. Access academy is awesome because of free courses. It talks about everything from [inaudible] to older stuff.

>> AMBER: I don’t see too many more questions coming in so feel free anyone if you have questions put them in but I have one or two others that I sort of thought of while Karl was talking otherwise we might wrap a little early. One thing I was wondering because you mentioned that there are some law firms that just use the testing tools. Do you feel like there is a good response to that if someone has a client that gets a demand letter. I mean, obviously, the ideal response, of course, is make the website accessible and do user testing and prove that it already is and that if that was a false flag on Wave or whatever that might be. Do you have any thoughts on that having been involved in some of the lawsuits as an expert?

>> KARL: Yes, I mean it’s really– it’s a huge topic because there are trolls out there who– they’ll go away for $4000 and some of it, not all of them. A lot of them want real money but there are some of these folks, they’ll just settle for like a four grand, they’ll go away. You got to ask yourself like my lawyer at least before we got acquired by LevelAccess, my lawyer was $500 an hour. We’re not talking a lot of time before that $4000 is run out with me paying my lawyer to fight this thing.

The other reality is they probably have a good complaint. If you have a website and somebody reaches out to you and says we’re going to sue you, unless you know for a fact your website is accessible, they might have a point. That’s actually a huge problem, makes it really hard to defend. Obviously the bigger your company the more money you have to fight this sort of stuff and there you go, but there’s something to be said for just fixing your stuff.

As a matter of fact, when it comes to a legitimate complaint, all paths lead to you fixing your stuff. If you follow the possible option, we actually have a flow chart. There’s a blog post on the tenon website blog. blog.tenon.io. I forget what the blog post is titled, but we have a flowchart. It says, “Basically all paths lead to fixing your stuff.” If you fight the lawsuit and you win, the only way you’re winning is because you already have an accessible site. If you settle, you have to fix your site. If you lose, you have to fix your site.

There’s only one way to deal with the lawsuits, and that’s to fix your site. [coughs] That’s the one thing I would say. The other part, just strategically, is that some lawsuits can be mooted. Mooting in legal terms is a strategy where you basically pull the rug out from underneath the plaintiff by making it no longer an issue, so fixing your site before the trial date. Then you have the trial, you have the lawsuit thrown out, and so on and so forth. That’s legal stuff that I can’t get into anymore than that other than knowing that I have seen it happen and I participated in it happening. There’s lots of legal mumbo jumbo behind that, that I’m under-qualified for.

>> AMBER: I appreciate the thoughts and obviously that would be the most ideal response, is you get a complaint and you just go fix it and you say, “Oh, thanks for letting us know. Here’s the fix on our website.”


>> KARL: Yes.

>> AMBER: Someone said they’re trying to learn web development. “I find myself paralyzed because I’m terrified of learning inaccessible practices because accessibility is usually treated as an afterthought. Can you recommend any courses?” I think they’re referencing on web development that account for accessibility throughout.

>> KARL: Man, I wish there was a good answer for that.

>> AMBER: I know Joe Dolson does some LinkedIn Learning stuff. I know he has one on accessibility. I’m not certain if he has any without going and looking if he has just web development courses, but I would guess if he does on LinkedIn Learning, that might be a good resource.

>> KARL: Yes. That’s true. There’s a couple of LinkedIn Learning courses that are out there that have covered some of that Joe Dolson– I think Marcy Sutton had some stuff out there. Gerard Cohen, I think he did one on Lynda.com or something like that. I just put a link to an Amazon purchase for a book called Designing With Progressive Enhancement. [clears throat] The book is a little bit on the out-of-date side when it comes to like modern practices. However, if you’re just learning web development, this is going to cover a lot of really good stuff.

That’ll be a big deal because it’ll talk about JavaScript. It has HTML, CSS, CSS 3 stuff. It’s semi-modern, but a lot of the Java script in there is JQuery rather than more modern framework stuff. That will teach a lot of the fundamentals for that stuff. Also, this is another one that’s really out of date, Beginning Javascript with Java Scripting. A good friend of mine. Christian Hellman wrote this book.

>> AMBER: [silence] I think while you’re looking for that, Glen commented in the chat that if you start with semantic HTML, that’s a great start for accessible practices. Maybe don’t skip those HTML basics courses.

>> KARL: The unfortunate part is not a lot of those resources out there talk about semantic HTML. They don’t show you. None of the websites that are out there, like tutorial websites, even care about semantic HTML for the most part.

>> AMBER: Christina commented that [crosstalk] college in Toronto’s web development boot camp program integrates accessibility.

>> KARL: That’s cool.

>> AMBER: She attended that on a scholarship and there was accessibility there.

>> KARL: That’s cool.

>> AMBER: Here’s a question that is very current. How does accessibility fit into Metaverse and Web 3.0 and future emerging technologies like virtual reality, AR, VR, MR? AR I know VR. I don’t know MR. What is the scope of accessibility in these areas? Do you have thoughts on this?

>> KARL: Glenn Walker can answer that one. Thomas Logan is probably the first resource I would point anybody to for AR and VR stuff.

>> AMBER: Glen posted in the chat just in case anyone can’t see it. Thomas Logan of the A11yNYC meetup.

>> KARL: [clears throat]

>> AMBER: Do you know if those are virtual meetups?

>> KARL: Yes.

>> AMBER: Anyone could join? Great. There was a comment or question or earlier back that said, “One thing I’ve considered is using a VPAT template for developing an accessibility score so removing the non-relevant items and creating a score per accessibility level. For example, level A only has 10 relevant items divide that by 100 or so on for each level. Does this seem problematic?” The person said, “Keeping in mind, this is for clients who just really want to have a score.”

>> KARL: I don’t really agree with the idea of using VPAT for that. VPAT is great for disclosing where your failings are at a success criteria by success criteria way. I agree with that, but as far as do we have an a or b? No. [clears throat]

>> AMBER: You actually helped me when we were working on one for ours and I was looking out at some of them and I feel like there were a lot of what you were talking about earlier where they just said, “Oh, it passes.” Even though it’s it didn’t actually apply. I was going back and forth and I appreciated your thoughts on that as we were writing ours. I was like, “Well, does it pass?” [clears throat] We don’t have this. There’s no images of text. I guess we passed it. [chuckles]

>> KARL: Exactly. I’m not going to name them, but there’s a very, very large software company out there who says, if their criteria is irrelevant and it’s very– then it passes and I’m like, “No, it’s just not relevant,” but their logic for saying that is because they haven’t failed. I’m like, “That’s not the same.”


That’s not the same.

>> AMBER: I’m not seeing any other questions coming up. M said, “I agree that we should move away from scores and just fix what is wrong or build it right at the start. The point should be to create an inclusive and welcoming site for everyone.”

>> KARL: Yes.

>> AMBER: The last question that I wrote down was when you were talking about user testing. The way we sort of have approached this is we try to do automated testing and then we have our developers do manual testing themselves. After all those things are fixed, we bring in users. and I’m wondering if you have thoughts about like how people can incorporate users in their testing process. Should it be earlier? Also, maybe for someone who’s new to that, how would they go about finding users?

>> KARL: I would suggest strongly never to do usability testing until you’ve done automated and manual testing. The reason why is because you don’t want to waste your time, you don’t want to waste your participants’ time. If there’s money involved like paying stipends, which you should be doing, then you’re wasting money. I would strongly recommend not doing user testing until you’ve gotten that other stuff out of the way. I’ve seen this plenty of times. You’re going to have a person failing a task for an extraordinarily obvious reason. I’ll give you a great example. I was once testing a jobs website. This was a jobs site where in order to post your resume, you have to have an account. In order to, apply for a job, you have a resume so on and so forth.

If you’re testing this whole ability to upload a resume, but the fields don’t have labels or there’s a stupid focus problem or something like that, then you’re not really testing the usability of this process. You’re getting bogged down in technical problems that should have been addressed in the first place, but once you’ve done that the next part would be usability testing. There’s a ton of ways to do it right. There’s a ton of ways to do it effectively that aren’t like ideal. The ideal is of course you, you do a recruit. You have a defined persona. You recruit against that persona, blah, blah, blah, blah. That stuff can get really, really expensive.

If you don’t have the budget for that, you can grab some people who at least have some domain knowledge in the area of this website, and recruit them to do the testing. Lots of places are out there to finding participants in this. NFB, local lighthouses or any of the other AFB, NFB and ACB, those sort of things. You can reach out to them to see if there are any people who would be willing to participate. That should have you covered for non-visual testers. The same thing goes for any other sort of population.

There’s plenty of disability rights organizations or advocacy organizations that have lists of people who might want to participate or might be available to participate or they’ll share your announcement to your call for participants. Like M said, just make sure you pay them. [chuckles] Give them a stipend of some kind. The going rate for stipends is about $75 to $150 bucks. It’s well worth it once you’ve done that testing if you’re ready for it. By the way, going back to the question about the VPAT. Another thing that I should have mentioned before is there’s a new organization that’s out there that is trying to create an accessibility reports standard. It’s like what they say here in the introduction of a window sticker, similar to a car window sticker for telling people what your performance is for accessibility.

If this’s a topic you’re interested in and have time for, definitely get in touch with Chris Law and try to participate in that and give your feedback in that. That is one area of trying to boil all of this stuff I’ve been talking about down into something actually digestible for consumers.

>> AMBER: Is that like a “Self-proclaimed” or they’re going to come and be an external credentialing body that people can opt into? Or do you know much about what they’re looking at going to for that window sticker?

>> KARL: I don’t know, actually. I guess that has yet to evolve. Step one of course, is getting people involved in the organization and setting it up then there’s more to go from that. I have been only involved in the very very early days with the acquisition of Tenon from Level Access. That took up pretty much most of my time throughout from Fall to winter and into Spring this year. That’s been all I’ve been able to work on. [clears throat]

>> AMBER: Awesome. Well, thank you very much. This has been fabulous. Everyone please checkout the WordPress Accessibility Day website. It’s wpaccessibility.day, which is a URL.


Thank you so much everyone, have a great rest of the day.

>> CHRIS: Thanks for listening to Accessibility Craft. If you enjoyed this episode, please subscribe in your podcast app to get notified when future episodes release. You can find Accessibility Craft on Apple podcasts, Google podcasts, Spotify, and more. And if building accessibility awareness is important to you, please consider rating Accessibility Craft five stars on Apple podcasts. Accessibility Craft is produced by Equalize Digital and hosted by Amber Hinds, Chris Hinds, and Steve Jones. Steve Jones composed our theme music. Learn how we helped make 1000s of WordPress websites more accessible at equalizedigital.com.