AXNav: Replaying Accessibility Tests from Natural Language
Abstrak
Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often has an overwhelming scope, and can be difficult to schedule amongst other development milestones. Recently, Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs. However, to our knowledge, no one has yet explored the use of LLMs in controlling assistive technologies for the purposes of supporting accessibility testing. In this paper, we explore the requirements of a natural language based accessibility testing workflow, starting with a formative study. From this we build a system that takes a manual accessibility test instruction in natural language (e.g., “Search for a show in VoiceOver”) as input and uses an LLM combined with pixel-based UI Understanding models to execute the test and produce a chaptered, navigable video. In each video, to help QA testers, we apply heuristics to detect and flag accessibility issues (e.g., Text size not increasing with Large Text enabled, VoiceOver navigation loops). We evaluate this system through a 10-participant user study with accessibility QA professionals who indicated that the tool would be very useful in their current work and performed tests similarly to how they would manually test the features. The study also reveals insights for future work on using LLMs for accessibility testing.
Artikel Ilmiah Terkait
Navid Salehnamadi Ziyao He S. Malek
19 April 2023
Billions of people use smartphones on a daily basis, including 15% of the world’s population with disabilities. Mobile platforms encourage developers to manually assess their apps’ accessibility in the way disabled users interact with phones, i.e., through Assistive Technologies (AT) like screen readers. However, most developers only test their apps with touch gestures and do not have enough knowledge to use AT properly. Moreover, automated accessibility testing tools typically do not consider AT. This paper introduces a record-and-replay technique that records the developers’ touch interactions, replays the same actions with an AT, and generates a visualized report of various ways of interacting with the app using ATs. Empirical evaluation of this technique on real-world apps revealed that while user study is the most reliable way of assessing accessibility, our technique can aid developers in detecting complex accessibility issues at different stages of development.
S. Malek Navid Salehnamadi S. Branham + 3 lainnya
6 Mei 2021
For 15% of the world population with disabilities, accessibility is arguably the most critical software quality attribute. The ever-growing reliance of users with disability on mobile apps further underscores the need for accessible software in this domain. Existing automated accessibility assessment techniques primarily aim to detect violations of predefined guidelines, thereby produce a massive amount of accessibility warnings that often overlook the way software is actually used by users with disability. This paper presents a novel, high-fidelity form of accessibility testing for Android apps, called Latte, that automatically reuses tests written to evaluate an app’s functional correctness to assess its accessibility as well. Latte first extracts the use case corresponding to each test, and then executes each use case in the way disabled users would, i.e., using assistive services. Our empirical evaluation on real-world Android apps demonstrates Latte’s effectiveness in detecting substantially more useful defects than prior techniques.
Daniel E. Krutz Samuel A. Malachowsky Saad Khan + 1 lainnya
26 Juni 2021
Our Accessibility Learning Labs not only inform participants about the need for accessible software, but also how to properly create and implement accessible software. These experiential browser-based labs enable participants, instructors and practitioners to engage in our material using only their browser. In the following document, we will provide a brief overview of our labs, how they may be adopted, and some of their preliminary results. Complete project material is publicly available on our project website: http://all.rit.edu
Daniel Zimmermann A. Koziolek
1 April 2023
This paper introduces a new method for GUI-based software testing that utilizes GPT-3, a state-of-the-art language model. The approach uses GPT-3’s transformer architecture to interpret natural language test cases and programmatically navigate through the application under test. To overcome the memory limitations of the transformer architecture, we propose incorporating the current state of all GUI elements into the input prompt at each time step. Additionally, we suggest using a test automation framework to interact with the GUI elements and provide GPT-3 with information about the application’s current state. To simplify the process of acquiring training data, we also present a tool for this purpose. The proposed approach has the potential to improve the efficiency of software testing by eliminating the need for manual input and allowing non-technical users to easily input test cases for both desktop and mobile applications.
Syed Masum Billah Md. Touhidul Islam Donald E. Porter
19 April 2023
Perceived accessibility of an application is a subjective measure of how well an individual with a particular disability, skills, and goals experiences the application via assistive technology. This paper first presents a study with 11 blind users to report how they perceive the accessibility of desktop applications while interacting via assistive technology such as screen readers and a keyboard. The study identifies the low navigational complexity of the user interface (UI) elements as the primary contributor to higher perceived accessibility of different applications. Informed by this study, we develop a probabilistic model that accounts for the number of user actions needed to navigate between any two arbitrary UI elements within an application. This model contributes to the area of computational interaction for non-visual interaction. Next, we derive three metrics from this model: complexity, coverage, and reachability, which reveal important statistical characteristics of an application indicative of its perceived accessibility. The proposed metrics are appropriate for comparing similar applications and can be fine-tuned for individual users to cater to their skills and goals. Finally, we present five use cases, demonstrating how blind users, application developers, and accessibility practitioners can benefit from our model and metrics.
Daftar Referensi
4 referensiAssistive-Technology Aided Manual Accessibility Testing in Mobile Apps, Powered by Record-and-Replay
Navid Salehnamadi Ziyao He + 1 lainnya
19 April 2023
Billions of people use smartphones on a daily basis, including 15% of the world’s population with disabilities. Mobile platforms encourage developers to manually assess their apps’ accessibility in the way disabled users interact with phones, i.e., through Assistive Technologies (AT) like screen readers. However, most developers only test their apps with touch gestures and do not have enough knowledge to use AT properly. Moreover, automated accessibility testing tools typically do not consider AT. This paper introduces a record-and-replay technique that records the developers’ touch interactions, replays the same actions with an AT, and generates a visualized report of various ways of interacting with the app using ATs. Empirical evaluation of this technique on real-world apps revealed that while user study is the most reliable way of assessing accessibility, our technique can aid developers in detecting complex accessibility issues at different stages of development.
A Large-Scale Longitudinal Analysis of Missing Label Accessibility Failures in Android Apps
A. S. Ross Mingyuan Zhong + 3 lainnya
29 April 2022
We present the first large-scale longitudinal analysis of missing label accessibility failures in Android apps. We developed a crawler and collected monthly snapshots of 312 apps over 16 months. We use this unique dataset in empirical examinations of accessibility not possible in prior datasets. Key large-scale findings include missing label failures in 55.6% of unique image-based elements, longitudinal improvement in ImageButton elements but not in more prevalent ImageView elements, that 8.8% of unique screens are unreachable without navigating at least one missing label failure, that app failure rate does not improve with number of downloads, and that effective labeling is neither limited to nor guaranteed by large software organizations. We then examine longitudinal data in individual apps, presenting illustrative examples of accessibility impacts of systematic improvements, incomplete improvements, interface redesigns, and accessibility regressions. We discuss these findings and potential opportunities for tools and practices to improve label-based accessibility.
Latte: Use-Case and Assistive-Service Driven Automated Accessibility Testing Framework for Android
S. Malek Navid Salehnamadi + 4 lainnya
6 Mei 2021
For 15% of the world population with disabilities, accessibility is arguably the most critical software quality attribute. The ever-growing reliance of users with disability on mobile apps further underscores the need for accessible software in this domain. Existing automated accessibility assessment techniques primarily aim to detect violations of predefined guidelines, thereby produce a massive amount of accessibility warnings that often overlook the way software is actually used by users with disability. This paper presents a novel, high-fidelity form of accessibility testing for Android apps, called Latte, that automatically reuses tests written to evaluate an app’s functional correctness to assess its accessibility as well. Latte first extracts the use case corresponding to each test, and then executes each use case in the way disabled users would, i.e., using assistive services. Our empirical evaluation on real-world Android apps demonstrates Latte’s effectiveness in detecting substantially more useful defects than prior techniques.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.