Research

Paper

TESTING February 22, 2026

Towards Automated Page Object Generation for Web Testing using Large Language Models

Authors

Betül Karagöz, Filippo Ricca, Matteo Biagiola, Andrea Stocco

Abstract

Page Objects (POs) are a widely adopted design pattern for improving the maintainability and scalability of automated end-to-end web tests. However, creating and maintaining POs is still largely a manual, labor-intensive activity, while automated solutions have seen limited practical adoption. In this context, the potential of Large Language Models (LLMs) for these tasks has remained largely unexplored. This paper presents an empirical study on the feasibility of using LLMs, specifically GPT-4o and DeepSeek Coder, to automatically generate POs for web testing. We evaluate the generated artifacts on an existing benchmark of five web applications for which manually written POs are available (the ground truth), focusing on accuracy (i.e., the proportion of ground truth elements correctly identified) and element recognition rate (i.e., the proportion of ground truth elements correctly identified or marked for modification). Our results show that LLMs can generate syntactically correct and functionally useful POs with accuracy values ranging from 32.6% to 54.0% and element recognition rate exceeding 70% in most cases. Our study contributes the first systematic evaluation of LLMs strengths and open challenges for automated PO generation, and provides directions for further research on integrating LLMs into practical testing workflows.

Metadata

arXiv ID: 2602.19294

Provider: ARXIV

Primary Category: cs.SE

Published: 2026-02-22

Fetched: 2026-02-24 04:38

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.19294v1</id>\n    <title>Towards Automated Page Object Generation for Web Testing using Large Language Models</title>\n    <updated>2026-02-22T18:06:57Z</updated>\n    <link href='https://arxiv.org/abs/2602.19294v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.19294v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Page Objects (POs) are a widely adopted design pattern for improving the maintainability and scalability of automated end-to-end web tests. However, creating and maintaining POs is still largely a manual, labor-intensive activity, while automated solutions have seen limited practical adoption. In this context, the potential of Large Language Models (LLMs) for these tasks has remained largely unexplored. This paper presents an empirical study on the feasibility of using LLMs, specifically GPT-4o and DeepSeek Coder, to automatically generate POs for web testing. We evaluate the generated artifacts on an existing benchmark of five web applications for which manually written POs are available (the ground truth), focusing on accuracy (i.e., the proportion of ground truth elements correctly identified) and element recognition rate (i.e., the proportion of ground truth elements correctly identified or marked for modification). Our results show that LLMs can generate syntactically correct and functionally useful POs with accuracy values ranging from 32.6% to 54.0% and element recognition rate exceeding 70% in most cases. Our study contributes the first systematic evaluation of LLMs strengths and open challenges for automated PO generation, and provides directions for further research on integrating LLMs into practical testing workflows.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SE'/>\n    <published>2026-02-22T18:06:57Z</published>\n    <arxiv:comment>In proceedings of the 19th IEEE International Conference on Software Testing, Verification and Validation 2026 (ICST '26)</arxiv:comment>\n    <arxiv:primary_category term='cs.SE'/>\n    <author>\n      <name>Betül Karagöz</name>\n    </author>\n    <author>\n      <name>Filippo Ricca</name>\n    </author>\n    <author>\n      <name>Matteo Biagiola</name>\n    </author>\n    <author>\n      <name>Andrea Stocco</name>\n    </author>\n  </entry>"
}