Research

Paper

TESTING February 27, 2026

QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models

Authors

Md Hasanur Rashid, Jesun Firoz, Nathan R. Tallent, Luanzheng Guo, Meng Tang, Dong Dai

Abstract

With the increasing importance of distributed scientific workflows, there is a critical need to ensure Quality of Service (QoS) constraints, such as minimizing time or limiting execution to resource subsets. However, the unpredictable nature of workflow behavior, even with similar configurations, makes it difficult to provide QoS guarantees. For effective reasoning about QoS scheduling, we introduce QoSFlow, a performance modeling method that partitions a workflow's execution configuration space into regions with similar behavior. Each region groups configurations with comparable execution times according to a given statistical sensitivity, enabling efficient QoS-driven scheduling through analytical reasoning rather than exhaustive testing. Evaluation on three diverse workflows shows that QoSFlow's execution recommendations outperform the best-performing standard heuristic by 27.38%. Empirical validation confirms that QoSFlow's recommended configurations consistently match measured execution outcomes across different QoS constraints.

Metadata

arXiv ID: 2602.23598
Provider: ARXIV
Primary Category: cs.DC
Published: 2026-02-27
Fetched: 2026-03-02 06:04

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.23598v1</id>\n    <title>QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models</title>\n    <updated>2026-02-27T01:59:05Z</updated>\n    <link href='https://arxiv.org/abs/2602.23598v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.23598v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>With the increasing importance of distributed scientific workflows, there is a critical need to ensure Quality of Service (QoS) constraints, such as minimizing time or limiting execution to resource subsets. However, the unpredictable nature of workflow behavior, even with similar configurations, makes it difficult to provide QoS guarantees. For effective reasoning about QoS scheduling, we introduce QoSFlow, a performance modeling method that partitions a workflow's execution configuration space into regions with similar behavior. Each region groups configurations with comparable execution times according to a given statistical sensitivity, enabling efficient QoS-driven scheduling through analytical reasoning rather than exhaustive testing. Evaluation on three diverse workflows shows that QoSFlow's execution recommendations outperform the best-performing standard heuristic by 27.38%. Empirical validation confirms that QoSFlow's recommended configurations consistently match measured execution outcomes across different QoS constraints.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.DC'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.PF'/>\n    <published>2026-02-27T01:59:05Z</published>\n    <arxiv:comment>to be published in 40th IEEE International Parallel &amp; Distributed Processing Symposium (IPDPS), 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.DC'/>\n    <author>\n      <name>Md Hasanur Rashid</name>\n    </author>\n    <author>\n      <name>Jesun Firoz</name>\n    </author>\n    <author>\n      <name>Nathan R. Tallent</name>\n    </author>\n    <author>\n      <name>Luanzheng Guo</name>\n    </author>\n    <author>\n      <name>Meng Tang</name>\n    </author>\n    <author>\n      <name>Dong Dai</name>\n    </author>\n  </entry>"
}