Paper
High-Fidelity Digital Twin Dataset Generation for Inverter-Based Microgrids Under Multi-Scenario Disturbances
Authors
Osasumwen Cedric Ogiesoba-Eguakun, Kaveh Ashenayi, Suman Rath
Abstract
Public power-system datasets often lack electromagnetic transient (EMT) waveforms, inverter control dynamics, and diverse disturbance coverage, which limits their usefulness for training surrogate models and studying cyber-physical behavior in inverter-based microgrids. This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators. The dataset records synchronized three-phase PCC voltages and currents, per-DG active power, reactive power, and frequency, together with embedded scenario labels, producing 38 aligned channels sampled at $Δt = 2~μ$s over $T = 1$~s ($N = 500{,}001$ samples) per scenario. Eleven operating and disturbance scenarios are included: normal operation, load step, voltage sag (temporary three-phase fault), load ramp, frequency ramp, DG trip, tie-line trip, reactive power step, single-line-to-ground faults, measurement noise injection, and communication delay. To ensure numerical stability without altering sequence length, invalid samples (NaN, Inf, and extreme outliers) are repaired using linear interpolation. Each scenario is further validated using system-level evidence from mean frequency, PCC voltage magnitude, total active power, voltage unbalance, and zero-sequence current to confirm physical observability and correct timing. The resulting dataset provides a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids. The dataset and processing scripts will be released upon acceptance
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.10262v1</id>\n <title>High-Fidelity Digital Twin Dataset Generation for Inverter-Based Microgrids Under Multi-Scenario Disturbances</title>\n <updated>2026-03-10T22:48:38Z</updated>\n <link href='https://arxiv.org/abs/2603.10262v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.10262v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Public power-system datasets often lack electromagnetic transient (EMT) waveforms, inverter control dynamics, and diverse disturbance coverage, which limits their usefulness for training surrogate models and studying cyber-physical behavior in inverter-based microgrids. This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators. The dataset records synchronized three-phase PCC voltages and currents, per-DG active power, reactive power, and frequency, together with embedded scenario labels, producing 38 aligned channels sampled at $Δt = 2~μ$s over $T = 1$~s ($N = 500{,}001$ samples) per scenario. Eleven operating and disturbance scenarios are included: normal operation, load step, voltage sag (temporary three-phase fault), load ramp, frequency ramp, DG trip, tie-line trip, reactive power step, single-line-to-ground faults, measurement noise injection, and communication delay. To ensure numerical stability without altering sequence length, invalid samples (NaN, Inf, and extreme outliers) are repaired using linear interpolation. Each scenario is further validated using system-level evidence from mean frequency, PCC voltage magnitude, total active power, voltage unbalance, and zero-sequence current to confirm physical observability and correct timing. The resulting dataset provides a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids. The dataset and processing scripts will be released upon acceptance</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='eess.SY'/>\n <published>2026-03-10T22:48:38Z</published>\n <arxiv:comment>12 pages</arxiv:comment>\n <arxiv:primary_category term='eess.SY'/>\n <author>\n <name>Osasumwen Cedric Ogiesoba-Eguakun</name>\n </author>\n <author>\n <name>Kaveh Ashenayi</name>\n </author>\n <author>\n <name>Suman Rath</name>\n </author>\n </entry>"
}