ETRCN：用于轻量化光伏功率预测的ETALinear两级残差修正网络

刘书言; 李其骏; 和学豪; 苏适; 黄伟; 李鹏

doi:10.13374/j.issn2095-9389.2025.10.24.001

ETRCN：用于轻量化光伏功率预测的ETALinear两级残差修正网络

ETRCN: ETALinear Two?level Residual Correction Network for Lightweight Photovoltaic Power Forecasting

摘要

摘要: 针对现有光伏功率预测模型存在的精度低、计算时间长等问题，本文提出了ETRCN：用于轻量化光伏功率预测的ETALinear两级残差修正网络，首先利用一个参数冻结的一维卷积神经网络对原始数据进行趋势-残差分解，分解后利用改进的轻量化时间注意力机制进行特征权重计算从而对原始特征进行缩放，将注意力机制处理后的趋势项与残差项进行加性融合后得到预测值，随后利用卷积门控单元对长序列残差进行一级修正，再使用多层感知机对系统残差进行二级修正，最后将初步预测值与两步残差修正后的结果相加得到最后的预测值。实验表明，ETRCN的RMSE为82.54kW，相比CNN-BiLSTM-AM降低约23.1%，相比LSTM降低约17.1%，相比GWO-GRU降低约22.6%，相比TCN-BiGRU降低约42.8%。在拟合度方面，ETRCN的R2达到0.922，比CNN-BiLSTM提高约1.1%，LSTM提高约6.9%，比GWO-GRU提高约10.3%，比TCN-BiGRU提高约8.3%。此外，ETRCN参数量仅为17.68K，较CNN-BiLSTM-AM、LSTM、TCN-BiGRU和GWO-GRU分别减少约98.0%、99.6%、99.7%和48.6%，单步预测时间也缩短至0.235s，相较最慢的LSTM加速约81.6%，较目前主流的光伏功率预测模型而言，在计算精度与计算效率方面具有显著优越性，具有一定的工程应用价值。

Abstract: This study addresses two persistent limitations in photovoltaic (PV) power forecasting—insufficient predictive accuracy and excessive computational cost—by presenting a lightweight network: ETRCN, which couples ETALinear with a two-stage residual correction strategy. The design begins with a one-dimensional convolutional front end whose parameters are frozen. This component performs a trend–residual decomposition of the raw multivariate time series, isolating slowly varying diurnal patterns from high-frequency fluctuations driven by transient weather. On top of this decomposition, an improved, lightweight temporal attention mechanism computes dynamic feature weights that rescale inputs across time, ensuring that informative lags and meteorological channels receive proportionally greater influence while noisy or less relevant inputs are attenuated. The attention-refined trend and residual streams are additively fused to produce an initial estimate. To further reduce bias without inflating model size, the framework adopts hierarchical error mitigation: a convolutional gated unit first corrects temporally structured long-horizon residuals, and a compact multilayer perceptron subsequently calibrates remaining system-level errors. The final prediction is the sum of the initial estimate and both corrected residual terms. This modular pipeline—decomposition, attention-based feature scaling, and staged error correction—prioritizes interpretability, stable optimization, and efficiency. Methodologically, each design choice targets a known bottleneck in PV forecasting. Freezing the parameters of the decomposition CNN eliminates backpropagation overhead and stabilizes training in the earliest stage; the temporal attention block is intentionally lightweight to limit latency on commodity hardware; and the two corrective modules are narrow and task-specific, improving accuracy with minimal growth in parameter count. Because PV generation is shaped by periodic solar geometry, intermittent cloud cover, and abrupt ramp events, a decomposition-first approach separates the regular from the irregular and simplifies the learning problem for the subsequent modules. The attention mechanism then aligns model capacity with the most informative time intervals. Finally, the residual correctors capture local autocorrelation and global nonlinear biases that remain after additive fusion, but do so in a computationally frugal manner. Extensive experiments confirm concurrent gains in accuracy and efficiency. Using root mean square error (RMSE) as the primary metric, the proposed model attains 82.54kW. Relative comparisons indicate substantial error reductions: approximately 23.1% when contrasted with CNN-BiLSTM-AM, 17.1% compared with LSTM, 22.6% compared with GWO-GRU, and 42.8% compared with TCN-BiGRU. Goodness of fit is similarly strong, with an R2 of 0.922. The associated increments in R2 are about 1.1% over CNN-BiLSTM, 6.9% over LSTM, 10.3% over GWO-GRU, and 8.3% over TCN-BiGRU. Furthermore, the proposed model outperforms other methods by a significant margin in terms of error reduction and model fitting: the RMSE of 99.348 shows reductions of 31.8% relative to CNN-BiLSTM, 25.0% compared to TCN-BiGRU, 21.7% compared to LSTM, and 16.4% compared to GWO-GRU. Similarly, the R2 value of 0.876 represents improvements of 5.7%, 3.5%, 2.7%, and 2.3% over CNN-BiLSTM, TCN-BiGRU, LSTM, and GWO-GRU, respectively. These accuracy gains are achieved with a parameter budget of only 17.68 K, representing reductions of roughly 98.0%, 99.6%, 99.7%, and 48.6% when compared to CNN-BiLSTM-AM, LSTM, TCN-BiGRU, and GWO-GRU, respectively. Inference is also efficient: a single forward pass completes in 0.235 s, which corresponds to an acceleration of about 81.6% relative to the slowest baseline (LSTM). Taken together, the results show that the proposed architecture improves both predictive fidelity and computational frugality, a combination that is seldom achieved simultaneously. Ablation and component-wise analyses reinforce the contribution of each module. The frozen CNN decomposition yields a cleaner separation between low-frequency seasonality and high-frequency disturbances, which translates into more stable gradients and faster convergence downstream. The lightweight temporal attention layer consistently sharpens calibration by elevating the contribution of time steps that align with known PV dynamics while reducing the influence of erratic inputs. The first residual corrector based on convolutional gating efficiently models short-range temporal dependencies in the residual sequence, and the second corrector, a small multilayer perceptron, removes residual systematic bias without incurring significant parameter overhead. Notably, these improvements do not rely on deep stacks or large hidden dimensions; rather, they emerge from architectural parsimony and a careful division of labor among specialized blocks. From an engineering perspective, the framework is readily deployable. The small parameter footprint reduces memory usage and energy consumption, enabling on-premise inference at PV plants or substation controllers. The short prediction latency supports quasi-real-time tasks such as dispatch scheduling, reserve allocation, and curtailment decisions. Moreover, the modular structure facilitates operational flexibility: operators can disable the second-stage corrector to prioritize ultra-low latency, or replace the attention block with a static weighting scheme for hardware-constrained environments, without retraining the entire model. Overall, by uniting decomposition-first modeling, lightweight temporal attention, and hierarchical error correction within a rigorously efficient implementation, the proposed method delivers a practically meaningful balance of accuracy, speed, and interpretability for modern PV power forecasting.

HTML全文

参考文献(0)

施引文献

资源附件(0)