Abstract:
To address two persistent limitations in photovoltaic (PV) power forecasting, namely, insufficient predictive accuracy and excessive computational cost, this study presents a lightweight network—ETRCN—which couples ETALinear with a two-stage residual correction strategy. The design features a one-dimensional convolutional front end with frozen parameters. This component performs a trend residual decomposition of the raw multivariate time series, isolating slowly varying diurnal patterns from high-frequency fluctuations driven by transient weather. In addition to this decomposition, an improved lightweight temporal attention mechanism computes dynamic feature weights that rescale the inputs across time, ensuring that informative lags and meteorological channels receive a proportionally greater influence while noisy or less relevant inputs are attenuated. The attention-refined trend and residual streams are additively fused to produce an initial estimate. To further reduce the bias without increasing model size, the framework adopts hierarchical error mitigation: a convolutional gated unit first corrects temporally structured long-horizon residuals, and a compact multilayer perceptron subsequently calibrates the remaining system-level errors. The final prediction is the sum of the initial estimate and corrected residual terms. This modular pipeline, consisting of decomposition, attention-based feature scaling, and staged error correction, prioritizes interpretability, stable optimization, and efficiency. Methodologically, each design choice targets a known bottleneck in PV forecasting. Freezing the parameters of the decomposition convolutional neural network (CNN) eliminates backpropagation overhead and stabilizes training in the initial stage. The temporal attention block is intentionally lightweight to limit latency on commodity hardware, and the two corrective modules are narrow and task-specific, improving accuracy with minimal increase in parameter count. Because PV generation is shaped by periodic solar geometry, intermittent cloud cover, and abrupt ramp events, a decomposition-first approach separates the regular from the irregular, thereby simplifying the learning problem for the subsequent modules. The attention mechanism aligns the model capacity with the most informative time interval. Finally, the residual correctors capture the local autocorrelation and global nonlinear biases that remain after additive fusion in a computationally efficient manner. Extensive experiments confirm concurrent gains in accuracy and efficiency. Using the root mean square error (RMSE) as the primary metric, the proposed model attains 82.54 kW in relative comparisons with convolutional neural network – bidirectional long short-term memory – attention mechanism(CNN–BiLSTM–AM), long short-term memory (LSTM), grey wolf optimizer – gated recurrent unit (GWO–GRU), and temporal convolutional network bidirectional gated recurrent unit (TCN–BiGRU), indicating substantial error reductions of approximately 23.1%, 17.1%, 22.6%, and 42.8%, respectively. The goodness of fit is similarly strong, with an
R2 of 0.922. The associated increments in R
2 are approximately 1.1% for CNN–BiLSTM, 6.9% for LSTM, 10.3% for GWO–GRU, and 8.3% for TCN–BiGRU. Furthermore, the proposed model outperforms other methods by a significant margin in terms of error reduction and model fitting under complicated weather conditions. The RMSE of 99.348 indicates a reduction of 31.8% relative to CNN–BiLSTM, 25.0% relative to TCN–BiGRU, 21.7% relative to LSTM, and 16.4% relative to GWO–GRU. Similarly, an
R2 value of 0.876 represents improvements of 5.7%, 3.5%, 2.7%, and 2.3% over CNN–BiLSTM, TCN–BiGRU, LSTM, and GWO–GRU, respectively. These accuracy gains are achieved with a parameter budget of only 17.68 K, representing reductions of approximately 98.0%, 99.6%, 99.7%, and 48.6% compared to CNN–BiLSTM–AM, LSTM, TCN–BiGRU, and GWO–GRU, respectively. Inference is also efficient: a single forward pass is completed in 0.235 s, which corresponds to an acceleration of approximately 81.6% relative to the slowest baseline (LSTM). Taken together, the results show that the proposed architecture improves both predictive fidelity and computational efficiency, a combination that is seldom achieved simultaneously. Ablation and component-wise analyses confirm the contribution of each module. The frozen CNN decomposition yields a cleaner separation between low-frequency seasonality and high-frequency disturbances, which translates into more stable gradients and faster downstream convergence. The lightweight temporal attention layer consistently improves calibration by increasing the contribution of time steps that align with known PV dynamics while reducing the influence of erratic inputs. The first residual corrector, based on convolutional gating, efficiently models short-range temporal dependencies in the residual sequence, whereas the second corrector, a small multilayer perceptron, removes the residual systematic bias without incurring significant parameter overhead. Notably, these improvements do not rely on deep stacks or large hidden dimensions; rather, they emerge from architectural parsimony and careful division of labor among specialized blocks. From an engineering perspective, this framework is readily deployable. The small parameter footprint reduces memory usage and energy consumption, enabling on-premise inference at PV plants or substation controllers. The short prediction latency supports quasi-real-time tasks, such as dispatch scheduling, reserve allocation, and curtailment decisions. Moreover, the modular structure facilitates operational flexibility, in that, operators can disable the second-stage corrector to prioritize ultra-low latency or replace the attention block with a static weighting scheme for hardware-constrained environments without retraining the entire model. Overall, by combining decomposition-first modeling, lightweight temporal attention, and hierarchical error correction in a rigorously efficient implementation, the proposed method delivers a practically meaningful balance of accuracy, speed, and interpretability for modern PV power forecasting.