Forecasting Smarter: How I Tuned My GTI-Based Solar Production Model
Over the past weeks, I've been refining my GTI solar production forecast model to better align with real-world results from my PV system. While the model was already grounded in sound physics and weather data, actual production data revealed one recurring issue: overestimation – sometimes by more than 100%.
So, I dug into the details. Here's what I changed, and why.
The Problem: Overconfident Forecasts
Using data from Open-Meteo and GTI (Global Tilted Irradiance) values, the model calculated expected energy yields for each hour of the day. But reality didn't match theory - especially on days with variable cloud cover or full sun. The discrepancy was often due to two things:
- Unrealistic cloud correction (clouds reduce more than assumed)
- No modeling of inverter clipping (my inverter caps output at 7.0 kW, no matter how much sun hits the panels)
The original GTI_CORRECTION_FACTOR was 3.0 - an aggressive multiplier that inflated all GTI values. It worked well on paper, but poorly in practice.
What Changed: Smarter Assumptions
Here’s what I updated in my Python-based forecasting script:
GTI_CORRECTION_FACTOR = 1.35
I ran a retrospective analysis of the past two months, comparing forecasted vs. actual daily energy production. By minimizing the mean absolute error (MAE), I found that a factor of 1.35 yielded the best results. This means I now treat GTI data more conservatively — closer to what actually hits the inverter.
CLOUD_REDUCTION_FACTOR = 0.35
Clouds reduce production significantly. The old setting of 0.5 was too lenient. At full cloud cover (100%), I now reduce GTI by 35%, reflecting real performance on overcast days.
TEMPERATURE_COEFFICIENT = 0.005
PV panels lose efficiency when they get hot. Most silicon modules lose about 0.4–0.5% per °C above 25°C. So I adjusted the coefficient from an overly pessimistic 0.01 down to 0.005.
Modeled Explicitly
This was the big one.
My solar array has a DC capacity of 8.88 kWp (24 × 370 W panels), but the inverter is limited to 7.0 kW. On sunny days, the output plateaus at 7.0 kW, no matter how much energy is available.
So I added explicit clipping. This small but crucial change dramatically improved forecast realism. Gone are the midday spikes that used to suggest 9–10 kWh output per hour. Now the system reflects what’s physically possible — not just what the sun could theoretically deliver.
Testing the New Model: Reality Check
Of course, no model is complete without validation.
I compared the updated GTI forecasts against my Fronius inverter’s real-world production data over the past two months. Here’s what I found:
| Metric | Value | |--------|-------| | MAE (Mean Absolute Error) | 14.67 kWh | | RMSE (Root Mean Square Error) | 19.18 kWh | | MAPE (Mean Absolute Percentage Error) | 39.83% | | R² Score | –0.36 |
Not There Yet
While these results are a big step forward from the overblown forecasts I started with, they also show I’m not done yet. A negative R² means the model still performs worse than simply guessing the average. Clearly, the cloud and GTI corrections need more work — possibly with a non-linear approach.
But: we’re finally in the right order of magnitude.
Lessons Learned
- Physics-based models are powerful, but they need tuning. Defaults are rarely optimal.
- Real-world constraints like inverter clipping must be modeled — or they will skew your data.
- Validation is everything. Always measure forecast accuracy using MAE, MAPE, and R².
What’s Next?
- I plan to auto-tune
CLOUD_REDUCTION_FACTORandGTI_CORRECTION_FACTORusing grid search over the historical dataset. - I’ll test AI-enhanced forecasts that learn correction layers from error patterns — and may incorporate techniques already used in my separate AI model.
- In fact, since I’ve already built a complete machine learning pipeline for solar forecasting, I might use some of that logic (e.g. residual learning or model stacking) to fine-tune the GTI-based approach.
- I’ll also explore generating confidence intervals, so forecasts are not just single numbers but come with error bars and estimated uncertainty.