Smarter Forecasts: Fixing Feature Selection and Calibration in My AI Model
For a while now my AI-based solar forecasts have been performing reasonably well, but with an odd quirk: the predictions sometimes shot far above what the system could realistically produce or in other cases missed actual output by a wide margin. On closer inspection two underlying issues came to light. The model was not using all available weather features, and the calibration logic that ties model output back to reality had never been implemented due to lack of time. Both of these have now been fixed and the results speak for themselves.
The Feature Gap: Why Partial Data Leads to Partial Insights
When I first built my dataset I included a rich set of weather and production variables:
Temperatures: temp_max, temp_min, temp_mean
Precipitation: precip
Solar Irradiance: solar_ghi
Windspeed: windspeed
Cloud Cover: cloud_cover
That is a fairly complete picture of the conditions that drive photovoltaic output. Yet due to a subtle coding oversight the AI model was not trained on all of them.
What surprised me was how accurate the results actually were even with this restricted feature set. On many days the forecasts came close enough to be useful which shows how much information is already carried in temperature extremes and cloud cover. Still the model often had to guess in the dark whenever conditions deviated from the norm, for instance when irradiance was high but hidden behind haze or when rain reduced effective output in ways temperature alone could not capture.
This gap has now been closed. The model is trained on the full set of features from weather_energy_dataset.csv. With all relevant signals in play it no longer guesses blindly. Instead it takes a holistic view of the weather system balancing sunshine rain wind and clouds in its predictions. The result is not only more robust but also far more resilient to unusual weather conditions.
Calibration: Anchoring the Model to Reality
Even with better features any machine learning model can drift. Solar production is capped by hard physical limits, in my case the inverter’s maximum of about 60 kWh per day, and yet the raw AI model does not know that. This is where calibration comes in.
Up until now calibration was missing altogether. Not because it was overlooked but simply because there had not been enough time to implement it. The forecasts were therefore passed through unchanged. If the model overshot by 20 percent the forecasts overshot by 20 percent.
I have now implemented a proper ratio-based calibration logic. It works like this: for each of the last 14 days I compare the model’s prediction against the actual measured production. From this I compute correction factors that describe how far off the model tends to be. The median of these ratios is then applied as a scaling adjustment.
If the AI consistently overshoots forecasts are scaled down. If it undershoots they are nudged upwards. The calibration is dynamic: it adapts to recent trends and keeps forecasts anchored to reality as conditions change with the seasons.
Why This Matters
These two fixes, full feature training and proper calibration, may seem like technical details but together they make a dramatic difference. Instead of forecasts that sometimes spiraled into implausible territory I now have predictions that stay well within the physical limits of the system while adapting smoothly to current weather dynamics.
For me watching the dashboards every morning this means a forecast I can finally trust as more than just a rough guess. It means better planning for when to charge the car, run appliances, or store excess production. And it also means the AI system is doing what it was always meant to: learning not just from past data but also from its own recent mistakes.
In short my forecasts are now smarter steadier and more reliable than ever.
What does Hybrid Forecast mean?
The Hybrid forecast shown in the tables is not a separate model in itself but a weighted blend of the two forecasts we already calculate: the AI forecast, trained on historical production and weather data, and the GTI forecast, derived from irradiance and system geometry. Rather than relying on one alone, the Hybrid combines both, using a 70/30 ratio. In practice this means that seventy percent of the Hybrid value comes from the AI model and thirty percent from the GTI model.
The choice of 70/30 is deliberate. Over time I noticed that the AI forecast tends to be more accurate on most days, since it has learned from years of real-world production data. At the same time, the AI model can sometimes overreact under unusual weather conditions and produce forecasts that swing too far up or down. The GTI forecast, while often less precise in absolute terms, remains more physically grounded and stable. Giving it a thirty percent influence helps anchor the AI’s output and prevents the Hybrid from drifting too far when the machine-learning model misfires.
An example makes this clear. On one recent day the AI forecast predicted 58.1 kWh, while the GTI forecast suggested just 39.9 kWh. Actual production ended up at 50.0 kWh. On their own, both models were off the mark, one overshooting and the other undershooting. The Hybrid forecast, however, landed at 52.6 kWh, much closer to reality. By letting the two approaches balance each other out, the Hybrid provides a more reliable day-to-day guide, especially when weather conditions deviate from the norm.