A New Milestone: 1405 Data Entries and My Best Solar Forecasting Model Yet
Today marks a significant milestone in my solar forecasting project: The energy_weather_dataset has grown to 1,405 entries — each one a daily snapshot combining weather conditions with actual solar energy production.
This growing archive not only enhances the historical view of solar performance but powers increasingly precise forecasts using state-of-the-art machine learning models.
Model Evaluation: Random Forest vs. Gradient Boosting
In the latest round of training and evaluation, two models were put to the test:
| Model | R² Score | MAE (kWh) | Training Time | |------------------|----------|-----------|----------------| | Random Forest | 0.9209 | 4.22 kWh | 1.67 seconds | | Gradient Boosting | 0.9601 | 2.79 kWh | 0.75 seconds |
The clear winner: Gradient Boosting. It not only offers a higher R² (meaning it explains more variance in the actual production values), but it also reduces average error by over 1.4 kWh compared to Random Forest — and trains faster.
Why This Matters
Forecasting solar energy production isn't just about sunshine. My models take into account:
- Cloud cover
- Global radiation
- Max/min temperatures
- Wind speed
- Precipitation
By fusing these variables into a machine learning pipeline, I can predict daily production values with impressive accuracy. And because I retrain regularly on new data, the system continues to improve itself.
What's Next?
Here’s what’s in the pipeline:
- Hourly forecasts: Extending the model from daily to intra-day granularity
- Model performance tracking: Logging accuracy over time to evaluate seasonal effects
- Open-source release: Making the dataset and modeling pipeline available to the community
I'm not just building a forecast — I’m building a smarter home energy system, one prediction at a time.
Whether you’re managing your own solar installation or just curious about AI in sustainability, stay tuned. I’m only just getting started.