A New Milestone: 1405 Data Entries and My Best Solar Forecasting Model Yet
Today marks a significant milestone in my solar forecasting project: The energy_weather_dataset has grown to 1,405 entries, each one a daily snapshot combining weather conditions with actual solar energy production.
This growing archive not only enhances the historical view of solar performance but powers increasingly precise forecasts using state-of-the-art machine learning models.
Model Evaluation: Random Forest vs. Gradient Boosting
In the latest round of training and evaluation, two models were put to the test:
| Model | R² Score | MAE (kWh) | Training Time | |------------------|----------|-----------|----------------| | Random Forest | 0.9209 | 4.22 kWh | 1.67 seconds | | Gradient Boosting | 0.9601 | 2.79 kWh | 0.75 seconds |
The clear winner: Gradient Boosting. It not only offers a higher R² (meaning it explains more variance in the actual production values), but it also reduces average error by over 1.4 kWh compared to Random Forest, and trains faster.
Why This Matters
Forecasting solar energy production isn't just about sunshine. My models take into account:
- Cloud cover
- Global radiation
- Max/min temperatures
- Wind speed
- Precipitation
By fusing these variables into a machine learning pipeline, I can predict daily production values with impressive accuracy. And because I retrain regularly on new data, the system continues to improve itself.
What's Next?
Here’s what’s in the pipeline:
- Hourly forecasts: Extending the model from daily to intra-day granularity
- Model performance tracking: Logging accuracy over time to evaluate seasonal effects
- Open-source release: Making the dataset and modeling pipeline available to the community
I'm not just building a forecast, I’m building a smarter home energy system, one prediction at a time.
Whether you’re managing your own solar installation or just curious about AI in sustainability, stay tuned. I’m only just getting started.