A New Milestone: 1405 Data Entries and My Best Solar Forecasting Model Yet

Today marks a significant milestone in my solar forecasting project: The energy_weather_dataset has grown to 1,405 entries, each one a daily snapshot combining weather conditions with actual solar energy production.

This growing archive not only enhances the historical view of solar performance but powers increasingly precise forecasts using state-of-the-art machine learning models.

Model Evaluation: Random Forest vs. Gradient Boosting

In the latest round of training and evaluation, two models were put to the test:

| Model | R² Score | MAE (kWh) | Training Time | |------------------|----------|-----------|----------------| | Random Forest | 0.9209 | 4.22 kWh | 1.67 seconds | | Gradient Boosting | 0.9601 | 2.79 kWh | 0.75 seconds |

The clear winner: Gradient Boosting. It not only offers a higher R² (meaning it explains more variance in the actual production values), but it also reduces average error by over 1.4 kWh compared to Random Forest, and trains faster.

Why This Matters

Forecasting solar energy production isn't just about sunshine. My models take into account:

By fusing these variables into a machine learning pipeline, I can predict daily production values with impressive accuracy. And because I retrain regularly on new data, the system continues to improve itself.

What's Next?

Here’s what’s in the pipeline:

I'm not just building a forecast, I’m building a smarter home energy system, one prediction at a time.


Whether you’re managing your own solar installation or just curious about AI in sustainability, stay tuned. I’m only just getting started.