A New Milestone: 1405 Data Entries and My Best Solar Forecasting Model Yet

Today marks a significant milestone in my solar forecasting project: The energy_weather_dataset has grown to 1,405 entries, each one a daily snapshot combining weather conditions with actual solar energy production.

This growing archive not only enhances the historical view of solar performance but powers increasingly precise forecasts using state-of-the-art machine learning models.

Model Evaluation: Random Forest vs. Gradient Boosting

In the latest round of training and evaluation, two models were put to the test:

| Model | R² Score | MAE (kWh) | Training Time | |------------------|----------|-----------|----------------| | Random Forest | 0.9209 | 4.22 kWh | 1.67 seconds | | Gradient Boosting | 0.9601 | 2.79 kWh | 0.75 seconds |

The clear winner: Gradient Boosting. It not only offers a higher R² (meaning it explains more variance in the actual production values), but it also reduces average error by over 1.4 kWh compared to Random Forest, and trains faster.

Why This Matters

Forecasting solar energy production isn't just about sunshine. My models take into account:

Cloud cover
Global radiation
Max/min temperatures
Wind speed
Precipitation

By fusing these variables into a machine learning pipeline, I can predict daily production values with impressive accuracy. And because I retrain regularly on new data, the system continues to improve itself.

What's Next?

Here’s what’s in the pipeline:

Hourly forecasts: Extending the model from daily to intra-day granularity
Model performance tracking: Logging accuracy over time to evaluate seasonal effects
Open-source release: Making the dataset and modeling pipeline available to the community

I'm not just building a forecast, I’m building a smarter home energy system, one prediction at a time.

Whether you’re managing your own solar installation or just curious about AI in sustainability, stay tuned. I’m only just getting started.

A New Milestone: 1405 Data Entries and Our Best Solar Forecasting Model Yet

A New Milestone: 1405 Data Entries and My Best Solar Forecasting Model Yet

Model Evaluation: Random Forest vs. Gradient Boosting

Why This Matters

What's Next?