以下为卖家选择提供的数据验证报告:
数据描述
Context
These datasets are framed on predicting the short-term electricity, this forecasting problem is known in the research field as short-term load forecasting (STLF). These datasets address the STLF problem for the Panama power system, in which the forecasting horizon is one week, with hourly steps, which is a total of 168 hours. These datasets are useful to train and test forecasting models and compare their results with the power system operator official forecast (take a look at real-time electricity load). The datasets include historical load, a vast set of weather variables, holidays, and historical load weekly forecast features. More information regarding these datasets context, a literature review of forecasting techniques suitable for this dataset, and results after testing a set of Machine Learning; are available in the article Short-Term Electricity Load Forecasting with Machine Learning. (Aguilar Madrid, E.; Antonio, N. Short-Term Electricity Load Forecasting with Machine Learning. Information 2021, 12, 50. https://doi.org/10.3390/info12020050)
Objectives
The main objectives around these datasets are:
- Evaluate the power system operator official forecasts (weekly pre-dispatch forecast) against the real load, on weekly basis.
- Develop, train and test forecasting models to improve the operator official weekly forecasts (168 hours), in different scenarios.
Considerations to compare results
The following considerations should be kept to compare forecasting results with the weekly pre-dispatch forecast:
- Saturday is the first day of each weekly forecast; for instance, Friday is the last day.
- The first full-week starting on Saturday should be considered as the first week of the year, to number the weeks.
- A 72 hours gap of unseen records should be considered before the first day to forecast. In other words, next week forecast should be done with records until each Tuesday last hour.
- Make sure to train and test keeping the chronological order of records.
Data sources
Data sources provide hourly records, from January 2015 until June 2020. The data composition is the following:
- Historical electricity load, available on daily post-dispatch reports, from the grid operator (ETESA, CND).
- Historical weekly forecasts available on weekly pre-dispatch reports, both from ETESA, CND.
- Calendar information related to school periods, from Panama's Ministry of Education, published in official gazette.
- Calendar information related to holidays, from "When on Earth?" website.
- Weather variables, such as temperature, relative humidity, precipitation, and wind speed, for three main cities in Panama, from Earthdata.
The original data sources provide the post-dispatch electricity load in individual Excel files on a daily basis and weekly pre-dispatch electricity load forecast data in individual Excel files on a weekly basis, both with hourly granularity. Holidays and school periods data is sparse, along with websites and PDF files. Weather data is available on daily NetCDF files.
Datasets
For simplicity, the published datasets are already pre-processed by merging all data sources on the date-time index:
- A CSV file containing all records in a single continuous dataset with all variables.
- A CSV file containing the load forecast from weekly pre-dispatch reports.
- Two Excel files containing suggested regressors and 14 pairs of training/testing datasets as described in the PDF file.
These 14 pairs of raining/testing datasets are selected according to these testing criteria:
- A testing week for each month before the lockdown due to COVID-19.
- Select testing weeks containing holidays.
- Plus, two testing weeks during the lockdown.
Less pre-processed data
- Less pre-processed data regarding these datasets can be found in this data repository.
