🌸叶

verify-tagSubset of house prices and images socal

united stateshousingtabularimage

7

已售 0
72.62MB

数据标识:D17222538073697357

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Context

Wanting to further explore CNN + MLP hybrid modeling for housing prices, I (reasonably) cleaned and took a subset of the socal housing data made available by ted8080 at:

https://www.kaggle.com/ted8080/house-prices-and-images-socal/

His CSV cleaning code proved helpful as well as the fact that he (ted8080) had made a large list of images that needed to be cleaned. I wrote my own image cleaning code in Python, but used his list of bad images to clean the files using my code.

Content

The data set contains images and numeric data (including prices) for 2000 training and 1000 validation data. This is not an ideal split (typically it should be more like 2/3 and 1/3 split) but since the number of data is not large (on purpose) the validation set was made larger than usual. There is a clear demonstration of learning and not unreasonable price prediction achieved with this. One can also employ k-fold cross validation since the data set is not large.

NOTE NOTE NOTE: The images retain their original numeric label after cleaning and taking a subset, which means that the names may range from 0 to 3000+, even though there are only 2000 training and 1000 validation images. This is also true for the additional features CSV file which accompanies the images. The CSV house IDs range from 0 to 15000+ but only 3000 are actually used in the code. As a result, this labeling throws off the Kaggle column statistics displayed in the file pre-viewer.

Acknowledgements

Again, the original data are made available by ted8080 at the above Kaggle URL. I also acknowledge the helpful content at the PyTorch forum which had a nice discussion of CNN + MLP hybrid architectures useful for this work.

Inspiration

There are more detailed datasets out there for housing, which contain many more features/variables, but the purpose of this work was to extract as much performance from a small data set and model as possible.

data icon
Subset of house prices and images socal
7
已售 0
72.62MB
申请报告