以下为卖家选择提供的数据验证报告:
数据描述
Context
This project is my first database creation. Taking real-life data from TrueCar.com listings, scraped and posted publicly by another Kaggle user, I attempt on my own to create, preprocess, and scrutinize the data, first by building a schema to format a database in PostgreSQL13 and running several queries based on self-designated questions. Using Jupyter Notebook, I then run the data through Python’s pandas and Scikit learn packages for basic regression analysis. Finally, I created a dashboard via Tableau Public for helpful visualizations.
Content
The dataset shares all but one added column with its original: Region. The original columns include id, price, year, mileage, city, state, vin, make, and model. The addition of the Region column was a self-assigned SQL task: after the original file was uploaded into SQL, I created a new table "Regions" in the database. This data is used to visualize sales across six regions of the U.S.: Pacific, Rockies, Southwest, Midwest, Southeast, and Northeast. City and State were combined in a new column to see data to unique cities, in cases where cities share the same name with others (e.g. Pasadena, Arlington, etc.).
PostgreSQL | See my Database Creation Notes here. Python | See my notebook for performing simple analysis. Tableau | A dashboard can be found in my Tableau Public profile.
Acknowledgements
The dataset utilizes a .csv file extracted from www.TrueCar.com, scraped by Kaggle user Evan Payne (https://www.kaggle.com/jpayne/852k-used-car-listings/data?select=tc20171021.csv).
