十一

verify-tagAmazon Best Sellers

musicbusinesseducationcomputer sciencefood

7

已售 0
62.61MB

数据标识:D17220735014219155

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

The reason behind of this datasets.

When I give a class, I always try to use real-world data from my other jobs (censured, of course), to make my students see how the tools of the class are used out the classroom. We all have been in the Statistics class where we must count our ages, weights, or size, but: how many times the teacher gave us a real-world example? What if we change the game with real information? Tell your students that they can make the menu for their restaurants, prepare their speech to an audience or choose the launch day of their product with the same tool. As a form of gratitude and retribution to all my teachers and mentors, that gave me everything and more that I need to grow, I left here these datasets.

About the info.

  1. Here you will have 22 categories for Mexico and 25 for Brazil, in 3 different formats: csv, excel, parquet (choose your favorite). The datasets contain the Top 50 Best Sellers of that hour (they change every hour according to the footer); all obtained by a scraper made with python’s library Beautiful Soup. If you want to know the original names of the categories, here is the landing page:

Types and context.

The datasets contain:

  1. Time: (“%Y-%m-%d %H-%M”) date and time when the request was made.
  2. Rank: (Float) Rank position.
  3. Product Names: (String) The complete name that the seller puts on the object.
  4. Stars: (Float) Average from all the reviews made to the product.
  5. Reviews: (Int) Total number of reviews since the product is on sale at Amazon.
  6. Authors/Company: (String) In the case of books, kindle, and music, this column is related to the author of the piece, while the rest (in case they use it) contains the company that made the product.
  7. Edition/Console: (String) In the case of videogames, this column contains the console where you can play the videogame or use the product (like the headset, keyboards, or controls); books, hardcovers, digital, kindle; and music, CD, vinyl or boxset.
  8. Pricestdormin and Maxprices: (Float) In case there is not max price (next column), this just contains the price of the product. On the contrary, contains the lowest price of the range given by Amazon’s algorithm.

Challenges

Also, there are some “issues” left for the beginners (like me):

  • Some empty rows
  • In one country there are missing values from one column only, and just for a short period.
  • (Hint) The same country has an issue with the prices in the same period.

The missing data is not an error, it was deleted for students that are learning how to fill missing info with values. Trust me, it can be filled with a simple Data exploration. The price issue is also left by the same way. The idea came from a Kaggle that has the information of beer sales with a problem like this.

Acknowledgments

I have no words to thank Platzi for his Master program, and my coach Cesar for giving me all the challenges from Kaggle. Three months ago, I did not know a thing about python or Kaggle, and now, look at this!

Tasks left

• Design of dashboards. • Choose a product and predict when it will leave the Top 50. • Scrap the amazon product pages and obtain more info about the products. • Give a hypothesis that explains why that product reaches the top 50, and why did it stay that long. • Which is the variable that affects more in a category: price, stars, reviews, or a mix? • Add an API. • Compare both country markets. • (Spoiler) Fill in the stars and reviews of Brazil from 2020-08-01 to 2020-08-17 • (Spoiler) Correct the prices from Brazil in this time step: 2020-08-01 to 2020-08-17. Hint: thousand = ‘.’, decimal = ‘,’

data icon
Amazon Best Sellers
7
已售 0
62.61MB
申请报告