麻酱

verify-tagAnime Images Dataset

classificationdeep learninganime and mangaimage style transfersegmentation

27

已售 0
868.32MB

数据标识:D17171363569827351

发布时间:2024/05/31

Context

This dataset contains anime images for 231 different anime, with approximately 380 image for each of those anime. Please note that you might need to clean the image directories a bit, since the images might contain merchandise and live-action photos in addition to the actual anime itself.

Scripts

If you'd like to take a look at the scripts used to make this dataset, you can find them on this GitHub repo. Feel free to extend it, scrape your own images, etc. etc.

Inspiration

As a big anime fan, I found a lot of anime related datasets on Kaggle. I was however disappointed to find no dataset containing anime specific images for popular anime. Some other great datasets that I've been inspired by include:

Process

  1. You need a list of anime to scrape it. You can either:
  • Make your own list. This is what I do in the directory called "scraped_anime_list".
  • Use someone else's list. This is what I do in the directory called "kaggle_anime_list" and "top_anime_list".
  1. To be honest, I wanted to make my own list. To make a list of anime, I used the python wrapper of the unofficial MAL (MyAnimeList) API called JikanPy. JikanPy scraped MAL.
  2. Animes on MAL have a unique identifier called anime id, think of this as a unique number for each anime. This is supposed to be sequential but there are a lot of gaps from valid anime id to the next, which I discovered based on this post.
  3. These IDs can go from 1 - 100,000 and maybe beyond. However, I decided to go through the anime ids one by one from 1-50,000 and retrive the id, rank and anime_name. This is what you will find in the folder called "scraped_anime_list". Note that I prefer using the English name of the anime if it exists, and if it doesn't I get the Japanese name. Please use this list to obtain the anime ids if you intend to scrape MAL yourself, it will save you a LOT of time.
  4. I thought that someone else might've gone through and same process and voila, I found MyAnimeList Dataset on kaggle. I didn't want to wait for my scraper to finish scraping, so I decided to use this "anime_cleaned.csv" version of this list. The lists from this dataset are what you find in the "kaggle_anime_list" folder.
  5. Cleaning anime names is a task in and of itself. Within the GitHub repo, refer to the file called "notes_and_todo.md" to look at all the cleaning troubles. I tried my best to remove all:
    • Anime Movies: Since you have for instance One Piece (the anime) and One Piece Movie 1, One Piece Movie 2, and so on.
  • Seasons: MAL is an anime ranker. Different anime seasons can show up on the list with different ranks. I retain the original anime name (the most basic ones, for instance, just "Gintama" instead of "Gintama Season 4".
  1. Ultimately, I manually curated around 300 anime names, which reduced to 231 after removing duplicates, since after the curation, "Gintama" and "Gintama: Enchousen" would both be named "Gintama". This list with the duplicates is what you find in the file called "UsableAnimeList.xlsx" within the "top_anime_list" folder.
  2. This list is then rid of the duplicates and used to scrape the image URLs for each anime found in the folder called "anime_img_urls".
  3. These URLs are then used to scrape the anime images themselves, found in the folder called "anime_images".
  4. Also the tags are only a guide, feel free to use this dataset for any Deep Learning task.

Sources

看了又看

暂无推荐

验证报告

以下为卖家选择提供的数据验证报告:

data icon
Anime Images Dataset
27
已售 0
868.32MB
申请报告