以下为卖家选择提供的数据验证报告:
数据描述
The Unsplash Dataset
The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of searches across a nearly unlimited number of uses and contexts. Due to the breadth of intent and semantics contained within the Unsplash dataset, it enables new opportunities for research and learning.
The Unsplash Dataset is offered in two datasets:
- the Lite dataset: available for commercial and noncommercial usage, containing 25k nature-themed Unsplash photos, 25k keywords, and 1M searches
- the Full dataset: available for noncommercial usage, containing 3M+ high-quality Unsplash photos, 5M keywords, and over 250M searches
As the Unsplash library continues to grow, we’ll release updates to the dataset with new fields and new images, with each subsequent release being semantically versioned.
We welcome any feedback regarding the content of the datasets or their format. With your input, we hope to close the gap between the data we provide and the data that you would like to leverage. You can open an issue to report a problem or to let us know what you would like to see in the next release of the datasets.
For more on the Unsplash Dataset, see our announcement and site.
The Unsplash Dataset is made available for research purposes. It cannot be used to redistribute the images contained within. To use the Unsplash library in a product, see the Unsplash API.
Unsplash Dataset Documentation
The Unsplash Dataset is composed of multiple CSV files:
1 - photos.csv
The photos.csv
dataset has one row per photo. It contains properties of the photo, the name of the contributor, the image URL, and overall stats.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
photo_url | Permalink URL to the photo page on unsplash.com |
photo_image_url | URL of the image file. Note: this is a dynamic URL, so you can apply resizing and customization operations directly on the image |
photo_submitted_at | Timestamp of when the photo was submitted to Unsplash |
photo_featured | Whether the photo was promoted to the Editorial feed or not |
photo_width | Width of the photo in pixels |
photo_height | Height of the photo in pixels |
photo_aspect_ratio | Aspect ratio of the photo |
photo_description | Description of the photo written by the photographer |
photographer_username | Username of the photographer on Unsplash |
photographer_first_name | First name of the photographer |
photographer_last_name | Last name of the photographer |
exif_camera_make | Camera make (brand) extracted from the EXIF data |
exif_camera_model | Camera model extracted from the EXIF data |
exif_iso | ISO setting of the camera, extracted from the EXIF data |
exif_aperture_value | Aperture setting of the camera, extracted from the EXIF data |
exif_focal_length | Focal length setting of the camera, extracted from the EXIF data |
exif_exposure_time | Exposure time setting of the camera, extracted from the EXIF data |
photo_location_name | Location of the photo |
photo_location_latitude | Latitude of the photo |
photo_location_longitude | Longitude of the photo |
photo_location_country | Country where the photo was made |
photo_location_city | City where the photo was made |
stats_views | Total # of times that a photo has been viewed on the Unsplash platform |
stats_downloads | Total # of times that a photo has been downloaded via the Unsplash platform |
ai_description | Textual description of the photo, generated by a 3rd party AI |
ai_primary_landmark_name | Landmark present in the photo, generated by a 3rd party AI |
ai_primary_landmark_latitude | Latitude of the landmark, generated by a 3rd party AI |
ai_primary_landmark_longitude | Longitude of the landmark, generated by a 3rd party AI |
ai_primary_landmark_confidence | Landmark confidence of the 3rd party AI |
blur_hash | BlurHash hash of the photo |
2 - keywords.csv
The keywords.csv
dataset has one row per photo-keyword pair. It contains data about how a keyword is connected to a photo and the conversions of the photo our search engine for a particular keyword.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
keyword | Keyword or search term |
ai_service_1_confidence | Confidence for the keyword from a 3rd party AI (0-100) |
ai_service_2_confidence | Confidence for the keyword from another 3rd party AI (0-100) |
suggested_by_user | Whether the keyword was added by a user (human) |
3 - collections.csv
Note: A collection on Unsplash is a user created grouping of photos. These are similar to boards on Pinterest and can often group photos in complex and creative ways.
The collections.csv
dataset has one row per photo-collection pair. Whenever a photo belongs to a collection created by a user, it will appear as one row. Each row describes when the photo was added to the collection and gives the title of the collection.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
collection_id | ID of the Unsplash collection containing the photo |
collection_title | Title of the collection containing the photo |
photo_collected_at | Timestamp of when the photo was added to the collection |
4 - conversions.csv
Note: a conversion is currently defined as a user selecting an image to download it.
The conversions.csv
dataset has one row per search conversion. The dataset tells you which photo has been downloaded for a search, the country of origin, and an anonymous identifier to indiciate the unique users. The data goes back up to 1 year before the release of each version of the dataset.
Field | Description |
---|---|
converted_at | Timestamp of the conversion event |
conversion_type | Type of conversion (download only for now) |
keyword | Keyword that was searched and led to the conversion |
photo_id | Photo ID of the photo that converted |
anonymous_user_id | Anonymous user ID |
conversion_country | Country code of the device geolocation |
5 - colors.csv
Note: The coverage and score data comes from a 3rd party AI
The colors.csv
dataset has one row per major color present in the photo. The dataset tells which colors are contained within a photo, their coverage as a percentage, and a score for how in focus the color is.
Field | Description |
---|---|
photo_id | ID of the Unsplash photo |
hex | Hexadecimal representation of the color |
red | Red component of the photo in the RGB system |
green | Green component of the photo in the RGB system |
blue | Blue component of the photo in the RGB system |
keyword | Name of the closest color as a CSS color keyword |
coverage | Pixel coverage of the color as a percentage |
score | Score of the color in the photo (including the notion of focus) |
Combining datasets
You can merge the different datasets through the primary key ID fields (usually the photo_id
field). With this, you'll be able to cross-reference properties from the photos dataset with data from the keywords or conversions dataset.
