以下为卖家选择提供的数据验证报告:
数据描述
The dataset consists of Haematoxylin and Eosin stained histology images at 20x objective magnification (~0.5 microns/pixel) from 6 different data sources. For each image, an instance segmentation and a classification mask is provided. Within the dataset, each nucleus is assigned to one of the following categories:
- Epithelial
- Lymphocyte
- Plasma
- Eosinophil
- Neutrophil
- Connective tissue For more information on the dataset and the associated categories, we encourage participants to read the original dataset paper.
Data Format
Our provided patch-level dataset contains 4,981 non-overlapping images of size 256x256 provided in the following format:
- RGB images
- Segmentation & classification maps
- Nuclei counts The RGB images and segmentation/classification maps are each stored as a single NumPy array. The RGB image array has dimensions
4981x256x256x3
, whereas the segmentation & classification map array has dimensions4981x256x256x2
. Here, the first channel is the instance segmentation map and the second channel is the classification map. For the nuclei counts, we provide a singlecsv
file, where each row corresponds to a given patch and the columns determine the counts for each type of nucleus. The row ordering is in line with the order of patches within the numpy files.A given nucleus is considered present in the image if any part of it is within the central 224x224 region within the patch. This ensures that a nucleus is only considered for counting if it lies completely within the original 256x256 image.
Content
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Acknowledgements
This dataset was provided by the Organizers of the CoNIC Challenge:
- Simon Graham (TIA, PathLAKE)
- Mostafa Jahanifar (TIA, PathLAKE)
- Dang Vu (TIA)
- Giorgos Hadjigeorghiou (TIA, PathLAKE)
- Thomas Leech (TIA, PathLAKE)
- David Snead (UHCW, PathLAKE)
- Shan Raza (TIA, PathLAKE)
- Fayyaz Minhas (TIA, PathLAKE)
- Nasir Rajpoot (TIA, PathLAKE)
TIA: Tissue Image Analytics Centre, Department of Computer Science, University of Warwick, United Kingdom
UHCW: Department of Pathology, University Hospitals Coventry and Warwickshire, United Kingdom
PathLAKE: Pathology Image Data Lake for Analytics Knowledge & Education, University Hospitals Coventry and Warwickshire, United Kingdom
