以下为卖家选择提供的数据验证报告:
数据描述
Context
This dataset contains images of Sudoku taken in various newspapers using martphone Cameras. This dataset contains 200 pictures of Sudoku, divided into two sets: 160 training images and 40 test images.
The outlines of the sudoku are documented in the outlines_sorted.csv, you can train a model to recognize the grid themselves.
Versions
There are three versions of the datasets:
- V2: The complete dataset with 200 images (160 for training and 40 for testing)
- mixed: Every puzzle was completed artificially (each 81 digit is set), the same images as V2, but complete.
- V1: The old version of 160 images, should not be used anymore
Citation
>@inproceedings{wicht2014camera, title={Camera-based Sudoku recognition with deep belief network}, author={Wicht, Baptiste and Hennebert, Jean}, booktitle={Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of}, pages={83--88}, year={2014}, organization={IEEE} }
>@inproceedings{wicht2015mixed, title={Mixed handwritten and printed digit recognition in Sudoku with Convolutional Deep Belief Network}, author={Wicht, Baptiste and Henneberty, Jean}, booktitle={Document Analysis and Recognition (ICDAR), 2015 13th International Conference on}, pages={861--865}, year={2015}, organization={IEEE} }
Format
The format of the dataset should be straightforward. For each imageX.jpg file, there is imageX.dat file contains the metadata for this file. Here is an example of such a file:
sonyEricsson s500i 640x480:24 JPG 0 0 0 7 0 0 0 8 0 0 9 0 0 0 3 1 0 0 0 0 6 8 0 5 0 7 0 0 2 0 6 0 0 0 4 9 0 0 0 2 0 0 0 5 0 0 0 8 0 4 0 0 0 7 0 0 0 9 0 0 0 3 0 3 7 0 0 0 0 0 0 6 1 0 5 0 0 4 0 0 0
The first line contains the brand and model of the phone that took the picture. The second line contains information about the format of the image. Then the sudoku is described, 0 indicating an empty cell.
Acknowlegement
Foto von John Morgan auf Unsplash
