以下为卖家选择提供的数据验证报告:
数据描述
All data in this dataset was gathered from PUBLICLY accessible web sites or databases .This dataset consists of 2 classes, savory and unsavory. The unsavory class is populate with i facial mages of convicted felons. The savory class is populated with facial images of "ordinary" people. Granted some "ordinary" people may be convicted felon but I expect the percentage is very low. All downloaded images were processed by a custom duplicate image detector before being split into a train set, a validation set and a test set. This is meant to prevent images being in common between these data sets. All images were cropped from the original downloaded image to just a facial image using the MTCNN crop module. The crop is such that very little extraneous background is included in the cropped image. This is to prevent the CNN classifier from extracting background features not relevant to the task of classification from a facial image. The train set has 5610 images in the savory class and 5610 images in the unsavory class. The test set has 300 images in the savory class and 300 images in the unsavory class as does the validation set. > BIASES IN THE DATASET Early runs of a CNN classifier led to the discovery of unwanted biases in the dataset which incorrectly influenced classifications. As a matter of fact the CNN was a smile classifier versus a personality classifier. > EMOTIONAL BIAS Images of ordinary citizens had a preponderance of "smiling" faces. People naturally smile when a photo is taken. To the converse, booking images of felons have few if any that are smiling. Consequently the data set biases the classifier to predict images with smiling faces as savory and non-smiling images as unsavory. To fix that I went to the savory dataset and tried to replace the majority of smiling images with emotionally neutral faces. > GLASSES BIAS I noticed a much higher percentage of images in the savory class wore glasses than was the case for the unsavory images. Some images in the unsavory class wore glesses. I went to the savory class images and tried to replace images with glasses with emotionally neutral images but did leave some with glasses. > RACIAL BIAS From the onssey I was aware of the potential of embedding racial bias in the dataset. While not counting each case I tried for example to include as many African Americans in the savory class as I did in the unsavory class. Last thing I want to build is a racist classifier! > OTHER POTENTIAL BIASES I am aware there may well be additional biases built into the dataset. For example I notice that there is a higher percentage of long haired images in the unsavory class than in the savory class. Also I believe more of the images in the unsavory class sport beards than in the savory class. I did not deal with these potential problems. There may well be other less apparent biases in the dataset. If you observe any please advise of same. > INTERPERTATIONS I believe this dataset provides the basis for a CNN classifier to predict with reasonably high probability the genetic predisposition of an individual from a facial image. Results of runs on the test set support this conclusion. However please Note the term genetic predisposition. That does not imply that that is the individual's actual persona but rather a predisposition toward one class extreme or another. Of course persona is NOT a binary situation but rather a spectrum with a wide range of characteristics. However one is forced into a binary situation because there is no way to do a search for 1/2 savory facial images. As the old argument goes Nature vs Nuture. Individuals with a predisposition toward the unsavory class can be of course highly savory individuals if a sufficient level of positive life experiences are applied to overcome the disposition. On the converse individuals with a savory predisposition can become unsavory due to negative influences within their lifetime environment. I personally know several individuals where this has taken place. Politicians come to mind. Started out as good guys and ended up as bad guys. So I strongly urge that if you use this data set and develop a strong classifier you DO NOT make the mistake of applying the classification as that individuals true personality. > LICENSE The dataset is to be limited specifically to personal use only. No use by governments at any level or commercial entities is allowed. If you derive work from this data you are required to post this license as part of the work. You may not limit or extend the terms of use. I have sufficient resources and will use same to take legal action if I discover violation of the terms of the license. >** FACIAL MORPHOLOGY AND BRAIN FUNCTION** It is a well established fact that a correlation exists between a facial image and certain features of brain function. The clearest example of which is the unfortunate affliction of Down's syndrome. Clearly a very obvious case. Other medical studies have demonstrated similar correlation of facial morphology to other afflictions such as Autisms and related conditions. I built a data set of about 2000 facial images of children diagnosed as Autistic and 2000 facial images of typical children (note it is estimated that 3% of children are undiagnosed Autistics). I built a classifier and the resultant accuracy on the test set was about 90%. I believe that each of us has a "builtin character classifier". It makes evolutionary sense to have this capability of pattern recognition. It is a genetic advantage to be able to discern someone who may be a threat from some who is not a threat and to be able to do so at a distance. The term 'bad vibes' illustrates my point. I find that at a least a 95% rate I agree with my classifiers conclusion on any given image. I have had other individual do the same test with similar results although as in all cases of inherited brain performance capabilities some people are better at it than others. In my case the instant I meet some one I form an instant impression of that person. Savory, unsavory, intelligent or not, extrovert or introvert ,etc. I got to thinking are my impressions simply the result of biases, or even sheer nonsense. That is why I created this dataset. I wanted to evaluate if a non biased AI network could really use facial images to identify genetic persona predispositions and if those predispositions agreed with mine. They do. > ANAMOLIES Despite the fact of very high test set characterization accuracy I am suspicious. To me the results are almost to good. I have no idea what the CNN is using to make classifications, doubt anyone does. I still think the biggest unknown is the emotional appearance of a person in an image. In the kernel of my notebook I include a predictor function. It makes predictions on images in a directory. The function has a parameter average. If average is True the function will make a prediction on each image and sum the probabilities for both savory and unsavory. At the conclusion the class is determined for ALL the images as being the class with the highest summed probability. For example if you want to classify an image you can do so using just a single image. However it is far more accurate to include say 10 different images of that individual and use the predictor to get the averaged prediction. I did that for example with the images in the images_to_predict directory which has 10 images of Dr Fauci and 1 image of a felon. You can see the results in my notebook. I have done the same with people like Sadaam Husien and ohters. What bothers me is that the misclassification rate for individual images is significantly higher than one would expect based on the accuracy of he test set results. So something is going on but I don't know what. Thought it might be the emotion shown in the image but tests showed no clear correlation. If you run similar tests and find some correlation please advise.
