以下为卖家选择提供的数据验证报告:
数据描述
In this project, we will use a dataset of movies with plots. The original dataset is on https://www.kaggle.com/datasets/gabrieltardochi/wikipedia-movie-plots-with-plot-summaries
The plots were scraped from Wikipedia by jrobischon and then summarized by gabrieltardochi using DistilBART-CNN-12-6 model.
There are two plots, one is full and the other is shortened. I used CO.HERE AI to vectorize them. The processed dataset was published on Kaggle with two extra columns:
plot_vector_1024: Vectorized of the full plot in 1024 dimension (a vector of 1024 float numbers) plot_summary_vector_1024: Vectorized of the summarized plot in 1024 dimension (a vector of 1024 float numbers)
The detail of the process is on https://github.com/linhhlp/Machine-Learning-Applications/Text-2-Vect-Vector-Search
