淡然若水

verify-tagLung Cancer Patients - mRNA information

geneticsbiologycancer

2

已售 0
124.85MB

数据标识:D17222415507092843

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Introduction & Content

Lung cancer is the most fatal cancer currently, so it would be wise to study its prognosis. In this dataset there's information about patients mRNA expression (microarray) and a lot of clinical informations. Most of the column names about clinical information are self-explanatory, genetic data is majorly the codes with expression values (microarray use light's reflection to measure gene expression, I strongly suggest you to study a little bit about microarrays before using this dataset).

There are two directories. Raw directory is the raw data, all the information is segmented and it might be hard to work with it. Clean is the dataset which I joined all the tables and clean the whole dataset, however I encourage you to try to play with the raw data,

Address this notebook when manipulating raw data.

Important: complete_dataframe has a duplicated column: target & high_risk. I was studying a clinical survivability model using this data, I defined as high risk any patient with more than 18 months of life.

Why use this dataset?

As you might notice this dataset has over 23k columns for only 442 samples, so any kind of model need a powerful feature selection technique or clustering technique, otherwise the model will completely overfit. This dataset might be useful to study this relations and how to acquire that.

This dataset does not have a clear target data, I modeled a target which would be the patient "survivability" in the next 18 months to model as a classification problem, but you could use the raw column to model a regression problem! Either way is a hard dataset to deal with.

Acknowledgements

I did not participate in gathering of these data samples or publication of this dataset, please refer to the following links for more information: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68465

If you plan to use this data for academic purposes please use the following article as a reference: https://pubmed.ncbi.nlm.nih.gov/18641660/

data icon
Lung Cancer Patients - mRNA information
2
已售 0
124.85MB
申请报告