数据描述
This dataset, originally from CMS contains a range of information, including diagnoses, procedures, prescriptions, and financial data, among others. It represents ~ 5% of the whole CMS SynPUF data and has 55M claims - but it is too big to open with pandas in a Jupyter notebook. You can limit it to a smaller nrows (eg: , nrows=20000000) while pd.read_csv
Columns description: RangeIndex: 20000000 entries, 0 to 19999999 Data columns (total 43 columns):
Column Dtype
0 DESYNPUF_ID object 1 BENE_BIRTH_DT int64 2 BENE_DEATH_DT int64 3 BENE_SEX_IDENT_CD int64 4 BENE_RACE_CD int64 5 BENE_ESRD_IND object 6 SP_STATE_CODE int64 7 BENE_COUNTY_CD int64 8 BENE_HI_CVRAGE_TOT_MONS int64 9 BENE_SMI_CVRAGE_TOT_MONS int64 10 BENE_HMO_CVRAGE_TOT_MONS int64 11 PLAN_CVRG_MOS_NUM int64 12 SP_ALZHDMTA int64 13 SP_CHF int64 14 SP_CHRNKIDN int64 15 SP_CNCR int64 16 SP_COPD int64 17 SP_DEPRESSN int64 18 SP_DIABETES int64 19 SP_ISCHMCHT int64 20 SP_OSTEOPRS int64 21 SP_RA_OA int64 22 SP_STRKETIA int64 23 MEDREIMB_IP float64 24 BENRES_IP float64 25 PPPYMT_IP float64 26 MEDREIMB_OP float64 27 BENRES_OP float64 28 PPPYMT_OP float64 29 MEDREIMB_CAR float64 30 BENRES_CAR float64 31 PPPYMT_CAR float64 32 CLM_ID int64 33 CLM_FROM_DT int64 34 CLM_THRU_DT int64 35 ICD9_DGNS_CD_1 object 36 PRF_PHYSN_NPI_1 float64 37 HCPCS_CD_1 object 38 LINE_NCH_PMT_AMT_1 float64 39 LINE_BENE_PTB_DDCTBL_AMT_1 float64 40 LINE_COINSRNC_AMT_1 float64 41 LINE_PRCSG_IND_CD_1 object 42 LINE_ICD9_DGNS_CD_1 object dtypes: float64(13), int64(24), object(6) memory usage: 6.4+ GB
The 20M claims represent ~ 141k unique individuals with ~12k unique ICD9 diagnoses and 7k unique HCPCS (procedure codes)
验证报告
以下为卖家选择提供的数据验证报告:
