
卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家
数据描述
About Dataset
The dataset contains the file required for training and testing and split accordingly.
There are two groups of features that you can use for prediction:
- Fundamentals and ratios: Values collected form statements and balance sheets for each ticker
- Technical indicators and strategy flags: Technical indicators calculated on close value of each day and buy and sell signals generated using some commonly used trading strategies.
Files found in Fundamentals folder is a processed format of the files found in raw folder. Ratios and other values are stretched to match the length of the closing price column such that the value in the pe_ratio column for example is the PE ratio from the most recent quarter and this applies for every column.
Technical indicators are calculated with the default parameters used in Pandas_TA package.
Data is collected form finance.yahoo.com and macrotrends.net
Timeframe for the given data is different from one ticker to another because of unavailability of some stocks for a given time frame on either of the websites.
All code required to collect the data and perform preprocessing and feature engineering to get the data in the given format can be found in the following notebooks:
- https://www.kaggle.com/code/mohammedobeidat/us-stocks-data-collection
- https://www.kaggle.com/code/mohammedobeidat/us-stocks-technicals-feature-engineering-and-eda
- https://www.kaggle.com/code/mohammedobeidat/us-stocks-fundamentals-preprocessing-and-eda
Files
- {<>_ticker_train}.csv - the training set
- {<>_ticker_train}.csv - the test set
Columns
Columns names are supposed to be self-explanatory assuming you are familiar with the stock market.
Some acronyms you may encounter:
- tmm is short for Trailing Twelve Months
- pe is short for Price to Earnings
- pb is short for Price to Book Value
- ps is short for Price to Sales
- fcf is short for Free Cash Flow
- eps is short for Earnings per Share
