Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSI500 Featuer (Surport CSI500 Data) #152

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/_static/img/analysis/analysis_model_IC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/analysis_model_NDQ.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/analysis_model_auto_correlation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/analysis_model_cumulative_return.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/analysis_model_long_short.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/analysis_model_monthly_IC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/cumulative_return_buy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/cumulative_return_buy_minus_sell.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/cumulative_return_hold.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/cumulative_return_sell.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/rank_label_buy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/rank_label_hold.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/rank_label_sell.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/report.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/risk_analysis_annualized_return.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/risk_analysis_bar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/risk_analysis_information_ratio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/risk_analysis_max_drawdown.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/risk_analysis_std.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/analysis/score_ic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/logo/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/logo/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/logo/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/logo/white_bg_rec+word.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/img/logo/yel_bg_rec+word.png
Binary file modified docs/_static/img/logo/yellow_bg_rec+word .png
Binary file modified docs/_static/img/logo/yellow_bg_rec.png
Binary file modified docs/_static/img/topk_drop.png
4 changes: 2 additions & 2 deletions scripts/data_collector/cn_index/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CSI300/CSI100 History Companies Collection
# CSI300/CSI100/CSI500 History Companies Collection

## Requirements

Expand All @@ -15,7 +15,7 @@ python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --m
# parse new companies
python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --method save_new_companies

# index_name support: CSI300, CSI100
# index_name support: CSI300, CSI100, CSI500
# help
python collector.py --help
```
Expand Down
125 changes: 85 additions & 40 deletions scripts/data_collector/cn_index/collector.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@
import abc
import sys
import importlib
from tqdm import tqdm
from io import BytesIO
from typing import List
from pathlib import Path

import fire
import requests
import pandas as pd
import baostock as bs
from lxml import etree
from loguru import logger

Expand All @@ -21,7 +23,6 @@
from data_collector.index import IndexBase
from data_collector.utils import get_calendar_list, get_trading_date_by_shift, deco_retry


NEW_COMPANIES_URL = "http://www.csindex.com.cn/uploads/file/autofile/cons/{index_code}cons.xls"


Expand Down Expand Up @@ -53,7 +54,7 @@ def calendar_list(self) -> List[pd.Timestamp]:

Returns
-------
calendar list
calendar list
"""
return get_calendar_list(bench_code=self.index_name.upper())

Expand All @@ -71,7 +72,7 @@ def bench_start_date(self) -> pd.Timestamp:
"""
Returns
-------
index start date
index start date
"""
raise NotImplementedError("rewrite bench_start_date")

Expand All @@ -81,7 +82,7 @@ def index_code(self) -> str:
"""
Returns
-------
index code
index code
"""
raise NotImplementedError("rewrite index_code")

Expand All @@ -101,14 +102,14 @@ def get_changes(self) -> pd.DataFrame:

Returns
-------
pd.DataFrame:
symbol date type
SH600000 2019-11-11 add
SH600000 2020-11-10 remove
dtypes:
symbol: str
date: pd.Timestamp
type: str, value from ["add", "remove"]
pd.DataFrame:
symbol date type
SH600000 2019-11-11 add
SH600000 2020-11-10 remove
dtypes:
symbol: str
date: pd.Timestamp
type: str, value from ["add", "remove"]
"""
logger.info("get companies changes......")
res = []
Expand All @@ -125,11 +126,11 @@ def normalize_symbol(symbol: str) -> str:
Parameters
----------
symbol: str
symbol
symbol

Returns
-------
symbol
symbol
"""
symbol = f"{int(symbol):06}"
return f"SH{symbol}" if symbol.startswith("60") else f"SZ{symbol}"
Expand All @@ -140,18 +141,18 @@ def _read_change_from_url(self, url: str) -> pd.DataFrame:
Parameters
----------
url : str
change url
change url

Returns
-------
pd.DataFrame:
symbol date type
SH600000 2019-11-11 add
SH600000 2020-11-10 remove
dtypes:
symbol: str
date: pd.Timestamp
type: str, value from ["add", "remove"]
pd.DataFrame:
symbol date type
SH600000 2019-11-11 add
SH600000 2020-11-10 remove
dtypes:
symbol: str
date: pd.Timestamp
type: str, value from ["add", "remove"]
"""
resp = retry_request(url)
_text = resp.text
Expand Down Expand Up @@ -188,9 +189,11 @@ def _read_change_from_url(self, url: str) -> pd.DataFrame:
for _df in pd.read_html(resp.content):
if _df.shape[-1] != 4:
continue

_tmp_count += 1
if self.html_table_index + 1 > _tmp_count:
continue

tmp = []
for _s, _type, _date in [
(_df.iloc[2:, 0], self.REMOVE, remove_date),
Expand All @@ -210,14 +213,15 @@ def _read_change_from_url(self, url: str) -> pd.DataFrame:
)
)
break

return df

def _get_change_notices_url(self) -> List[str]:
"""get change notices url

Returns
-------
[url1, url2]
[url1, url2]
"""
resp = retry_request(self.changes_url)
html = etree.HTML(resp.text)
Expand All @@ -228,15 +232,15 @@ def get_new_companies(self) -> pd.DataFrame:

Returns
-------
pd.DataFrame:
pd.DataFrame:

symbol start_date end_date
SH600000 2000-01-01 2099-12-31
symbol start_date end_date
SH600000 2000-01-01 2099-12-31

dtypes:
symbol: str
start_date: pd.Timestamp
end_date: pd.Timestamp
dtypes:
symbol: str
start_date: pd.Timestamp
end_date: pd.Timestamp
"""
logger.info("get new companies......")
context = retry_request(self.new_companies_url).content
Expand Down Expand Up @@ -283,6 +287,47 @@ def html_table_index(self):
return 1


class CSI500(CSIIndex):
@property
def index_code(self):
return "000905"

@property
def bench_start_date(self) -> pd.Timestamp:
return pd.Timestamp("2007-01-15")

@property
def html_table_index(self):
return 0

def get_changes(self):
return self.get_changes_with_history_companies(self.get_history_companies())

def get_history_companies(self):
"""Data source:http://baostock.com/baostock/index.php/%E4%B8%AD%E8%AF%81500%E6%88%90%E5%88%86%E8%82%A1
Avoid a large number of parallel data acquisition,
such as 1000 times of concurrent data acquisition, because IP will be blocked
Returns
-------

"""
lg = bs.login()
today = pd.datetime.now()
date_range = pd.DataFrame(pd.date_range(start="2007-01-15", end=today, freq="7D"))[0].dt.date
ret_list = []
col = ["date", "symbol", "code_name"]
for date in tqdm(date_range, desc="Download CSI500"):
rs = bs.query_zz500_stocks(date=str(date))
zz500_stocks = []
while (rs.error_code == "0") & rs.next():
zz500_stocks.append(rs.get_row_data())
result = pd.DataFrame(zz500_stocks, columns=col)
result["symbol"] = result["symbol"].apply(lambda x: x.replace(".", "").upper())
ret_list.append(result[["date", "symbol"]])
bs.logout()
return pd.concat(ret_list, sort=False)


def get_instruments(
qlib_dir: str, index_name: str, method: str = "parse_instruments", request_retry: int = 5, retry_sleep: int = 3
):
Expand All @@ -291,23 +336,23 @@ def get_instruments(
Parameters
----------
qlib_dir: str
qlib data dir, default "Path(__file__).parent/qlib_data"
qlib data dir, default "Path(__file__).parent/qlib_data"
index_name: str
index name, value from ["csi100", "csi300"]
index name, value from ["csi100", "csi300"]
method: str
method, value from ["parse_instruments", "save_new_companies"]
method, value from ["parse_instruments", "save_new_companies"]
request_retry: int
request retry, by default 5
request retry, by default 5
retry_sleep: int
request sleep, by default 3
request sleep, by default 3

Examples
-------
# parse instruments
$ python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --method parse_instruments
# parse instruments
$ python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --method parse_instruments

# parse new companies
$ python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --method save_new_companies
# parse new companies
$ python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --method save_new_companies

"""
_cur_module = importlib.import_module("data_collector.cn_index.collector")
Expand Down
2 changes: 2 additions & 0 deletions scripts/data_collector/cn_index/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
baostock
logure
fire
requests
pandas
lxml
loguru
tqdm
Loading