sitemap

通过爬取网站，生成sitemap.xml

Go to file

zeek 83ba655383 增减存在问题栏目		2020-03-22 22:01:51 +08:00
.gitignore	修改忽略文件	2020-03-22 11:33:22 +08:00
get_url.py	修改get_url	2020-03-22 14:13:23 +08:00
README.md	增减存在问题栏目	2020-03-22 22:01:51 +08:00
requirement.txt	爬取网站url	2020-03-21 22:09:14 +08:00
sitemap.py	增加显示总的url数目	2020-03-22 22:00:26 +08:00

README.md

简介

通过爬取网站，生成sitemap.xml，方便搜索引擎收录本站链接

使用

安装依赖：

pip3 install -r requirement.txt

修改get_url.py

# 当前域名的http链接
url_root = 'https://git.zeekling.cn'
# 需要抓取的根链接，可以多写几个
url_mine_list = [
    'https://git.zeekling.cn/',
    'https://git.zeekling.cn/zeekling'
]
# 抓取的最大栈深度，默认为2
max_depth = 2
# 不需要写进sitemap.xml的链接
url_robot_arr = [
    '/user/sign_up',
    '/user/login',
    '/user/forgot_password'
]

修改sitemap.xml位置,sitemap.py

# 第一个参数为sitemap.xml的位置
create_xml('sitemap.xml', get_url.url_res_final)

修改完了之后执行

./sitemap.py

存在问题

爬取栈深度设置的较大之后爬取会比较慢

README.md Unescape Escape

简介

使用

存在问题

README.md