
Security News
MCP Community Begins Work on Official MCP Metaregistry
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
A python crawler authored by Ken.
python -V
pip install --upgrade pip
pip -V
pip search kcrawler
pip install kcrawler
# or
pip install --index-url https://pypi.org/simple kcrawler
pip install --upgrade kcrawler
# or
pip install --upgrade --index-url https://pypi.org/simple kcrawler
pip uninstall -y kcrawler
使用 pip 安装成功后,会自动在系统搜索路径创建可执行程序:kcrawler
, kcanjuke
, kcjuejin
。
通常是
python
或conda
安装目录下的bin
子目录下,例如:/anaconda3/bin/kcrawler
。windows 平台会创建.exe
文件。
kcrawler
是爬取所有网站应用的入口,命令执行格式如下:
kcrawler <webapp> [webapp-data] [--options]
等效于:
kc<webapp> [webapp-data] [--options]
例如:
kcrawler juejin books --url "https://..."
kcjuejin books --url "https://..."
以 kcrawler <webapp> [webapp-data] [--options]
方式运行为例。
执行如下命令:
kcrawler juejin book
命令执行成功,显示如下统计图表:
并将明细数据保存在当前目录下,同时保存 .csv
和 .xls
文件,文件名格式如下:
juejin_books_YYYY-MM-DD.csv
juejin_books_YYYY-MM-DD.xls
格式:
kcrawler juejin post --name <username> --limit 100 --url '<user_post_url>'
url 获取方式如下:
为了快速体验爬取效果,也提供了 url 缺省情况下的支持,爬取用户 ken 的专栏:
kcrawler juejin post --name ken --limit 100
爬取明细数据,会在 ken
目录下,以爬取日期和时间命名,同时保存 .csv
文件和 .xls
文件。
首先需要获取网站的 cookie
。获取方式参考《python 自动抓取分析房价数据——安居客版 》2.4 小节。
将 <anjuke_cookie>
替换成自己 cookie
,运行如下命令:
kcrawler anjuke --city shenzhen --limit 50 --cookie "<anjuke_cookie>"
也可以将 cookie
保存在当前目录下的 anjuke_cookie
(无后缀)文件中,运行如下命令:
kcrawler anjuke --city shenzhen --limit 50
命令成功运行成功后,会显示房价平均值,最大值,最小值,并绘制房价分布直方图,关闭直方图后,明细数据将保存在当前目录下,形如:anjuke_shenzhen_community_price_20xx-xx-xx.csv
。
获取其他城市的房价,只需将
city
参数改成安居客网站覆盖的城市拼音。可打开页面 https://www.anjuke.com/sy-city.html ,点击需要获取的城市,复制浏览器地址栏中城市对应的二级域名,如 beijing.anjuke.com 只取 beijing 作为 city 参数。
from kcrawler import Boss
boss = Boss()
boss_positions = boss.position()
boss_cities = boss.city()
boss_hotcities = boss.hotcity()
boss_industries = boss.industry()
boss_user_city = boss.userCity()
boss_expects = boss.expect()
jobs = boss.job(0, 1)
tencent_jobs = boss.queryjob(query='腾讯', city=101280600, industry=None, position=101301)
tencent_jobs = boss.queryjobpage(query='腾讯', city=101280600, industry=None, position=101301, page=2)
jobcard = boss.jobcard('3c2016bbf8413f3b1XR63t-1FVI~', '505ee74b-504b-4aea-921c-a3dc2016be80.f1:common-155-GroupA--157-GroupA.15')
https://pypi.org/project/kcrawler/#history
Copyright (c) 2019 kenblikylee
FAQs
A python crawler authored by Ken.
We found that kcrawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.
Research
Security News
Malicious npm packages posing as developer tools target macOS Cursor IDE users, stealing credentials and modifying files to gain persistent backdoor access.