🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more

easy-twitter-crawler

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

easy-twitter-crawler

简易、强大的推特(Twitter)采集程序,支持元搜索,用户,粉丝,关注,发文,回复,评论等采集

1.0.4
Maintainers
1

easy_twitter_crawler

推特(Twitter)采集程序,支持用户,发文,评论采集,希望能为使用者带来益处。如果您也想贡献好的代码片段,请将代码以及描述,通过邮箱( xinkonghan@gmail.com )发送给我。代码格式是遵循自我主观,如存在不足敬请指出!

推特三件套(有需要可自行安装)

安装

pip install easy-twitter-crawler

主要功能

  • search_crawler 关键词搜索采集(支持热门,用户,最新,视频,照片;支持条件过滤)
  • user_crawler 用户采集(支持用户信息,用户粉丝和关注,用户发文,用户回复)
  • common_crawler 通用采集(支持发文,评论)

简单使用

设置代理及cookie (关键词,用户发文,用户回复,评论需要设置cookie)

proxy = {
    'http': 'http://127.0.0.1:10808',
    'https': 'http://127.0.0.1:10808'
}
cookie = 'auth_token=686fa28f49400698820d0a3c344c51efdeeaf73a; ct0=5bed99b7faad9dcc742eda564ddbcf37888f8794abd6d4d736919234440be2172da1e9a9fc48bb068db1951d1748ba5467db2bc3e768f122794265da0a9fa6135b4ef40763e7fd91f730d0bb1298136b'

关键词采集使用案例(对关键词指定条件采集10条数据)

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, search_crawler, TwitterFilter

key_word = 'elonmusk'

twitter_filter = TwitterFilter(key_word)
twitter_filter.word_category(lang='en')
twitter_filter.account_category(filter_from='', to='', at='')
twitter_filter.filter_category(only_replies=None, only_links=None, exclude_replies=None, exclude_links=None)
twitter_filter.interact_category(min_replies='', min_faves='', min_retweets='')
twitter_filter.date_category(since='', until='')
key_word = twitter_filter.filter_join()

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in search_crawler(
        key_word,
        data_type='Top',
        count=10,
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))

关键词采集参数说明

字段名类型必须描述
key_wordstring关键词
data_typestring指定采集的板块,大小写均可(热门:Top 用户:People 最新:Latest 视频:Videos 照片:Photos)
countint采集的数量(默认不采集:-1,采集全部:0,采集指定的数量:>0)

关键词过滤参数说明(对标推特搜索功能,同一参数多个值间用空格隔开)

所属类别字段名类型必须描述
word_categoryexactstring精确短语
word_categoryfilter_anystring任何一词(支持多个)
word_categoryexcludestring排除这些词语 (支持多个) 示例:dog cat
word_categorytabstring这些话题标签(支持多个)
word_categorylangstring语言(文档后附语言可选范围)
account_categoryfilter_fromstring来自这些账号(支持多个)
account_categorytostring发给这些账号(支持多个)
account_categoryatstring提及这些账号(支持多个)
filter_categoryonly_repliesbool仅回复
filter_categoryonly_linksbool仅链接
filter_categoryexclude_repliesbool排除回复
filter_categoryexclude_linksbool排除链接
interact_categorymin_repliesint最少回复次数
interact_categorymin_favesint最少喜欢次数
interact_categorymin_retweetsint最少转推次数
date_categorysincestring开始日期('2023-07-20')
date_categoryuntilstring结束日期('2023-08-20')

用户信息采集使用案例(采集该用户信息及10条文章,10条回复,10个粉丝信息,10个关注信息)

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, user_crawler

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in user_crawler(
        'elonmusk',
        article_count=10,
        reply_count=10,
        following_count=10,
        followers_count=10,
        # start_time='2023-07-20 00:00:00',
        # end_time='2023-07-27 00:00:00',
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))
    print(f"文章数:{len(info.get('article', []))}")
    print(f"粉丝数:{len(info.get('followers', []))}")
    print(f"关注数:{len(info.get('following', []))}")
    print(f"回复数:{len(info.get('reply', []))}")

用户信息采集参数说明

字段名类型必须描述
user_idstring用户名(https://twitter.com/elonmusk 中的 elonmusk)
article_countint采集文章数(默认不采集:-1,采集全部:0,采集指定的数量:>0)
reply_countint采集回复数 (默认不采集:-1,采集全部:0,采集指定的数量:>0)
following_countint采集关注数 (默认不采集:-1,采集全部:0,采集指定的数量:>0)
followers_countint采集粉丝数 (默认不采集:-1,采集全部:0,采集指定的数量:>0)
start_timestring数据截取开始时间 (仅当采集文章或回复时有效)
end_timestring数据截取结束时间(仅当采集文章或回复时有效)

通用采集使用案例(已知文章id,采集此文章信息)

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, common_crawler

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in common_crawler(
        '1684447438864785409',
        data_type='article',
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))

通用采集使用案例(已知文章id,采集此文章下10条评论)

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, common_crawler

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in common_crawler(
        '1684447438864785409',
        data_type='comment',
        comment_count=10,
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))

通用采集参数说明

字段名类型必须描述
task_idstring文章id(https://twitter.com/elonmusk/status/1690164670441586688 中的 1690164670441586688)
data_typestring采集类型(文章:article 评论:comment)
comment_countint采集评论数量(仅当data_type为comment时有效;默认不采集:-1,采集全部:0,采集指定的数量:>0)

语言表

语言代码语言名称英文名
aa阿法尔语Afar
ab阿布哈兹语Abkhaz language
ae阿维斯陀语Avestan language
af南非语Afrikaans
ak阿坎语Arkan language
am阿姆哈拉语Amharic
an阿拉贡语Aragonese
ar阿拉伯语Arabic
as阿萨姆语Assam
av阿瓦尔语Avar language
ay艾马拉语Aymara
az阿塞拜疆语Azerbaijani
ba巴什基尔语Bashkir
be白俄罗斯语Belarusian
bg保加利亚语Bulgarian
bh比哈尔语Bihar
bi比斯拉马语Bislama
bm班巴拉语Bambara
bn孟加拉语Bengali
bo藏语Tibetan language
br布列塔尼语Breton
bs波斯尼亚语Bosnian
ca加泰隆语Catalan
ce车臣语Chechen
ch查莫罗语Chamorro
co科西嘉语Corsican language
cr克里语Kerry
cs捷克语Czech
cu古教会斯拉夫语Ancient Church Slavic
cv楚瓦什语Chuvash language
cy威尔士语Welsh
da丹麦语Danish
de德语German
dv迪维希语Dhivehi language
dz不丹语Bhutanese
ee埃维语Ewe language
el现代希腊语Modern Greek
en英语English
eo世界语Esperanto
es西班牙语Spanish
et爱沙尼亚语Estonian
eu巴斯克语Basque
fa波斯语Persian
ff富拉语Fulah language
fi芬兰语Finnish
fj斐济语Fijian
fo法罗语Faroese
fr法语French
fy弗里西亚语Frisian
ga爱尔兰语Irish
gd苏格兰盖尔语Scottish Gaelic
gl加利西亚语Galician
gn瓜拉尼语Guarani
gu古吉拉特语Gujarati
gv马恩岛语Manx language
ha豪萨语Hausa
he希伯来语Hebrew
hi印地语Hindi
ho希里莫图语Greek language
hr克罗地亚语Croatian
ht海地克里奥尔语Haitian Creole
hu匈牙利语Hungarian
hy亚美尼亚语Armenian
hz赫雷罗语Herero
ia国际语 AInterlingua
id印尼语Indonesian
ie国际语 EInterlingua E
ig伊博语Ibo language
ii四川彝语(诺苏语)Sichuan Yi (Nuosu)
ik依努庇克语According to Nupian language
io伊多语Ido language
is冰岛语Icelandic
it意大利语Italian
iu因纽特语Inuit language
ja日语Japanese
jv爪哇语Javanese
ka格鲁吉亚语Georgian
kg刚果语Congo
ki基库尤语Kikuyu
kj宽亚玛语Aum wide language
kk哈萨克语Kazakh
kl格陵兰语Greenlandic
km高棉语Cambodian
kn卡纳达语Kannada
ko朝鲜语、韩语Korean, Korean
kr卡努里语Canouli
ks克什米尔语Kashmir
ku库尔德语Kurdish
kv科米语Komi
kw康沃尔语Cornish
ky吉尔吉斯语Kyrgyz language
la拉丁语Latin
lb卢森堡语Luxembourgish
lg卢干达语Lugan da language
li林堡语Limburg
ln林加拉语Lingala
lo老挝语Lao
lt立陶宛语Lithuanian
lu卢巴语Luba
lv拉脱维亚语Latvian
mg马达加斯加语Madagascar
mh马绍尔语Marshall language
mi毛利语Maori language
mk马其顿语Macedonian
ml马拉亚拉姆语Malayalam
mn蒙古语Mongolian
mo摩尔达维亚语Moldavian
mr马拉提语Marathi
ms马来语Malay
mt马耳他语Maltese
my缅甸语Burmese
na瑙鲁语Nauru language
nb书面挪威语Written Norwegian
nd北恩德贝勒语North Ndebele
ne尼泊尔语Nepali language
ng恩敦加语Ennastatic
nl荷兰语Dutch
nn新挪威语New Norwegian
no挪威语Norwegian
nr南恩德贝勒语South End Baylor
nv纳瓦霍语Navajo
ny尼扬贾语Nyanja
oc奥克语Och
oj奥吉布瓦语Ojibwa
om奥洛莫语Olomouc
or奥利亚语Oriya
os奥塞梯语Ossetian language
pa旁遮普语Punjabi
pi巴利语Pali
pl波兰语Polish
ps普什图语Pashto
pt葡萄牙语Portuguese
qu凯楚亚语Kai Chu Asian
rm罗曼什语Romansh language
rn基隆迪语Kirundi
ro罗马尼亚语Romanian
ru俄语Russian
rw卢旺达语Rwanda
sa梵语Sanskrit
sc萨丁尼亚语Sardinian
sd信德语Sindhi language
se北萨米语Northern Sami
sg桑戈语Sango language
sh塞尔维亚-克罗地亚语Serbian - Croatian
si僧加罗语Sinhala
sk斯洛伐克语Slovak
sl斯洛文尼亚语Slovenian
sm萨摩亚语Samoan
sn绍纳语Shona language
so索马里语Somali
sq阿尔巴尼亚语Albanian
sr塞尔维亚语Serbian
ss斯瓦特语Swat
st南索托语South Sotho
su巽他语He language
sv瑞典语Swedish
sw斯瓦希里语Swahili
ta泰米尔语Tamil
te泰卢固语Telugu
tg塔吉克斯坦语Tajikistan
th泰语Thai
ti提格里尼亚语Tigrinya
tk土库曼语Turkmen
tl他加禄语Tagalog
tn塞茨瓦纳语Sethwana
to汤加语Tongan
tr土耳其语Turkish
ts宗加语Zong dialect
tt塔塔尔语Tatar
tw特威语Twain language
ty塔希提语Tahitian
ug维吾尔语Uyghur
uk乌克兰语Ukrainian
ur乌尔都语Urdu
uz乌兹别克语Uzbek
ve文达语Vinda
vi越南语Vietnamese
vo沃拉普克语Volapuk
wa沃伦语Warren
wo沃洛夫语Wolof
xh科萨语Xhosa
yi依地语Yiddish
yo约鲁巴语Yoruba
za壮语Zhuang
zh中文(汉语)Chinese
zu祖鲁语Zulu

链接

Github:https://github.com/hanxinkong/easy-twitter-crawler

在线文档:https://easy-twitter-crawler.xink.top

贡献者

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts