Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
ease-crawler
Advanced tools
Readme
后续功能待添加...
const easeCrawler = require('ease-crawler')
const crawler = new easeCrawler({
disable: false,
args: {
headless: false,
ignoreHTTPSErrors: true
}
})
// 由于使用了puppeteer-pool,需要初始化完成之后才能进行操作
crawler.init().then(() => {
crawler.crawl('https://example.com', async (browser, curPage) => {
let content = await curPage.evaluate(() => {
return $('div').html()
})
// content你所选取的节点内容
}, (err) => { console.log('err', err) })
})
参数名 | 类型 | 描述 | 默认值 |
---|---|---|---|
disable | boolean | 是否禁用css和img | false |
args | object | puppeteer launch时传入的参数 | {} |
用于抓取页面数据
有些页面需要等待ajax请求完成之后才能抓取数据
禁用页面的css和img
检测页面是否有jq,没有则注入
分时函数,用于防止抓取速度过快被封ip
用于把数据保存至csv
crawler.saveToCsv(['city', 'name'], {city: '杭州', name: '最强'}, 'book')
调用此函数之后,会在项目的根目录创建一个csv文件,使用gbk编码,如果是window可能会乱码
用sublime打开csv,file => save with encoding => UTF-8 with BOM 可以解决此问题
用于截屏,提供的url必须可以在浏览器中打开
crawler.screenShot({
url: xxx,
width: xxx,
height: xxx,
onSuccess: (url) => {
console.log('url')
},
onError: (err) => {
console.log(`server occur error: ${err}`)
}
})
FAQs
基于puppeteer的爬虫
The npm package ease-crawler receives a total of 18 weekly downloads. As such, ease-crawler popularity was classified as not popular.
We found that ease-crawler demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.