Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
@waynechang65/ptt-crawler
Advanced tools
Readme
ptt-crawler 是一個專門用來爬批踢踢(Ptt)各版資料的爬蟲模組。
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
批踢踢(Ptt)是台灣最大的BBS(Bulletin Board System),也是許多台灣大數據分析常參考的資料庫。 不過,大多數Ptt爬蟲都是用python程式所寫。 本人為了在Node.js上爬批踢踢(Ptt)的資料,乾脆就自己用javascript打造一個簡單的爬蟲模組,並且分享給大家使用。
Ptt is the most famous and biggest BBS(Bulletin Board System) in Taiwan and also an import reference database for big data analysis.
However, most of ptt crawler modules are written by python.
In order to scrape data from Ptt by Node.js,
I just create a simple ptt crawler module by javascript and share it to everyone to use.
npm install --save @waynechang65/ptt-crawler
const ptt_crawler = require('@waynechang65/ptt-crawler');
// *** Initialize ***
await ptt_crawler.initialize();
// *** GetResult ***
let ptt = await ptt_crawler.getResults({
board: 'PokemonGO',
pages: 3,
skipPBs: true,
getContents: true
}); // Ptt PokemonGO board, 3 pages, skip fixed bottom posts, scrape content of posts
// *** Close ***
await ptt_crawler.close();
{ titles[], urls[], rates[], authors[], dates[], marks[], contents[] }
git clone https://github.com/WayneChang65/ptt-crawler.git
npm install
npm start
options.board: 欲爬的ptt版名, board name of ptt
options.pages: 要爬幾頁, pages
options.skipPBs: 是否忽略置底文, skip fix bottom posts
options.getContents: 是否爬內文(會花費較多時間), scrape contents
ptt-crawler 雖然是一個小模組,但本人還是希望這個專案能夠持續進步!若有發現臭蟲(bug)或問題,請幫忙在Issue留言告知詳細情形。
歡迎共同開發。歡迎Fork / Pull Request,謝謝。:)
Even though ptt-crawler is a small project, I hope it can be improving. If there is any issue, please comment and welcome to fork and send Pull Request. Thanks. :)
FAQs
A web crawler module designed to scarp data from Ptt.
The npm package @waynechang65/ptt-crawler receives a total of 12 weekly downloads. As such, @waynechang65/ptt-crawler popularity was classified as not popular.
We found that @waynechang65/ptt-crawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.