🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Demo Install Sign in

github.com/TTingChen/Golang-WebCrawler

Package Overview

Dependencies

Alerts

File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/TTingChen/Golang-WebCrawler

v0.0.0-20210910002209-dfc469df8959

Source

Version published: 4 years ago

Created: 4 years ago

Source

webcrawler

簡介

設計一個 Web Service，能依據關鍵字對兩個購物網站 (Waston, Ebay)進行爬蟲，並將結果呈現出來

基本架構

利用 HTTP Handler 建構一個基礎的 Web API
利用第三方爬蟲框架 Colly 來實現爬蟲的基本需求
利用 JSON 來儲存爬蟲結果，提供了資料的高相容性以及未來擴展開發的便利性

其他細節

爬下來的商品資訊包含：名稱、價錢、圖片連結、商品連結
運用 Colly 的 Parallelism, Async 參數來實現 worker
運用 Colly 的 Limit, UserAgent 等參數來模擬真實使用者狀態來避免被網站封鎖
利用 interface 抽換底層實作，讓 code 具有延展性，並更容易測試
利用 context 來實現 graceful shutdown
搜尋結果有多頁時，可自動依據所設定商品數量計算頁數爬蟲
程式被中斷時，worker 能先將手上任務完成才結束
使用 mutex 來避免 HTTP Writer 造成的 race condition
基於 HTTP Writer 和 Colly 的並用來實現 real time render

尚可改進目標

運用 Database 建立 cache 機制，特定期限內 user 再次搜尋相同關鍵字，就不用再爬一次，但應避免 hard-code DB 連線資訊

使用方式

打開終端機到程式所在位置輸入

go run main.go
打開本地端任一瀏覽器於網址輸入

localhost:port number/search?keyword=your keyword
或是打開終端機

curl 'localhost:port number/search?keyword=your keyword'
Unit-test

go test -v ./...

爬蟲結果

Client端
Server端

FAQs

What is github.com/TTingChen/Golang-WebCrawler?

Package last updated on 10 Sep 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/TTingChen/Golang-WebCrawler

webcrawler

簡介

基本架構

其他細節

尚可改進目標

使用方式

爬蟲結果

Related posts

Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs

Malicious Koishi Chatbot Plugin Exfiltrates Messages Triggered by 8-Character Hex Strings