Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
github.com/vedhavyas/sitemap-generator
Scrape is minimalistic depth controlled web scraping project. It can be used as command-line tool or integrate it in your project.
Scrape also supports sitemap
generation as an output.
Once the Scraping is done on given URL, the API returns the following structure.
// Response holds the scrapped response
package scrape
import (
"net/url"
"regexp"
)
type Response struct {
BaseURL *url.URL // starting url at maxDepth 0
UniqueURLs map[string]int // UniqueURLs holds the map of unique urls we crawled and times each url is repeated
URLsPerDepth map[int][]*url.URL // URLsPerDepth holds urls found in each depth
SkippedURLs map[string][]string // SkippedURLs holds urls extracted from source urls but failed domainRegex (if given) and are invalid.
ErrorURLs map[string]error // errorURLs holds details as to why reason the url was not crawled
DomainRegex *regexp.Regexp // restricts crawling the urls to given domain
MaxDepth int // MaxDepth of crawl, -1 means no limit for maxDepth
Interrupted bool // true if the scrapping was interrupted
}
go get github.com/vedhavyas/scrape/cmd/scrape/
Usage of ./scrape:
-domain-regex string(optional)
Domain regex to limit crawls to. Defaults to base url domain
-max-depth int(optional)
Max depth to Crawl (default -1)
-sitemap string(optional)
File location to write sitemap to
-url string(required)
Starting URL (default "https://vedhavyas.com")
Scrape supports 2 types of output.
stdout
from Response
sitemap
xml file(if passed) from the Response
.Scrape can be integrated into any Go project through the given APIs.
As a package, you will have access to the above mentioned Response
and all the data in it.
At this point, the following are the available APIs.
func Start(ctx context.Context, url string) (resp *Response, err error)
Start will start the scrapping with no depth limit(-1) and base url domain
func StartWithDepth(ctx context.Context, url string, maxDepth int) (resp *Response, err error)
StartWithDepth will start the scrapping with given max depth and base url domain
func StartWithDepthAndDomainRegex(ctx context.Context, url string, maxDepth int, domainRegex string) (resp *Response, err error)
StartWithDepthAndDomainRegex will start the scrapping with max depth and regex
func StartWithDomainRegex(ctx context.Context, url, domainRegex string) (resp *Response, err error)
StartWithRegex will start the scrapping with no depth limit(-1) and regex
func Sitemap(resp *Response, file string) error
Sitemap generates a sitemap from the given response
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.