
Product
Introducing Scala and Kotlin Support in Socket
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
github.com/tomstuart92/web-crawler
We'd like you to write a simple web crawler in a programming language of your choice. Feel free to either choose one you're very familiar with or, if you'd like to learn some Go, you can also make this your first Go program! The crawler should be limited to one domain - so when you start with https://monzo.com/, it would crawl all pages within monzo.com, but not follow external links, for example to the Facebook and Twitter accounts. Given a URL, it should print a simple site map, showing the links between pages.
Ideally, write it as you would a production piece of code. Bonus points for tests and making it as fast as possible!
go run main.go --target=https://jigsaw.xyz --concurrency=1 --singleDomain=true
Utilise channels/go routines by spinning up a set of routines up to concurrency
which are responsible for fetching URLs. The resultant pages are sent to another set of go routines which are allowed to scale as wide as needed to deal with tokenisation of the html and extraction of the links. This design allows us to have a limited set of concurrent connections to the target site, while dealing with the tokenisation (which can be more expensive than the fetch) in a manner that maximises efficiency.
Once the links have been extracted,they are returned to the main thread which is responsible for maintaining state of which links have been seen, and the relationship between pages via a graph data structure. New links are sent to the worker threads so they can be scraped, and the process continues until we no longer have any pages to scrape.
At that point a sitemap is printed, via a BFS traversal of the graph data structure.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Application Security
/Security News
Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despite their risks, still points toward a more secure and innovative future.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.