
Security News
vlt Launches "reproduce": A New Tool Challenging the Limits of Package Provenance
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
Grab and recursively parse website sitemaps, robots.txt, and other related files.
A Python-based utility suite designed to fetch and analyze sitemaps and other well-known files from websites.
Clone the repository:
git clone https://github.com/yourusername/sitemap-utility-suite.git
cd sitemap-utility-suite
Install the required dependencies:
pip install -r requirements.txt
To install Sitemap Grabber with documentation dependencies:
Ensure you have flit installed:
pip install flit
Install the package with documentation dependencies:
flit install --deps develop --extras docs
For developers who want to contribute to the project:
flit install --deps develop --extras "docs,dev"
Alternatively, if you prefer using pip:
pip install -e ".[docs,dev]"
This will install the package in editable mode along with all necessary dependencies for building the documentation and development tools.
Here's a basic example of how to use the SitemapGrabber:
from sitemap_grabber import SitemapGrabber
# Initialize the SitemapGrabber with a website URL
grabber = SitemapGrabber("https://example.com")
# Fetch all sitemaps
grabber.get_all_sitemaps()
# Print the discovered sitemap URLs
for url in grabber.sitemap_urls:
print(url)
To fetch well-known files:
from well_known_files import WellKnownFiles
# Initialize the WellKnownFiles with a website URL
wkf = WellKnownFiles("https://example.com")
# Fetch robots.txt
robots_txt = wkf.fetch("robots.txt")
print(robots_txt)
# Fetch security.txt
security_txt = wkf.fetch("security.txt")
print(security_txt)
The SitemapGrabber
class is responsible for discovering and fetching XML sitemaps from a given website. It can:
The WellKnownFiles
class fetches common well-known files from websites, including:
It includes caching to avoid redundant requests and can handle various edge cases in HTTP responses.
Contributions are welcome! Please feel free to submit a Pull Request.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)This project is licensed under the MIT License - see the LICENSE file for details.
For more information or to report issues, please visit the GitHub repository.
FAQs
Grab and recursively parse website sitemaps, robots.txt, and other related files.
We found that sitemap_grabber demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
Research
Security News
Socket researchers uncovered a malicious PyPI package exploiting Deezer’s API to enable coordinated music piracy through API abuse and C2 server control.
Research
The Socket Research Team discovered a malicious npm package, '@ton-wallet/create', stealing cryptocurrency wallet keys from developers and users in the TON ecosystem.