You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

sec-web-scraper

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sec-web-scraper

SEC Web Scraper for the EDGAR API

0.1.1
pipPyPI
Maintainers
1

sec-web-scraper

A Python based web scraper for the SEC EDGAR database

Github Issues codecov Github docs PyPI

Overview

This library will for scraping certain financial documents from the EDGAR database such as the 10-K (and it's versions such as 10-K405,10-KSB), 20-F and 40-F.

The two main features of the library will be:

  • A document downloader portion that will fetch documents from the EDGAR database based on parameters such as a text query, time period, company ticker, and file type.
  • A scraper that will parse sections and information from the retrieved files.

Installation

Please make sure you have Python 3.7 or higher.

You can check your python version with

python --version

Then run the command below!

pip install sec-web-scraper

Usage

# Downloader
from sec_web_scraper.Downloader import Downloader

# Create new downloader object
d = Downloader()

# input the year range for filing data
d.build_index_sec(2000, 2002)


# After you've built the index, see all forms type filed in that period as a list
d.get_forms()

# If you want to find the cik of company, provide the name (fuzzy match). Returns a list
d.get_company_info('apple')

# If you want all 8-K's filled in the range above.This is a DataFrame
res = d.find_files_by_type('8-K') 

#More features to be added!
#Scraper
from sec_web_scraper.Scraper import *

#With a particular filing
sample_10k = "https://www.sec.gov/Archives/edgar/data/20/0000893220-96-000500.txt"

#Get the raw text
raw_txt = get_document_given_link(sample_10k)

#Get the sections in the document
doc_tags = get_document_tags(raw_txt)

#More features to be added!

References

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.