
Product
Introducing Scala and Kotlin Support in Socket
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
A tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.
The secdaily
package replicates the quarterly Financial Statement Datasets from the SEC, but on a daily basis. While the SEC only provides these datasets once per quarter, this tool allows you to:
This enables financial analysts, researchers, and developers to access structured financial statement data without waiting for the quarterly releases.
The package requires Python 3.10 or higher. Install using pip:
pip install secdaily
The main entry point is the SecDailyOrchestrator
class. Here's a basic example:
from secdaily.SecDaily import SecDailyOrchestrator, Configuration
# create the configuration
configuration = Configuration(workdir=workdir_default)
# Initialize the orchestrator
orchestrator = SecDailyOrchestrator(configuration=configuration)
# Run the full process
orchestrator.process(
start_year=2025, # Optional: specify starting year (defaults to current year)
start_qrtr=1 # Optional: specify starting quarter (defaults to current quarter)
)
The configuration class provides the following parameters:
user_agent_def
: User agent string for SEC.gov requests. If not provided, a default string will be generated. Must follow the format specified in SEC's EDGAR access requirements: "Company Name contact@company.com"workdir
: Working directory for storing all data. Defaults to current directory.xmldir
: Directory for storing XML files. If not provided, defaults to '_1_xml/' under workdir.csvdir
: Directory for storing CSV files. If not provided, defaults to '_2_csv/' under workdir.formatdir
: Directory for storing SEC-style formatted files. If not provided, defaults to '_3_secstyle/' under workdir.dailyzipdir
: Directory for storing daily zip files. If not provided, defaults to '_4_daily/' under workdir.quarterzipdir
: Directory for storing quarterly zip files. If not provided, defaults to '_5_quarter/' under workdir.clean_intermediate_files
: Flag to clean up intermediate files during housekeeping. Defaults to False.clean_db_entries
: Flag to clean up database entries during housekeeping. Defaults to False.clean_daily_zip_files
: Flag to clean up daily zip files during housekeeping. Defaults to False.clean_quarter_zip_files
: Flag to clean up quarterly zip files during housekeeping. Defaults to False.Normally, you will use the "orginial" quarterly files from the SEC Financial Statement Datasets as a starting point. Therefore, you will set the "start_year" and "start_qrtr" parameters to the quarter of the first quarter that is missing at SEC. For example, if quarterly up to 2024Q4 are available on the SEC site, you will set the "start_year" to 2025 and the "start_qrtr" to 1 in order to download and process the daily available xml files and transform them into the same format as the SEC quarterly files.
The quarterly zip file from the sec is usually available two to three weeks after the quarter end.
As soon as a new quarter zip file on SEC is available, you can then adjust the startyear and startqrtr parameters to the next quarter. Dpending on the configuration, intermediate files, database entries, and zip files can be cleaned up.
Since reports are filed daily on the SEC, you will run the process daily to be always up-to-date with the latest available reports.
num.txt
, pre.txt
, lab.txt
)You can also run individual parts of the process:
# Only process index data
orchestrator.process_index_data()
# Only process XML data
orchestrator.process_xml_data()
# Only create SEC-style formatted files
orchestrator.create_sec_style()
# Only create daily zip files
orchestrator.create_daily_zip()
# Only create quarter zip files
orchestrator.create_quarter_zip()
# Only perform housekeeping
# housekeeps everything before the start quarter
orchestrator.housekeeping(start_qrtr_info=QuarterInfo(year=2025, qrtr=1))
The tool creates the following directory structure in your specified workdir
:
workdir/
├── sec_processing.db # SQLite database for tracking processing
├── _1_xml/ # Downloaded XML files
│ ├── 2024q4/
│ │ ├── 2024-10-01/
│ │ │ ├── xyz_htm.xml.zip
│ │ │ ├── xyz_pre.xml.zip
│ │ │ ├── xyz_lab.xml.zip
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── _2_csv/ # Parsed CSV files
│ ├── 2024q4/
│ │ ├── 2024-10-01/
│ │ │ ├── xyz_num.csv.zip
│ │ │ ├── xyz_pre.csv.zip
│ │ │ ├── xyz_lab.csv.zip
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── _3_secstyle/ # SEC-style formatted files
│ ├── 2024q4/
│ │ ├── 2024-10-01/
│ │ │ ├── xyz_num.csv.zip
│ │ │ ├── xyz_pre.csv.zip
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── _4_daily/ # Daily zip files
│ ├── 2024q4/
│ │ ├── 20241001.zip
│ │ ├── 20241002.zip
│ │ └── ...
│ └── ...
└── _5_quarter/ # Quarterly zip files
├── 2024q4.zip
├── 2025q1.zip
└── ...
Each daily and quarterly zip file contains:
sub.txt
- Submission informationpre.txt
- Presentation informationnum.txt
- Numeric datanum.txt
doesn't contain content for the segments columnpre.txt
may not be the same as in the quarterly files, but the order should be the sameContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Also check out the SEC Financial Statement Data Sets Tools project.
FAQs
A tool that replicates the quarterly Financial Statement Datasets from the SEC (https://www.sec.gov/dera/data/financial-statement-data-sets), but on a daily basis.
We found that secdaily demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Application Security
/Security News
Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despite their risks, still points toward a more secure and innovative future.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.