🚀 Big News:Socket Has Acquired Secure Annex.Learn More →
Socket
Book a DemoSign in
Socket

codebase-to-text

Package Overview
Dependencies
Maintainers
1
Versions
9
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

codebase-to-text

A Python package to convert codebase to text

pipPyPI
Version
1.2
Maintainers
1

Codebase to Text Converter

A powerful Python tool that converts codebases (folder structures with files) into a single text file or Microsoft Word document (.docx), while preserving folder structure and file contents. Perfect for AI/LLM processing, documentation generation, and code analysis.

✨ Features

  • Multi-source input: Local directories and GitHub repositories
  • Flexible output: Text files (.txt) and Microsoft Word documents (.docx)
  • Smart exclusions: Advanced pattern matching for files and directories
  • Performance optimized: Efficient traversal of large codebases
  • Comprehensive logging: Detailed verbose mode for transparency
  • Encoding support: Handles various file encodings gracefully

🚀 Installation

pip install codebase-to-text

đź“– Usage

Command Line Interface (CLI)

Basic Usage

codebase-to-text --input "path_or_github_url" --output "output_path" --output_type "txt"

Advanced Usage with Exclusions

# Exclude specific patterns
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude "*.log,temp/,**/__pycache__/**"

# Multiple exclude arguments
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude "*.pyc" --exclude "build/" --exclude "venv/"

# Exclude hidden files
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude_hidden

# Verbose mode for detailed logging
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --verbose

Python API

from codebase_to_text import CodebaseToText

# Basic usage
converter = CodebaseToText(
    input_path="path_or_github_url",
    output_path="output_path",
    output_type="txt"
)
converter.get_file()

# Advanced usage with exclusions
converter = CodebaseToText(
    input_path="./my_project",
    output_path="./output.txt",
    output_type="txt",
    exclude=["*.log", "temp/", "**/__pycache__/**"],
    exclude_hidden=True,
    verbose=True
)
converter.get_file()

# Get text content without saving to file
text_content = converter.get_text()
print(text_content)

🎯 Exclusion Patterns

The tool supports powerful exclusion patterns to filter out unwanted files and directories:

Pattern Types

  • Exact filename: README.md, config.yaml
  • Wildcard patterns: *.log, *.tmp, test_*
  • Directory patterns: __pycache__/, .git/, node_modules/
  • Recursive patterns: **/__pycache__/**, **/node_modules/**
  • Path-based patterns: src/temp/, docs/build/

Exclusion Sources

  • CLI Arguments: Use --exclude flag (can be used multiple times)
  • .exclude file: Place in your project root (see example below)
  • Default patterns: Common files/folders are excluded automatically

Default Exclusions

The tool automatically excludes common development files:

  • .git/, __pycache__/, *.pyc, *.pyo
  • node_modules/, .venv/, venv/, env/
  • *.log, *.tmp, .DS_Store
  • .pytest_cache/, build/, dist/

📝 .exclude File Example

Create a .exclude file in your project root:

# .exclude file - Patterns for files/folders to exclude

# Version control
.git/
.gitignore

# Python
__pycache__/
*.pyc
venv/
.pytest_cache/

# Node.js
node_modules/
*.log

# IDE files
.vscode/
.idea/

# Project specific
config/secrets.yaml
data/large_files/

đź”§ CLI Parameters

ParameterDescriptionExample
--inputInput path (local folder or GitHub URL)./my_project or https://github.com/user/repo
--outputOutput file path./output.txt
--output_typeOutput format (txt or docx)txt
--excludeExclusion patterns (repeatable)--exclude "*.log" --exclude "temp/"
--exclude_hiddenExclude hidden files/foldersFlag (no value)
--verboseEnable detailed loggingFlag (no value)

đź’ˇ Examples

Convert Local Project

# Basic conversion
codebase-to-text --input "~/projects/my_app" --output "my_app_code.txt" --output_type "txt"

# With custom exclusions
codebase-to-text --input "~/projects/my_app" --output "my_app_code.txt" --output_type "txt" --exclude "*.log,build/,dist/" --verbose

Convert GitHub Repository

# Public repository
codebase-to-text --input "https://github.com/username/repo" --output "repo_analysis.docx" --output_type "docx"

# With exclusions for cleaner output
codebase-to-text --input "https://github.com/username/repo" --output "repo_clean.txt" --output_type "txt" --exclude "*.md,docs/,examples/"

Python Integration

# Analyze a codebase programmatically
from codebase_to_text import CodebaseToText

def analyze_codebase(project_path):
    converter = CodebaseToText(
        input_path=project_path,
        output_path="analysis.txt",
        output_type="txt",
        exclude=["*.log", "test/", "**/__pycache__/**"],
        verbose=True
    )
    
    # Get the content
    content = converter.get_text()
    
    # Process with your preferred LLM/AI tool
    # analysis_result = your_ai_tool.analyze(content)
    
    return content

# Usage
code_content = analyze_codebase("./my_project")

🎯 Use Cases

  • AI/LLM Training: Prepare codebases for language model training
  • Code Review: Generate comprehensive code overviews for review
  • Documentation: Create single-file documentation from projects
  • Analysis: Feed entire codebases to AI tools for analysis
  • Migration: Document legacy codebases before migration
  • Learning: Study open-source projects more effectively

🔄 Output Format

The generated output includes:

  • Folder Structure: Tree-like representation of the directory structure
  • File Contents: Full content of each file with metadata
  • Clear Separators: Distinct sections for easy navigation

✒️ License

License This project is licensed under the MIT License - see the LICENSE file for details.

Keywords

codebase

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts