New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

shlesha

Package Overview
Dependencies
Maintainers
1
Versions
19
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

shlesha

High-performance extensible transliteration library with hub-and-spoke architecture

latest
Source
npmnpm
Version
0.5.4
Version published
Maintainers
1
Created
Source

Shlesha - Schema-Driven Transliteration Library

A transliteration library for Sanskrit and Indic scripts using schema-driven architecture. Built with compile-time optimization and runtime schema loading.

Quick Start

Setup command:

./scripts/quick-start.sh

This sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.

For detailed setup instructions, see DEVELOPER_SETUP.md.

Documentation: See DOCUMENTATION_INDEX.md for guides and references.

Architecture Features

  • Schema-generated converters with compile-time optimization
  • Zero runtime overhead from code generation
  • Token-based conversion system for memory efficiency

Schema-Based Architecture

Compile-Time Code Generation

Converters are generated at compile-time from declarative schemas:

# schemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
  name: "slp1"
  script_type: "roman"
  description: "Sanskrit Library Phonetic Basic"

target: "iso15919"

mappings:
  vowels:
    "A": "ā"
    "I": "ī" 
    "U": "ū"
    # ... more mappings
# schemas/bengali.yaml - Generates optimized Bengali converter  
metadata:
  name: "bengali"
  script_type: "brahmic"
  description: "Bengali/Bangla script"

mappings:
  vowels:
    "অ": "अ"    # Bengali A → Devanagari A
    "আ": "आ"    # Bengali AA → Devanagari AA
    # ... more mappings

Build-Time Optimization

The build system automatically generates highly optimized converters:

# Build output showing schema processing
warning: Processing YAML schemas...
warning: Generating optimized converters with Handlebars templates...
warning: Created 18 schema-generated converters with O(1) lookups

Hub-and-Spoke Architecture

Multi-Hub Design

  • Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
  • ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
  • Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
  • Direct Conversion: Bypass hubs when possible for maximum performance

Routing

The system determines the conversion path:

// Direct passthrough - zero conversion cost
transliterator.transliterate("धर्म", "devanagari", "devanagari")?; // instant

// Single hub - one conversion 
transliterator.transliterate("धर्म", "devanagari", "iso")?; // deva→iso

// Cross-hub - optimized path
transliterator.transliterate("dharma", "itrans", "bengali")?; // itrans→iso→deva→bengali

Supported Scripts

Indic Scripts (Schema-Generated)

  • Devanagari (devanagari, deva) - Sanskrit, Hindi, Marathi
  • Bengali (bengali, bn) - Bengali/Bangla script
  • Tamil (tamil, ta) - Tamil script
  • Telugu (telugu, te) - Telugu script
  • Gujarati (gujarati, gu) - Gujarati script
  • Kannada (kannada, kn) - Kannada script
  • Malayalam (malayalam, ml) - Malayalam script
  • Odia (odia, od) - Odia/Oriya script
  • Gurmukhi (gurmukhi, pa) - Punjabi script
  • Sinhala (sinhala, si) - Sinhala script
  • Sharada (sharada, shrd) - Historical script of Kashmir, crucial for Vedic manuscripts
  • Tibetan (tibetan, tibt, bo) - Important for Buddhist Vedic transmission
  • Thai (thai, th) - Adapted from Grantha for Buddhist Vedic texts

Romanization Schemes (Schema-Generated)

  • ISO-15919 (iso15919, iso) - International standard
  • ITRANS (itrans) - Indian languages TRANSliteration
  • SLP1 (slp1) - Sanskrit Library Phonetic Basic
  • Harvard-Kyoto (harvard_kyoto, hk) - ASCII-based scheme
  • Velthuis (velthuis) - TeX-compatible scheme
  • WX (wx) - ASCII-based notation

Hand-Coded Scripts

  • IAST (iast) - International Alphabet of Sanskrit Transliteration
  • Kolkata (kolkata) - Regional romanization scheme
  • Grantha (grantha) - Classical Sanskrit script

Usage Examples

Rust Library

use shlesha::Shlesha;

let transliterator = Shlesha::new();

// High-performance cross-script conversion
let result = transliterator.transliterate("धर्म", "devanagari", "gujarati")?;
println!("{}", result); // "ધર્મ"

// Roman to Indic conversion  
let result = transliterator.transliterate("dharmakṣetra", "slp1", "tamil")?;
println!("{}", result); // "தர்மக்ஷேத்ர"

// Schema-generated converters in action
let result = transliterator.transliterate("dharmakSetra", "slp1", "iast")?;
println!("{}", result); // "dharmakśetra"

Python Bindings (PyO3)

import shlesha

# Create transliterator with all schema-generated converters
transliterator = shlesha.Shlesha()

# Fast schema-based conversion
result = transliterator.transliterate("ধর্ম", "bengali", "telugu")
print(result)  # "ధర్మ"

# Performance with metadata tracking
result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")
print(f"Output: {result.output}")  # "dharmakr"
print(f"Unknown tokens: {len(result.metadata.unknown_tokens)}")

# Runtime extensibility
scripts = shlesha.get_supported_scripts()
print(f"Supports {len(scripts)} scripts: {scripts}")

Command Line Interface

# Schema-generated high-performance conversion
shlesha transliterate --from slp1 --to devanagari "dharmakSetra"
# Output: धर्मक्षेत्र

# Cross-script conversion via dual hubs  
shlesha transliterate --from itrans --to tamil "dharma"
# Output: தர்ம

# List all schema-generated + hand-coded scripts
shlesha scripts
# Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...

WebAssembly (Browser/Node.js)

import init, { WasmShlesha } from './pkg/shlesha.js';

async function demo() {
    await init();
    const transliterator = new WasmShlesha();
    
    // Schema-generated converter performance in browser
    const result = transliterator.transliterate("કર્મ", "gujarati", "devanagari");
    console.log(result); // "कर्म"
    
    // Runtime script discovery
    const scripts = transliterator.listSupportedScripts();
    console.log(`${scripts.length} scripts available`);
}

Runtime Schema Loading

Shlesha supports runtime schema loading across all APIs to add custom scripts without recompilation.

Rust API

use shlesha::Shlesha;

let mut transliterator = Shlesha::new();

// Load custom schema from YAML content
let custom_schema = r#"
metadata:
  name: "my_custom_script"
  script_type: "roman"
  has_implicit_a: false
  description: "My custom transliteration scheme"

target: "iso15919"

mappings:
  vowels:
    "a": "a"
    "e": "ē"
  consonants:
    "k": "k"
    "t": "ṭ"
"#;

// Load the schema at runtime
transliterator.load_schema_from_string(custom_schema, "my_custom_script")?;

// Use immediately without recompilation
let result = transliterator.transliterate("kate", "my_custom_script", "devanagari")?;
println!("{}", result); // "काटे"

// Schema management
let info = transliterator.get_schema_info("my_custom_script").unwrap();
println!("Loaded {} with {} mappings", info.name, info.mapping_count);

Python API

import shlesha

transliterator = shlesha.Shlesha()

# Load schema from YAML string
yaml_content = """
metadata:
  name: "custom_script"
  script_type: "roman"
  has_implicit_a: false
  description: "Custom transliteration"

target: "iso15919"

mappings:
  vowels:
    "a": "a"
  consonants:
    "k": "k"
"""

# Runtime loading
transliterator.load_schema_from_string(yaml_content, "custom_script")

# Immediate usage
result = transliterator.transliterate("ka", "custom_script", "devanagari")
print(result)  # "क"

# Schema info
info = transliterator.get_schema_info("custom_script")
print(f"Script: {info['name']}, Mappings: {info['mapping_count']}")

# Schema management
transliterator.remove_schema("custom_script")
transliterator.clear_runtime_schemas()

JavaScript/WASM API

import init, { WasmShlesha } from './pkg/shlesha.js';

async function loadCustomScript() {
    await init();
    const transliterator = new WasmShlesha();
    
    // Define custom schema
    const yamlContent = `
metadata:
  name: "custom_script"
  script_type: "roman"
  has_implicit_a: false
  description: "Custom script"

target: "iso15919"

mappings:
  vowels:
    "a": "a"
  consonants:
    "k": "k"
`;
    
    // Load at runtime
    transliterator.loadSchemaFromString(yamlContent, "custom_script");
    
    // Use immediately
    const result = transliterator.transliterate("ka", "custom_script", "devanagari");
    console.log(result); // "क"
    
    // Get schema information
    const info = transliterator.getSchemaInfo("custom_script");
    console.log(`Name: ${info.name}, Mappings: ${info.mapping_count}`);
}

Key Runtime Features

  • Load from YAML strings - No file system required
  • Load from file paths - For development workflows
  • Schema validation - Automatic error checking
  • Hot reloading - Add/remove schemas dynamically
  • Schema introspection - Get metadata about loaded schemas
  • Memory management - Clear schemas when done
  • Cross-platform - Identical API across Rust, Python, WASM

Use Cases

Development & Testing

// Test schema variations quickly
transliterator.load_schema_from_string(variant_a, "test_a")?;
transliterator.load_schema_from_string(variant_b, "test_b")?;
// Compare results immediately

Dynamic Applications

# User uploads custom transliteration scheme
user_schema = request.files['schema'].read().decode('utf-8')
transliterator.load_schema_from_string(user_schema, user_id)
# Use immediately in application

Configuration-Driven Systems

// Load schemas from configuration
config.schemas.forEach(schema => {
    transliterator.loadSchemaFromString(schema.content, schema.name);
});

Performance & Benchmarks

Performance Analysis

Shlesha uses a hub-and-spoke architecture with schema-generated converters, trading some performance for extensibility compared to direct conversion approaches.

Performance Characteristics

  • Competitive with other transliteration libraries
  • Schema-generated converters match hand-coded performance
  • Optimized for both short and long text processing

Architecture Trade-offs

AspectShleshaVidyut
PerformanceHub-basedDirect conversion
ExtensibilityRuntime schemasCompile-time only
Script Support15+ (easily expandable)Limited
ArchitectureHub-and-spokeDirect conversion
BindingsRust/Python/WASM/CLIRust only

Schema-Driven Development

Adding New Scripts

Adding support for new scripts with schemas:

# schemas/new_script.yaml
metadata:
  name: "NewScript"
  description: "Description of the script"
  unicode_block: "NewScript"
  has_implicit_vowels: true

mappings:
  vowels:
    - source: "𑀅"  # New script character
      target: "अ"   # Devanagari equivalent
    # ... add more mappings
# Rebuild to include new script
cargo build
# New script automatically available!

Template-Based Generation

Converters are generated using Handlebars templates for consistency:

{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
    {{snake_case metadata.name}}_to_deva_map: HashMap<char, char>,
    deva_to_{{snake_case metadata.name}}_map: HashMap<char, char>,
}

impl {{pascal_case metadata.name}}Converter {
    pub fn new() -> Self {
        // Generated O(1) lookup tables
        let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
        {{#each character_mappings}}
        {{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
        {{/each}}
        // ... template continues
    }
}

Quality Assurance

Test Suite

  • 127 tests covering all functionality
  • Schema-generated converter tests for all 14 generated converters
  • Performance regression tests ensuring schema = hand-coded speed
  • Cross-script conversion matrix testing all 210+ pairs
  • Unknown character handling

Build System Validation

# Test schema-generated converters maintain performance
cargo test --lib

# Verify all conversions work
cargo test comprehensive_bidirectional_tests

# Performance benchmarks
cargo run --example shlesha_vs_vidyut_benchmark

Build Configuration & Features

Schema Processing Features

# Default: Schema-generated + hand-coded converters
cargo build

# Development mode with schema recompilation
cargo build --features "schema-dev"

# Minimal build (hand-coded only)
cargo build --no-default-features --features "hand-coded-only"

# All features (Python + WASM + CLI)
cargo build --features "python,wasm,cli"

Runtime Extensibility

let mut transliterator = Shlesha::new();

// Load additional schemas at runtime (future feature)
transliterator.load_schema("path/to/new_script.yaml")?;

// Schema registry access
let scripts = transliterator.list_supported_scripts();
println!("Dynamically loaded: {:?}", scripts);

Advanced Features

Metadata Collection

// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")?;

if let Some(metadata) = result.metadata {
    println!("Conversion: {} → {}", metadata.source_script, metadata.target_script);
    for unknown in metadata.unknown_tokens {
        println!("Unknown '{}' at position {}", unknown.token, unknown.position);
    }
}

Script Characteristics

// Schema-aware script properties
let registry = ScriptConverterRegistry::default();

// Indic scripts have implicit vowels
assert!(registry.script_has_implicit_vowels("bengali").unwrap());
assert!(registry.script_has_implicit_vowels("devanagari").unwrap());

// Roman schemes don't
assert!(!registry.script_has_implicit_vowels("itrans").unwrap());
assert!(!registry.script_has_implicit_vowels("slp1").unwrap());

Hub Processing Control

// Fine-grained control over conversion paths
let hub = Hub::new();

// Direct hub operations
let iso_text = hub.deva_to_iso("धर्म")?;  // Devanagari → ISO
let deva_text = hub.iso_to_deva("dharma")?;  // ISO → Devanagari

// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata("धर्म")?;

Documentation

Quick Reference

# Generate documentation
cargo doc --open

# Run all examples
cargo run --example shlesha_vs_vidyut_benchmark
cargo run --example roman_allocation_analysis  

# Performance testing
cargo bench

Releases

Shlesha uses an automated release system for publishing to package registries:

Quick Release

# Guided release process
./scripts/release.sh

Package Installation

# Python (PyPI)
pip install shlesha

# WASM (npm)  
npm install shlesha-wasm

# Rust (crates.io)
cargo add shlesha

See DEPLOYMENT.md for complete release documentation.

Contributing

Contributions are welcome. The schema-driven architecture simplifies adding new scripts:

  • Add Schema: Create TOML/YAML mapping file
  • Test: Run test suite to verify
  • Benchmark: Ensure performance maintained
  • Submit: Open PR with schema and tests

See CONTRIBUTING.md for detailed guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Unicode Consortium for Indic script standards
  • ISO-15919 for romanization standardization
  • Sanskrit Library for SLP1 encoding schemes
  • Vidyut Project for performance benchmarking standards
  • Rust Community for excellent tools (PyO3, wasm-pack, handlebars)

FAQs

Package last updated on 07 Dec 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts