You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP β†’
Socket
Book a DemoInstallSign in
Socket

octocode-data-masker

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

octocode-data-masker

A TypeScript library for masking sensitive data in strings, including PII, tokens, API keys, and more

1.0.0
latest
Source
npmnpm
Version published
Weekly downloads
48
-52.48%
Maintainers
1
Weekly downloads
Β 
Created
Source

sensitive-data-masker

A high-performance TypeScript library for detecting and masking sensitive data in strings. Protect PII, API keys, tokens, credentials, and other confidential information with intelligent masking algorithms and configurable accuracy levels.

npm version License: MIT TypeScript Node.js

Features

  • πŸ›‘οΈ 200+ Detection Patterns: Comprehensive coverage for modern security needs
  • ⚑ High Performance: Optimized regex engine with pattern caching
  • 🎯 Accuracy Control: Configure detection sensitivity (high/medium/low)
  • πŸ”§ Flexible Masking: Smart partial masking that preserves readability
  • πŸ“¦ Zero Dependencies: Lightweight and secure
  • 🌍 International Support: Handles US, UK, Canadian, and international formats
  • πŸ” Pattern Filtering: Include or exclude specific pattern types
  • πŸ“Š Detailed Results: Get match counts, positions, and masked values

Installation

npm install sensitive-data-masker
yarn add sensitive-data-masker

Quick Start

import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';

// Basic usage - intelligent partial masking
const text = 'My email is john@example.com and my SSN is 123-45-6789';
const result = mask(text);
console.log(result.output);
// "My email is **hn@example.c** and my SSN is **3-45-67**"

console.log(result.found);
// { email: 1, ssn: 1 }

// Check if content contains sensitive data
const isSensitive = hasSensitiveContent(text);
console.log(isSensitive); // true

// Get detailed pattern matches with positions
const matches = getPatternMatches(text);
console.log(matches);
// [
//   {
//     pattern: 'email',
//     matches: [{ match: 'john@example.com', startIndex: 12, endIndex: 27 }]
//   },
//   {
//     pattern: 'ssn',
//     matches: [{ match: '123-45-6789', startIndex: 44, endIndex: 54 }]
//   }
// ]

API Reference

mask(input: string, options?: MaskingOptions): MaskResult

Masks sensitive content in a string using intelligent partial masking.

Options

interface MaskingOptions {
  maskChar?: string;                    // Character used for masking (default: '*')
  preserveLength?: boolean;             // Preserve original length (default: false)
  excludePatterns?: string[];           // Patterns to exclude from masking
  onlyPatterns?: string[];              // Only mask these patterns
  matchAccuracy?: 'high' | 'medium' | 'low'; // Detection sensitivity
}

Returns

interface MaskResult {
  output: string;                       // Masked string
  found: { [name: string]: number };    // Count of each pattern found
  matches: string[];                    // Original matched values
  masked: string[];                     // Masked versions of matches
}

hasSensitiveContent(input: string, options?): boolean

Quickly check if a string contains sensitive data without performing masking.

import { hasSensitiveContent } from 'sensitive-data-masker';

hasSensitiveContent('user@example.com'); // true
hasSensitiveContent('hello world');      // false

// With options
hasSensitiveContent('sk-1234567890abcdef', { 
  matchAccuracy: 'high',
  excludePatterns: ['genericId']
}); // true

getPatternMatches(input: string, options?): PatternMatch[]

Get detailed information about all pattern matches including their positions.

import { getPatternMatches } from 'sensitive-data-masker';

const matches = getPatternMatches('Contact: admin@test.com and key: sk-123abc');
console.log(matches);
// [
//   {
//     pattern: 'email',
//     matches: [{ match: 'admin@test.com', startIndex: 9, endIndex: 22 }]
//   },
//   {
//     pattern: 'openaiApiKey',
//     matches: [{ match: 'sk-123abc', startIndex: 33, endIndex: 41 }]
//   }
// ]

Advanced Usage

Custom Masking Options

import { mask } from 'sensitive-data-masker';

// Custom masking character
const result = mask('API key: sk-1234567890abcdef', { maskChar: '#' });
console.log(result.output);
// "API key: ##-1234567890ab##"

// Preserve original length
const result2 = mask('secret123', { preserveLength: true });
console.log(result2.output);
// "*********" (full length masked)

// Use high accuracy mode (fewer false positives)
const result3 = mask('sk-1234567890abcdef', { matchAccuracy: 'high' });
console.log(result3.output);
// "##-1234567890ab##"

Pattern Filtering

// Only mask specific patterns
const result = mask('Email: user@test.com, API: sk-123', { 
  onlyPatterns: ['email', 'openaiApiKey'] 
});

// Exclude certain patterns
const result2 = mask('Email: user@test.com, UUID: 123e4567-e89b-12d3-a456-426614174000', { 
  excludePatterns: ['uuid', 'genericId']
});

// Combine with accuracy control
const result3 = mask(sensitiveText, {
  matchAccuracy: 'high',
  excludePatterns: ['uuid']
});

Supported Pattern Categories

The library detects sensitive data across 25 categories with 200+ patterns:

πŸ†” Personal Identifiable Information (PII)

  • Email addresses (multiple formats)
  • Phone numbers (US, International, E.164)
  • Social Security Numbers (US with various formats)
  • Driver's license numbers, Medical record numbers
  • Tax IDs (TIN/EIN), Canadian SIN, UK National Insurance Numbers

☁️ Cloud Provider Credentials

  • AWS: Access keys, secret keys, session tokens, account IDs
  • AWS Resources: EC2, S3, RDS, Lambda ARNs, VPC IDs
  • Azure: Subscription IDs, client secrets, resource IDs
  • Google Cloud: API keys, service account keys, project IDs

πŸ’³ Financial & Payment Services

  • Credit card numbers (Visa, MasterCard, Amex, Discover)
  • Stripe: Secret keys, publishable keys, webhook secrets
  • PayPal: Access tokens, client IDs
  • Square: Access tokens, application IDs
  • Bank account numbers (US routing numbers, IBAN)

πŸ€– AI Provider Credentials

  • OpenAI: API keys, organization IDs
  • Anthropic/Claude: API keys
  • Google AI: Gemini API keys, Vertex AI tokens
  • Hugging Face: Access tokens, API keys
  • Other AI: Groq, Perplexity, Replicate, Together AI

πŸ” Authentication & Security

  • JWT tokens, Bearer tokens
  • OAuth access tokens, refresh tokens
  • API keys in headers (X-API-Key, Authorization)
  • Session IDs, CSRF tokens
  • Generic secret patterns in environment variables

πŸ”§ Developer Tools & Services

  • GitHub: Personal access tokens, app tokens
  • Slack: Bot tokens, webhook URLs, app secrets
  • Discord: Bot tokens, webhook URLs
  • Analytics: Google Analytics, Mixpanel, Amplitude
  • Monitoring: Datadog, New Relic, Sentry keys

πŸ—„οΈ Database & Storage

  • Database connection strings (PostgreSQL, MySQL, MongoDB)
  • File Storage: S3 bucket URLs, Azure Blob Storage
  • CDN: CloudFront URLs, Azure CDN
  • Redis connection strings, Elasticsearch URLs

πŸ”‘ Cryptographic Materials

  • RSA private keys, SSH private keys
  • EC private keys, DSA private keys
  • X.509 certificates, PGP private key blocks
  • JSON Web Keys (JWK), PKCS#8 keys

🌐 Network & Location

  • IPv4/IPv6 addresses, MAC addresses
  • Geographic coordinates (latitude/longitude)
  • Private network ranges, subnet masks
  • URL patterns with embedded secrets

πŸ“± Communication Services

  • Messaging: Twilio, SendGrid, Mailgun keys
  • Social Media: Twitter, Facebook, Instagram tokens
  • Email Services: Mailchimp, Postmark, SparkPost
  • SMS/Voice: Nexmo, Plivo, MessageBird

πŸ› οΈ Infrastructure & DevOps

  • Container Registries: Docker Hub, ECR, GCR tokens
  • CI/CD: Jenkins, GitLab CI, CircleCI tokens
  • Deployment: Vercel, Netlify, Heroku tokens
  • Monitoring: PagerDuty, Datadog, New Relic

🏒 Enterprise & Business

  • CRM: Salesforce, HubSpot tokens
  • E-commerce: Shopify, WooCommerce keys
  • Business Tools: Slack, Microsoft Teams tokens
  • Analytics: Google Analytics, Adobe Analytics

🎯 Generic Patterns

  • UUID v4, Generic IDs
  • Base64 encoded secrets
  • Hex-encoded keys (32, 64, 128 bit)
  • Custom secret patterns in configuration files

πŸ” URL & Reference Patterns

  • URLs with embedded tokens
  • Database connection URIs
  • API endpoints with keys
  • Webhook URLs with secrets

πŸ’Ύ Version Control & Code

  • Git repository URLs with tokens
  • Package manager tokens (npm, PyPI)
  • Container registry credentials
  • Code hosting platform tokens

Pattern Accuracy Levels

Control detection sensitivity to balance between security and false positives:

High Accuracy

  • Most specific patterns with minimal false positives
  • Examples: AWS access keys with AKIA prefix, specific API key formats
  • Best for production environments

Medium Accuracy (Default)

  • Balanced detection with reasonable false positive rates
  • Examples: Generic API keys, common secret patterns
  • Good for most use cases

Low Accuracy

  • Broadest detection, may have higher false positive rates
  • Examples: Generic IDs, loose pattern matching
  • Useful for comprehensive scanning
// Use high accuracy for production
const prodResult = mask(text, { matchAccuracy: 'high' });

// Use medium accuracy for development  
const devResult = mask(text, { matchAccuracy: 'medium' });

// Use low accuracy for comprehensive scanning
const scanResult = mask(text, { matchAccuracy: 'low' });

TypeScript Support

Full TypeScript support with complete type definitions:

import { mask, hasSensitiveContent, getPatternMatches } from 'sensitive-data-masker';
import type { MaskResult, MaskingOptions } from 'sensitive-data-masker';

// Type-safe masking options
const options: MaskingOptions = {
  maskChar: '#',
  matchAccuracy: 'high',
  excludePatterns: ['uuid']
};

const result: MaskResult = mask(text, options);

Real-World Examples

Log File Sanitization

import { mask } from 'sensitive-data-masker';

const logEntry = `
[2024-01-15 10:30:45] INFO User john@company.com logged in
[2024-01-15 10:31:12] DEBUG API call with key sk-1234567890abcdef
[2024-01-15 10:31:15] ERROR Payment failed for card 4111-1111-1111-1111
[2024-01-15 10:31:20] WARN SSN in request: 123-45-6789
`;

const sanitized = mask(logEntry);
console.log(sanitized.output);
// [2024-01-15 10:30:45] INFO User **hn@company.c** logged in
// [2024-01-15 10:31:12] DEBUG API call with key **-1234567890ab**
// [2024-01-15 10:31:15] ERROR Payment failed for card **11-1111-1111-11**
// [2024-01-15 10:31:20] WARN SSN in request: **3-45-67**

console.log(sanitized.found);
// { email: 1, openaiApiKey: 1, creditCard: 1, ssn: 1 }

Configuration File Security

const config = `
DATABASE_URL=postgresql://user:password123@localhost:5432/db
OPENAI_API_KEY=sk-1234567890abcdef1234567890abcdef
STRIPE_SECRET_KEY=sk_live_abcdef123456
ADMIN_EMAIL=admin@company.com
JWT_SECRET=super-secret-key-123
`;

const result = mask(config);
console.log(result.output);
// DATABASE_URL=postgresql://user:**ssword1** @localhost:5432/db
// OPENAI_API_KEY=**-1234567890abcdef1234567890ab**
// STRIPE_SECRET_KEY=**_live_abcdef12**
// ADMIN_EMAIL=**min@company.c**
// JWT_SECRET=**per-secret-key-1**

Multi-Environment Setup

import { mask } from 'sensitive-data-masker';

// Production: Mask everything with high accuracy
const prodResult = mask(sensitiveData, { matchAccuracy: 'high' });

// Development: Allow test emails but mask real API keys
const devResult = mask(sensitiveData, { 
  matchAccuracy: 'medium',
  excludePatterns: ['email'] 
});

// Testing: Only mask financial data
const testResult = mask(sensitiveData, { 
  onlyPatterns: ['creditCard', 'bankAccount', 'ssn'],
  matchAccuracy: 'high'
});

Data Pipeline Processing

import { hasSensitiveContent, mask } from 'sensitive-data-masker';

// Check if data needs processing
function processBatch(records: string[]) {
  const results = records.map(record => {
    if (hasSensitiveContent(record)) {
      const masked = mask(record, { matchAccuracy: 'high' });
      return {
        data: masked.output,
        hadSensitiveData: true,
        patternsFound: Object.keys(masked.found)
      };
    }
    return { data: record, hadSensitiveData: false };
  });
  
  return results;
}

Performance Considerations

  • Optimized Regex Engine: Patterns are compiled and cached on first use
  • Single-Pass Processing: Efficient string traversal with minimal overhead
  • Memory Efficient: No unnecessary string copies or allocations
  • Pattern Filtering: Use onlyPatterns when you know which types to look for
  • Accuracy Optimization: Higher accuracy modes are faster due to more specific patterns
// Optimize for specific use cases
const emailsOnly = mask(text, { onlyPatterns: ['email'] }); // Faster
const highAccuracy = mask(text, { matchAccuracy: 'high' }); // Faster, fewer false positives
const comprehensive = mask(text, { matchAccuracy: 'low' }); // Slower, more thorough

Security Best Practices

  • Always mask before logging: Ensure sensitive data is masked before writing to logs
  • Use appropriate accuracy: Higher accuracy for production, lower for development/testing
  • Store results securely: The matches array contains original sensitive values
  • Regular updates: Keep the library updated for new pattern definitions
  • Test your patterns: Verify masking works correctly with your specific data formats
  • Environment-specific config: Use different settings for dev/staging/production

Development

Prerequisites

  • Node.js >= 18.12.0
  • Yarn or npm

Setup

git clone https://github.com/bgauryy/sensitive-data-mask.git
cd sensitive-data-mask
yarn install

Commands

yarn build          # Build the library
yarn dev           # Build in watch mode
yarn lint          # Run ESLint
yarn test          # Run tests
yarn typecheck     # Run TypeScript compiler checks

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Adding New Patterns

  • Choose the appropriate category file in src/regexes/
  • Add your pattern following the existing structure:
{
  name: 'myPattern',
  regex: /your-regex-here/gi,
  description: 'Description of what this detects',
  matchAccuracy: 'medium' // optional: 'high', 'medium', or 'low'
}
  • Run tests to ensure no regressions
  • Submit a PR with a clear description

License

MIT Β© guybary

Security

If you discover a security vulnerability, please email guybary@wix.com instead of using the issue tracker.

Made with ❀️ for developers who care about data security

Keywords

sensitive-data

FAQs

Package last updated on 11 Jul 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts