
Security News
/Research
Wallet-Draining npm Package Impersonates Nodemailer to Hijack Crypto Transactions
Malicious npm package impersonates Nodemailer and drains wallets by hijacking crypto transactions across multiple blockchains.
@morgan-stanley/url-detector
Advanced tools
A command-line tool that scans source code and text files for URLs, detecting all discovered URLs
A URL detection tool that scans files using Tree-sitter parsers for accurate URL discovery across 20+ programming languages. Instead of simple regex matching, this tool performs AST (Abstract Syntax Tree) parsing to precisely locate URLs in strings, comments, and other appropriate contexts.
Software Bill of Materials (SBOM) generation has become critical for security and compliance, but traditional SBOM tools miss a significant category of external dependencies: URLs embedded directly in source code.
Modern package managers and dependency scanners excel at tracking managed dependencies (npm packages, Maven artifacts, etc.), but they can't detect legacy patterns like:
<script src="https://cdn.jsdelivr.net/npm/lodash@4.17.21/lodash.min.js"></script>
<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Roboto">
const API_ENDPOINT = "https://api.thirdparty.com/v1";
fetch("https://analytics.example.com/track", { ... });
These URLs represent real external dependencies that can impact security, availability, and compliance - but they won't appear in any SBOM generated from package metadata. URL Detector fills this gap by providing comprehensive URL inventory that complements traditional dependency tracking tools.
To suppress warnings from tree-sitter transitive dependencies, all these commands can be run optionally with --loglevel=error flag.
npm install -g @morgan-stanley/url-detector
npm install @morgan-stanley/url-detector
npx @morgan-stanley/url-detector --scan "src/**/*.js" --format table
# Scan all files in current directory
url-detector
# Scan specific files/patterns
url-detector --scan "src/**/*.{js,ts}" --format table
# Exclude directories and ignore domains
url-detector --scan "**/*" --exclude "**/node_modules" --ignore-domains "*example.com"
# Export results to CSV
url-detector --scan "src/**/*" --format csv --output urls.csv
# Run in CI/CD (fail if URLs found)
url-detector --scan "**/*.js" --fail-on-error --results-only
import { URLDetector, LanguageManager } from '@morgan-stanley/url-detector';
// Basic usage
const detector = new URLDetector();
const sourceCode = `
const apiUrl = "https://api.example.com/v1/users";
// Documentation: https://docs.example.com
`;
const urls = detector.detectURLs(sourceCode, 'javascript', 'app.js');
console.log(urls);
// Advanced usage with custom options
const detector = new URLDetector({
includeComments: true,
ignoreDomains: ['*.example.com', 'localhost'],
protocol: ['https'],
unique: true,
logger: new ConsoleLogger()
});
// Custom language configurations
const customLanguageManager = new LanguageManager(undefined, [
{ name: 'mylang', module: 'tree-sitter-mylang', extensions: ['.ml'] }
]);
Option | Description | Default |
---|---|---|
-s, --scan <patterns...> | Glob patterns for files to scan | ["**/*"] |
-e, --exclude <patterns...> | Glob patterns for files to exclude | [] |
-i, --ignore-domains <domains...> | Additional domains to ignore (supports wildcards, always includes www.w3.org ) | [] |
--include-comments | Also scan commented-out lines for URLs | false |
--include-non-fqdn | Include non-fully qualified domain names like "localhost" | false |
-f, --format <format> | Output format: table , json , or csv | "table" |
-o, --output <file> | Output file path (stdout if not specified) | null |
-q, --quiet | Run in quiet mode with no console output | false |
--results-only | Show only results, suppressing progress and info messages | false |
--fail-on-error | Exit with non-zero code if any URLs are found | false |
--concurrency <number> | Maximum number of files to scan concurrently | 10 |
--scan-file <file> | File containing glob patterns to scan (one per line) | null |
--exclude-file <file> | File containing glob patterns to exclude (one per line) | null |
Language | Extensions | Tree-sitter Parser |
---|---|---|
JavaScript | .js , .mjs | tree-sitter-javascript |
TypeScript | .ts , .tsx | tree-sitter-typescript |
Java | .java | tree-sitter-java |
C | .c , .h | tree-sitter-c |
C++ | .cpp , .cc , .cxx , .hpp , .hh , .hxx | tree-sitter-cpp |
C# | .cs | tree-sitter-c-sharp |
Python | .py , .pyw | tree-sitter-python |
PHP | .php , .phtml | tree-sitter-php |
Ruby | .rb , .rake , .gemspec | tree-sitter-ruby |
Go | .go | tree-sitter-go |
Swift | .swift | tree-sitter-swift |
Kotlin | .kt , .kts | @tree-sitter-grammars/tree-sitter-kotlin |
Scala | .scala , .sc | tree-sitter-scala |
HTML | .html , .htm | tree-sitter-html |
CSS | .css | tree-sitter-css |
JSON | .json , .jsonc | tree-sitter-json |
XML | .xml , .xsd , .xsl , .xslt | @tree-sitter-grammars/tree-sitter-xml |
TOML | .toml | @tree-sitter-grammars/tree-sitter-toml |
Bash | .sh , .bash , .zsh , .fish | tree-sitter-bash |
YAML | .yaml , .yml | @tree-sitter-grammars/tree-sitter-yaml |
Note: For unsupported file types, the tool automatically falls back to regex-based detection.
# Scan all JavaScript and TypeScript files
url-detector --scan "**/*.{js,ts}" --format table
# Scan source code only, exclude build artifacts
url-detector --scan "src/**/*" --exclude "build/**" "dist/**" "**/node_modules"
The tool automatically ignores common non-meaningful domains found in code (like www.w3.org
in XML namespaces). You can add additional domains to ignore:
# Ignore all example.com subdomains
url-detector --ignore-domains "*.example.com"
# Ignore multiple domain patterns
url-detector --ignore-domains "*.example.com" "localhost" "*.local"
# Table output (default)
url-detector --scan "src/**/*" --format table
# JSON output for programmatic processing
url-detector --scan "src/**/*" --format json --output results.json
# CSV output for spreadsheet analysis
url-detector --scan "src/**/*" --format csv --output urls.csv
# Fail build if any URLs are found
url-detector --scan "**/*" --exclude "**/node_modules" --fail-on-error
# Quiet mode for CI logs
url-detector --scan "src/**/*" --quiet --format json --output scan-results.json
# Results-only mode (no progress messages)
url-detector --scan "**/*" --results-only --format table
class URLDetector {
constructor(options?: DetectorOptionsConfig, logger?: Logger);
detectURLs(sourceCode: string, language: string, filePath?: string): URLMatch[];
process(): Promise<FileResult[]>;
}
interface DetectorOptionsConfig {
// File scanning options
scan?: string[]; // Glob patterns for files to scan (default: ["**/*"])
exclude?: string[]; // Glob patterns to exclude (default: [])
// Filtering options
ignoreDomains?: string[]; // Additional domains to ignore (default: [], always includes `www.w3.org`)
includeComments?: boolean; // Include URLs from comments (default: false)
includeNonFqdn?: boolean; // Include non-FQDN domains like "localhost" (default: false)
// Output options
format?: 'table' | 'json' | 'csv'; // Output format (default: "table")
output?: string | null; // Output file path (default: null)
// Control options
resultsOnly?: boolean; // Results only mode (default: false)
failOnError?: boolean; // Exit with error if URLs found (default: false)
// Performance options
concurrency?: number; // Max concurrent files (default: 10)
// Advanced options (programmatic only)
fallbackRegex?: boolean; // Use regex fallback when tree-sitter fails (default: true)
context?: number; // Lines of context to include (default: 0)
maxDepth?: number; // Max directory depth (default: Infinity)
quiet?: boolean; // Suppress informational output (default: false)
}
interface URLMatch {
url: string; // The detected URL
start: number; // Start character position
end: number; // End character position
line: number; // Line number (1-based)
column: number; // Column number (1-based)
sourceType: 'string' | 'comment' | 'unknown'; // Context type
context?: string[]; // Surrounding lines (if requested)
}
import { LanguageManager, LanguageConfig } from '@morgan-stanley/url-detector';
// Add custom language support
const customLanguages: LanguageConfig[] = [
{
name: 'mylang',
module: 'tree-sitter-mylang',
extensions: ['.ml', '.mylang'],
filenames: ['Mylangfile']
}
];
const languageManager = new LanguageManager(undefined, customLanguages);
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run with coverage
npm test -- --coverage
src/
├── index.ts # Main library entry point
├── cli.ts # Command-line interface
├── urlDetector.ts # Core URL detection logic
├── languageManager.ts # Language/parser management
├── urlFilter.ts # URL filtering and validation
├── outputFormatter.ts # Output formatting (table/json/csv)
├── options.ts # Configuration options
└── logger.ts # Logging interfaces
tests/
├── urlDetector.test.ts
├── languageManager.test.ts
└── integration.test.ts
When cloning this project for local development, you'll need to use the --legacy-peer-deps
flag due to complex peer dependencies across Tree-sitter packages:
# Clone the repository
git clone https://github.com/morgan-stanley/url-detector.git
cd url-detector
# Install dependencies with legacy peer deps support
npm install --legacy-peer-deps
# Build TypeScript to JavaScript
npm run build
# Build and watch for changes
npm run dev
# Clean build artifacts
npm run clean
# Check code style
npm run lint
# Fix auto-fixable issues
npm run lint:fix
Apache License 2.0 - see LICENSE file for details.
FAQs
A command-line tool that scans source code and text files for URLs, detecting all discovered URLs
The npm package @morgan-stanley/url-detector receives a total of 10 weekly downloads. As such, @morgan-stanley/url-detector popularity was classified as not popular.
We found that @morgan-stanley/url-detector demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 5 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
/Research
Malicious npm package impersonates Nodemailer and drains wallets by hijacking crypto transactions across multiple blockchains.
Security News
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.
Security News
/Research
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.