
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
A React hook for speech-to-text using multiple STT providers.
useSTT, useFFmpeg) run on the client side and must be marked with 'use client' directive in Next.js 13+WebAssembly Loading:
// FFmpeg core files are loaded from CDN
const baseURL = 'https://unpkg.com/@ffmpeg/core@0.12.10/dist/umd';
await ffmpeg.load({
coreURL: `${baseURL}/ffmpeg-core.js`, // JavaScript interface
wasmURL: `${baseURL}/ffmpeg-core.wasm` // WebAssembly binary
});
Processing Flow:
// 1. Record audio in browser
// 2. Process with FFmpeg (client-side)
const processedBlob = await convertAudioToWebM(ffmpeg, audioBlob);
// 3. Send to server for transcription
const formData = new FormData();
formData.append('file', processedBlob);
The library automatically handles audio format compatibility across different devices:
Why Format Conversion?
Format Strategy:
// FFmpeg is always loaded, regardless of device type
await ffmpeg.load({
coreURL: `${baseURL}/ffmpeg-core.js`,
wasmURL: `${baseURL}/ffmpeg-core.wasm`
});
// Then check if conversion is needed
const isWebM = audioBlob.type.includes('webm');
if (!isWebM) {
// Convert non-WebM formats (e.g., iOS M4A) to WebM
processedBlob = await convertAudioToWebM(ffmpeg, audioBlob);
} else {
// Android WebM recordings can be used as-is
// But FFmpeg is still available for other audio processing
// like normalization, noise reduction, etc.
processedBlob = audioBlob;
}
Why WebM?
Format Flow:
iOS Recording (M4A) ──┐
│
▼
FFmpeg Convert ──► WebM/Opus ──┐
▲ │
│ ▼
Android (WebM) ──────┘ ──► Optional Processing ──► Whisper API
(normalize, denoise, etc.)
Note: FFmpeg is always loaded because it's needed for audio processing features (normalization, noise reduction, etc.) even when format conversion isn't required. The ~31MB WebAssembly load happens on all devices, but this enables consistent audio processing capabilities across platforms.
npm install use-stt
// app/actions/transcribe.ts
'use server';
export async function transcribe(formData: FormData) {
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY is not configured');
}
try {
const file = formData.get('file') as File;
if (!file) {
throw new Error('No file provided');
}
// Prepare form data for Whisper API
const whisperFormData = new FormData();
whisperFormData.append('file', file);
whisperFormData.append('model', 'whisper-1');
whisperFormData.append('response_format', 'json');
const response = await fetch('https://api.openai.com/v1/audio/transcriptions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`
},
body: whisperFormData
});
const result = await response.json();
return {
transcript: result.text,
confidence: result.confidence
};
} catch (error) {
console.error('Transcription error:', error);
throw error;
}
}
// components/SpeechToText.tsx
'use client';
import React, { useState, useCallback } from 'react';
import { useSTT } from 'use-stt';
import { FFmpeg } from '@ffmpeg/ffmpeg';
import { transcribe } from '@/app/actions/transcribe';
import type { FFmpegConfig } from 'use-stt';
import { convertAudioToWebM } from 'use-stt/utils';
// Initialize FFmpeg (do this once)
let ffmpeg: FFmpeg | null = null;
// Default audio processing config
const defaultConfig: FFmpegConfig = {
outputSampleRate: 16000,
outputChannels: 1,
bitrate: '24k',
normalize: true,
normalizationLevel: -16,
denoise: false,
vad: false,
vadLevel: 1,
compressionLevel: 10
};
function SpeechToTextDemo() {
// Audio processing configuration state
const [config, setConfig] = useState<FFmpegConfig>(defaultConfig);
// Wrapper function to handle audio processing and transcription
const transcribeAudio = useCallback(async (audioBlob: Blob) => {
try {
// Initialize FFmpeg if needed
if (!ffmpeg) {
console.log('Initializing FFmpeg...');
ffmpeg = new FFmpeg();
await ffmpeg.load();
}
// Process audio with FFmpeg
console.log('Converting audio with config:', config);
const processedBlob = await convertAudioToWebM(ffmpeg, audioBlob, config);
// Send to server for transcription
const formData = new FormData();
formData.append('file', processedBlob, 'audio.webm');
return transcribe(formData);
} catch (error) {
console.error('Audio processing error:', error);
throw error;
}
}, [config]);
const {
isRecording,
isProcessing,
transcript,
error,
startRecording,
stopRecording,
pauseRecording,
resumeRecording
} = useSTT({
provider: 'whisper',
transcribe: transcribeAudio
});
return (
<div>
{/* Audio Processing Configuration */}
<div className="mb-4">
<h3>Audio Processing Options</h3>
<div>
<label>
<input
type="checkbox"
checked={config.normalize}
onChange={(e) => setConfig(prev => ({
...prev,
normalize: e.target.checked
}))}
/>
Normalize Volume
</label>
<label>
<input
type="checkbox"
checked={config.denoise}
onChange={(e) => setConfig(prev => ({
...prev,
denoise: e.target.checked
}))}
/>
Reduce Background Noise
</label>
<select
value={config.bitrate}
onChange={(e) => setConfig(prev => ({
...prev,
bitrate: e.target.value
}))}
>
<option value="16k">16 kbps (Low)</option>
<option value="24k">24 kbps (Default)</option>
<option value="32k">32 kbps (Better)</option>
</select>
</div>
</div>
{/* Recording Controls */}
<div>
<button
onClick={startRecording}
disabled={isRecording || isProcessing}
>
Start Recording
</button>
<button
onClick={stopRecording}
disabled={!isRecording}
>
Stop Recording
</button>
<button
onClick={pauseRecording}
disabled={!isRecording}
>
Pause
</button>
<button
onClick={resumeRecording}
disabled={!isRecording}
>
Resume
</button>
</div>
{isProcessing && <p>Processing audio...</p>}
<div>
<h3>Transcript:</h3>
<p>{transcript || 'No transcript yet'}</p>
</div>
{error && (
<div>
<h3>Error:</h3>
<p>{error.message}</p>
</div>
)}
</div>
);
}
## Audio Processing Options
The library supports various audio processing options through the `FFmpegConfig` interface:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| bitrate | string | '24k' | Audio bitrate (e.g., '16k', '24k', '32k') |
| normalize | boolean | true | Enable volume normalization |
| normalizationLevel | number | -16 | Target normalization level in dB |
| denoise | boolean | false | Apply noise reduction |
| vad | boolean | false | Enable Voice Activity Detection |
| vadLevel | number | 1 | VAD sensitivity (0-3) |
| compressionLevel | number | 10 | Opus compression level (0-10) |
See the [examples](./examples) directory for a complete demo with audio processing controls.
## API Reference
### useSTT Options
| Option | Type | Required | Description |
|--------|------|----------|-------------|
| provider | 'whisper' \| 'azure' \| 'google' | Yes | The STT provider to use |
| transcribe | (blob: Blob) => Promise<{transcript: string, confidence?: number}> | Yes | Function to handle transcription (typically a server action) |
| language | string | No | Language code (e.g., 'en', 'es') |
| model | string | No | Model name (provider-specific) |
### Return Value
| Property | Type | Description |
|----------|------|-------------|
| isRecording | boolean | Whether recording is in progress |
| isProcessing | boolean | Whether audio is being processed |
| transcript | string | The current transcript text |
| error | Error \| null | Any error that occurred |
| startRecording | () => Promise<void> | Start recording |
| stopRecording | () => Promise<void> | Stop recording and process audio |
| pauseRecording | () => void | Pause recording |
| resumeRecording | () => void | Resume recording |
### Controlling Hook Activity (Disabling/Pausing)
In some scenarios, you might want to temporarily "pause" or "disable" the `useSTT` hook, for example, when a parent form is submitting or when another UI element takes precedence. This prevents the hook from starting new recordings, processing audio, or updating its state while it's meant to be inactive.
You can achieve this by adding a `disabled` option to your `useSTT` hook.
**1. Modify `UseSTTOptions` in `useSTT`:**
Add a `disabled` option to your hook's options interface:
```typescript
// In your custom useSTT.ts or equivalent
export interface UseSTTOptions {
// ... existing options like provider, language, transcribe ...
disabled?: boolean; // New option to disable the hook
}
2. Implement disabled Logic within useSTT:
When the disabled option is true, your useSTT hook should:
isRecording, isProcessing, transcript, error).Here's a conceptual example of how to use the disabled flag within useSTT:
// In your custom useSTT.ts or equivalent
export function useSTT(options: UseSTTOptions) {
const {
provider,
language,
transcribe,
disabled = false // Default to false if not provided
} = options;
const [transcript, setTranscript] = useState('');
const [isRecording, setIsRecording] = useState(false);
const [isProcessing, setIsProcessing] = useState(false);
const [error, setError] = useState<Error | null>(null);
// ... other internal state and refs ...
// Example: Guarding an effect
useEffect(() => {
if (disabled) {
// If disabled, ensure no processing happens and clean up.
// This might involve stopping any active recognition,
// clearing timeouts, removing event listeners, etc.
setIsRecording(false);
setIsProcessing(false);
// setTranscript(''); // Optionally reset transcript
// setError(null); // Optionally clear errors
// Example: If using Web Speech API directly
// if (speechRecognitionInstanceRef.current) {
// speechRecognitionInstanceRef.current.stop();
// speechRecognitionInstanceRef.current.onresult = null;
// // ... remove other listeners ...
// }
return; // Bail out of the effect
}
// ... otherwise, proceed with normal effect logic ...
// (e.g., setting up speech recognition, handling events)
}, [disabled, /* ... other relevant dependencies ... */]);
// Example: Guarding a callback
const startRecording = useCallback(async () => {
if (disabled) {
console.log('useSTT is disabled, startRecording aborted.');
return;
}
// ... existing startRecording logic ...
}, [disabled, /* ... other dependencies for startRecording ... */]);
const stopRecording = useCallback(async () => {
if (disabled) {
console.log('useSTT is disabled, stopRecording aborted.');
// Even if disabled, you might want to allow stopping an ongoing recording
// that was started *before* it was disabled. This depends on desired behavior.
// If truly dormant, then perhaps this also becomes a no-op.
// For now, let's assume it should still try to stop if one was in progress.
}
// ... existing stopRecording logic ...
// Consider if isRecording should be set to false here if disabled is true.
}, [disabled, /* ... other dependencies for stopRecording ... */]);
// Ensure all other effects, callbacks, and internal processes
// similarly respect the 'disabled' flag.
return {
transcript,
isRecording,
isProcessing,
error,
startRecording,
stopRecording,
// ... other returned values
};
}
Important: The exact implementation details within useSTT (like how to stop underlying speech APIs or clear resources) will depend on your specific hook's internal architecture. The key is to make the hook as quiescent as possible when disabled is true.
3. Using the disabled Prop in a Parent Component:
Now, your parent component can pass the disabled prop to useSTT:
// Example: In a component like VoiceInput.tsx
// Assume 'isSubmitting' is a state variable in this parent component.
const {
transcript: sttTranscript,
isRecording,
isProcessing: isProcessingSTT,
error: sttError,
startRecording,
stopRecording
} = useSTT({
provider: 'whisper',
// ... other options ...
disabled: isSubmitting, // Pass the parent's state here
});
// UI elements can also use 'isSubmitting' to disable themselves
// <button onClick={startRecording} disabled={isSubmitting || isRecording}>
// Record
// </button>
By implementing this pattern, you gain fine-grained control over when useSTT is active, helping to prevent unexpected behavior and state updates during critical periods like form submissions.
See the examples directory for working examples.
When using this library with Next.js 13+ (App Router), ensure your components are properly marked:
// components/AudioRecorder.tsx
'use client'; // Required because this component uses browser APIs
import { useSTT } from 'use-stt';
export function AudioRecorder() {
const { transcript, startRecording } = useSTT({...});
// ...
}
// app/api/transcribe/route.ts
// Server-side API route for transcription
export async function POST(request: Request) {
const formData = await request.formData();
const file = formData.get('file');
// Process with Whisper API...
}
The library uses FFmpeg-WASM from a CDN (unpkg.com). This means:
To optimize for production:
// Example of custom FFmpeg URL configuration
const ffmpeg = new FFmpeg();
await ffmpeg.load({
coreURL: '/ffmpeg/ffmpeg-core.js', // Self-hosted
wasmURL: '/ffmpeg/ffmpeg-core.wasm' // Self-hosted
});
The library supports various audio processing options through the FFmpegConfig interface:
| Option | Type | Default | Description |
|---|---|---|---|
| bitrate | string | '24k' | Audio bitrate (e.g., '16k', '24k', '32k') |
| normalize | boolean | true | Enable volume normalization |
| normalizationLevel | number | -16 | Target normalization level in dB |
| denoise | boolean | false | Apply noise reduction |
| vad | boolean | false | Enable Voice Activity Detection |
| vadLevel | number | 1 | VAD sensitivity (0-3) |
| compressionLevel | number | 10 | Opus compression level (0-10) |
See the examples directory for a complete demo with audio processing controls.
MediaRecorder APIAudioContext API// Preload FFmpeg in a low-priority way
const preloadFFmpeg = () => {
const link = document.createElement('link');
link.rel = 'preload';
link.as = 'fetch';
link.href = 'https://unpkg.com/@ffmpeg/core@0.12.10/dist/umd/ffmpeg-core.wasm';
document.head.appendChild(link);
};
"FFmpeg not loaded" Error
// Ensure FFmpeg is loaded before use
if (!ffmpeg) {
await loadFFmpeg();
}
iOS Audio Format Issues
type: 'audio/webm' in recorder optionsMemory Issues
URL.revokeObjectURL() for audio URLsCORS Issues with FFmpeg Loading
Enable debug logging:
const { transcript, error } = useSTT({
provider: 'whisper',
transcribe: transcribeAudio,
debug: true // Enables detailed logging
});
// Clear processed audio data
const cleanup = () => {
if (audioUrl) {
URL.revokeObjectURL(audioUrl);
}
// Clear any stored blobs
processedBlob = null;
};
If self-hosting FFmpeg files, set appropriate headers:
# Nginx configuration example
location /ffmpeg/ {
add_header Cross-Origin-Resource-Policy cross-origin;
add_header Cross-Origin-Embedder-Policy require-corp;
}
Add required CSP headers:
// next.config.js
const nextConfig = {
headers: async () => [{
source: '/:path*',
headers: [
{
key: 'Cross-Origin-Embedder-Policy',
value: 'require-corp'
},
{
key: 'Cross-Origin-Resource-Policy',
value: 'cross-origin'
}
]
}]
};
MIT License - see LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request.
To publish a new version of the use-stt package to npm, follow these steps:
Ensure all changes are committed: Make sure your working directory is clean and all desired changes are included in your Git history.
Update the package version:
Decide on the appropriate version bump (patch, minor, or major) according to Semantic Versioning (SemVer). You can use the npm version command to update the version in package.json and create a version commit and tag.
npm version patchnpm version minornpm version majorPush Git commits and tags (if you used npm version which creates them):
git push && git push --tags
Publish to npm:
Run the following command:
npm publish
The prepublishOnly script in package.json will automatically run npm run build before publishing, ensuring that the latest code is compiled.
FAQs
React hook for speech-to-text using multiple STT providers
The npm package use-stt receives a total of 1 weekly downloads. As such, use-stt popularity was classified as not popular.
We found that use-stt demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.