@memberjunction/ai-groq
A comprehensive wrapper for Groq's LPU (Language Processing Unit) inference engine, providing high-performance AI model access within the MemberJunction framework.
Features
- High-Performance Integration: Connect to Groq's ultra-fast LPU inference API
- Standardized Interface: Implements MemberJunction's BaseLLM abstract class
- Streaming Support: Full support for streaming responses for real-time interactions
- Message Formatting: Handles conversion between MemberJunction and Groq message formats
- Error Handling: Robust error handling with detailed reporting
- Token Usage Tracking: Track token consumption for monitoring
- Chat Completion: Interactive chat completions with various LLMs hosted on Groq
- Model Support: Compatible with Llama, Mixtral, Gemma, and other models hosted on Groq
- Response Format Control: Support for JSON, text, and model-specific response formats
Installation
npm install @memberjunction/ai-groq
Requirements
- Node.js 16+
- A Groq API key
- MemberJunction Core libraries
Usage
Basic Setup
import { GroqLLM } from '@memberjunction/ai-groq';
const groqLLM = new GroqLLM('your-groq-api-key');
Chat Completion
import { ChatParams } from '@memberjunction/ai';
const chatParams: ChatParams = {
model: 'llama2-70b-4096',
messages: [
{ role: 'system', content: 'You are a helpful AI assistant.' },
{ role: 'user', content: 'Explain how LPUs differ from traditional GPUs for AI inference.' }
],
temperature: 0.7,
maxOutputTokens: 1000
};
try {
const response = await groqLLM.ChatCompletion(chatParams);
if (response.success) {
console.log('Response:', response.data.choices[0].message.content);
console.log('Token Usage:', response.data.usage);
console.log('Time Elapsed (ms):', response.timeElapsed);
} else {
console.error('Error:', response.errorMessage);
}
} catch (error) {
console.error('Exception:', error);
}
Streaming Responses
import { ChatParams, ChatResult } from '@memberjunction/ai';
const streamingParams: ChatParams = {
model: 'llama3-70b-8192',
messages: [
{ role: 'user', content: 'Write a short story about AI.' }
],
stream: true,
onStream: (content: string) => {
process.stdout.write(content);
},
maxOutputTokens: 2000
};
const response = await groqLLM.ChatCompletion(streamingParams);
console.log('\n\nFinal response:', response.data.choices[0].message.content);
Response Format Control
const jsonParams: ChatParams = {
model: 'mixtral-8x7b-32768',
messages: [
{ role: 'system', content: 'You are a helpful assistant that responds in JSON format.' },
{ role: 'user', content: 'List 3 benefits of using Groq in JSON format with keys: benefit, description' }
],
responseFormat: 'JSON',
maxOutputTokens: 1000
};
const jsonResponse = await groqLLM.ChatCompletion(jsonParams);
const benefits = JSON.parse(jsonResponse.data.choices[0].message.content);
Direct Access to Groq Client
const groqClient = groqLLM.GroqClient;
const client = groqLLM.client;
const customResponse = await groqClient.chat.completions.create({
model: 'mixtral-8x7b-32768',
messages: [{ role: 'user', content: 'Hello!' }],
max_tokens: 500
});
Supported Models
Groq provides access to various open models with optimized inference:
- Llama Models:
llama3-70b-8192 - Llama 3 70B with 8K context
llama3-8b-8192 - Llama 3 8B with 8K context
llama2-70b-4096 - Llama 2 70B with 4K context
- Mixtral Models:
mixtral-8x7b-32768 - Mixtral 8x7B with 32K context
- Gemma Models:
gemma-7b-it - Gemma 7B Instruct
gemma2-9b-it - Gemma 2 9B Instruct
Check the Groq documentation for the latest list of supported models and their capabilities.
API Reference
GroqLLM Class
A class that extends BaseLLM to provide Groq-specific functionality.
Constructor
new GroqLLM(apiKey: string)
Properties
GroqClient: (read-only) Returns the underlying Groq client instance
client: (read-only) Alias for GroqClient
SupportsStreaming: (read-only) Returns true - Groq supports streaming responses
Methods
ChatCompletion
ChatCompletion(params: ChatParams): Promise<ChatResult>
Perform a chat completion with support for both streaming and non-streaming responses.
SummarizeText
SummarizeText(params: SummarizeParams): Promise<SummarizeResult>
Note: Not yet implemented
ClassifyText
ClassifyText(params: ClassifyParams): Promise<ClassifyResult>
Note: Not yet implemented
Performance Considerations
Groq is known for its extremely fast inference times:
- Response generation is typically 5-10x faster than traditional GPU-based inference
- Lower latency means better interactive experiences
- Benchmark different models to find the best performance/quality balance for your use case
Error Handling
The wrapper provides detailed error information:
try {
const response = await groqLLM.ChatCompletion(params);
if (!response.success) {
console.error('Error:', response.errorMessage);
console.error('Status:', response.statusText);
console.error('Time Elapsed:', response.timeElapsed, 'ms');
}
} catch (error) {
console.error('Exception occurred:', error);
}
Integration with MemberJunction
This package seamlessly integrates with the MemberJunction AI framework:
import { RegisterClass } from '@memberjunction/global';
import { BaseLLM } from '@memberjunction/ai';
import { GroqLLM } from '@memberjunction/ai-groq';
const llm = RegisterClass.GetRegisteredClass(BaseLLM, 'GroqLLM', 'your-api-key');
Advanced Features
Effort Level Support
For models that support reasoning effort levels (experimental):
const params: ChatParams = {
model: 'llama3-70b-8192',
messages: [{ role: 'user', content: 'Solve this complex problem...' }],
effortLevel: 'high',
maxOutputTokens: 2000
};
Handling Groq-Specific Requirements
The wrapper automatically handles Groq's requirement that the last message must be from a user. If your message chain ends with an assistant message, the wrapper will automatically append a dummy user message to satisfy this requirement.
Dependencies
groq-sdk (0.21.0): Official Groq SDK
@memberjunction/ai (2.43.0): MemberJunction AI core framework
@memberjunction/global (2.43.0): MemberJunction global utilities
Development
Building
npm run build
Development Mode
npm start
Contributing
When contributing to this package:
- Follow the MemberJunction coding standards
- Ensure all TypeScript types are properly defined
- Update tests when adding new features
- Document any Groq-specific behaviors or limitations
Supported Parameters
The Groq provider supports the following LLM parameters:
Supported:
temperature - Controls randomness in the output (0.0-2.0)
maxOutputTokens - Maximum number of tokens to generate
topP - Nucleus sampling threshold (0.0-1.0)
seed - For deterministic outputs
stopSequences - Array of sequences where the API will stop generating
frequencyPenalty - Reduces repetition of token sequences (-2.0 to 2.0)
presencePenalty - Reduces repetition of specific tokens (-2.0 to 2.0)
responseFormat - Output format (Text, JSON, etc.)
Not Supported:
topK - Not available in Groq API
minP - Not available in Groq API
License
ISC