Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

@synsci/cli-linux-x64

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@synsci/cli-linux-x64

npm

Version: 1.1.57

Version published: 4 months ago

Maintainers: 1

Created: 4 months ago

Source

GRPO/RL Training Skill

Expert-level guidance for Group Relative Policy Optimization with TRL

📁 Skill Structure

grpo-rl-training/
├── SKILL.md                              # Main skill documentation (READ THIS FIRST)
├── README.md                             # This file
├── templates/
│   └── basic_grpo_training.py            # Production-ready training template
└── examples/
    └── reward_functions_library.py       # 20+ reward function examples

🚀 Quick Start

Read SKILL.md - Comprehensive guide with all concepts and patterns
Copy templates/basic_grpo_training.py - Start with working code
Browse examples/reward_functions_library.py - Pick reward functions for your task
Modify for your use case - Adapt dataset, rewards, and config

💡 What's Inside

SKILL.md (Main Documentation)

Core GRPO concepts and algorithm fundamentals
Complete implementation workflow (dataset → rewards → training → deployment)
10+ reward function examples with code
Hyperparameter tuning guide
Training insights (loss behavior, metrics, debugging)
Troubleshooting guide
Production best practices

Templates

basic_grpo_training.py: Minimal, production-ready training script
- Uses Qwen 2.5 1.5B Instruct
- 3 reward functions (format + correctness)
- LoRA for efficient training
- Fully documented and ready to run

Examples

reward_functions_library.py: 20+ battle-tested reward functions
- Correctness rewards (exact match, fuzzy match, numeric, code execution)
- Format rewards (XML, JSON, strict/soft)
- Length rewards (ideal length, min/max)
- Style rewards (reasoning quality, citations, repetition penalty)
- Combined rewards (multi-objective optimization)
- Preset collections for common tasks

📖 Usage for Agents

When this skill is loaded in your agent's context:

Always read SKILL.md first before implementing
Start simple - Use length-based reward to validate setup
Build incrementally - Add one reward function at a time
Reference examples - Copy patterns from reward_functions_library.py
Monitor training - Watch reward metrics (not loss!)

🎯 Common Use Cases

Task Type	Recommended Rewards	Template
Math reasoning	`MATH_REASONING_REWARDS` preset	basic_grpo_training.py
Code generation	`CODE_GENERATION_REWARDS` preset	Modify dataset in template
Summarization	`SUMMARIZATION_REWARDS` preset	Adjust prompts + rewards
Q&A	`QA_REWARDS` preset	Use fuzzy match + citations

⚠️ Critical Reminders

Loss goes UP during training - This is normal (it's KL divergence)
Use 3-5 reward functions - Single rewards often fail
Test rewards before training - Debug each function independently
Monitor reward_std - Should stay > 0.1 (avoid mode collapse)
Start with num_generations=4-8 - Scale up if GPU allows

🔗 External Resources

📝 Version

v1.0.0 - Initial release (January 2025)

👨‍💻 Maintained By

Synthetic Sciences For questions or improvements, see https://orchestra.com

License: MIT Last Updated: January 2025

FAQs

What is @synsci/cli-linux-x64?

Is @synsci/cli-linux-x64 well maintained?

Package last updated on 11 Feb 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@synsci/cli-linux-x64

GRPO/RL Training Skill

📁 Skill Structure

🚀 Quick Start

💡 What's Inside

SKILL.md (Main Documentation)

Templates

Examples

📖 Usage for Agents

🎯 Common Use Cases

⚠️ Critical Reminders

🔗 External Resources

📝 Version

👨‍💻 Maintained By

Related posts

Feross on TBPN: Socket's Series C and the State of Software Supply Chain Security

OSV Withdraws 157 Malware Reports After Automated False Positives Hit npm and PyPI