
Research
/Security News
Trivy Under Attack Again: Widespread GitHub Actions Tag Compromise Exposes CI/CD Secrets
Attackers compromised Trivy GitHub Actions by force-updating tags to deliver malware, exposing CI/CD secrets across affected pipelines.
The package manager for AI skills.
Scan. Verify. Install. Across 49 agent platforms.
npx vskill install remotion-best-practices
36.82% of AI skills have security flaws (Snyk ToxicSkills).
When you install a skill today, you're trusting blindly:
vskill fixes all of this.
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Source │────>│ Scan │────>│ Verify │────>│ Install │
│ │ │ │ │ │ │ │
│ GitHub │ │ 38 rules │ │ LLM │ │ Pin SHA │
│ Registry │ │ Blocklist│ │ analysis │ │ Lock ver │
│ Local │ │ Patterns │ │ Intent │ │ Symlink │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Every install goes through the security pipeline. No exceptions. No --skip-scan.
# Install from any GitHub repo
npx vskill install remotion-dev/skills/remotion-best-practices
# Browse a repo and pick interactively
npx vskill install remotion-dev/skills
# Install a plugin (Claude Code)
npx vskill install --repo anton-abyzov/vskill --plugin mobile
Or install globally: npm install -g vskill
Getting E401 errors? If your project has a
.npmrcpointing to a private registry (e.g. AWS CodeArtifact, GitHub Packages), npx may fail withnpm error code E401. Fix it by overriding the registry:npx --registry https://registry.npmjs.org vskill install <skill>Or install globally once to avoid this entirely:
npm i -g vskill --registry https://registry.npmjs.org
| Tier | How | Trust Level |
|---|---|---|
| Scanned | 38 deterministic pattern checks against known attack vectors | Baseline |
| Verified | Pattern scan + LLM-based intent analysis for subtle threats | Recommended |
| Certified | Full manual security review by the vskill team | Highest |
Every install is at minimum Scanned. The vskill.lock file tracks the SHA-256 hash, scan date, and tier for every installed skill. Running vskill update diffs against the locked version and re-scans before applying.
vskill auto-detects your installed agents and installs skills to all of them at once.
CLI & Terminal — Claude Code, Cursor, GitHub Copilot, Windsurf, Codex, Gemini CLI, Amp, Cline, Roo Code, Goose, Aider, Kilo, Devin, OpenHands, Qwen Code, Trae, and more
IDE Extensions — VS Code, JetBrains, Zed, Neovim, Emacs, Sublime Text, Xcode
Cloud & Hosted — Replit, Bolt, v0, GPT Pilot, Plandex, Sweep
vskill ships 7 expert skills organized into 5 domain plugins. Each plugin has its own namespace — install only what you need.
npx vskill install --repo anton-abyzov/vskill --plugin mobile
npx vskill install --repo anton-abyzov/vskill --plugin marketing
Then invoke as /plugin:skill in your agent:
/mobile:appstore /marketing:social-media-posting
/google-workspace:gws /skills:scout
| Plugin | Description | Skills |
|---|---|---|
| mobile | React Native, Expo, Flutter, SwiftUI, Jetpack Compose, app store | appstore |
| marketing | Social media content creation, posting, and engagement across 11 platforms, plus Slack messaging | social-media-posting slack-messaging |
| google-workspace | Google Workspace CLI (gws) for Drive, Sheets, Docs, Calendar, Chat, Admin | gws |
| skills | Skill discovery and recommendations | scout |
| productivity | Expert network survey completion and paid expertise sharing | survey-passing |
vskill install <source> Install skill after security scan
vskill find <query> Search the verified-skill.com registry
vskill scan <path> Run security scan without installing
vskill list Show installed skills with status
vskill remove <skill> Remove an installed skill
vskill update [skill] Update with diff scanning (--all for everything)
vskill audit [path] Full project security audit with LLM analysis
vskill info <skill> Show detailed skill information
vskill submit <source> Submit a skill for verification
vskill blocklist Manage blocked malicious skills
vskill init Initialize vskill in a project
| Flag | Description |
|---|---|
--yes -y | Accept defaults, no prompts |
--global -g | Install to global scope |
--copy | Copy files instead of symlinking |
--skill <name> | Pick a specific skill from a multi-skill repo |
--plugin <name> | Pick a plugin by name (checks marketplace, then plugins/ folder) |
--plugin-dir <path> | Local directory as plugin source |
--repo <owner/repo> | Remote GitHub repo as plugin source |
--agent <id> | Target a specific agent (e.g., cursor) |
--force | Install even if blocklisted |
--cwd <path> | Override project root |
--all | Install all skills from a repo |
Scan entire projects for security issues — not just skills:
vskill audit # scan current directory
vskill audit --ci --report sarif # CI-friendly SARIF output
vskill audit --severity high,critical # filter by severity
Skills are single SKILL.md files that work with any of the 49 supported agents. They follow the Agent Skills Standard — drop a SKILL.md into the agent's commands directory.
Plugins are multi-component containers for Claude Code. They bundle skills, hooks, commands, and agents under a single namespace with enable/disable support and marketplace integration.
Even Anthropic ships the same skill in two places:
anthropics/skills/frontend-design (standalone)anthropics/claude-code/.../frontend-design (plugin)Install both? Duplicates. They diverge? Inconsistencies. vskill gives you one install path with version pinning and dedup, regardless of source.
Every skill can include evaluations — standardized test cases that verify the skill actually improves LLM output. Skills with evals get quality scores on verified-skill.com and regression tracking across versions.
The eval system tests the skill's plan, not its execution. It doesn't post to social media, generate images, or call external APIs. Instead, it measures whether your SKILL.md successfully teaches an LLM the correct behavior.
The algorithm:
For example, a social media posting skill's eval might check: does the LLM mention checking for duplicate posts? Does it use the correct aspect ratios per platform? Does it wait for user approval? If the skill description is clear about these behaviors, the LLM will demonstrate them in its response. If it's vague, assertions fail — telling you exactly what to improve.
Think of it like testing a recipe book: you don't cook the food, you check whether someone reading your recipe would know the right steps, quantities, and order.
| Mode | What it does | When to use |
|---|---|---|
| Benchmark | Runs prompts WITH skill, grades assertions | Measure pass rate after edits |
| A/B Comparison | Runs each prompt WITH and WITHOUT skill, blind-judges both | Prove the skill adds value |
| Activation Test | Tests whether the skill correctly triggers on relevant prompts | Reduce false positives/negatives |
The A/B comparison randomly shuffles outputs as "Response A" and "Response B" before scoring, so the judge can't tell which used the skill. Each response is scored on content (1-5) and structure (1-5). The delta between skill and baseline averages produces a verdict: EFFECTIVE, MARGINAL, INEFFECTIVE, or DEGRADING.
Skill evals are unit tests — they verify the skill's teaching quality in isolation, without calling external tools or APIs. This is a deliberate design choice:
| Unit Tests (current) | Integration Tests | |
|---|---|---|
| What | Does the SKILL.md teach the right workflow? | Does the end-to-end tool execution work? |
| Speed | ~30s per case | ~3min per case |
| Infrastructure | None — any LLM provider | Real MCP servers, auth tokens, test data |
| CI/CD | Runs anywhere | Needs secrets, test workspaces |
| Flakiness | Low (deterministic text) | High (external APIs, rate limits) |
| Coverage | Workflow, tool selection, formatting, parameters | API compatibility, auth, error recovery |
Why unit tests are sufficient for most skills: The eval doesn't test whether Slack's API works — it tests whether your SKILL.md correctly teaches an LLM to use slack_search_channels before slack_read_channel, to use thread_ts for replies, and to format messages with *bold* instead of **bold**. If the teaching is correct, the execution follows.
Skills that reference MCP tools automatically get simulation mode during evals. The eval system detects MCP tool references in your SKILL.md and instructs the LLM to demonstrate the complete workflow with simulated tool responses. This means your assertions can test tool selection, parameter correctness, and workflow order — even without a real MCP connection.
Standard skill eval: MCP skill eval (automatic):
┌──────────┐ ┌──────────┐
│ SKILL.md │ → system prompt │ SKILL.md │ → system prompt
└──────────┘ └──────────┘ + simulation instructions
↓ ↓
┌──────────┐ ┌──────────┐
│ LLM │ → text response │ LLM │ → simulated workflow
└──────────┘ └──────────┘ (tool calls + mock responses)
↓ ↓
┌──────────┐ ┌──────────┐
│ Judge │ → pass/fail │ Judge │ → pass/fail
└──────────┘ └──────────┘
No configuration needed — if your SKILL.md mentions slack_*, github_*, linear_*, or gws_* tools, simulation mode activates automatically.
Skills can also include trigger accuracy tests in evals/activation-prompts.json:
{
"prompts": [
{ "prompt": "check what's new in #engineering", "expected": "should_activate" },
{ "prompt": "send an email to the team", "expected": "should_not_activate" }
]
}
This tests whether your skill's description field in SKILL.md causes the skill to trigger on the right prompts (precision) and not miss relevant ones (recall). Results show TP/TN/FP/FN classification with precision, recall, and reliability metrics.
The eval system supports Claude (CLI or API), Anthropic API, and Ollama. Testing across models reveals:
# Test with Opus (high-end)
VSKILL_EVAL_MODEL=opus npx vskill eval run my-skill
# Test with Ollama (open-source)
VSKILL_EVAL_PROVIDER=ollama VSKILL_EVAL_MODEL=llama3.1:8b npx vskill eval run my-skill
your-skill/
├── SKILL.md # The skill definition
└── evals/
├── evals.json # Test cases + assertions
├── activation-prompts.json # Trigger accuracy tests (optional)
└── benchmark.json # Latest benchmark results (auto-generated)
{
"skill_name": "your-skill",
"evals": [
{
"id": 1,
"name": "Descriptive test name",
"prompt": "Realistic user prompt that tests the skill",
"expected_output": "Reference output (not graded, for human context)",
"files": [],
"assertions": [
{ "id": "a1", "text": "Output includes specific technique X", "type": "boolean" },
{ "id": "a2", "text": "Code example compiles without errors", "type": "boolean" }
]
}
]
}
npx vskill eval serve # Open visual eval UI (benchmark, compare, history)
npx vskill eval init <skill-dir> # Scaffold evals.json from SKILL.md via LLM
npx vskill eval run <skill-dir> # Run evals and grade assertions (CLI output)
npx vskill eval coverage # Show eval status for all skills
npx vskill eval generate-all # Batch-generate for all skills
vskill eval serve launches a local web UI where you can:
Previous benchmark results are displayed on the skill detail page without re-running. Per-case pass/fail status, time, and token usage are shown inline.
The eval system supports multiple LLM providers. Switch between them in the eval UI dropdown or via environment variables.
| Provider | Models | Requirements |
|---|---|---|
| Claude CLI | Sonnet, Opus, Haiku | Claude Max/Pro subscription + claude CLI installed |
| Anthropic API | Claude Sonnet 4.6, Opus 4.6, Haiku 4.5 | ANTHROPIC_API_KEY env var |
| Ollama | Any locally installed model | Ollama running at localhost:11434 |
# Use Anthropic API with Opus
VSKILL_EVAL_PROVIDER=anthropic VSKILL_EVAL_MODEL=claude-opus-4-6 npx vskill eval run my-skill
# Use Ollama with a local model
VSKILL_EVAL_PROVIDER=ollama VSKILL_EVAL_MODEL=qwen2.5:32b npx vskill eval run my-skill
# Custom Ollama server
OLLAMA_BASE_URL=http://gpu-server:11434 VSKILL_EVAL_PROVIDER=ollama npx vskill eval run my-skill
Which model for what?
Skills with evals/evals.json get:
/skills/[name]/evals/admin/evals (admin-only)Browse and search verified skills at verified-skill.com.
vskill find "react native" # search from CLI
vskill info remotion-best-practices # skill details
Submit your skill for verification:
vskill submit your-org/your-repo/your-skill
FAQs
Secure multi-platform AI skill installer — scan before you install
The npm package vskill receives a total of 2,005 weekly downloads. As such, vskill popularity was classified as popular.
We found that vskill demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
Attackers compromised Trivy GitHub Actions by force-updating tags to deliver malware, exposing CI/CD secrets across affected pipelines.

Security News
ENISA’s new package manager advisory outlines the dependency security practices companies will need to demonstrate as the EU’s Cyber Resilience Act begins enforcing software supply chain requirements.

Research
/Security News
We identified over 20 additional malicious extensions, along with over 20 related sleeper extensions, some of which have already been weaponized.