Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@jackwener/opencli

Package Overview
Dependencies
Maintainers
1
Versions
106
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@jackwener/opencli - npm Package Compare versions

Comparing version
0.1.0
to
0.1.1
+594
CLI-CREATOR.md
# CLI-CREATOR — 适配器开发完全指南
> 本文档教你(或 AI Agent)如何为 OpenCLI 添加一个新网站的命令。
> 从零到发布,覆盖 API 发现、方案选择、适配器编写、测试验证全流程。
## 核心流程
```
┌─────────────┐ ┌─────────────┐ ┌──────────────┐ ┌────────┐
│ 1. 发现 API │ ──▶ │ 2. 选择策略 │ ──▶ │ 3. 写适配器 │ ──▶ │ 4. 测试 │
└─────────────┘ └─────────────┘ └──────────────┘ └────────┘
explore cascade YAML / TS run + verify
```
---
## Step 1: 发现 API
### 1a. 自动化发现(推荐)
OpenCLI 内置 Deep Explore,自动分析网站网络请求:
```bash
opencli explore https://www.example.com --site mysite
```
输出到 `.opencli/explore/mysite/`:
| 文件 | 内容 |
|------|------|
| `manifest.json` | 站点元数据、框架检测(Vue2/3、React、Next.js、Pinia、Vuex) |
| `endpoints.json` | 已发现的 API 端点,按评分排序,含 URL pattern、方法、响应类型 |
| `capabilities.json` | 推理出的功能(`hot`、`search`、`feed`…),含置信度和推荐参数 |
| `auth.json` | 认证方式检测(Cookie/Header/无认证),策略候选列表 |
### 1b. 手动抓包验证
Explore 的自动分析可能不完美,用 verbose 模式手动确认:
```bash
# 在浏览器中打开目标页面,观察网络请求
opencli explore https://www.example.com --site mysite -v
# 或直接用 evaluate 测试 API
opencli bilibili hot -v # 查看已有命令的 pipeline 每步数据流
```
关注抓包结果中的关键信息:
- **URL pattern**: `/api/v2/hot?limit=20` → 这就是你要调用的端点
- **Method**: `GET` / `POST`
- **Request Headers**: Cookie? Bearer? 自定义签名头(X-s、X-t)?
- **Response Body**: JSON 结构,特别是数据在哪个路径(`data.items`、`data.list`)
### 1c. 框架检测
Explore 自动检测前端框架。如果需要手动确认:
```bash
# 在已打开目标网站的情况下
opencli evaluate "(()=>{
const vue3 = !!document.querySelector('#app')?.__vue_app__;
const vue2 = !!document.querySelector('#app')?.__vue__;
const react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__;
const pinia = vue3 && !!document.querySelector('#app').__vue_app__.config.globalProperties.\$pinia;
return JSON.stringify({vue3, vue2, react, pinia});
})()"
```
Vue + Pinia 的站点(如小红书)可以直接通过 Store Action 绕过签名。
---
## Step 2: 选择认证策略
OpenCLI 提供 5 级认证策略。使用 `cascade` 命令自动探测:
```bash
opencli cascade https://api.example.com/hot
```
### 策略决策树
```
直接 fetch(url) 能拿到数据?
→ ✅ Tier 1: public(公开 API,不需要浏览器)
→ ❌ fetch(url, {credentials:'include'}) 带 Cookie 能拿到?
→ ✅ Tier 2: cookie(最常见,evaluate 步骤内 fetch)
→ ❌ → 加上 Bearer / CSRF header 后能拿到?
→ ✅ Tier 3: header(如 Twitter ct0 + Bearer)
→ ❌ → 网站有 Pinia/Vuex Store?
→ ✅ Tier 4: intercept(Store Action + XHR 拦截)
→ ❌ Tier 5: ui(UI 自动化,最后手段)
```
### 各策略对比
| Tier | 策略 | 速度 | 复杂度 | 适用场景 | 实例 |
|------|------|------|--------|---------|------|
| 1 | `public` | ⚡ ~1s | 最简 | 公开 API,无需登录 | Hacker News, V2EX |
| 2 | `cookie` | 🔄 ~7s | 简单 | Cookie 认证即可 | Bilibili, Zhihu, Reddit |
| 3 | `header` | 🔄 ~7s | 中等 | 需要 CSRF token 或 Bearer | Twitter GraphQL |
| 4 | `intercept` | 🔄 ~10s | 较高 | 请求有复杂签名 | 小红书 (Pinia + XHR) |
| 5 | `ui` | 🐌 ~15s+ | 最高 | 无 API,纯 DOM 解析 | 遗留网站 |
---
## Step 3: 编写适配器
### YAML vs TS?先看决策树
```
你的 pipeline 里有 evaluate 步骤(内嵌 JS 代码)?
→ ✅ 用 TypeScript (src/clis/<site>/<name>.ts),需在 index.ts 注册
→ ❌ 纯声明式(navigate + tap + map + limit)?
→ ✅ 用 YAML (src/clis/<site>/<name>.yaml),放入即自动注册
```
| 场景 | 选择 | 示例 |
|------|------|------|
| 纯 fetch/select/map/limit | YAML | `v2ex/hot.yaml`, `hackernews/top.yaml` |
| navigate + evaluate(fetch) + map | YAML(评估复杂度) | `zhihu/hot.yaml` |
| navigate + tap + map | YAML ✅ | `xiaohongshu/feed.yaml`, `xiaohongshu/notifications.yaml` |
| 有复杂 JS 逻辑(Pinia state 读取、条件分支) | TS | `xiaohongshu/me.ts`, `bilibili/me.ts` |
| XHR 拦截 + 签名 | TS | `xiaohongshu/search.ts` |
| GraphQL / 分页 / Wbi 签名 | TS | `bilibili/search.ts`, `twitter/search.ts` |
> **经验法则**:如果你发现 YAML 里嵌了超过 10 行 JS,改用 TS 更可维护。
### 方式 A: YAML Pipeline(声明式,推荐)
文件路径: `src/clis/<site>/<name>.yaml`,放入即自动注册。
#### Tier 1 — 公开 API 模板
```yaml
# src/clis/v2ex/hot.yaml
site: v2ex
name: hot
description: V2EX 热门话题
domain: www.v2ex.com
strategy: public
browser: false
args:
limit:
type: int
default: 20
pipeline:
- fetch:
url: https://www.v2ex.com/api/topics/hot.json
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
replies: ${{ item.replies }}
- limit: ${{ args.limit }}
columns: [rank, title, replies]
```
#### Tier 2 — Cookie 认证模板(最常用)
```yaml
# src/clis/zhihu/hot.yaml
site: zhihu
name: hot
description: 知乎热榜
domain: www.zhihu.com
pipeline:
- navigate: https://www.zhihu.com # 先加载页面建立 session
- evaluate: | # 在浏览器内发请求,自动带 Cookie
(async () => {
const res = await fetch('/api/v3/feed/topstory/hot-lists/total?limit=50', {
credentials: 'include'
});
const d = await res.json();
return (d?.data || []).map(item => {
const t = item.target || {};
return {
title: t.title,
heat: item.detail_text || '',
answers: t.answer_count,
};
});
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
heat: ${{ item.heat }}
answers: ${{ item.answers }}
- limit: ${{ args.limit }}
columns: [rank, title, heat, answers]
```
> **关键**: `evaluate` 步骤内的 `fetch` 运行在浏览器页面内,自动携带 `credentials: 'include'`,无需手动处理 Cookie。
#### 进阶 — 带搜索参数
```yaml
# src/clis/zhihu/search.yaml
site: zhihu
name: search
description: 知乎搜索
args:
keyword:
type: str
required: true
description: Search keyword
limit:
type: int
default: 10
pipeline:
- navigate: https://www.zhihu.com
- evaluate: |
(async () => {
const q = encodeURIComponent('${{ args.keyword }}');
const res = await fetch('/api/v4/search_v3?q=' + q + '&t=general&limit=${{ args.limit }}', {
credentials: 'include'
});
const d = await res.json();
return (d?.data || [])
.filter(item => item.type === 'search_result')
.map(item => ({
title: (item.object?.title || '').replace(/<[^>]+>/g, ''),
type: item.object?.type || '',
author: item.object?.author?.name || '',
votes: item.object?.voteup_count || 0,
}));
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
type: ${{ item.type }}
author: ${{ item.author }}
votes: ${{ item.votes }}
- limit: ${{ args.limit }}
columns: [rank, title, type, author, votes]
```
#### Tier 4 — Store Action Bridge(`tap` 步骤,intercept 策略推荐)
适用于 Vue + Pinia/Vuex 的网站(如小红书),无须手动写 XHR 拦截代码:
```yaml
# src/clis/xiaohongshu/notifications.yaml
site: xiaohongshu
name: notifications
description: "小红书通知"
domain: www.xiaohongshu.com
strategy: intercept
browser: true
args:
type:
type: str
default: mentions
description: "Notification type: mentions, likes, or connections"
limit:
type: int
default: 20
columns: [rank, user, action, content, note, time]
pipeline:
- navigate: https://www.xiaohongshu.com/notification
- wait: 3
- tap:
store: notification # Pinia store name
action: getNotification # Store action to call
args: # Action arguments
- ${{ args.type | default('mentions') }}
capture: /you/ # URL pattern to capture response
select: data.message_list # Extract sub-path from response
timeout: 8
- map:
rank: ${{ index + 1 }}
user: ${{ item.user_info.nickname }}
action: ${{ item.title }}
content: ${{ item.comment_info.content }}
- limit: ${{ args.limit | default(20) }}
```
> **`tap` 步骤自动完成**:注入 fetch+XHR 双拦截 → 查找 Pinia/Vuex store → 调用 action → 捕获匹配 URL 的响应 → 清理拦截。
> 如果 store 或 action 找不到,会返回 `hint` 列出所有可用的 store actions,方便调试。
| tap 参数 | 必填 | 说明 |
|---------|------|------|
| `store` | ✅ | Pinia store 名称(如 `feed`, `search`, `notification`) |
| `action` | ✅ | Store action 方法名 |
| `capture` | ✅ | URL 子串匹配(匹配网络请求 URL) |
| `args` | ❌ | 传给 action 的参数数组 |
| `select` | ❌ | 从 captured JSON 中提取的路径(如 `data.items`) |
| `timeout` | ❌ | 等待网络响应的超时秒数(默认 5s) |
| `framework` | ❌ | `pinia` 或 `vuex`(默认自动检测) |
### 方式 B: TypeScript 适配器(编程式)
适用于需要嵌入 JS 代码读取 Pinia state、XHR 拦截、GraphQL、分页、复杂数据转换等场景。
文件路径: `src/clis/<site>/<name>.ts`,还需要在 `src/clis/index.ts` 中 import 注册。
#### Tier 3 — Header 认证(Twitter)
```typescript
// src/clis/twitter/search.ts
import { cli, Strategy } from '../../registry.js';
cli({
site: 'twitter',
name: 'search',
description: 'Search tweets',
strategy: Strategy.HEADER,
args: [{ name: 'keyword', required: true }],
columns: ['rank', 'author', 'text', 'likes'],
func: async (page, kwargs) => {
await page.goto('https://x.com');
const data = await page.evaluate(`
(async () => {
// 从 Cookie 提取 CSRF token
const ct0 = document.cookie.split(';')
.map(c => c.trim())
.find(c => c.startsWith('ct0='))?.split('=')[1];
if (!ct0) return { error: 'Not logged in' };
const bearer = 'AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D...';
const headers = {
'Authorization': 'Bearer ' + decodeURIComponent(bearer),
'X-Csrf-Token': ct0,
'X-Twitter-Auth-Type': 'OAuth2Session',
};
const variables = JSON.stringify({ rawQuery: '${kwargs.keyword}', count: 20 });
const url = '/i/api/graphql/xxx/SearchTimeline?variables=' + encodeURIComponent(variables);
const res = await fetch(url, { headers, credentials: 'include' });
return await res.json();
})()
`);
// ... 解析 data
},
});
```
#### Tier 4 — Store Action + XHR 拦截(小红书)
```typescript
// src/clis/xiaohongshu/search.ts
import { cli, Strategy } from '../../registry.js';
cli({
site: 'xiaohongshu',
name: 'search',
description: '搜索小红书笔记',
strategy: Strategy.COOKIE, // 实际是 intercept 模式
args: [{ name: 'keyword', required: true }],
columns: ['rank', 'title', 'author', 'likes', 'type'],
func: async (page, kwargs) => {
await page.goto('https://www.xiaohongshu.com');
await page.wait(2);
const data = await page.evaluate(`
(async () => {
const app = document.querySelector('#app')?.__vue_app__;
const pinia = app?.config?.globalProperties?.$pinia;
if (!pinia?._s) return { error: 'Page not ready' };
const searchStore = pinia._s.get('search');
if (!searchStore) return { error: 'Search store not found' };
// XHR 拦截:捕获 store action 发出的请求
let captured = null;
const origOpen = XMLHttpRequest.prototype.open;
const origSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.open = function(m, u) {
this.__url = u;
return origOpen.apply(this, arguments);
};
XMLHttpRequest.prototype.send = function(b) {
if (this.__url?.includes('search/notes')) {
const x = this;
const orig = x.onreadystatechange;
x.onreadystatechange = function() {
if (x.readyState === 4 && !captured) {
try { captured = JSON.parse(x.responseText); } catch {}
}
if (orig) orig.apply(this, arguments);
};
}
return origSend.apply(this, arguments);
};
try {
// 触发 Store Action,让网站自己签名发请求
searchStore.mutateSearchValue('${kwargs.keyword}');
await searchStore.loadMore();
await new Promise(r => setTimeout(r, 800));
} finally {
// 恢复原始 XHR
XMLHttpRequest.prototype.open = origOpen;
XMLHttpRequest.prototype.send = origSend;
}
if (!captured?.success) return { error: captured?.msg || 'Search failed' };
return (captured.data?.items || []).map(i => ({
title: i.note_card?.display_title || '',
author: i.note_card?.user?.nickname || '',
likes: i.note_card?.interact_info?.liked_count || '0',
type: i.note_card?.type || '',
}));
})()
`);
if (!Array.isArray(data)) return [];
return data.slice(0, kwargs.limit || 20).map((item, i) => ({
rank: i + 1, ...item,
}));
},
});
```
> **XHR 拦截核心思路**:不自己构造签名,而是劫持网站自己的 `XMLHttpRequest`,让网站的 Store Action 发出正确签名的请求,我们只是"窃听"响应。用完后必须恢复原始方法。
---
## Step 4: 测试
> **⚠️ 构建通过 ≠ 功能正常**。`npm run build` 只验证 TypeScript / YAML 语法,不验证运行时行为。
> 每个新命令 **必须实际运行** 并确认输出正确后才算完成。
### 必做清单
```bash
# 1. 构建(确认语法无误)
npm run build
# 2. 确认命令已注册
opencli list | grep mysite
# 3. 实际运行命令(最关键!)
opencli mysite hot --limit 3 -v # verbose 查看每步数据流
opencli mysite hot --limit 3 -f json # JSON 输出确认字段完整
```
### tap 步骤调试(intercept 策略专用)
> **⚠️ 不要猜 store name / action name**。先用 evaluate 探索,再写 YAML。
#### Step 1: 列出所有 Pinia store
在浏览器中打开目标网站后:
```bash
opencli evaluate "(() => {
const app = document.querySelector('#app')?.__vue_app__;
const pinia = app?.config?.globalProperties?.\$pinia;
return [...pinia._s.keys()];
})()"
# 输出: ["user", "feed", "search", "notification", ...]
```
#### Step 2: 查看 store 的 action 名称
故意写一个错误 action 名,tap 会返回所有可用 actions:
```
⚠ tap: Action not found: wrongName on store notification
💡 Available: getNotification, replyComment, getNotificationCount, reset
```
#### Step 3: 用 network requests 确认 capture 模式
```bash
# 在浏览器打开目标页面,查看网络请求
# 找到目标 API 的 URL 特征(如 "/you/mentions"、"homefeed")
```
#### 完整流程
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────┐
│ 1. navigate │ ──▶ │ 2. 探索 store │ ──▶ │ 3. 写 YAML │ ──▶ │ 4. 测试 │
│ 到目标页面 │ │ name/action │ │ tap 步骤 │ │ 运行验证 │
└──────────────┘ └──────────────┘ └──────────────┘ └────────┘
```
### Verbose 模式
```bash
# 查看 pipeline 每步的输入输出
opencli bilibili hot --limit 1 -v
```
输出示例:
```
[1/4] navigate → https://www.bilibili.com
→ (no data)
[2/4] evaluate → (async () => { const res = await fetch(…
→ [{title: "…", author: "…", play: 230835}]
[3/4] map (rank, title, author, play, danmaku)
→ [{rank: 1, title: "…", author: "…"}]
[4/4] limit → 1
→ [{rank: 1, title: "…"}]
```
### 输出格式验证
```bash
# 确认表格渲染正确
opencli mysite hot -f table
# 确认 JSON 可被 jq 解析
opencli mysite hot -f json | jq '.[0]'
# 确认 CSV 可被导入
opencli mysite hot -f csv > data.csv
```
---
## Step 5: 注册 & 发布
### YAML 适配器
放入 `src/clis/<site>/<name>.yaml` 即自动注册,无需额外操作。
### TS 适配器
在 `src/clis/index.ts` 添加 import:
```typescript
import './mysite/search.js';
```
### 验证注册
```bash
opencli list # 确认新命令出现
opencli validate mysite # 校验定义完整性
```
### 提交
```bash
git add src/clis/mysite/
git commit -m "feat(mysite): add hot and search adapters"
git push
```
---
## 常见陷阱
| 陷阱 | 表现 | 解决方案 |
|------|------|---------|
| 缺少 `navigate` | evaluate 报 `Target page context` 错误 | 在 evaluate 前加 `navigate:` 步骤 |
| 嵌套字段访问 | `${{ item.node?.title }}` 不工作 | 在 evaluate 中 flatten 数据,不在模板中用 optional chaining |
| 缺少 `strategy: public` | 公开 API 也启动浏览器,7s → 1s | 公开 API 加上 `strategy: public` + `browser: false` |
| evaluate 返回字符串 | map 步骤收到 `""` 而非数组 | pipeline 有 auto-parse,但建议在 evaluate 内 `.map()` 整形 |
| 搜索参数被 URL 编码 | `${{ args.keyword }}` 被浏览器二次编码 | 在 evaluate 内用 `encodeURIComponent()` 手动编码 |
| Cookie 过期 | 返回 401 / 空数据 | 在浏览器里重新登录目标站点 |
| Extension tab 残留 | Chrome 多出 `chrome-extension://` tab | 已自动清理;若残留,手动关闭即可 |
| TS evaluate 格式 | `() => {}` 报 `result is not a function` | TS 中 `page.evaluate()` 必须用 IIFE:`(async () => { ... })()` |
| 页面异步加载 | evaluate 拿到空数据(store state 还没更新) | 在 evaluate 内用 polling 等待数据出现,或增加 `wait` 时间 |
| YAML 内嵌大段 JS | 调试困难,字符串转义问题 | 超过 10 行 JS 的命令改用 TS adapter |
---
## 用 AI Agent 自动生成适配器
最快的方式是让 AI Agent 完成全流程:
```bash
# 一键:探索 → 分析 → 合成 → 注册
opencli generate https://www.example.com --goal "hot"
# 或分步执行:
opencli explore https://www.example.com --site mysite # 发现 API
opencli synthesize mysite # 生成候选 YAML
opencli verify mysite/hot --smoke # 冒烟测试
```
生成的候选 YAML 保存在 `.opencli/explore/mysite/candidates/`,可直接复制到 `src/clis/mysite/` 并微调。
/**
* Strategy Cascade: automatic strategy downgrade chain.
*
* Probes an API endpoint starting from the simplest strategy (PUBLIC)
* and automatically downgrades through the strategy tiers until one works:
*
* PUBLIC → COOKIE → HEADER → INTERCEPT → UI
*
* This eliminates the need for manual strategy selection — the system
* automatically finds the minimum-privilege strategy that works.
*/
import { Strategy } from './registry.js';
interface ProbeResult {
strategy: Strategy;
success: boolean;
statusCode?: number;
hasData?: boolean;
error?: string;
responsePreview?: string;
}
interface CascadeResult {
bestStrategy: Strategy;
probes: ProbeResult[];
confidence: number;
}
/**
* Probe an endpoint with a specific strategy.
* Returns whether the probe succeeded and basic response info.
*/
export declare function probeEndpoint(page: any, url: string, strategy: Strategy, opts?: {
timeout?: number;
}): Promise<ProbeResult>;
/**
* Run the cascade: try each strategy in order until one works.
* Returns the simplest working strategy.
*/
export declare function cascadeProbe(page: any, url: string, opts?: {
maxStrategy?: Strategy;
timeout?: number;
}): Promise<CascadeResult>;
/**
* Render cascade results for display.
*/
export declare function renderCascadeResult(result: CascadeResult): string;
export {};
/**
* Strategy Cascade: automatic strategy downgrade chain.
*
* Probes an API endpoint starting from the simplest strategy (PUBLIC)
* and automatically downgrades through the strategy tiers until one works:
*
* PUBLIC → COOKIE → HEADER → INTERCEPT → UI
*
* This eliminates the need for manual strategy selection — the system
* automatically finds the minimum-privilege strategy that works.
*/
import { Strategy } from './registry.js';
/** Strategy cascade order (simplest → most complex) */
const CASCADE_ORDER = [
Strategy.PUBLIC,
Strategy.COOKIE,
Strategy.HEADER,
Strategy.INTERCEPT,
Strategy.UI,
];
/**
* Probe an endpoint with a specific strategy.
* Returns whether the probe succeeded and basic response info.
*/
export async function probeEndpoint(page, url, strategy, opts = {}) {
const result = { strategy, success: false };
try {
switch (strategy) {
case Strategy.PUBLIC: {
// Try direct fetch without browser (no credentials)
const js = `
async () => {
try {
const resp = await fetch(${JSON.stringify(url)});
const status = resp.status;
if (!resp.ok) return { status, ok: false };
const text = await resp.text();
let hasData = false;
try {
const json = JSON.parse(text);
hasData = !!json && (Array.isArray(json) ? json.length > 0 :
typeof json === 'object' && Object.keys(json).length > 0);
} catch {}
return { status, ok: true, hasData, preview: text.slice(0, 200) };
} catch (e) { return { ok: false, error: e.message }; }
}
`;
const resp = await page.evaluate(js);
result.statusCode = resp?.status;
result.success = resp?.ok && resp?.hasData;
result.hasData = resp?.hasData;
result.responsePreview = resp?.preview;
break;
}
case Strategy.COOKIE: {
// Fetch with credentials: 'include' (uses browser cookies)
const js = `
async () => {
try {
const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' });
const status = resp.status;
if (!resp.ok) return { status, ok: false };
const text = await resp.text();
let hasData = false;
try {
const json = JSON.parse(text);
hasData = !!json && (Array.isArray(json) ? json.length > 0 :
typeof json === 'object' && Object.keys(json).length > 0);
// Check for API-level error codes (common in Chinese sites)
if (json.code !== undefined && json.code !== 0) hasData = false;
} catch {}
return { status, ok: true, hasData, preview: text.slice(0, 200) };
} catch (e) { return { ok: false, error: e.message }; }
}
`;
const resp = await page.evaluate(js);
result.statusCode = resp?.status;
result.success = resp?.ok && resp?.hasData;
result.hasData = resp?.hasData;
result.responsePreview = resp?.preview;
break;
}
case Strategy.HEADER: {
// Fetch with credentials + try to extract common auth headers
const js = `
async () => {
try {
// Try to extract CSRF tokens from cookies
const cookies = document.cookie.split(';').map(c => c.trim());
const csrf = cookies.find(c => c.startsWith('ct0=') || c.startsWith('csrf_token=') || c.startsWith('_csrf='))?.split('=').slice(1).join('=');
const headers = {};
if (csrf) {
headers['X-Csrf-Token'] = csrf;
headers['X-XSRF-Token'] = csrf;
}
const resp = await fetch(${JSON.stringify(url)}, {
credentials: 'include',
headers
});
const status = resp.status;
if (!resp.ok) return { status, ok: false };
const text = await resp.text();
let hasData = false;
try {
const json = JSON.parse(text);
hasData = !!json && (Array.isArray(json) ? json.length > 0 :
typeof json === 'object' && Object.keys(json).length > 0);
if (json.code !== undefined && json.code !== 0) hasData = false;
} catch {}
return { status, ok: true, hasData, preview: text.slice(0, 200) };
} catch (e) { return { ok: false, error: e.message }; }
}
`;
const resp = await page.evaluate(js);
result.statusCode = resp?.status;
result.success = resp?.ok && resp?.hasData;
result.hasData = resp?.hasData;
result.responsePreview = resp?.preview;
break;
}
case Strategy.INTERCEPT:
case Strategy.UI:
// These require specific implementation per-site
// Mark as needing manual implementation
result.success = false;
result.error = `Strategy ${strategy} requires site-specific implementation`;
break;
}
}
catch (err) {
result.success = false;
result.error = err.message ?? String(err);
}
return result;
}
/**
* Run the cascade: try each strategy in order until one works.
* Returns the simplest working strategy.
*/
export async function cascadeProbe(page, url, opts = {}) {
const maxIdx = opts.maxStrategy
? CASCADE_ORDER.indexOf(opts.maxStrategy)
: CASCADE_ORDER.indexOf(Strategy.HEADER); // Don't auto-try INTERCEPT/UI
const probes = [];
for (let i = 0; i <= Math.min(maxIdx, CASCADE_ORDER.length - 1); i++) {
const strategy = CASCADE_ORDER[i];
const probe = await probeEndpoint(page, url, strategy, opts);
probes.push(probe);
if (probe.success) {
return {
bestStrategy: strategy,
probes,
confidence: 1.0 - (i * 0.1), // Higher confidence for simpler strategies
};
}
}
// None worked — default to COOKIE (most common for logged-in sites)
return {
bestStrategy: Strategy.COOKIE,
probes,
confidence: 0.3,
};
}
/**
* Render cascade results for display.
*/
export function renderCascadeResult(result) {
const lines = [
`Strategy Cascade: ${result.bestStrategy} (${(result.confidence * 100).toFixed(0)}% confidence)`,
];
for (const probe of result.probes) {
const icon = probe.success ? '✅' : '❌';
const status = probe.statusCode ? ` [${probe.statusCode}]` : '';
const err = probe.error ? ` — ${probe.error}` : '';
lines.push(` ${icon} ${probe.strategy}${status}${err}`);
}
return lines.join('\n');
}
site: bilibili
name: hot
description: B站热门视频
domain: www.bilibili.com
args:
limit:
type: int
default: 20
description: Number of videos
pipeline:
- navigate: https://www.bilibili.com
- evaluate: |
(async () => {
const res = await fetch('https://api.bilibili.com/x/web-interface/popular?ps=${{ args.limit }}&pn=1', {
credentials: 'include'
});
const data = await res.json();
return (data?.data?.list || []).map((item) => ({
title: item.title,
author: item.owner?.name,
play: item.stat?.view,
danmaku: item.stat?.danmaku,
}));
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
author: ${{ item.author }}
play: ${{ item.play }}
danmaku: ${{ item.danmaku }}
- limit: ${{ args.limit }}
columns: [rank, title, author, play, danmaku]
site: github
name: trending
description: GitHub trending repositories
domain: github.com
args:
language:
type: str
default: ""
description: "Programming language filter (e.g. python, rust)"
since:
type: str
default: daily
description: "Time range: daily, weekly, monthly"
limit:
type: int
default: 20
description: Number of repos
pipeline:
- evaluate: |
(async () => {
const lang = '${{ args.language }}' ? '/${{ args.language }}' : '';
const res = await fetch(`https://github.com/trending${lang}?since=${{ args.since }}`, {
headers: { 'Accept': 'text/html' }
});
const html = await res.text();
const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/html');
const rows = doc.querySelectorAll('article.Box-row');
return Array.from(rows).map(row => {
const nameEl = row.querySelector('h2 a');
const descEl = row.querySelector('p');
const langEl = row.querySelector('[itemprop="programmingLanguage"]');
const starsEl = row.querySelectorAll('a.Link--muted');
const todayEl = row.querySelector('span.d-inline-block.float-sm-right');
return {
repo: nameEl?.textContent?.trim()?.replace(/\s+/g, '') || '',
description: descEl?.textContent?.trim() || '',
language: langEl?.textContent?.trim() || '',
stars: starsEl[0]?.textContent?.trim() || '',
forks: starsEl[1]?.textContent?.trim() || '',
today: todayEl?.textContent?.trim() || ''
};
});
})()
- map:
rank: ${{ index + 1 }}
repo: ${{ item.repo }}
description: ${{ item.description }}
language: ${{ item.language }}
stars: ${{ item.stars }}
today: ${{ item.today }}
- limit: ${{ args.limit }}
columns: [rank, repo, language, stars, today]
site: hackernews
name: top
description: Hacker News top stories
domain: news.ycombinator.com
strategy: public
browser: false
args:
limit:
type: int
default: 20
description: Number of stories
pipeline:
- fetch:
url: https://hacker-news.firebaseio.com/v0/topstories.json
- limit: 30
- map:
id: ${{ item }}
- fetch:
url: https://hacker-news.firebaseio.com/v0/item/${{ item.id }}.json
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
score: ${{ item.score }}
author: ${{ item.by }}
comments: ${{ item.descendants }}
url: ${{ item.url }}
- limit: ${{ args.limit }}
columns: [rank, title, score, author, comments]
site: reddit
name: hot
description: Reddit 热门帖子
domain: www.reddit.com
args:
subreddit:
type: str
default: ""
description: "Subreddit name (e.g. programming). Empty for frontpage"
limit:
type: int
default: 20
description: Number of posts
pipeline:
- navigate: https://www.reddit.com
- evaluate: |
(async () => {
const sub = '${{ args.subreddit }}';
const path = sub ? '/r/' + sub + '/hot.json' : '/hot.json';
const res = await fetch(path + '?limit=${{ args.limit }}&raw_json=1', {
credentials: 'include'
});
const d = await res.json();
return (d?.data?.children || []).map(c => ({
title: c.data.title,
subreddit: c.data.subreddit_name_prefixed,
score: c.data.score,
comments: c.data.num_comments,
author: c.data.author,
url: 'https://www.reddit.com' + c.data.permalink,
}));
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
subreddit: ${{ item.subreddit }}
score: ${{ item.score }}
comments: ${{ item.comments }}
- limit: ${{ args.limit }}
columns: [rank, title, subreddit, score, comments]
site: twitter
name: trending
description: Twitter/X trending topics
domain: x.com
args:
limit:
type: int
default: 20
description: Number of trends to show
pipeline:
- navigate: https://x.com/explore/tabs/trending
- evaluate: |
(async () => {
const cookies = document.cookie.split(';').reduce((acc, c) => {
const [k, v] = c.trim().split('=');
acc[k] = v;
return acc;
}, {});
const csrfToken = cookies['ct0'] || '';
const bearerToken = 'AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA';
const res = await fetch('/i/api/2/guide.json?include_page_configuration=true', {
credentials: 'include',
headers: { 'x-twitter-active-user': 'yes', 'x-csrf-token': csrfToken, 'authorization': 'Bearer ' + bearerToken }
});
const data = await res.json();
const trends = data?.timeline?.instructions?.[1]?.addEntries?.entries || [];
return trends.filter(e => e.content?.timelineModule).flatMap(e => e.content.timelineModule.items || []).map(t => t?.item?.content?.trend).filter(Boolean);
})()
- map:
rank: ${{ index + 1 }}
topic: ${{ item.name }}
tweets: ${{ item.tweetCount || 'N/A' }}
- limit: ${{ args.limit }}
columns: [rank, topic, tweets]
site: v2ex
name: hot
description: V2EX 热门话题
domain: www.v2ex.com
strategy: public
browser: false
args:
limit:
type: int
default: 20
description: Number of topics
pipeline:
- fetch:
url: https://www.v2ex.com/api/topics/hot.json
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
replies: ${{ item.replies }}
- limit: ${{ args.limit }}
columns: [rank, title, replies]
site: v2ex
name: latest
description: V2EX 最新话题
domain: www.v2ex.com
strategy: public
browser: false
args:
limit:
type: int
default: 20
description: Number of topics
pipeline:
- fetch:
url: https://www.v2ex.com/api/topics/latest.json
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
replies: ${{ item.replies }}
- limit: ${{ args.limit }}
columns: [rank, title, replies]
site: v2ex
name: topic
description: V2EX 主题详情和回复
domain: www.v2ex.com
strategy: public
browser: false
args:
id:
type: str
required: true
description: Topic ID
pipeline:
- fetch:
url: https://www.v2ex.com/api/topics/show.json
params:
id: ${{ args.id }}
- map:
title: ${{ item.title }}
replies: ${{ item.replies }}
url: ${{ item.url }}
- limit: 1
columns: [title, replies, url]
site: xiaohongshu
name: feed
description: "小红书首页推荐 Feed (via Pinia Store Action)"
domain: www.xiaohongshu.com
strategy: intercept
browser: true
args:
limit:
type: int
default: 20
description: Number of items to return
columns: [title, author, likes, type, url]
pipeline:
- navigate: https://www.xiaohongshu.com/explore
- wait: 3
- tap:
store: feed
action: fetchFeeds
capture: homefeed
select: data.items
timeout: 8
- map:
id: ${{ item.id }}
title: ${{ item.note_card.display_title }}
type: ${{ item.note_card.type }}
author: ${{ item.note_card.user.nickname }}
likes: ${{ item.note_card.interact_info.liked_count }}
url: https://www.xiaohongshu.com/explore/${{ item.id }}
- limit: ${{ args.limit | default(20) }}
site: xiaohongshu
name: notifications
description: "小红书通知 (mentions/likes/connections)"
domain: www.xiaohongshu.com
strategy: intercept
browser: true
args:
type:
type: str
default: mentions
description: "Notification type: mentions, likes, or connections"
limit:
type: int
default: 20
description: Number of notifications to return
columns: [rank, user, action, content, note, time]
pipeline:
- navigate: https://www.xiaohongshu.com/notification
- wait: 3
- tap:
store: notification
action: getNotification
args:
- ${{ args.type | default('mentions') }}
capture: /you/
select: data.message_list
timeout: 8
- map:
rank: ${{ index + 1 }}
user: ${{ item.user_info.nickname }}
action: ${{ item.title }}
content: ${{ item.comment_info.content }}
note: ${{ item.item_info.content }}
time: ${{ item.time }}
- limit: ${{ args.limit | default(20) }}
/**
* Xiaohongshu search — trigger search via Pinia store + XHR interception.
* Inspired by bb-sites/xiaohongshu/search.js but adapted for opencli pipeline.
*/
export {};
/**
* Xiaohongshu search — trigger search via Pinia store + XHR interception.
* Inspired by bb-sites/xiaohongshu/search.js but adapted for opencli pipeline.
*/
import { cli, Strategy } from '../../registry.js';
cli({
site: 'xiaohongshu',
name: 'search',
description: '搜索小红书笔记',
domain: 'www.xiaohongshu.com',
strategy: Strategy.COOKIE,
args: [
{ name: 'keyword', required: true, help: 'Search keyword' },
{ name: 'limit', type: 'int', default: 20, help: 'Number of results' },
],
columns: ['rank', 'title', 'author', 'likes', 'type'],
func: async (page, kwargs) => {
await page.goto('https://www.xiaohongshu.com');
await page.wait(2);
const data = await page.evaluate(`
(async () => {
const app = document.querySelector('#app')?.__vue_app__;
const pinia = app?.config?.globalProperties?.$pinia;
if (!pinia?._s) return {error: 'Page not ready'};
const searchStore = pinia._s.get('search');
if (!searchStore) return {error: 'Search store not found'};
let captured = null;
const origOpen = XMLHttpRequest.prototype.open;
const origSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.open = function(m, u) { this.__url = u; return origOpen.apply(this, arguments); };
XMLHttpRequest.prototype.send = function(b) {
if (this.__url?.includes('search/notes')) {
const x = this;
const orig = x.onreadystatechange;
x.onreadystatechange = function() { if (x.readyState === 4 && !captured) { try { captured = JSON.parse(x.responseText); } catch {} } if (orig) orig.apply(this, arguments); };
}
return origSend.apply(this, arguments);
};
try {
searchStore.mutateSearchValue('${kwargs.keyword}');
await searchStore.loadMore();
await new Promise(r => setTimeout(r, 800));
} finally {
XMLHttpRequest.prototype.open = origOpen;
XMLHttpRequest.prototype.send = origSend;
}
if (!captured?.success) return {error: captured?.msg || 'Search failed'};
return (captured.data?.items || []).map(i => ({
title: i.note_card?.display_title || '',
type: i.note_card?.type || '',
url: 'https://www.xiaohongshu.com/explore/' + i.id,
author: i.note_card?.user?.nickname || '',
likes: i.note_card?.interact_info?.liked_count || '0',
}));
})()
`);
if (!Array.isArray(data))
return [];
return data.slice(0, kwargs.limit).map((item, i) => ({
rank: i + 1,
...item,
}));
},
});
site: zhihu
name: hot
description: 知乎热榜
domain: www.zhihu.com
args:
limit:
type: int
default: 20
description: Number of items to return
pipeline:
- navigate: https://www.zhihu.com
- evaluate: |
(async () => {
const res = await fetch('https://www.zhihu.com/api/v3/feed/topstory/hot-lists/total?limit=50', {
credentials: 'include'
});
const d = await res.json();
return (d?.data || []).map((item) => {
const t = item.target || {};
return {
title: t.title,
url: 'https://www.zhihu.com/question/' + t.id,
answer_count: t.answer_count,
follower_count: t.follower_count,
heat: item.detail_text || '',
};
});
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
heat: ${{ item.heat }}
answers: ${{ item.answer_count }}
url: ${{ item.url }}
- limit: ${{ args.limit }}
columns: [rank, title, heat, answers]
import { cli, Strategy } from '../../registry.js';
cli({
site: 'zhihu',
name: 'question',
description: '知乎问题详情和回答',
domain: 'www.zhihu.com',
strategy: Strategy.COOKIE,
args: [
{ name: 'id', required: true, help: 'Question ID (numeric)' },
{ name: 'limit', type: 'int', default: 5, help: 'Number of answers' },
],
columns: ['rank', 'author', 'votes', 'content'],
func: async (page, kwargs) => {
const { id, limit = 5 } = kwargs;
const stripHtml = (html) => (html || '').replace(/<[^>]+>/g, '').replace(/&nbsp;/g, ' ').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&amp;/g, '&').trim();
// Fetch question detail and answers in parallel via evaluate
const result = await page.evaluate(`
async () => {
const [qResp, aResp] = await Promise.all([
fetch('https://www.zhihu.com/api/v4/questions/${id}?include=data[*].detail,excerpt,answer_count,follower_count,visit_count', {credentials: 'include'}),
fetch('https://www.zhihu.com/api/v4/questions/${id}/answers?limit=${limit}&offset=0&sort_by=default&include=data[*].content,voteup_count,comment_count,author', {credentials: 'include'})
]);
if (!qResp.ok || !aResp.ok) return { error: true };
const q = await qResp.json();
const a = await aResp.json();
return { question: q, answers: a.data || [] };
}
`);
if (!result || result.error)
throw new Error('Failed to fetch question. Are you logged in?');
const answers = (result.answers ?? []).slice(0, Number(limit)).map((a, i) => ({
rank: i + 1,
author: a.author?.name ?? 'anonymous',
votes: a.voteup_count ?? 0,
content: stripHtml(a.content ?? '').slice(0, 200),
}));
return answers;
},
});
site: zhihu
name: search
description: 知乎搜索
domain: www.zhihu.com
args:
keyword:
type: str
required: true
description: Search keyword
limit:
type: int
default: 10
description: Number of results
pipeline:
- navigate: https://www.zhihu.com
- evaluate: |
(async () => {
const strip = (html) => (html || '').replace(/<[^>]+>/g, '').replace(/&nbsp;/g, ' ').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&amp;/g, '&').replace(/<em>/g, '').replace(/<\/em>/g, '').trim();
const res = await fetch('https://www.zhihu.com/api/v4/search_v3?q=' + encodeURIComponent('${{ args.keyword }}') + '&t=general&offset=0&limit=${{ args.limit }}', {
credentials: 'include'
});
const d = await res.json();
return (d?.data || [])
.filter(item => item.type === 'search_result')
.map(item => {
const obj = item.object || {};
const q = obj.question || {};
return {
type: obj.type,
title: strip(obj.title || q.name || ''),
excerpt: strip(obj.excerpt || '').substring(0, 100),
author: obj.author?.name || '',
votes: obj.voteup_count || 0,
url: obj.type === 'answer'
? 'https://www.zhihu.com/question/' + q.id + '/answer/' + obj.id
: obj.type === 'article'
? 'https://zhuanlan.zhihu.com/p/' + obj.id
: 'https://www.zhihu.com/question/' + obj.id,
};
});
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
type: ${{ item.type }}
author: ${{ item.author }}
votes: ${{ item.votes }}
- limit: ${{ args.limit }}
columns: [rank, title, type, author, votes]
# OpenCLI
> **把任何网站变成你的命令行工具。**
> 零风控 · 复用 Chrome 登录 · AI 自动发现接口
[English](./README.md)
[![npm](https://img.shields.io/npm/v/@jackwener/opencli)](https://www.npmjs.com/package/@jackwener/opencli)
OpenCLI 通过 Chrome 浏览器 + [Playwright MCP Bridge](https://github.com/nichochar/playwright-mcp) 扩展,将任何网站变成命令行工具。不存密码、不泄 token,直接复用浏览器已登录状态。
## ✨ 亮点
- 🌐 **25+ 命令,8 个站点** — B站、知乎、GitHub、Twitter/X、Reddit、V2EX、小红书、Hacker News
- 🔐 **零风控** — 复用 Chrome 登录态,无需存储任何凭证
- 🤖 **AI 原生** — `explore` 自动发现 API,`synthesize` 生成适配器,`cascade` 探测认证策略
- 📝 **声明式 YAML** — 大部分适配器只需 ~30 行 YAML
- 🔌 **TypeScript 扩展** — 复杂场景(XHR 拦截、GraphQL)可用 TS 编写
## 🚀 快速开始
### npm 全局安装(推荐)
```bash
npm install -g @jackwener/opencli
```
直接使用:
```bash
opencli list # 查看所有命令
opencli hackernews top --limit 5 # 公共 API,无需浏览器
opencli bilibili hot --limit 5 # 浏览器命令
opencli zhihu hot -f json # JSON 输出
```
### 从源码安装
```bash
git clone git@github.com:jackwener/opencli.git
cd opencli && npm install
npx tsx src/main.ts list
```
### 更新
```bash
# npm 全局更新
npm update -g @jackwener/opencli
# 或直接安装最新版
npm install -g @jackwener/opencli@latest
```
## 📋 前置要求
浏览器命令需要:
1. **Chrome** 浏览器正在运行,且**已登录目标网站**(如 bilibili.com、zhihu.com、xiaohongshu.com)
2. 安装 **[Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm)** 扩展
3. 在 MCP 配置中设置 `PLAYWRIGHT_MCP_EXTENSION_TOKEN`(从扩展设置页获取):
```json
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--extension"],
"env": {
"PLAYWRIGHT_MCP_EXTENSION_TOKEN": "<your-token>"
}
}
}
}
```
公共 API 命令(`hackernews`、`github search`、`v2ex`)无需浏览器。
> **⚠️ 重要**:浏览器命令复用你的 Chrome 登录状态。运行命令前,你必须已在 Chrome 中登录目标网站。如果获取到空数据或报错,请先检查登录状态。
## 📦 内置命令
| 站点 | 命令 | 模式 |
|------|------|------|
| **bilibili** | `hot` `search` `me` `favorite` `history` `feed` `user-videos` | 🔐 浏览器 |
| **zhihu** | `hot` `search` `question` | 🔐 浏览器 |
| **xiaohongshu** | `search` `notifications` `feed` | 🔐 浏览器 |
| **twitter** | `trending` | 🔐 浏览器 |
| **reddit** | `hot` | 🔐 浏览器 |
| **github** | `trending` `search` | 🔐 / 🌐 |
| **v2ex** | `hot` `latest` `topic` | 🌐 公共 API |
| **hackernews** | `top` | 🌐 公共 API |
## 🎨 输出格式
```bash
opencli bilibili hot -f table # 默认:表格
opencli bilibili hot -f json # JSON(可 pipe 给 jq 或 AI agent)
opencli bilibili hot -f md # Markdown
opencli bilibili hot -f csv # CSV
opencli bilibili hot -v # 详细模式:展示 pipeline 每步数据
```
## 🧠 AI Agent 工作流
```bash
# 1. Deep Explore — 网络拦截 → 响应分析 → 能力推理 → 框架检测
opencli explore https://example.com --site mysite
# 2. Synthesize — 从探索成果物生成 evaluate-based YAML 适配器
opencli synthesize mysite
# 3. Generate — 一键完成:探索 → 合成 → 注册
opencli generate https://example.com --goal "hot"
# 4. Strategy Cascade — 自动降级探测:PUBLIC → COOKIE → HEADER
opencli cascade https://api.example.com/data
```
探索结果输出到 `.opencli/explore/<site>/`:
- `manifest.json` — 站点元数据、框架检测结果
- `endpoints.json` — 评分排序的 API 端点,含响应 schema
- `capabilities.json` — 推理出的能力及置信度
- `auth.json` — 认证策略建议
## 🔧 创建新命令
查看 **[SKILL.md](./SKILL.md)** 了解完整的适配器开发指南(YAML pipeline + TypeScript)。
## 版本发布
```bash
# 升级版本号
npm version patch # 0.1.0 → 0.1.1
npm version minor # 0.1.0 → 0.2.0
npm version major # 0.1.0 → 1.0.0
# 推送 tag,GitHub Actions 自动发 release 并发布到 npm
git push --follow-tags
```
## 📄 License
MIT
/**
* Strategy Cascade: automatic strategy downgrade chain.
*
* Probes an API endpoint starting from the simplest strategy (PUBLIC)
* and automatically downgrades through the strategy tiers until one works:
*
* PUBLIC → COOKIE → HEADER → INTERCEPT → UI
*
* This eliminates the need for manual strategy selection — the system
* automatically finds the minimum-privilege strategy that works.
*/
import { Strategy } from './registry.js';
/** Strategy cascade order (simplest → most complex) */
const CASCADE_ORDER: Strategy[] = [
Strategy.PUBLIC,
Strategy.COOKIE,
Strategy.HEADER,
Strategy.INTERCEPT,
Strategy.UI,
];
interface ProbeResult {
strategy: Strategy;
success: boolean;
statusCode?: number;
hasData?: boolean;
error?: string;
responsePreview?: string;
}
interface CascadeResult {
bestStrategy: Strategy;
probes: ProbeResult[];
confidence: number;
}
/**
* Probe an endpoint with a specific strategy.
* Returns whether the probe succeeded and basic response info.
*/
export async function probeEndpoint(
page: any,
url: string,
strategy: Strategy,
opts: { timeout?: number } = {},
): Promise<ProbeResult> {
const result: ProbeResult = { strategy, success: false };
try {
switch (strategy) {
case Strategy.PUBLIC: {
// Try direct fetch without browser (no credentials)
const js = `
async () => {
try {
const resp = await fetch(${JSON.stringify(url)});
const status = resp.status;
if (!resp.ok) return { status, ok: false };
const text = await resp.text();
let hasData = false;
try {
const json = JSON.parse(text);
hasData = !!json && (Array.isArray(json) ? json.length > 0 :
typeof json === 'object' && Object.keys(json).length > 0);
} catch {}
return { status, ok: true, hasData, preview: text.slice(0, 200) };
} catch (e) { return { ok: false, error: e.message }; }
}
`;
const resp = await page.evaluate(js);
result.statusCode = resp?.status;
result.success = resp?.ok && resp?.hasData;
result.hasData = resp?.hasData;
result.responsePreview = resp?.preview;
break;
}
case Strategy.COOKIE: {
// Fetch with credentials: 'include' (uses browser cookies)
const js = `
async () => {
try {
const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' });
const status = resp.status;
if (!resp.ok) return { status, ok: false };
const text = await resp.text();
let hasData = false;
try {
const json = JSON.parse(text);
hasData = !!json && (Array.isArray(json) ? json.length > 0 :
typeof json === 'object' && Object.keys(json).length > 0);
// Check for API-level error codes (common in Chinese sites)
if (json.code !== undefined && json.code !== 0) hasData = false;
} catch {}
return { status, ok: true, hasData, preview: text.slice(0, 200) };
} catch (e) { return { ok: false, error: e.message }; }
}
`;
const resp = await page.evaluate(js);
result.statusCode = resp?.status;
result.success = resp?.ok && resp?.hasData;
result.hasData = resp?.hasData;
result.responsePreview = resp?.preview;
break;
}
case Strategy.HEADER: {
// Fetch with credentials + try to extract common auth headers
const js = `
async () => {
try {
// Try to extract CSRF tokens from cookies
const cookies = document.cookie.split(';').map(c => c.trim());
const csrf = cookies.find(c => c.startsWith('ct0=') || c.startsWith('csrf_token=') || c.startsWith('_csrf='))?.split('=').slice(1).join('=');
const headers = {};
if (csrf) {
headers['X-Csrf-Token'] = csrf;
headers['X-XSRF-Token'] = csrf;
}
const resp = await fetch(${JSON.stringify(url)}, {
credentials: 'include',
headers
});
const status = resp.status;
if (!resp.ok) return { status, ok: false };
const text = await resp.text();
let hasData = false;
try {
const json = JSON.parse(text);
hasData = !!json && (Array.isArray(json) ? json.length > 0 :
typeof json === 'object' && Object.keys(json).length > 0);
if (json.code !== undefined && json.code !== 0) hasData = false;
} catch {}
return { status, ok: true, hasData, preview: text.slice(0, 200) };
} catch (e) { return { ok: false, error: e.message }; }
}
`;
const resp = await page.evaluate(js);
result.statusCode = resp?.status;
result.success = resp?.ok && resp?.hasData;
result.hasData = resp?.hasData;
result.responsePreview = resp?.preview;
break;
}
case Strategy.INTERCEPT:
case Strategy.UI:
// These require specific implementation per-site
// Mark as needing manual implementation
result.success = false;
result.error = `Strategy ${strategy} requires site-specific implementation`;
break;
}
} catch (err: any) {
result.success = false;
result.error = err.message ?? String(err);
}
return result;
}
/**
* Run the cascade: try each strategy in order until one works.
* Returns the simplest working strategy.
*/
export async function cascadeProbe(
page: any,
url: string,
opts: { maxStrategy?: Strategy; timeout?: number } = {},
): Promise<CascadeResult> {
const maxIdx = opts.maxStrategy
? CASCADE_ORDER.indexOf(opts.maxStrategy)
: CASCADE_ORDER.indexOf(Strategy.HEADER); // Don't auto-try INTERCEPT/UI
const probes: ProbeResult[] = [];
for (let i = 0; i <= Math.min(maxIdx, CASCADE_ORDER.length - 1); i++) {
const strategy = CASCADE_ORDER[i];
const probe = await probeEndpoint(page, url, strategy, opts);
probes.push(probe);
if (probe.success) {
return {
bestStrategy: strategy,
probes,
confidence: 1.0 - (i * 0.1), // Higher confidence for simpler strategies
};
}
}
// None worked — default to COOKIE (most common for logged-in sites)
return {
bestStrategy: Strategy.COOKIE,
probes,
confidence: 0.3,
};
}
/**
* Render cascade results for display.
*/
export function renderCascadeResult(result: CascadeResult): string {
const lines = [
`Strategy Cascade: ${result.bestStrategy} (${(result.confidence * 100).toFixed(0)}% confidence)`,
];
for (const probe of result.probes) {
const icon = probe.success ? '✅' : '❌';
const status = probe.statusCode ? ` [${probe.statusCode}]` : '';
const err = probe.error ? ` — ${probe.error}` : '';
lines.push(` ${icon} ${probe.strategy}${status}${err}`);
}
return lines.join('\n');
}
site: reddit
name: hot
description: Reddit 热门帖子
domain: www.reddit.com
args:
subreddit:
type: str
default: ""
description: "Subreddit name (e.g. programming). Empty for frontpage"
limit:
type: int
default: 20
description: Number of posts
pipeline:
- navigate: https://www.reddit.com
- evaluate: |
(async () => {
const sub = '${{ args.subreddit }}';
const path = sub ? '/r/' + sub + '/hot.json' : '/hot.json';
const res = await fetch(path + '?limit=${{ args.limit }}&raw_json=1', {
credentials: 'include'
});
const d = await res.json();
return (d?.data?.children || []).map(c => ({
title: c.data.title,
subreddit: c.data.subreddit_name_prefixed,
score: c.data.score,
comments: c.data.num_comments,
author: c.data.author,
url: 'https://www.reddit.com' + c.data.permalink,
}));
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
subreddit: ${{ item.subreddit }}
score: ${{ item.score }}
comments: ${{ item.comments }}
- limit: ${{ args.limit }}
columns: [rank, title, subreddit, score, comments]
site: v2ex
name: topic
description: V2EX 主题详情和回复
domain: www.v2ex.com
strategy: public
browser: false
args:
id:
type: str
required: true
description: Topic ID
pipeline:
- fetch:
url: https://www.v2ex.com/api/topics/show.json
params:
id: ${{ args.id }}
- map:
title: ${{ item.title }}
replies: ${{ item.replies }}
url: ${{ item.url }}
- limit: 1
columns: [title, replies, url]
site: xiaohongshu
name: feed
description: "小红书首页推荐 Feed (via Pinia Store Action)"
domain: www.xiaohongshu.com
strategy: intercept
browser: true
args:
limit:
type: int
default: 20
description: Number of items to return
columns: [title, author, likes, type, url]
pipeline:
- navigate: https://www.xiaohongshu.com/explore
- wait: 3
- tap:
store: feed
action: fetchFeeds
capture: homefeed
select: data.items
timeout: 8
- map:
id: ${{ item.id }}
title: ${{ item.note_card.display_title }}
type: ${{ item.note_card.type }}
author: ${{ item.note_card.user.nickname }}
likes: ${{ item.note_card.interact_info.liked_count }}
url: https://www.xiaohongshu.com/explore/${{ item.id }}
- limit: ${{ args.limit | default(20) }}
site: xiaohongshu
name: notifications
description: "小红书通知 (mentions/likes/connections)"
domain: www.xiaohongshu.com
strategy: intercept
browser: true
args:
type:
type: str
default: mentions
description: "Notification type: mentions, likes, or connections"
limit:
type: int
default: 20
description: Number of notifications to return
columns: [rank, user, action, content, note, time]
pipeline:
- navigate: https://www.xiaohongshu.com/notification
- wait: 3
- tap:
store: notification
action: getNotification
args:
- ${{ args.type | default('mentions') }}
capture: /you/
select: data.message_list
timeout: 8
- map:
rank: ${{ index + 1 }}
user: ${{ item.user_info.nickname }}
action: ${{ item.title }}
content: ${{ item.comment_info.content }}
note: ${{ item.item_info.content }}
time: ${{ item.time }}
- limit: ${{ args.limit | default(20) }}
/**
* Xiaohongshu search — trigger search via Pinia store + XHR interception.
* Inspired by bb-sites/xiaohongshu/search.js but adapted for opencli pipeline.
*/
import { cli, Strategy } from '../../registry.js';
cli({
site: 'xiaohongshu',
name: 'search',
description: '搜索小红书笔记',
domain: 'www.xiaohongshu.com',
strategy: Strategy.COOKIE,
args: [
{ name: 'keyword', required: true, help: 'Search keyword' },
{ name: 'limit', type: 'int', default: 20, help: 'Number of results' },
],
columns: ['rank', 'title', 'author', 'likes', 'type'],
func: async (page, kwargs) => {
await page.goto('https://www.xiaohongshu.com');
await page.wait(2);
const data = await page.evaluate(`
(async () => {
const app = document.querySelector('#app')?.__vue_app__;
const pinia = app?.config?.globalProperties?.$pinia;
if (!pinia?._s) return {error: 'Page not ready'};
const searchStore = pinia._s.get('search');
if (!searchStore) return {error: 'Search store not found'};
let captured = null;
const origOpen = XMLHttpRequest.prototype.open;
const origSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.open = function(m, u) { this.__url = u; return origOpen.apply(this, arguments); };
XMLHttpRequest.prototype.send = function(b) {
if (this.__url?.includes('search/notes')) {
const x = this;
const orig = x.onreadystatechange;
x.onreadystatechange = function() { if (x.readyState === 4 && !captured) { try { captured = JSON.parse(x.responseText); } catch {} } if (orig) orig.apply(this, arguments); };
}
return origSend.apply(this, arguments);
};
try {
searchStore.mutateSearchValue('${kwargs.keyword}');
await searchStore.loadMore();
await new Promise(r => setTimeout(r, 800));
} finally {
XMLHttpRequest.prototype.open = origOpen;
XMLHttpRequest.prototype.send = origSend;
}
if (!captured?.success) return {error: captured?.msg || 'Search failed'};
return (captured.data?.items || []).map(i => ({
title: i.note_card?.display_title || '',
type: i.note_card?.type || '',
url: 'https://www.xiaohongshu.com/explore/' + i.id,
author: i.note_card?.user?.nickname || '',
likes: i.note_card?.interact_info?.liked_count || '0',
}));
})()
`);
if (!Array.isArray(data)) return [];
return data.slice(0, kwargs.limit).map((item: any, i: number) => ({
rank: i + 1,
...item,
}));
},
});
import { cli, Strategy } from '../../registry.js';
cli({
site: 'zhihu',
name: 'question',
description: '知乎问题详情和回答',
domain: 'www.zhihu.com',
strategy: Strategy.COOKIE,
args: [
{ name: 'id', required: true, help: 'Question ID (numeric)' },
{ name: 'limit', type: 'int', default: 5, help: 'Number of answers' },
],
columns: ['rank', 'author', 'votes', 'content'],
func: async (page, kwargs) => {
const { id, limit = 5 } = kwargs;
const stripHtml = (html: string) =>
(html || '').replace(/<[^>]+>/g, '').replace(/&nbsp;/g, ' ').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&amp;/g, '&').trim();
// Fetch question detail and answers in parallel via evaluate
const result = await page.evaluate(`
async () => {
const [qResp, aResp] = await Promise.all([
fetch('https://www.zhihu.com/api/v4/questions/${id}?include=data[*].detail,excerpt,answer_count,follower_count,visit_count', {credentials: 'include'}),
fetch('https://www.zhihu.com/api/v4/questions/${id}/answers?limit=${limit}&offset=0&sort_by=default&include=data[*].content,voteup_count,comment_count,author', {credentials: 'include'})
]);
if (!qResp.ok || !aResp.ok) return { error: true };
const q = await qResp.json();
const a = await aResp.json();
return { question: q, answers: a.data || [] };
}
`);
if (!result || result.error) throw new Error('Failed to fetch question. Are you logged in?');
const answers = (result.answers ?? []).slice(0, Number(limit)).map((a: any, i: number) => ({
rank: i + 1,
author: a.author?.name ?? 'anonymous',
votes: a.voteup_count ?? 0,
content: stripHtml(a.content ?? '').slice(0, 200),
}));
return answers;
},
});
site: zhihu
name: search
description: 知乎搜索
domain: www.zhihu.com
args:
keyword:
type: str
required: true
description: Search keyword
limit:
type: int
default: 10
description: Number of results
pipeline:
- navigate: https://www.zhihu.com
- evaluate: |
(async () => {
const strip = (html) => (html || '').replace(/<[^>]+>/g, '').replace(/&nbsp;/g, ' ').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&amp;/g, '&').replace(/<em>/g, '').replace(/<\/em>/g, '').trim();
const res = await fetch('https://www.zhihu.com/api/v4/search_v3?q=' + encodeURIComponent('${{ args.keyword }}') + '&t=general&offset=0&limit=${{ args.limit }}', {
credentials: 'include'
});
const d = await res.json();
return (d?.data || [])
.filter(item => item.type === 'search_result')
.map(item => {
const obj = item.object || {};
const q = obj.question || {};
return {
type: obj.type,
title: strip(obj.title || q.name || ''),
excerpt: strip(obj.excerpt || '').substring(0, 100),
author: obj.author?.name || '',
votes: obj.voteup_count || 0,
url: obj.type === 'answer'
? 'https://www.zhihu.com/question/' + q.id + '/answer/' + obj.id
: obj.type === 'article'
? 'https://zhuanlan.zhihu.com/p/' + obj.id
: 'https://www.zhihu.com/question/' + obj.id,
};
});
})()
- map:
rank: ${{ index + 1 }}
title: ${{ item.title }}
type: ${{ item.type }}
author: ${{ item.author }}
votes: ${{ item.votes }}
- limit: ${{ args.limit }}
columns: [rank, title, type, author, votes]
+1
-0

@@ -42,2 +42,3 @@ /**

private _initialTabCount;
private _page;
connect(opts?: {

@@ -44,0 +45,0 @@ timeout?: number;

@@ -40,3 +40,14 @@ /**

if (textParts.length === 1) {
const text = textParts[0].text;
let text = textParts[0].text;
// MCP browser_evaluate returns: "[JSON]\n### Ran Playwright code\n```js\n...\n```"
// Strip the "### Ran Playwright code" suffix to get clean JSON
const codeMarker = text.indexOf('### Ran Playwright code');
if (codeMarker !== -1) {
text = text.slice(0, codeMarker).trim();
}
// Also handle "### Result\n[JSON]" format (some MCP versions)
const resultMarker = text.indexOf('### Result\n');
if (resultMarker !== -1) {
text = text.slice(resultMarker + '### Result\n'.length).trim();
}
try {

@@ -110,2 +121,3 @@ return JSON.parse(text);

_initialTabCount = 0;
_page = null;
async connect(opts = {}) {

@@ -129,2 +141,3 @@ await this._acquireLock();

this._proc.stdin.write(msg); }, () => new Promise((res) => { this._waiters.push(res); }));
this._page = page;
this._proc.stdout?.on('data', (chunk) => {

@@ -180,2 +193,22 @@ this._buffer += chunk.toString();

try {
// Close tabs opened during this session (site tabs + extension tabs)
if (this._page && this._proc && !this._proc.killed) {
try {
const tabs = await this._page.tabs();
const tabStr = typeof tabs === 'string' ? tabs : JSON.stringify(tabs);
const allTabs = tabStr.match(/Tab (\d+)/g) || [];
const currentTabCount = allTabs.length;
// Close tabs in reverse order to avoid index shifting issues
// Keep the original tabs that existed before the command started
if (currentTabCount > this._initialTabCount && this._initialTabCount > 0) {
for (let i = currentTabCount - 1; i >= this._initialTabCount; i--) {
try {
await this._page.closeTab(i);
}
catch { }
}
}
}
catch { }
}
if (this._proc && !this._proc.killed) {

@@ -187,2 +220,3 @@ this._proc.kill('SIGTERM');

finally {
this._page = null;
this._releaseLock();

@@ -189,0 +223,0 @@ }

+2
-1

@@ -13,2 +13,3 @@ /**

import './github/search.js';
import './zhihu/search.js';
import './zhihu/question.js';
import './xiaohongshu/search.js';

@@ -16,2 +16,4 @@ /**

// zhihu
import './zhihu/search.js';
import './zhihu/question.js';
// xiaohongshu
import './xiaohongshu/search.js';
/**
* Deep Explore: intelligent API discovery with response analysis.
*
* Unlike simple page snapshots, Deep Explore intercepts network traffic,
* analyzes response schemas, and automatically infers capabilities that
* can be turned into CLI commands.
*
* Flow:
* 1. Navigate to target URL
* 2. Auto-scroll to trigger lazy loading
* 3. Capture network requests (with body analysis)
* 4. For each JSON response: detect list fields, infer columns, analyze auth
* 5. Detect frontend framework (Vue/React/Pinia/Next.js)
* 6. Generate structured capabilities.json
* Navigates to the target URL, auto-scrolls to trigger lazy loading,
* captures network traffic, analyzes JSON responses, and automatically
* infers CLI capabilities from discovered API endpoints.
*/
export declare function exploreUrl(url: string, opts?: any): Promise<any>;
export declare function renderExploreSummary(result: any): string;
export declare function detectSiteName(url: string): string;
export declare function slugify(value: string): string;
export interface DiscoveredStore {
type: 'pinia' | 'vuex';
id: string;
actions: string[];
stateKeys: string[];
}
export declare function exploreUrl(url: string, opts: {
BrowserFactory: new () => any;
site?: string;
goal?: string;
authenticated?: boolean;
outDir?: string;
waitSeconds?: number;
query?: string;
clickLabels?: string[];
auto?: boolean;
}): Promise<Record<string, any>>;
export declare function renderExploreSummary(result: Record<string, any>): string;
/**
* Deep Explore: intelligent API discovery with response analysis.
*
* Unlike simple page snapshots, Deep Explore intercepts network traffic,
* analyzes response schemas, and automatically infers capabilities that
* can be turned into CLI commands.
*
* Flow:
* 1. Navigate to target URL
* 2. Auto-scroll to trigger lazy loading
* 3. Capture network requests (with body analysis)
* 4. For each JSON response: detect list fields, infer columns, analyze auth
* 5. Detect frontend framework (Vue/React/Pinia/Next.js)
* 6. Generate structured capabilities.json
* Navigates to the target URL, auto-scrolls to trigger lazy loading,
* captures network traffic, analyzes JSON responses, and automatically
* infers CLI capabilities from discovered API endpoints.
*/
import * as fs from 'node:fs';
import * as path from 'node:path';
import { browserSession, DEFAULT_BROWSER_EXPLORE_TIMEOUT, runWithTimeout } from './runtime.js';
import { DEFAULT_BROWSER_EXPLORE_TIMEOUT, browserSession, runWithTimeout } from './runtime.js';
// ── Site name detection ────────────────────────────────────────────────────
const KNOWN_ALIASES = {
const KNOWN_SITE_ALIASES = {
'x.com': 'twitter', 'twitter.com': 'twitter',
'news.ycombinator.com': 'hackernews',
'www.zhihu.com': 'zhihu', 'www.bilibili.com': 'bilibili',
'search.bilibili.com': 'bilibili',
'www.v2ex.com': 'v2ex', 'www.reddit.com': 'reddit',
'www.xiaohongshu.com': 'xiaohongshu', 'www.douban.com': 'douban',
'www.weibo.com': 'weibo', 'search.bilibili.com': 'bilibili',
'www.weibo.com': 'weibo', 'www.bbc.com': 'bbc',
};
function detectSiteName(url) {
export function detectSiteName(url) {
try {
const host = new URL(url).hostname.toLowerCase();
if (host in KNOWN_ALIASES)
return KNOWN_ALIASES[host];
if (host in KNOWN_SITE_ALIASES)
return KNOWN_SITE_ALIASES[host];
const parts = host.split('.').filter(p => p && p !== 'www');
if (parts.length >= 2) {
if (['uk', 'jp', 'cn', 'com'].includes(parts[parts.length - 1]) && parts.length >= 3) {
return parts[parts.length - 3].replace(/[^a-z0-9]/g, '');
return slugify(parts[parts.length - 3]);
}
return parts[parts.length - 2].replace(/[^a-z0-9]/g, '');
return slugify(parts[parts.length - 2]);
}
return parts[0]?.replace(/[^a-z0-9]/g, '') ?? 'site';
return parts[0] ? slugify(parts[0]) : 'site';
}

@@ -46,12 +39,11 @@ catch {

}
export function slugify(value) {
return value.trim().toLowerCase().replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-|-$/g, '') || 'site';
}
// ── Field & capability inference ───────────────────────────────────────────
/**
* Common field names grouped by semantic role.
* Used to auto-detect which response fields map to which columns.
*/
const FIELD_ROLES = {
title: ['title', 'name', 'text', 'content', 'desc', 'description', 'headline', 'subject'],
url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'short_link', 'share_url'],
url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'share_url'],
author: ['author', 'username', 'user_name', 'nickname', 'nick', 'owner', 'creator', 'up_name', 'uname'],
score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'stat', 'play', 'favorite_count', 'reply_count'],
score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'play', 'favorite_count', 'reply_count'],
time: ['time', 'created_at', 'publish_time', 'pub_time', 'date', 'ctime', 'mtime', 'pubdate', 'created'],

@@ -62,48 +54,23 @@ id: ['id', 'aid', 'bvid', 'mid', 'uid', 'oid', 'note_id', 'item_id'],

};
/** Param names that indicate searchable APIs */
const SEARCH_PARAMS = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'search_query', 'w']);
/** Param names that indicate pagination */
const PAGINATION_PARAMS = new Set(['page', 'pn', 'offset', 'cursor', 'next', 'page_num']);
/** Param names that indicate limit control */
const LIMIT_PARAMS = new Set(['limit', 'count', 'size', 'per_page', 'page_size', 'ps', 'num']);
/** Content types to ignore */
const IGNORED_CONTENT_TYPES = new Set(['image/', 'font/', 'text/css', 'text/javascript', 'application/javascript', 'application/wasm']);
/** Volatile query params to strip from patterns */
const VOLATILE_PARAMS = new Set(['w_rid', 'wts', '_', 'callback', 'timestamp', 't', 'nonce', 'sign']);
/**
* Parse raw network output from Playwright MCP into structured entries.
* Handles both text format ([GET] url => [200]) and structured JSON.
* Parse raw network output from Playwright MCP.
* Handles text format: [GET] url => [200]
*/
function parseNetworkOutput(raw) {
function parseNetworkRequests(raw) {
if (typeof raw === 'string') {
// Playwright MCP returns network as text lines like:
// "[GET] https://api.example.com/xxx => [200] "
// May also have markdown headers like "### Result"
const entries = [];
const lines = raw.split('\n').filter((l) => l.trim());
for (const line of lines) {
// Format: [METHOD] URL => [STATUS] optional_extra
const bracketMatch = line.match(/^\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i);
if (bracketMatch) {
const [, method, url, status] = bracketMatch;
for (const line of raw.split('\n')) {
// Format: [GET] URL => [200]
const m = line.match(/\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i);
if (m) {
const [, method, url, status] = m;
entries.push({
method: method.toUpperCase(),
url,
status: status ? parseInt(status) : null,
contentType: url.endsWith('.json') ? 'application/json' :
(url.includes('/api/') || url.includes('/x/')) ? 'application/json' : '',
method: method.toUpperCase(), url, status: status ? parseInt(status) : null,
contentType: (url.includes('/api/') || url.includes('/x/') || url.endsWith('.json')) ? 'application/json' : '',
});
continue;
}
// Legacy format: GET url → 200 (application/json)
const legacyMatch = line.match(/^(GET|POST|PUT|DELETE|PATCH|OPTIONS)\s+(\S+)\s*→?\s*(\d+)?\s*(?:\(([^)]*)\))?/i);
if (legacyMatch) {
const [, method, url, status, ct] = legacyMatch;
entries.push({
method: method.toUpperCase(),
url,
status: status ? parseInt(status) : null,
contentType: ct ?? '',
});
}
}

@@ -113,9 +80,8 @@ return entries;

if (Array.isArray(raw)) {
return raw.map((e) => ({
return raw.filter(e => e && typeof e === 'object').map(e => ({
method: (e.method ?? 'GET').toUpperCase(),
url: e.url ?? e.request?.url ?? '',
url: String(e.url ?? e.request?.url ?? e.requestUrl ?? ''),
status: e.status ?? e.statusCode ?? null,
contentType: e.contentType ?? e.mimeType ?? '',
responseBody: e.responseBody ?? e.body,
requestHeaders: e.requestHeaders ?? e.headers,
contentType: e.contentType ?? e.response?.contentType ?? '',
responseBody: e.responseBody, requestHeaders: e.requestHeaders,
}));

@@ -125,19 +91,10 @@ }

}
/**
* Normalize a URL into a pattern by replacing IDs with placeholders.
*/
function urlToPattern(url) {
try {
const parsed = new URL(url);
const pathNorm = parsed.pathname
.replace(/\/\d+/g, '/{id}')
.replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}')
.replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}');
const p = new URL(url);
const pathNorm = p.pathname.replace(/\/\d+/g, '/{id}').replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}').replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}');
const params = [];
parsed.searchParams.forEach((_v, k) => {
if (!VOLATILE_PARAMS.has(k))
params.push(k);
});
const qs = params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : '';
return `${parsed.host}${pathNorm}${qs}`;
p.searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k))
params.push(k); });
return `${p.host}${pathNorm}${params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : ''}`;
}

@@ -148,18 +105,2 @@ catch {

}
/**
* Extract query params from a URL.
*/
function extractQueryParams(url) {
try {
const params = {};
new URL(url).searchParams.forEach((v, k) => { params[k] = v; });
return params;
}
catch {
return {};
}
}
/**
* Detect auth indicators from request headers.
*/
function detectAuthIndicators(headers) {

@@ -176,28 +117,17 @@ if (!headers)

indicators.push('signature');
if (keys.some(k => k === 'x-client-transaction-id'))
indicators.push('transaction');
return indicators;
}
/**
* Analyze a JSON response to find list data and field mappings.
*/
function analyzeResponseBody(body) {
if (!body || typeof body !== 'object')
return null;
// Try to find the main list in the response
const candidates = [];
function findArrays(obj, currentPath, depth) {
function findArrays(obj, path, depth) {
if (depth > 4)
return;
if (Array.isArray(obj) && obj.length >= 2) {
// Check if items are objects (not primitive arrays)
if (obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) {
candidates.push({ path: currentPath, items: obj });
}
if (Array.isArray(obj) && obj.length >= 2 && obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) {
candidates.push({ path, items: obj });
}
if (obj && typeof obj === 'object' && !Array.isArray(obj)) {
for (const [key, val] of Object.entries(obj)) {
const nextPath = currentPath ? `${currentPath}.${key}` : key;
findArrays(val, nextPath, depth + 1);
}
for (const [key, val] of Object.entries(obj))
findArrays(val, path ? `${path}.${key}` : key, depth + 1);
}

@@ -208,17 +138,11 @@ }

return null;
// Pick the largest array as the main list
candidates.sort((a, b) => b.items.length - a.items.length);
const best = candidates[0];
// Analyze field names in the first item
const sampleItem = best.items[0];
const sampleFieldNames = sampleItem && typeof sampleItem === 'object'
? flattenFieldNames(sampleItem, '', 2)
: [];
// Match fields to semantic roles
const sample = best.items[0];
const sampleFields = sample && typeof sample === 'object' ? flattenFields(sample, '', 2) : [];
const detectedFields = {};
for (const [role, aliases] of Object.entries(FIELD_ROLES)) {
for (const fieldName of sampleFieldNames) {
const basename = fieldName.split('.').pop()?.toLowerCase() ?? '';
if (aliases.includes(basename)) {
detectedFields[role] = fieldName;
for (const f of sampleFields) {
if (aliases.includes(f.split('.').pop()?.toLowerCase() ?? '')) {
detectedFields[role] = f;
break;

@@ -228,13 +152,5 @@ }

}
return {
itemPath: best.path || null,
itemCount: best.items.length,
detectedFields,
sampleFieldNames,
};
return { itemPath: best.path || null, itemCount: best.items.length, detectedFields, sampleFields };
}
/**
* Flatten nested object field names for analysis.
*/
function flattenFieldNames(obj, prefix, maxDepth) {
function flattenFields(obj, prefix, maxDepth) {
if (maxDepth <= 0 || !obj || typeof obj !== 'object')

@@ -244,80 +160,38 @@ return [];

for (const key of Object.keys(obj)) {
const fullKey = prefix ? `${prefix}.${key}` : key;
names.push(fullKey);
if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) {
names.push(...flattenFieldNames(obj[key], fullKey, maxDepth - 1));
}
const full = prefix ? `${prefix}.${key}` : key;
names.push(full);
if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key]))
names.push(...flattenFields(obj[key], full, maxDepth - 1));
}
return names;
}
/**
* Analyze a list of network entries into structured endpoints.
*/
function analyzeEndpoints(entries, siteHost) {
const seen = new Map();
for (const entry of entries) {
if (!entry.url)
continue;
// Skip static resources
const ct = entry.contentType.toLowerCase();
if (IGNORED_CONTENT_TYPES.has(ct.split(';')[0]?.trim() ?? '') ||
ct.includes('image/') || ct.includes('font/') || ct.includes('css') ||
ct.includes('javascript') || ct.includes('wasm'))
continue;
// Skip non-JSON and failed responses
if (entry.status && entry.status >= 400)
continue;
const pattern = urlToPattern(entry.url);
const queryParams = extractQueryParams(entry.url);
const paramNames = Object.keys(queryParams).filter(k => !VOLATILE_PARAMS.has(k));
const key = `${entry.method}:${pattern}`;
if (seen.has(key))
continue;
const endpoint = {
pattern,
method: entry.method,
url: entry.url,
status: entry.status,
contentType: ct,
queryParams: paramNames,
hasSearchParam: paramNames.some(p => SEARCH_PARAMS.has(p)),
hasPaginationParam: paramNames.some(p => PAGINATION_PARAMS.has(p)),
hasLimitParam: paramNames.some(p => LIMIT_PARAMS.has(p)),
authIndicators: detectAuthIndicators(entry.requestHeaders),
responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null,
};
seen.set(key, endpoint);
function scoreEndpoint(ep) {
let s = 0;
if (ep.contentType.includes('json'))
s += 10;
if (ep.responseAnalysis) {
s += 5;
s += Math.min(ep.responseAnalysis.itemCount, 10);
s += Object.keys(ep.responseAnalysis.detectedFields).length * 2;
}
return [...seen.values()];
if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/'))
s += 3;
if (ep.hasSearchParam)
s += 3;
if (ep.hasPaginationParam)
s += 2;
if (ep.hasLimitParam)
s += 2;
if (ep.status === 200)
s += 2;
return s;
}
/**
* Infer what strategy to use based on endpoint analysis.
*/
function inferStrategy(endpoint) {
if (endpoint.authIndicators.includes('signature'))
return 'intercept';
if (endpoint.authIndicators.includes('transaction'))
return 'header';
if (endpoint.authIndicators.includes('bearer') || endpoint.authIndicators.includes('csrf'))
return 'header';
// Check if the URL is a public API (no auth indicators)
if (endpoint.authIndicators.length === 0) {
// If it's the same domain, likely cookie auth
return 'cookie';
}
return 'cookie';
}
/**
* Infer the capability name from an endpoint pattern.
*/
function inferCapabilityName(endpoint, goal) {
function inferCapabilityName(url, goal) {
if (goal)
return goal;
const u = endpoint.url.toLowerCase();
const p = endpoint.pattern.toLowerCase();
// Match common patterns
if (endpoint.hasSearchParam)
return 'search';
const u = url.toLowerCase();
if (u.includes('hot') || u.includes('popular') || u.includes('ranking') || u.includes('trending'))
return 'hot';
if (u.includes('search'))
return 'search';
if (u.includes('feed') || u.includes('timeline') || u.includes('dynamic'))

@@ -329,16 +203,10 @@ return 'feed';

return 'history';
if (u.includes('profile') || u.includes('userinfo') || u.includes('/me') || u.includes('myinfo'))
if (u.includes('profile') || u.includes('userinfo') || u.includes('/me'))
return 'me';
if (u.includes('video') || u.includes('article') || u.includes('detail') || u.includes('view'))
return 'detail';
if (u.includes('favorite') || u.includes('collect') || u.includes('bookmark'))
return 'favorite';
if (u.includes('notification') || u.includes('notice'))
return 'notifications';
// Fallback: try to extract from path
try {
const pathname = new URL(endpoint.url).pathname;
const segments = pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i));
if (segments.length)
return segments[segments.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase();
const segs = new URL(url).pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i));
if (segs.length)
return segs[segs.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase();
}

@@ -348,176 +216,159 @@ catch { }

}
/**
* Build recommended columns from response analysis.
*/
function buildRecommendedColumns(analysis) {
if (!analysis)
return ['title', 'url'];
const cols = [];
// Prioritize: title → url → author → score → time
const priority = ['title', 'url', 'author', 'score', 'time'];
for (const role of priority) {
if (analysis.detectedFields[role])
cols.push(role);
}
return cols.length ? cols : ['title', 'url'];
function inferStrategy(authIndicators) {
if (authIndicators.includes('signature'))
return 'intercept';
if (authIndicators.includes('bearer') || authIndicators.includes('csrf'))
return 'header';
return 'cookie';
}
/**
* Build recommended args from endpoint query params.
*/
function buildRecommendedArgs(endpoint) {
const args = [];
if (endpoint.hasSearchParam) {
const paramName = endpoint.queryParams.find(p => SEARCH_PARAMS.has(p)) ?? 'keyword';
args.push({ name: 'keyword', type: 'str', required: true });
}
// Always add limit
args.push({ name: 'limit', type: 'int', required: false, default: 20 });
if (endpoint.hasPaginationParam) {
args.push({ name: 'page', type: 'int', required: false, default: 1 });
}
return args;
}
/**
* Score an endpoint's interest level for capability generation.
* Higher score = more likely to be a useful API endpoint.
*/
function scoreEndpoint(ep) {
let score = 0;
// JSON content type is strongly preferred
if (ep.contentType.includes('json'))
score += 10;
// Has response analysis with items
if (ep.responseAnalysis) {
score += 5;
score += Math.min(ep.responseAnalysis.itemCount, 10);
score += Object.keys(ep.responseAnalysis.detectedFields).length * 2;
}
// API-like path patterns
if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/'))
score += 3;
// Has search/pagination params
if (ep.hasSearchParam)
score += 3;
if (ep.hasPaginationParam)
score += 2;
if (ep.hasLimitParam)
score += 2;
// 200 OK
if (ep.status === 200)
score += 2;
return score;
}
// ── Framework detection ────────────────────────────────────────────────────
const FRAMEWORK_DETECT_JS = `
(() => {
const result = {};
try {
const app = document.querySelector('#app');
result.vue3 = !!(app && app.__vue_app__);
result.vue2 = !!(app && app.__vue__);
result.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]');
result.nextjs = !!window.__NEXT_DATA__;
result.nuxt = !!window.__NUXT__;
if (result.vue3 && app.__vue_app__) {
() => {
const r = {};
try {
const app = document.querySelector('#app');
r.vue3 = !!(app && app.__vue_app__);
r.vue2 = !!(app && app.__vue__);
r.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]');
r.nextjs = !!window.__NEXT_DATA__;
r.nuxt = !!window.__NUXT__;
if (r.vue3 && app.__vue_app__) { const gp = app.__vue_app__.config?.globalProperties; r.pinia = !!(gp && gp.$pinia); r.vuex = !!(gp && gp.$store); }
} catch {}
return r;
}
`;
// ── Store discovery ────────────────────────────────────────────────────────
const STORE_DISCOVER_JS = `
() => {
const stores = [];
try {
const app = document.querySelector('#app');
if (!app?.__vue_app__) return stores;
const gp = app.__vue_app__.config?.globalProperties;
result.pinia = !!(gp && gp.$pinia);
result.vuex = !!(gp && gp.$store);
}
} catch {}
return JSON.stringify(result);
})()
// Pinia stores
const pinia = gp?.$pinia;
if (pinia?._s) {
pinia._s.forEach((store, id) => {
const actions = [];
const stateKeys = [];
for (const k in store) {
try {
if (k.startsWith('$') || k.startsWith('_')) continue;
if (typeof store[k] === 'function') actions.push(k);
else stateKeys.push(k);
} catch {}
}
stores.push({ type: 'pinia', id, actions: actions.slice(0, 20), stateKeys: stateKeys.slice(0, 15) });
});
}
// Vuex store modules
const vuex = gp?.$store;
if (vuex?._modules?.root?._children) {
const children = vuex._modules.root._children;
for (const [modName, mod] of Object.entries(children)) {
const actions = Object.keys(mod._rawModule?.actions ?? {}).slice(0, 20);
const stateKeys = Object.keys(mod.state ?? {}).slice(0, 15);
stores.push({ type: 'vuex', id: modName, actions, stateKeys });
}
}
} catch {}
return stores;
}
`;
// ── Main explore function ──────────────────────────────────────────────────
export async function exploreUrl(url, opts = {}) {
const site = opts.site ?? detectSiteName(url);
const outDir = opts.outDir ?? path.join('.opencli', 'explore', site);
fs.mkdirSync(outDir, { recursive: true });
const result = await browserSession(opts.BrowserFactory, async (page) => {
export async function exploreUrl(url, opts) {
const waitSeconds = opts.waitSeconds ?? 3.0;
const exploreTimeout = Math.max(DEFAULT_BROWSER_EXPLORE_TIMEOUT, 45.0 + waitSeconds * 8.0);
return browserSession(opts.BrowserFactory, async (page) => {
return runWithTimeout((async () => {
// Step 1: Navigate
await page.goto(url);
await page.wait(opts.waitSeconds ?? 3);
// Step 2: Auto-scroll to trigger lazy loading
await page.wait(waitSeconds);
// Step 2: Auto-scroll to trigger lazy loading (use keyboard since page.scroll may not exist)
for (let i = 0; i < 3; i++) {
await page.scroll('down');
try {
await page.pressKey('End');
}
catch { }
await page.wait(1);
}
// Step 3: Capture network traffic
// Step 3: Read page metadata
const metadata = await readPageMetadata(page);
// Step 4: Capture network traffic
const rawNetwork = await page.networkRequests(false);
const networkEntries = parseNetworkOutput(rawNetwork);
// Step 4: For JSON endpoints, try to fetch response body in-browser
const networkEntries = parseNetworkRequests(rawNetwork);
// Step 5: For JSON endpoints, re-fetch response body in-browser
const jsonEndpoints = networkEntries.filter(e => e.contentType.includes('json') && e.method === 'GET' && e.status === 200);
for (const ep of jsonEndpoints.slice(0, 10)) {
// Only fetch body for promising-looking API endpoints
if (ep.url.includes('/api/') || ep.url.includes('/x/') || ep.url.includes('/web/') ||
ep.contentType.includes('json')) {
try {
const bodyResult = await page.evaluate(`
async () => {
try {
const resp = await fetch(${JSON.stringify(ep.url)}, { credentials: 'include' });
if (!resp.ok) return null;
const data = await resp.json();
return JSON.stringify(data).slice(0, 10000);
} catch { return null; }
}
`);
if (bodyResult && typeof bodyResult === 'string') {
try {
ep.responseBody = JSON.parse(bodyResult);
}
catch { }
const body = await page.evaluate(`async () => { try { const r = await fetch(${JSON.stringify(ep.url)}, {credentials:'include'}); if (!r.ok) return null; const d = await r.json(); return JSON.stringify(d).slice(0,10000); } catch { return null; } }`);
if (body && typeof body === 'string') {
try {
ep.responseBody = JSON.parse(body);
}
else if (bodyResult && typeof bodyResult === 'object') {
ep.responseBody = bodyResult;
}
catch { }
}
catch { }
else if (body && typeof body === 'object')
ep.responseBody = body;
}
catch { }
}
// Step 5: Detect frontend framework
// Step 6: Detect framework
let framework = {};
try {
const fwResult = await page.evaluate(FRAMEWORK_DETECT_JS);
if (typeof fwResult === 'string')
framework = JSON.parse(fwResult);
else if (typeof fwResult === 'object')
framework = fwResult;
const fw = await page.evaluate(FRAMEWORK_DETECT_JS);
if (fw && typeof fw === 'object')
framework = fw;
}
catch { }
// Step 6: Get page metadata
let title = '', finalUrl = '';
try {
const meta = await page.evaluate(`
() => JSON.stringify({ url: window.location.href, title: document.title || '' })
`);
if (typeof meta === 'string') {
const parsed = JSON.parse(meta);
title = parsed.title;
finalUrl = parsed.url;
// Step 6.5: Discover stores (Pinia / Vuex)
let stores = [];
if (framework.pinia || framework.vuex) {
try {
const raw = await page.evaluate(STORE_DISCOVER_JS);
if (Array.isArray(raw))
stores = raw;
}
else if (typeof meta === 'object') {
title = meta.title;
finalUrl = meta.url;
}
catch { }
}
catch { }
// Step 7: Analyze endpoints
let siteHost = '';
try {
siteHost = new URL(url).hostname;
const seen = new Map();
for (const entry of networkEntries) {
if (!entry.url)
continue;
const ct = entry.contentType.toLowerCase();
if (ct.includes('image/') || ct.includes('font/') || ct.includes('css') || ct.includes('javascript') || ct.includes('wasm'))
continue;
if (entry.status && entry.status >= 400)
continue;
const pattern = urlToPattern(entry.url);
const key = `${entry.method}:${pattern}`;
if (seen.has(key))
continue;
const qp = [];
try {
new URL(entry.url).searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k))
qp.push(k); });
}
catch { }
const ep = {
pattern, method: entry.method, url: entry.url, status: entry.status, contentType: ct,
queryParams: qp, hasSearchParam: qp.some(p => SEARCH_PARAMS.has(p)),
hasPaginationParam: qp.some(p => PAGINATION_PARAMS.has(p)),
hasLimitParam: qp.some(p => LIMIT_PARAMS.has(p)),
authIndicators: detectAuthIndicators(entry.requestHeaders),
responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null,
score: 0,
};
ep.score = scoreEndpoint(ep);
seen.set(key, ep);
}
catch { }
const analyzedEndpoints = analyzeEndpoints(networkEntries, siteHost);
// Step 8: Score and rank endpoints
const scoredEndpoints = analyzedEndpoints
.map(ep => ({ ...ep, score: scoreEndpoint(ep) }))
.filter(ep => ep.score >= 5)
.sort((a, b) => b.score - a.score);
// Step 9: Infer capabilities from top endpoints
const analyzedEndpoints = [...seen.values()].filter(ep => ep.score >= 5).sort((a, b) => b.score - a.score);
// Step 8: Infer capabilities
const capabilities = [];
const usedNames = new Set();
for (const ep of scoredEndpoints.slice(0, 8)) {
let capName = inferCapabilityName(ep, opts.goal);
// Deduplicate names
for (const ep of analyzedEndpoints.slice(0, 8)) {
let capName = inferCapabilityName(ep.url, opts.goal);
if (usedNames.has(capName)) {

@@ -528,76 +379,79 @@ const suffix = ep.pattern.split('/').filter(s => s && !s.startsWith('{') && !s.includes('.')).pop();

usedNames.add(capName);
const cols = [];
if (ep.responseAnalysis) {
for (const role of ['title', 'url', 'author', 'score', 'time']) {
if (ep.responseAnalysis.detectedFields[role])
cols.push(role);
}
}
const args = [];
if (ep.hasSearchParam)
args.push({ name: 'keyword', type: 'str', required: true });
args.push({ name: 'limit', type: 'int', required: false, default: 20 });
if (ep.hasPaginationParam)
args.push({ name: 'page', type: 'int', required: false, default: 1 });
// Link store actions to capabilities when store-action strategy is recommended
const epStrategy = inferStrategy(ep.authIndicators);
let storeHint;
if ((epStrategy === 'intercept' || ep.authIndicators.includes('signature')) && stores.length > 0) {
// Try to find a store/action that matches this endpoint's purpose
for (const s of stores) {
const matchingAction = s.actions.find(a => capName.split('_').some(part => a.toLowerCase().includes(part)) ||
a.toLowerCase().includes('fetch') || a.toLowerCase().includes('get'));
if (matchingAction) {
storeHint = { store: s.id, action: matchingAction };
break;
}
}
}
capabilities.push({
name: capName,
description: `${site} ${capName}`,
strategy: inferStrategy(ep),
confidence: Math.min(ep.score / 20, 1.0),
endpoint: ep.pattern,
name: capName, description: `${opts.site ?? detectSiteName(url)} ${capName}`,
strategy: storeHint ? 'store-action' : epStrategy,
confidence: Math.min(ep.score / 20, 1.0), endpoint: ep.pattern,
itemPath: ep.responseAnalysis?.itemPath ?? null,
recommendedColumns: buildRecommendedColumns(ep.responseAnalysis),
recommendedArgs: buildRecommendedArgs(ep),
recommendedColumns: cols.length ? cols : ['title', 'url'],
recommendedArgs: args,
...(storeHint ? { storeHint } : {}),
});
}
// Step 10: Determine auth strategy
const allAuthIndicators = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators));
let topStrategy = 'cookie';
if (allAuthIndicators.has('signature'))
topStrategy = 'intercept';
else if (allAuthIndicators.has('transaction') || allAuthIndicators.has('bearer'))
topStrategy = 'header';
else if (allAuthIndicators.size === 0 && scoredEndpoints.some(ep => ep.contentType.includes('json')))
topStrategy = 'public';
return {
site,
target_url: url,
final_url: finalUrl,
title,
framework,
top_strategy: topStrategy,
endpoint_count: analyzedEndpoints.length,
api_endpoint_count: scoredEndpoints.length,
capabilities,
endpoints: scoredEndpoints.map(ep => ({
pattern: ep.pattern,
method: ep.method,
url: ep.url,
status: ep.status,
contentType: ep.contentType,
score: ep.score,
queryParams: ep.queryParams,
itemPath: ep.responseAnalysis?.itemPath ?? null,
itemCount: ep.responseAnalysis?.itemCount ?? 0,
detectedFields: ep.responseAnalysis?.detectedFields ?? {},
authIndicators: ep.authIndicators,
})),
auth_indicators: [...allAuthIndicators],
// Step 9: Determine overall auth strategy
const allAuth = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators));
const topStrategy = allAuth.has('signature') ? 'intercept' : allAuth.has('bearer') || allAuth.has('csrf') ? 'header' : allAuth.size === 0 ? 'public' : 'cookie';
const siteName = opts.site ?? detectSiteName(metadata.url || url);
const targetDir = opts.outDir ?? path.join('.opencli', 'explore', siteName);
fs.mkdirSync(targetDir, { recursive: true });
const result = {
site: siteName, target_url: url, final_url: metadata.url, title: metadata.title,
framework, stores, top_strategy: topStrategy,
endpoint_count: analyzedEndpoints.length + [...seen.values()].filter(ep => ep.score < 5).length,
api_endpoint_count: analyzedEndpoints.length,
capabilities, auth_indicators: [...allAuth],
};
})(), { timeout: DEFAULT_BROWSER_EXPLORE_TIMEOUT, label: 'explore' });
// Write artifacts
fs.writeFileSync(path.join(targetDir, 'manifest.json'), JSON.stringify({
site: siteName, target_url: url, final_url: metadata.url, title: metadata.title,
framework, stores: stores.map(s => ({ type: s.type, id: s.id, actions: s.actions })),
top_strategy: topStrategy, explored_at: new Date().toISOString(),
}, null, 2));
fs.writeFileSync(path.join(targetDir, 'endpoints.json'), JSON.stringify(analyzedEndpoints.map(ep => ({
pattern: ep.pattern, method: ep.method, url: ep.url, status: ep.status,
contentType: ep.contentType, score: ep.score, queryParams: ep.queryParams,
itemPath: ep.responseAnalysis?.itemPath ?? null, itemCount: ep.responseAnalysis?.itemCount ?? 0,
detectedFields: ep.responseAnalysis?.detectedFields ?? {}, authIndicators: ep.authIndicators,
})), null, 2));
fs.writeFileSync(path.join(targetDir, 'capabilities.json'), JSON.stringify(capabilities, null, 2));
fs.writeFileSync(path.join(targetDir, 'auth.json'), JSON.stringify({
top_strategy: topStrategy, indicators: [...allAuth], framework,
}, null, 2));
if (stores.length > 0) {
fs.writeFileSync(path.join(targetDir, 'stores.json'), JSON.stringify(stores, null, 2));
}
return { ...result, out_dir: targetDir };
})(), { timeout: exploreTimeout, label: `Explore ${url}` });
});
// Write artifacts
const manifest = {
site: result.site,
target_url: result.target_url,
final_url: result.final_url,
title: result.title,
framework: result.framework,
top_strategy: result.top_strategy,
explored_at: new Date().toISOString(),
};
fs.writeFileSync(path.join(outDir, 'manifest.json'), JSON.stringify(manifest, null, 2));
fs.writeFileSync(path.join(outDir, 'endpoints.json'), JSON.stringify(result.endpoints ?? [], null, 2));
fs.writeFileSync(path.join(outDir, 'capabilities.json'), JSON.stringify(result.capabilities ?? [], null, 2));
fs.writeFileSync(path.join(outDir, 'auth.json'), JSON.stringify({
top_strategy: result.top_strategy,
indicators: result.auth_indicators ?? [],
framework: result.framework ?? {},
}, null, 2));
return { ...result, out_dir: outDir };
}
export function renderExploreSummary(result) {
const lines = [
'opencli explore: OK',
`Site: ${result.site}`,
`URL: ${result.target_url}`,
`Title: ${result.title || '(none)'}`,
`Strategy: ${result.top_strategy}`,
'opencli probe: OK', `Site: ${result.site}`, `URL: ${result.target_url}`,
`Title: ${result.title || '(none)'}`, `Strategy: ${result.top_strategy}`,
`Endpoints: ${result.endpoint_count} total, ${result.api_endpoint_count} API`,

@@ -607,3 +461,4 @@ `Capabilities: ${result.capabilities?.length ?? 0}`,

for (const cap of (result.capabilities ?? []).slice(0, 5)) {
lines.push(` • ${cap.name} (${cap.strategy}, confidence: ${(cap.confidence * 100).toFixed(0)}%)`);
const storeInfo = cap.storeHint ? ` → ${cap.storeHint.store}.${cap.storeHint.action}()` : '';
lines.push(` • ${cap.name} (${cap.strategy}, ${(cap.confidence * 100).toFixed(0)}%)${storeInfo}`);
}

@@ -614,4 +469,20 @@ const fw = result.framework ?? {};

lines.push(`Framework: ${fwNames.join(', ')}`);
const stores = result.stores ?? [];
if (stores.length) {
lines.push(`Stores: ${stores.length}`);
for (const s of stores.slice(0, 5)) {
lines.push(` • ${s.type}/${s.id}: ${s.actions.slice(0, 5).join(', ')}${s.actions.length > 5 ? '...' : ''}`);
}
}
lines.push(`Output: ${result.out_dir}`);
return lines.join('\n');
}
async function readPageMetadata(page) {
try {
const result = await page.evaluate(`() => ({ url: window.location.href, title: document.title || '' })`);
if (result && typeof result === 'object')
return { url: String(result.url ?? ''), title: String(result.title ?? '') };
}
catch { }
return { url: '', title: '' };
}

@@ -58,2 +58,4 @@ #!/usr/bin/env node

.action(async (url, opts) => { const { exploreUrl, renderExploreSummary } = await import('./explore.js'); console.log(renderExploreSummary(await exploreUrl(url, { BrowserFactory: PlaywrightMCP, site: opts.site, goal: opts.goal, waitSeconds: parseFloat(opts.wait) }))); });
program.command('probe').description('Probe a website: discover APIs, stores, and recommend strategies').argument('<url>').option('--site <name>').option('--goal <text>').option('--wait <s>', '', '3')
.action(async (url, opts) => { const { exploreUrl, renderExploreSummary } = await import('./explore.js'); console.log(renderExploreSummary(await exploreUrl(url, { BrowserFactory: PlaywrightMCP, site: opts.site, goal: opts.goal, waitSeconds: parseFloat(opts.wait) }))); });
program.command('synthesize').description('Synthesize CLIs from explore').argument('<target>').option('--top <n>', '', '3')

@@ -63,2 +65,17 @@ .action(async (target, opts) => { const { synthesizeFromExplore, renderSynthesizeSummary } = await import('./synthesize.js'); console.log(renderSynthesizeSummary(synthesizeFromExplore(target, { top: parseInt(opts.top) }))); });

.action(async (url, opts) => { const { generateCliFromUrl, renderGenerateSummary } = await import('./generate.js'); const r = await generateCliFromUrl({ url, BrowserFactory: PlaywrightMCP, builtinClis: BUILTIN_CLIS, userClis: USER_CLIS, goal: opts.goal, site: opts.site }); console.log(renderGenerateSummary(r)); process.exitCode = r.ok ? 0 : 1; });
program.command('cascade').description('Strategy cascade: find simplest working strategy').argument('<url>').option('--site <name>')
.action(async (url, opts) => {
const { cascadeProbe, renderCascadeResult } = await import('./cascade.js');
const result = await browserSession(PlaywrightMCP, async (page) => {
// Navigate to the site first for cookie context
try {
const siteUrl = new URL(url);
await page.goto(`${siteUrl.protocol}//${siteUrl.host}`);
await page.wait(2);
}
catch { }
return cascadeProbe(page, url);
});
console.log(renderCascadeResult(result));
});
// ── Dynamic site commands ──────────────────────────────────────────────────

@@ -65,0 +82,0 @@ const registry = getRegistry();

@@ -19,2 +19,8 @@ /**

data = await executeStep(page, op, params, data, args);
// Detect error objects returned by steps (e.g. tap store not found)
if (data && typeof data === 'object' && !Array.isArray(data) && data.error) {
process.stderr.write(` ${chalk.yellow('⚠')} ${chalk.yellow(op)}: ${data.error}\n`);
if (data.hint)
process.stderr.write(` ${chalk.dim('💡')} ${chalk.dim(data.hint)}\n`);
}
if (debug)

@@ -140,3 +146,14 @@ debugStepResult(op, data);

const js = String(render(params, { args, data }));
return page.evaluate(normalizeEvaluateSource(js));
let result = await page.evaluate(normalizeEvaluateSource(js));
// MCP may return JSON as a string — auto-parse it
if (typeof result === 'string') {
const trimmed = result.trim();
if ((trimmed.startsWith('[') && trimmed.endsWith(']')) || (trimmed.startsWith('{') && trimmed.endsWith('}'))) {
try {
result = JSON.parse(trimmed);
}
catch { }
}
}
return result;
}

@@ -219,3 +236,222 @@ case 'snapshot': {

}
case 'intercept': return data;
case 'intercept': {
// Declarative XHR interception step
// Usage:
// intercept:
// trigger: "navigate:https://..." | "evaluate:store.note.fetch()" | "click:ref"
// capture: "api/pattern" # URL substring to match
// timeout: 5 # seconds to wait for matching request
// select: "data.items" # optional: extract sub-path from response
const cfg = typeof params === 'object' ? params : {};
const trigger = cfg.trigger ?? '';
const capturePattern = cfg.capture ?? '';
const timeout = cfg.timeout ?? 8;
const selectPath = cfg.select ?? null;
if (!capturePattern)
return data;
// Step 1: Execute the trigger action
if (trigger.startsWith('navigate:')) {
const url = render(trigger.slice('navigate:'.length), { args, data });
await page.goto(String(url));
}
else if (trigger.startsWith('evaluate:')) {
const js = trigger.slice('evaluate:'.length);
await page.evaluate(normalizeEvaluateSource(render(js, { args, data })));
}
else if (trigger.startsWith('click:')) {
const ref = render(trigger.slice('click:'.length), { args, data });
await page.click(String(ref).replace(/^@/, ''));
}
else if (trigger === 'scroll') {
await page.scroll('down');
}
// Step 2: Wait a bit for network requests to fire
await page.wait(Math.min(timeout, 3));
// Step 3: Get network requests and find matching ones
const rawNetwork = await page.networkRequests(false);
const matchingResponses = [];
if (typeof rawNetwork === 'string') {
// Parse the network output to find matching URLs
const lines = rawNetwork.split('\n');
for (const line of lines) {
const match = line.match(/\[?(GET|POST)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i);
if (match) {
const [, method, url, status] = match;
if (url.includes(capturePattern) && status === '200') {
// Re-fetch the matching URL to get the response body
try {
const body = await page.evaluate(`
async () => {
try {
const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' });
if (!resp.ok) return null;
return await resp.json();
} catch { return null; }
}
`);
if (body)
matchingResponses.push(body);
}
catch { }
}
}
}
}
// Step 4: Select from response if specified
let result = matchingResponses.length === 1 ? matchingResponses[0] :
matchingResponses.length > 1 ? matchingResponses : data;
if (selectPath && result) {
let current = result;
for (const part of String(selectPath).split('.')) {
if (current && typeof current === 'object' && !Array.isArray(current)) {
current = current[part];
}
else
break;
}
result = current ?? result;
}
return result;
}
case 'tap': {
// ── Declarative Store Action Bridge ──────────────────────────────────
// Usage:
// tap:
// store: feed # Pinia/Vuex store name
// action: fetchFeeds # Store action to call
// args: [] # Optional args to pass to action
// capture: homefeed # URL pattern to capture response
// timeout: 5 # Seconds to wait for network (default: 5)
// select: data.items # Optional: extract sub-path from response
// framework: pinia # Optional: pinia | vuex (auto-detected if omitted)
//
// Generates a self-contained IIFE that:
// 1. Injects fetch + XHR dual interception proxy
// 2. Finds the Pinia/Vuex store and calls the action
// 3. Captures the response matching the URL pattern
// 4. Auto-cleans up interception in finally block
// 5. Returns the captured data (optionally sub-selected)
const cfg = typeof params === 'object' ? params : {};
const storeName = String(render(cfg.store ?? '', { args, data }));
const actionName = String(render(cfg.action ?? '', { args, data }));
const capturePattern = String(render(cfg.capture ?? '', { args, data }));
const timeout = cfg.timeout ?? 5;
const selectPath = cfg.select ? String(render(cfg.select, { args, data })) : null;
const framework = cfg.framework ?? null; // auto-detect if null
const actionArgs = cfg.args ?? [];
if (!storeName || !actionName)
throw new Error('tap: store and action are required');
// Build select chain for the captured response
const selectChain = selectPath
? selectPath.split('.').map((p) => `?.[${JSON.stringify(p)}]`).join('')
: '';
// Serialize action arguments
const actionArgsRendered = actionArgs.map((a) => {
const rendered = render(a, { args, data });
return JSON.stringify(rendered);
});
const actionCall = actionArgsRendered.length
? `store[${JSON.stringify(actionName)}](${actionArgsRendered.join(', ')})`
: `store[${JSON.stringify(actionName)}]()`;
const js = `
async () => {
// ── 1. Setup capture proxy (fetch + XHR dual interception) ──
let captured = null;
const capturePattern = ${JSON.stringify(capturePattern)};
// Intercept fetch API
const origFetch = window.fetch;
window.fetch = async function(...fetchArgs) {
const resp = await origFetch.apply(this, fetchArgs);
try {
const url = typeof fetchArgs[0] === 'string' ? fetchArgs[0]
: fetchArgs[0] instanceof Request ? fetchArgs[0].url : String(fetchArgs[0]);
if (capturePattern && url.includes(capturePattern) && !captured) {
try { captured = await resp.clone().json(); } catch {}
}
} catch {}
return resp;
};
// Intercept XMLHttpRequest
const origXhrOpen = XMLHttpRequest.prototype.open;
const origXhrSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.open = function(method, url) {
this.__tapUrl = String(url);
return origXhrOpen.apply(this, arguments);
};
XMLHttpRequest.prototype.send = function(body) {
if (capturePattern && this.__tapUrl?.includes(capturePattern)) {
const xhr = this;
const origHandler = xhr.onreadystatechange;
xhr.onreadystatechange = function() {
if (xhr.readyState === 4 && !captured) {
try { captured = JSON.parse(xhr.responseText); } catch {}
}
if (origHandler) origHandler.apply(this, arguments);
};
// Also handle onload
const origOnload = xhr.onload;
xhr.onload = function() {
if (!captured) { try { captured = JSON.parse(xhr.responseText); } catch {} }
if (origOnload) origOnload.apply(this, arguments);
};
}
return origXhrSend.apply(this, arguments);
};
try {
// ── 2. Find store ──
let store = null;
const storeName = ${JSON.stringify(storeName)};
const fw = ${JSON.stringify(framework)};
// Auto-detect framework if not specified
const app = document.querySelector('#app');
if (!fw || fw === 'pinia') {
// Try Pinia (Vue 3)
try {
const pinia = app?.__vue_app__?.config?.globalProperties?.$pinia;
if (pinia?._s) store = pinia._s.get(storeName);
} catch {}
}
if (!store && (!fw || fw === 'vuex')) {
// Try Vuex (Vue 2/3)
try {
const vuexStore = app?.__vue_app__?.config?.globalProperties?.$store
?? app?.__vue__?.$store;
if (vuexStore) {
// Vuex doesn't have named stores like Pinia, dispatch action
store = { [${JSON.stringify(actionName)}]: (...a) => vuexStore.dispatch(storeName + '/' + ${JSON.stringify(actionName)}, ...a) };
}
} catch {}
}
if (!store) return { error: 'Store not found: ' + storeName, hint: 'Page may not be fully loaded or store name may be incorrect' };
if (typeof store[${JSON.stringify(actionName)}] !== 'function') {
return { error: 'Action not found: ' + ${JSON.stringify(actionName)} + ' on store ' + storeName,
hint: 'Available: ' + Object.keys(store).filter(k => typeof store[k] === 'function' && !k.startsWith('$') && !k.startsWith('_')).join(', ') };
}
// ── 3. Call store action ──
await ${actionCall};
// ── 4. Wait for network response ──
const deadline = Date.now() + ${timeout} * 1000;
while (!captured && Date.now() < deadline) {
await new Promise(r => setTimeout(r, 200));
}
} finally {
// ── 5. Always restore originals ──
window.fetch = origFetch;
XMLHttpRequest.prototype.open = origXhrOpen;
XMLHttpRequest.prototype.send = origXhrSend;
}
if (!captured) return { error: 'No matching response captured for pattern: ' + capturePattern };
return captured${selectChain} ?? captured;
}
`;
return page.evaluate(js);
}
default: return data;

@@ -222,0 +458,0 @@ }

/**
* Synthesize: turn explore capabilities into ready-to-use CLI definitions.
*
* Takes the structured capabilities from Deep Explore and generates
* YAML pipeline files that can be directly registered as CLI commands.
*
* This is the bridge between discovery (explore) and usability (CLI).
* Synthesize candidate CLIs from explore artifacts.
* Generates evaluate-based YAML pipelines (matching hand-written adapter patterns).
*/
export declare function synthesizeFromExplore(target: string, opts?: any): any;
export declare function renderSynthesizeSummary(r: any): string;
export declare function synthesizeFromExplore(target: string, opts?: {
outDir?: string;
top?: number;
}): Record<string, any>;
export declare function renderSynthesizeSummary(result: Record<string, any>): string;
export declare function resolveExploreDir(target: string): string;
export declare function loadExploreBundle(exploreDir: string): Record<string, any>;
/** Backward-compatible export for scaffold.ts */
export declare function buildCandidate(site: string, targetUrl: string, cap: any, endpoint: any): any;
/**
* Synthesize: turn explore capabilities into ready-to-use CLI definitions.
*
* Takes the structured capabilities from Deep Explore and generates
* YAML pipeline files that can be directly registered as CLI commands.
*
* This is the bridge between discovery (explore) and usability (CLI).
* Synthesize candidate CLIs from explore artifacts.
* Generates evaluate-based YAML pipelines (matching hand-written adapter patterns).
*/

@@ -12,64 +8,65 @@ import * as fs from 'node:fs';

import yaml from 'js-yaml';
/** Volatile params to strip from generated URLs */
const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']);
const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']);
const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']);
const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']);
export function synthesizeFromExplore(target, opts = {}) {
const exploreDir = fs.existsSync(target) ? target : path.join('.opencli', 'explore', target);
if (!fs.existsSync(exploreDir))
throw new Error(`Explore dir not found: ${target}`);
const manifest = JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8'));
const capabilities = JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8'));
const endpoints = JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8'));
const auth = JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8'));
const exploreDir = resolveExploreDir(target);
const bundle = loadExploreBundle(exploreDir);
const targetDir = opts.outDir ?? path.join(exploreDir, 'candidates');
fs.mkdirSync(targetDir, { recursive: true });
const site = manifest.site;
const topN = opts.top ?? 5;
const site = bundle.manifest.site;
const capabilities = (bundle.capabilities ?? [])
.sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0))
.slice(0, opts.top ?? 3);
const candidates = [];
// Sort capabilities by confidence
const sortedCaps = [...capabilities]
.sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0))
.slice(0, topN);
for (const cap of sortedCaps) {
// Find the matching endpoint for more detail
const endpoint = endpoints.find((ep) => ep.pattern === cap.endpoint) ??
endpoints[0];
const candidate = buildCandidateYaml(site, manifest, cap, endpoint);
const fileName = `${cap.name}.yaml`;
const filePath = path.join(targetDir, fileName);
for (const cap of capabilities) {
const endpoint = chooseEndpoint(cap, bundle.endpoints);
if (!endpoint)
continue;
const candidate = buildCandidateYaml(site, bundle.manifest, cap, endpoint);
const filePath = path.join(targetDir, `${candidate.name}.yaml`);
fs.writeFileSync(filePath, yaml.dump(candidate.yaml, { sortKeys: false, lineWidth: 120 }));
candidates.push({
name: cap.name,
path: filePath,
strategy: cap.strategy,
endpoint: cap.endpoint,
confidence: cap.confidence,
columns: candidate.yaml.columns,
});
candidates.push({ name: candidate.name, path: filePath, strategy: cap.strategy, confidence: cap.confidence });
}
const index = {
site,
target_url: manifest.target_url,
generated_from: exploreDir,
candidate_count: candidates.length,
candidates,
};
const index = { site, target_url: bundle.manifest.target_url, generated_from: exploreDir, candidate_count: candidates.length, candidates };
fs.writeFileSync(path.join(targetDir, 'candidates.json'), JSON.stringify(index, null, 2));
return { site, explore_dir: exploreDir, out_dir: targetDir, candidate_count: candidates.length, candidates };
}
export function renderSynthesizeSummary(result) {
const lines = ['opencli synthesize: OK', `Site: ${result.site}`, `Source: ${result.explore_dir}`, `Candidates: ${result.candidate_count}`];
for (const c of result.candidates ?? [])
lines.push(` • ${c.name} (${c.strategy}, ${((c.confidence ?? 0) * 100).toFixed(0)}% confidence) → ${c.path}`);
return lines.join('\n');
}
export function resolveExploreDir(target) {
if (fs.existsSync(target))
return target;
const candidate = path.join('.opencli', 'explore', target);
if (fs.existsSync(candidate))
return candidate;
throw new Error(`Explore directory not found: ${target}`);
}
export function loadExploreBundle(exploreDir) {
return {
site,
explore_dir: exploreDir,
out_dir: targetDir,
candidate_count: candidates.length,
candidates,
manifest: JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8')),
endpoints: JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8')),
capabilities: JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8')),
auth: JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8')),
};
}
/** Volatile params to strip from generated URLs */
const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']);
const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']);
const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']);
const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']);
/**
* Build a clean templated URL from a raw API URL.
* - Strips volatile params (w_rid, wts, etc.)
* - Templates search, limit, and pagination params
* - Builds URL string manually to avoid URL encoding of ${{ }} expressions
*/
function buildTemplatedUrl(rawUrl, cap, endpoint) {
function chooseEndpoint(cap, endpoints) {
if (!endpoints.length)
return null;
// Match by endpoint pattern from capability
if (cap.endpoint) {
const match = endpoints.find((e) => e.pattern === cap.endpoint || e.url?.includes(cap.endpoint));
if (match)
return match;
}
return endpoints.sort((a, b) => (b.score ?? 0) - (a.score ?? 0))[0];
}
// ── URL templating ─────────────────────────────────────────────────────────
function buildTemplatedUrl(rawUrl, cap, _endpoint) {
try {

@@ -81,22 +78,14 @@ const u = new URL(rawUrl);

u.searchParams.forEach((v, k) => {
// Skip volatile params
if (VOLATILE_PARAMS.has(k))
return;
// Template known param types
if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) {
if (hasKeyword && SEARCH_PARAM_NAMES.has(k))
params.push([k, '${{ args.keyword }}']);
}
else if (LIMIT_PARAM_NAMES.has(k)) {
else if (LIMIT_PARAM_NAMES.has(k))
params.push([k, '${{ args.limit | default(20) }}']);
}
else if (PAGE_PARAM_NAMES.has(k)) {
else if (PAGE_PARAM_NAMES.has(k))
params.push([k, '${{ args.page | default(1) }}']);
}
else {
else
params.push([k, v]);
}
});
if (params.length === 0)
return base;
return base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&');
return params.length ? base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&') : base;
}

@@ -108,41 +97,86 @@ catch {

/**
* Build a YAML pipeline definition from a capability + endpoint.
* Build inline evaluate script for browser-based fetch+parse.
* Follows patterns from bilibili/hot.yaml and twitter/trending.yaml.
*/
function buildEvaluateScript(url, itemPath, endpoint) {
const pathChain = itemPath.split('.').map((p) => `?.${p}`).join('');
const detectedFields = endpoint?.detectedFields ?? {};
const hasFields = Object.keys(detectedFields).length > 0;
let mapCode = '';
if (hasFields) {
const mappings = Object.entries(detectedFields)
.map(([role, field]) => ` ${role}: item${String(field).split('.').map(p => `?.${p}`).join('')}`)
.join(',\n');
mapCode = `.map((item) => ({\n${mappings}\n }))`;
}
return [
'(async () => {',
` const res = await fetch('${url}', {`,
` credentials: 'include'`,
' });',
' const data = await res.json();',
` return (data${pathChain} || [])${mapCode};`,
'})()\n',
].join('\n');
}
// ── YAML pipeline generation ───────────────────────────────────────────────
function buildCandidateYaml(site, manifest, cap, endpoint) {
const needsBrowser = cap.strategy !== 'public';
const pipeline = [];
// Step 1: Navigate (if browser-based)
if (needsBrowser) {
const templatedUrl = buildTemplatedUrl(endpoint?.url ?? manifest.target_url, cap, endpoint);
let domain = '';
try {
domain = new URL(manifest.target_url).hostname;
}
catch { }
if (cap.strategy === 'store-action' && cap.storeHint) {
// Store Action: navigate + wait + tap (declarative, clean)
pipeline.push({ navigate: manifest.target_url });
pipeline.push({ wait: 3 });
const tapStep = {
store: cap.storeHint.store,
action: cap.storeHint.action,
timeout: 8,
};
// Infer capture pattern from endpoint URL
if (endpoint?.url) {
try {
const epUrl = new URL(endpoint.url);
const pathParts = epUrl.pathname.split('/').filter((p) => p);
// Use last meaningful path segment as capture pattern
const capturePart = pathParts.filter((p) => !p.match(/^v\d+$/)).pop();
if (capturePart)
tapStep.capture = capturePart;
}
catch { }
}
if (cap.itemPath)
tapStep.select = cap.itemPath;
pipeline.push({ tap: tapStep });
}
// Step 2: Fetch the API — build a clean URL with templates
const rawUrl = endpoint?.url ?? manifest.target_url;
const fetchStep = { url: buildTemplatedUrl(rawUrl, cap, endpoint) };
pipeline.push({ fetch: fetchStep });
// Step 3: Select the item path
if (cap.itemPath) {
pipeline.push({ select: cap.itemPath });
else if (needsBrowser) {
// Browser-based: navigate + evaluate (like bilibili/hot.yaml, twitter/trending.yaml)
pipeline.push({ navigate: manifest.target_url });
const itemPath = cap.itemPath ?? 'data.data.list';
pipeline.push({ evaluate: buildEvaluateScript(templatedUrl, itemPath, endpoint) });
}
// Step 4: Map fields to columns
else {
// Public API: direct fetch (like hackernews/top.yaml)
pipeline.push({ fetch: { url: templatedUrl } });
if (cap.itemPath)
pipeline.push({ select: cap.itemPath });
}
// Map fields
const mapStep = {};
const columns = cap.recommendedColumns ?? ['title', 'url'];
// Add a rank column if not doing search
if (!cap.recommendedArgs?.some((a) => a.name === 'keyword')) {
if (!cap.recommendedArgs?.some((a) => a.name === 'keyword'))
mapStep['rank'] = '${{ index + 1 }}';
}
// Build field mappings from the endpoint's detected fields
const detectedFields = endpoint?.detectedFields ?? {};
for (const col of columns) {
const fieldPath = detectedFields[col];
if (fieldPath) {
mapStep[col] = `\${{ item.${fieldPath} }}`;
}
else {
mapStep[col] = `\${{ item.${col} }}`;
}
mapStep[col] = fieldPath ? `\${{ item.${fieldPath} }}` : `\${{ item.${col} }}`;
}
pipeline.push({ map: mapStep });
// Step 5: Limit
pipeline.push({ limit: '${{ args.limit | default(20) }}' });
// Build args definition
// Args
const argsDef = {};

@@ -163,33 +197,23 @@ for (const arg of cap.recommendedArgs ?? []) {

}
// Ensure limit arg always exists
if (!argsDef['limit']) {
if (!argsDef['limit'])
argsDef['limit'] = { type: 'int', default: 20, description: 'Number of items to return' };
}
const allColumns = Object.keys(mapStep);
return {
name: cap.name,
yaml: {
site,
name: cap.name,
description: `${site} ${cap.name} (auto-generated)`,
domain: manifest.final_url ? new URL(manifest.final_url).hostname : undefined,
strategy: cap.strategy,
browser: needsBrowser,
args: argsDef,
pipeline,
columns: allColumns,
site, name: cap.name, description: `${cap.description || site + ' ' + cap.name} (auto-generated)`,
domain, strategy: cap.strategy, browser: needsBrowser,
args: argsDef, pipeline, columns: Object.keys(mapStep),
},
};
}
export function renderSynthesizeSummary(r) {
const lines = [
'opencli synthesize: OK',
`Site: ${r.site}`,
`Source: ${r.explore_dir}`,
`Candidates: ${r.candidate_count}`,
];
for (const c of r.candidates ?? []) {
lines.push(` • ${c.name} (${c.strategy}, ${(c.confidence * 100).toFixed(0)}% confidence) → ${c.path}`);
}
return lines.join('\n');
/** Backward-compatible export for scaffold.ts */
export function buildCandidate(site, targetUrl, cap, endpoint) {
// Map old-style field names to new ones
const normalizedCap = {
...cap,
recommendedArgs: cap.recommendedArgs ?? cap.recommended_args,
recommendedColumns: cap.recommendedColumns ?? cap.recommended_columns,
};
const manifest = { target_url: targetUrl, final_url: targetUrl };
return buildCandidateYaml(site, manifest, normalizedCap, endpoint);
}
{
"name": "@jackwener/opencli",
"version": "0.1.0",
"version": "0.1.1",
"publishConfig": {

@@ -15,3 +15,5 @@ "access": "public"

"dev": "tsx src/main.ts",
"build": "tsc",
"build": "tsc && npm run clean-yaml && npm run copy-yaml",
"clean-yaml": "find dist/clis -name '*.yaml' -o -name '*.yml' 2>/dev/null | xargs rm -f",
"copy-yaml": "find src/clis -name '*.yaml' -o -name '*.yml' | while read f; do d=\"dist/${f#src/}\"; mkdir -p \"$(dirname \"$d\")\"; cp \"$f\" \"$d\"; done",
"start": "node dist/main.js",

@@ -18,0 +20,0 @@ "typecheck": "tsc --noEmit",

+116
-38
# OpenCLI
> **Make any website your CLI.** 操控 Chrome 无风控风险,复用登录,CLI 化全部网站。
> **Make any website your CLI.**
> Zero risk · Reuse Chrome login · AI-powered discovery
OpenCLI 是一个 AI Native 的 CLI 工具,通过 Chrome 浏览器 + Playwright MCP Bridge 扩展,将任何网站变成命令行工具。
[中文文档](./README.zh-CN.md)
## ✨ 特性
[![npm](https://img.shields.io/npm/v/@jackwener/opencli)](https://www.npmjs.com/package/@jackwener/opencli)
- 🌐 **CLI 化全部网站** — 支持 Bilibili、知乎、GitHub、Twitter、V2EX、Hacker News 等
- 🔐 **零风控风险** — 复用 Chrome 已登录状态,无需存储密码
- 🤖 **AI Native** — AI agent 可直接探索网站并自动生成新命令
- 📝 **声明式 YAML** — 用 YAML 定义 pipeline,无需写代码
- 🔌 **TypeScript 扩展** — 复杂场景用 TS 编写适配器
OpenCLI turns any website into a command-line tool by bridging your Chrome browser through [Playwright MCP](https://github.com/nichochar/playwright-mcp). No passwords stored, no tokens leaked — it just rides your existing browser session.
## 🚀 快速开始
## ✨ Highlights
- 🌐 **25+ commands, 8 sites** — Bilibili, Zhihu, GitHub, Twitter/X, Reddit, V2EX, Xiaohongshu, Hacker News
- 🔐 **Account-safe** — Reuses Chrome's logged-in state; your credentials never leave the browser
- 🤖 **AI Agent ready** — `explore` discovers APIs, `synthesize` generates adapters, `cascade` finds auth strategies
- 📝 **Declarative YAML** — Most adapters are ~30 lines of YAML pipeline
- 🔌 **TypeScript escape hatch** — Complex adapters (XHR interception, GraphQL) in TS
## 🚀 Quick Start
### Install via npm (recommended)
```bash
# 安装依赖
cd ~/code/ai-native-cli && npm install
npm install -g @jackwener/opencli
```
# 列出所有命令
Then use directly:
```bash
opencli list # See all commands
opencli hackernews top --limit 5 # Public API, no browser
opencli bilibili hot --limit 5 # Browser command
opencli zhihu hot -f json # JSON output
```
### Install from source
```bash
git clone git@github.com:jackwener/opencli.git
cd opencli && npm install
npx tsx src/main.ts list
```
# 公共 API(无需浏览器)
npx tsx src/main.ts hackernews top --limit 10
npx tsx src/main.ts github search --keyword "typescript"
### Update
# 浏览器命令(需要 Chrome + Playwright MCP Bridge 扩展)
npx tsx src/main.ts bilibili hot --limit 10
npx tsx src/main.ts zhihu hot --limit 10
npx tsx src/main.ts twitter trending --limit 10
```bash
# npm global
npm update -g @jackwener/opencli
# Or reinstall to latest
npm install -g @jackwener/opencli@latest
```
## 📋 前置要求
## 📋 Prerequisites
浏览器命令需要:
1. Chrome 浏览器正在运行
2. 安装 [Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm) 扩展
3. 首次使用时点击扩展图标批准连接
Browser commands need:
1. **Chrome** running **and logged into the target site** (e.g. bilibili.com, zhihu.com, xiaohongshu.com)
2. **[Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm)** extension installed
3. Configure `PLAYWRIGHT_MCP_EXTENSION_TOKEN` (from the extension settings page) in your MCP config:
## 📦 内置命令
```json
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--extension"],
"env": {
"PLAYWRIGHT_MCP_EXTENSION_TOKEN": "<your-token>"
}
}
}
}
```
| 站点 | 命令 | 说明 | 模式 |
|------|------|------|------|
| bilibili | hot, search, me, favorite, history, feed, user-videos | 热门 / 搜索 / 个人 / 收藏 / 历史 / 动态 / 投稿 | 🔐 浏览器 |
| zhihu | hot, search | 热榜 / 搜索 | 🔐 浏览器 |
| github | trending, search | Trending / 搜索 | 🔐 / 🌐 公共 |
| twitter | trending | 热门话题 | 🔐 浏览器 |
| v2ex | hot, latest | 热门 / 最新 | 🔐 浏览器 |
| hackernews | top | 热门故事 | 🌐 公共 API |
Public API commands (`hackernews`, `github search`, `v2ex`) need no browser at all.
## 🎨 输出格式
> **⚠️ Important**: Browser commands reuse your Chrome login session. You must be logged into the target website in Chrome before running commands. If you get empty data or errors, check your login status first.
## 📦 Built-in Commands
| Site | Commands | Mode |
|------|----------|------|
| **bilibili** | `hot` `search` `me` `favorite` `history` `feed` `user-videos` | 🔐 Browser |
| **zhihu** | `hot` `search` `question` | 🔐 Browser |
| **xiaohongshu** | `search` `notifications` `feed` | 🔐 Browser |
| **twitter** | `trending` | 🔐 Browser |
| **reddit** | `hot` | 🔐 Browser |
| **github** | `trending` `search` | 🔐 / 🌐 |
| **v2ex** | `hot` `latest` `topic` | 🌐 Public |
| **hackernews** | `top` | 🌐 Public |
## 🎨 Output Formats
```bash
opencli bilibili hot -f table # 默认表格
opencli bilibili hot -f json # JSON(适合管道和 AI agent)
opencli bilibili hot -f table # Default: rich table
opencli bilibili hot -f json # JSON (pipe to jq, feed to AI)
opencli bilibili hot -f md # Markdown
opencli bilibili hot -f csv # CSV
opencli bilibili hot -v # Verbose: show pipeline steps
```
## 🔧 创建新命令
## 🧠 AI Agent Workflow
参考 [SKILL.md](./SKILL.md) 了解 YAML 和 TypeScript 两种方式创建新的 CLI 适配器。
```bash
# 1. Deep Explore — discover APIs, infer capabilities, detect framework
opencli explore https://example.com --site mysite
# 2. Synthesize — generate YAML adapters from explore artifacts
opencli synthesize mysite
# 3. Generate — one-shot: explore → synthesize → register
opencli generate https://example.com --goal "hot"
# 4. Strategy Cascade — auto-probe: PUBLIC → COOKIE → HEADER
opencli cascade https://api.example.com/data
```
Explore outputs to `.opencli/explore/<site>/`:
- `manifest.json` — site metadata, framework detection
- `endpoints.json` — scored API endpoints with response schemas
- `capabilities.json` — inferred capabilities with confidence scores
- `auth.json` — authentication strategy recommendations
## 🔧 Create New Commands
See **[SKILL.md](./SKILL.md)** for the full adapter guide (YAML pipeline + TypeScript).
## Releasing New Versions
```bash
# Bump version
npm version patch # 0.1.0 → 0.1.1
npm version minor # 0.1.0 → 0.2.0
npm version major # 0.1.0 → 1.0.0
# Push tag to trigger GitHub Actions auto-release
git push --follow-tags
```
The CI will automatically build, create a GitHub release, and publish to npm.
## 📄 License
MIT
+148
-96
---
name: opencli
description: "OpenCLI — Make any website your CLI. Zero setup, AI-powered. Turn any website into CLI commands via Chrome browser."
description: "OpenCLI — Make any website your CLI. Zero risk, AI-powered, reuse Chrome login."
version: 0.1.0
author: jackwener
tags: [cli, browser, web, mcp, playwright, bilibili, zhihu, twitter, github, v2ex, hackernews, 哔哩哔哩, 知乎, AI, agent]
tags: [cli, browser, web, mcp, playwright, bilibili, zhihu, twitter, github, v2ex, hackernews, reddit, xiaohongshu, AI, agent]
---

@@ -11,36 +11,36 @@

> Make any website your CLI. 操控 Chrome 无风控风险,复用登录,CLI 化全部网站。
> Make any website your CLI. Reuse Chrome login, zero risk, AI-powered discovery.
## 安装
## Install & Run
```bash
cd ~/code/ai-native-cli
npm install
```
# npm global install (recommended)
npm install -g @jackwener/opencli
opencli <command>
## 使用方式
```bash
# 通过 npx 运行(推荐)
# Or from source
cd ~/code/opencli && npm install
npx tsx src/main.ts <command>
# 或者构建后运行
npm run build && node dist/main.js <command>
# Update to latest
npm update -g @jackwener/opencli
```
## 前置要求
## Prerequisites
浏览器命令需要:
1. Chrome 浏览器正在运行
2. 安装 [Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm) 扩展
3. 点击扩展图标批准连接
Browser commands require:
1. Chrome browser running **(logged into target sites)**
2. [Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm) extension
3. Configure `PLAYWRIGHT_MCP_EXTENSION_TOKEN` (from extension settings) in your MCP config
公共 API 命令(hackernews、github search)无需浏览器。
> **Note**: You must be logged into the target website in Chrome before running commands. Tabs opened during command execution are auto-closed afterwards.
## 内置命令
Public API commands (`hackernews`, `github search`, `v2ex`) need no browser.
### 数据查询
## Commands Reference
### Data Commands
```bash
# Bilibili
# Bilibili (browser)
opencli bilibili hot --limit 10 # B站热门视频

@@ -54,46 +54,67 @@ opencli bilibili search --keyword "rust" # 搜索视频

# 知乎
# 知乎 (browser)
opencli zhihu hot --limit 10 # 知乎热榜
opencli zhihu search --keyword "AI" # 搜索
opencli zhihu question --id 34816524 # 问题详情和回答
# GitHub
# 小红书 (browser)
opencli xiaohongshu search --keyword "美食" # 搜索笔记
opencli xiaohongshu notifications # 通知(mentions/likes/connections)
opencli xiaohongshu feed --limit 10 # 推荐 Feed
# GitHub (trending=browser, search=public)
opencli github trending --limit 10 # GitHub Trending
opencli github search --keyword "cli" # 搜索仓库(无需浏览器)
opencli github search --keyword "cli" # 搜索仓库
# Twitter/X
# Twitter/X (browser)
opencli twitter trending --limit 10 # 热门话题
# V2EX
# Reddit (browser)
opencli reddit hot --limit 10 # 热门帖子
opencli reddit hot --subreddit programming # 指定子版块
# V2EX (public)
opencli v2ex hot --limit 10 # 热门话题
opencli v2ex latest --limit 10 # 最新话题
opencli v2ex topic --id 1024 # 主题详情
# Hacker News
opencli hackernews top --limit 10 # 热门故事(无需浏览器)
# Hacker News (public)
opencli hackernews top --limit 10 # Top stories
```
### 管理命令
### Management Commands
```bash
opencli list # 列出所有可用命令
opencli list --json # JSON 格式输出
opencli validate # 验证所有 CLI 定义
opencli validate bilibili # 验证指定站点
opencli list # List all commands
opencli list --json # JSON output
opencli validate # Validate all CLI definitions
opencli validate bilibili # Validate specific site
```
### AI 工作流(为 AI Agent 设计)
### AI Agent Workflow
```bash
opencli explore <url> # 探索网站,生成 API 发现成果物
opencli synthesize <site> # 从探索成果物合成候选 CLI
opencli generate <url> --goal "hot" # 一键:探索 → 合成 → 注册
opencli verify <site/name> --smoke # 验证 + Smoke 测试
# Deep Explore: network intercept → response analysis → capability inference
opencli explore <url> --site <name>
# Synthesize: generate evaluate-based YAML pipelines from explore artifacts
opencli synthesize <site>
# Generate: one-shot explore → synthesize → register
opencli generate <url> --goal "hot"
# Strategy Cascade: auto-probe PUBLIC → COOKIE → HEADER
opencli cascade <api-url>
# Verify: smoke-test a generated adapter
opencli verify <site/name> --smoke
```
## 输出格式
## Output Formats
所有命令支持 `--format` / `-f` 选项:
All commands support `--format` / `-f`:
```bash
opencli bilibili hot -f table # 默认表格
opencli bilibili hot -f json # JSON
opencli bilibili hot -f table # Default: rich table
opencli bilibili hot -f json # JSON (pipe to jq, feed to AI agent)
opencli bilibili hot -f md # Markdown

@@ -103,13 +124,13 @@ opencli bilibili hot -f csv # CSV

## 调试
## Verbose Mode
```bash
opencli bilibili hot -v # 显示 pipeline 每步详情
opencli bilibili hot -v # Show each pipeline step and data flow
```
## 创建新的 CLI 适配器
## Creating Adapters
### YAML 方式(声明式,推荐)
### YAML Pipeline (declarative, recommended)
在 `src/clis/<site>/<name>.yaml` 创建文件:
Create `src/clis/<site>/<name>.yaml`:

@@ -119,3 +140,3 @@ ```yaml

name: hot
description: Hot topics on mysite
description: Hot topics
domain: www.mysite.com

@@ -137,7 +158,9 @@ strategy: cookie # public | cookie | header | intercept | ui

const res = await fetch('/api/hot', { credentials: 'include' });
return await res.json();
const d = await res.json();
return d.data.items.map(item => ({
title: item.title,
score: item.score,
}));
})()
- select: data.items
- map:

@@ -153,6 +176,21 @@ rank: ${{ index + 1 }}

### TypeScript 方式(编程式,更灵活)
For public APIs (no browser):
在 `src/clis/<site>/<name>.ts` 创建并在 `clis/index.ts` 中 import:
```yaml
strategy: public
browser: false
pipeline:
- fetch:
url: https://api.example.com/hot.json
- select: data.items
- map:
title: ${{ item.title }}
- limit: ${{ args.limit }}
```
### TypeScript Adapter (programmatic)
Create `src/clis/<site>/<name>.ts` and import in `clis/index.ts`:
```typescript

@@ -168,12 +206,13 @@ import { cli, Strategy } from '../../registry.js';

func: async (page, kwargs) => {
await page.goto('https://www.mysite.com');
const data = await page.evaluate(`
async () => {
const res = await fetch('/api/search?q=${kwargs.keyword}', { credentials: 'include' });
(async () => {
const res = await fetch('/api/search?q=${kwargs.keyword}', {
credentials: 'include'
});
return await res.json();
}
})()
`);
return data.items.map((item, i) => ({
rank: i + 1,
title: item.title,
url: item.url,
rank: i + 1, title: item.title, url: item.url,
}));

@@ -184,34 +223,36 @@ },

## Pipeline 步骤参考
**When to use TS**: XHR interception (小红书), cookie extraction (Twitter ct0), Wbi signing (Bilibili), auto-pagination, complex data transforms.
| 步骤 | 说明 | 示例 |
|------|------|------|
| `navigate` | 导航到 URL | `navigate: https://example.com` |
| `fetch` | HTTP 请求(使用浏览器 cookie) | `fetch: { url: "...", params: { q: "${{ args.keyword }}" } }` |
| `evaluate` | 执行 JavaScript | `evaluate: \| (async () => { ... })()` |
| `select` | 选取 JSON 路径 | `select: data.items` |
| `map` | 映射字段 | `map: { title: "${{ item.title }}" }` |
| `filter` | 过滤 | `filter: item.score > 100` |
| `sort` | 排序 | `sort: { by: score, order: desc }` |
| `limit` | 限制数量 | `limit: ${{ args.limit }}` |
| `snapshot` | 获取页面快照 | `snapshot: { interactive: true }` |
| `click` | 点击元素 | `click: ${{ ref }}` |
| `type` | 输入文本 | `type: { ref: "@1", text: "hello" }` |
| `wait` | 等待 | `wait: 2` 或 `wait: { text: "loaded" }` |
| `press` | 按键 | `press: Enter` |
## Pipeline Steps
## 模板语法
| Step | Description | Example |
|------|-------------|---------|
| `navigate` | Go to URL | `navigate: https://example.com` |
| `fetch` | HTTP request (browser cookies) | `fetch: { url: "...", params: { q: "..." } }` |
| `evaluate` | Run JavaScript in page | `evaluate: \| (async () => { ... })()` |
| `select` | Extract JSON path | `select: data.items` |
| `map` | Map fields | `map: { title: "${{ item.title }}" }` |
| `filter` | Filter items | `filter: item.score > 100` |
| `sort` | Sort items | `sort: { by: score, order: desc }` |
| `limit` | Cap result count | `limit: ${{ args.limit }}` |
| `intercept` | Declarative XHR capture | `intercept: { trigger: "navigate:...", capture: "api/hot" }` |
| `tap` | Store action + XHR capture | `tap: { store: "feed", action: "fetchFeeds", capture: "homefeed" }` |
| `snapshot` | Page accessibility tree | `snapshot: { interactive: true }` |
| `click` | Click element | `click: ${{ ref }}` |
| `type` | Type text | `type: { ref: "@1", text: "hello" }` |
| `wait` | Wait for time/text | `wait: 2` or `wait: { text: "loaded" }` |
| `press` | Press key | `press: Enter` |
使用 `${{ expression }}` 进行模板替换:
## Template Syntax
```yaml
# 引用参数
# Arguments with defaults
${{ args.keyword }}
${{ args.limit | default(20) }}
# 引用当前 item(在 map/filter 中)
# Current item (in map/filter)
${{ item.title }}
${{ item.data.nested.field }}
# 索引(从 0 开始)
# Index (0-based)
${{ index }}

@@ -221,19 +262,30 @@ ${{ index + 1 }}

## 环境变量
## 5-Tier Authentication Strategy
| 变量 | 默认值 | 说明 |
|------|--------|------|
| `OPENCLI_BROWSER_CONNECT_TIMEOUT` | 30 | 浏览器连接超时(秒) |
| `OPENCLI_BROWSER_COMMAND_TIMEOUT` | 45 | 命令执行超时(秒) |
| `OPENCLI_BROWSER_EXPLORE_TIMEOUT` | 120 | Explore 超时(秒) |
| `OPENCLI_EXTENSION_LOCK_TIMEOUT` | 120 | 扩展锁超时(秒) |
| `PLAYWRIGHT_MCP_EXTENSION_TOKEN` | — | 自动批准扩展连接 |
| Tier | Name | Method | Example |
|------|------|--------|---------|
| 1 | `public` | No auth, Node.js fetch | Hacker News, V2EX |
| 2 | `cookie` | Browser fetch with `credentials: include` | Bilibili, Zhihu |
| 3 | `header` | Custom headers (ct0, Bearer) | Twitter GraphQL |
| 4 | `intercept` | XHR interception + store mutation | 小红书 Pinia |
| 5 | `ui` | Full UI automation (click/type/scroll) | Last resort |
## 错误排查
## Environment Variables
| 错误 | 解决方案 |
|------|----------|
| `npx not found` | 安装 Node.js: `brew install node` |
| `Timed out connecting to browser` | 1) 确认 Chrome 已打开 2) 安装 Playwright MCP Bridge 扩展 3) 点击扩展图标批准 |
| `Extension lock timed out` | 等待其他 opencli 命令完成,浏览器命令需串行运行 |
| `Request timed out` | 增大 `OPENCLI_BROWSER_COMMAND_TIMEOUT` 或检查网络 |
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENCLI_BROWSER_CONNECT_TIMEOUT` | 30 | Browser connection timeout (sec) |
| `OPENCLI_BROWSER_COMMAND_TIMEOUT` | 45 | Command execution timeout (sec) |
| `OPENCLI_BROWSER_EXPLORE_TIMEOUT` | 120 | Explore timeout (sec) |
| `OPENCLI_EXTENSION_LOCK_TIMEOUT` | 120 | Extension lock timeout (sec) |
| `PLAYWRIGHT_MCP_EXTENSION_TOKEN` | — | Auto-approve extension connection |
## Troubleshooting
| Issue | Solution |
|-------|----------|
| `npx not found` | Install Node.js: `brew install node` |
| `Timed out connecting to browser` | 1) Chrome must be open 2) Install MCP Bridge extension 3) Click to approve |
| `Extension lock timed out` | Another opencli command is running; browser commands run serially |
| `Target page context` error | Add `navigate:` step before `evaluate:` in YAML |
| Empty table data | Check if evaluate returns JSON string (MCP parsing) or data path is wrong |

@@ -39,3 +39,14 @@ /**

if (textParts.length === 1) {
const text = textParts[0].text;
let text = textParts[0].text;
// MCP browser_evaluate returns: "[JSON]\n### Ran Playwright code\n```js\n...\n```"
// Strip the "### Ran Playwright code" suffix to get clean JSON
const codeMarker = text.indexOf('### Ran Playwright code');
if (codeMarker !== -1) {
text = text.slice(0, codeMarker).trim();
}
// Also handle "### Result\n[JSON]" format (some MCP versions)
const resultMarker = text.indexOf('### Result\n');
if (resultMarker !== -1) {
text = text.slice(resultMarker + '### Result\n'.length).trim();
}
try { return JSON.parse(text); } catch { return text; }

@@ -119,2 +130,4 @@ }

private _page: Page | null = null;
async connect(opts: { timeout?: number } = {}): Promise<Page> {

@@ -142,2 +155,3 @@ await this._acquireLock();

);
this._page = page;

@@ -191,2 +205,19 @@ this._proc.stdout?.on('data', (chunk: Buffer) => {

try {
// Close tabs opened during this session (site tabs + extension tabs)
if (this._page && this._proc && !this._proc.killed) {
try {
const tabs = await this._page.tabs();
const tabStr = typeof tabs === 'string' ? tabs : JSON.stringify(tabs);
const allTabs = tabStr.match(/Tab (\d+)/g) || [];
const currentTabCount = allTabs.length;
// Close tabs in reverse order to avoid index shifting issues
// Keep the original tabs that existed before the command started
if (currentTabCount > this._initialTabCount && this._initialTabCount > 0) {
for (let i = currentTabCount - 1; i >= this._initialTabCount; i--) {
try { await this._page.closeTab(i); } catch {}
}
}
} catch {}
}
if (this._proc && !this._proc.killed) {

@@ -197,2 +228,3 @@ this._proc.kill('SIGTERM');

} finally {
this._page = null;
this._releaseLock();

@@ -199,0 +231,0 @@ }

@@ -19,2 +19,5 @@ /**

// zhihu
import './zhihu/search.js';
import './zhihu/question.js';
// xiaohongshu
import './xiaohongshu/search.js';

@@ -5,2 +5,4 @@ site: v2ex

domain: www.v2ex.com
strategy: public
browser: false

@@ -14,7 +16,4 @@ args:

pipeline:
- evaluate: |
(async () => {
const res = await fetch('https://www.v2ex.com/api/topics/hot.json');
return await res.json();
})()
- fetch:
url: https://www.v2ex.com/api/topics/hot.json

@@ -24,9 +23,6 @@ - map:

title: ${{ item.title }}
node: ${{ item.node?.title }}
author: ${{ item.member?.username }}
replies: ${{ item.replies }}
url: ${{ item.url }}
- limit: ${{ args.limit }}
columns: [rank, title, node, author, replies]
columns: [rank, title, replies]

@@ -5,2 +5,4 @@ site: v2ex

domain: www.v2ex.com
strategy: public
browser: false

@@ -14,7 +16,4 @@ args:

pipeline:
- evaluate: |
(async () => {
const res = await fetch('https://www.v2ex.com/api/topics/latest.json');
return await res.json();
})()
- fetch:
url: https://www.v2ex.com/api/topics/latest.json

@@ -24,4 +23,2 @@ - map:

title: ${{ item.title }}
node: ${{ item.node?.title }}
author: ${{ item.member?.username }}
replies: ${{ item.replies }}

@@ -31,2 +28,2 @@

columns: [rank, title, node, author, replies]
columns: [rank, title, replies]

@@ -15,15 +15,29 @@ site: zhihu

- fetch:
url: https://www.zhihu.com/api/v4/search/top_search
params:
limit: ${{ args.limit }}
- evaluate: |
(async () => {
const res = await fetch('https://www.zhihu.com/api/v3/feed/topstory/hot-lists/total?limit=50', {
credentials: 'include'
});
const d = await res.json();
return (d?.data || []).map((item) => {
const t = item.target || {};
return {
title: t.title,
url: 'https://www.zhihu.com/question/' + t.id,
answer_count: t.answer_count,
follower_count: t.follower_count,
heat: item.detail_text || '',
};
});
})()
- select: top_search.words
- map:
rank: ${{ index + 1 }}
title: ${{ item.query }}
title: ${{ item.title }}
heat: ${{ item.heat }}
answers: ${{ item.answer_count }}
url: ${{ item.url }}
- limit: ${{ args.limit }}
columns: [rank, title]
columns: [rank, title, heat, answers]
/**
* Deep Explore: intelligent API discovery with response analysis.
*
* Unlike simple page snapshots, Deep Explore intercepts network traffic,
* analyzes response schemas, and automatically infers capabilities that
* can be turned into CLI commands.
*
* Flow:
* 1. Navigate to target URL
* 2. Auto-scroll to trigger lazy loading
* 3. Capture network requests (with body analysis)
* 4. For each JSON response: detect list fields, infer columns, analyze auth
* 5. Detect frontend framework (Vue/React/Pinia/Next.js)
* 6. Generate structured capabilities.json
* Navigates to the target URL, auto-scrolls to trigger lazy loading,
* captures network traffic, analyzes JSON responses, and automatically
* infers CLI capabilities from discovered API endpoints.
*/

@@ -19,56 +11,51 @@

import * as path from 'node:path';
import { browserSession, DEFAULT_BROWSER_EXPLORE_TIMEOUT, runWithTimeout } from './runtime.js';
import { DEFAULT_BROWSER_EXPLORE_TIMEOUT, browserSession, runWithTimeout } from './runtime.js';
// ── Site name detection ────────────────────────────────────────────────────
const KNOWN_ALIASES: Record<string, string> = {
const KNOWN_SITE_ALIASES: Record<string, string> = {
'x.com': 'twitter', 'twitter.com': 'twitter',
'news.ycombinator.com': 'hackernews',
'www.zhihu.com': 'zhihu', 'www.bilibili.com': 'bilibili',
'search.bilibili.com': 'bilibili',
'www.v2ex.com': 'v2ex', 'www.reddit.com': 'reddit',
'www.xiaohongshu.com': 'xiaohongshu', 'www.douban.com': 'douban',
'www.weibo.com': 'weibo', 'search.bilibili.com': 'bilibili',
'www.weibo.com': 'weibo', 'www.bbc.com': 'bbc',
};
function detectSiteName(url: string): string {
export function detectSiteName(url: string): string {
try {
const host = new URL(url).hostname.toLowerCase();
if (host in KNOWN_ALIASES) return KNOWN_ALIASES[host];
if (host in KNOWN_SITE_ALIASES) return KNOWN_SITE_ALIASES[host];
const parts = host.split('.').filter(p => p && p !== 'www');
if (parts.length >= 2) {
if (['uk', 'jp', 'cn', 'com'].includes(parts[parts.length - 1]) && parts.length >= 3) {
return parts[parts.length - 3].replace(/[^a-z0-9]/g, '');
return slugify(parts[parts.length - 3]);
}
return parts[parts.length - 2].replace(/[^a-z0-9]/g, '');
return slugify(parts[parts.length - 2]);
}
return parts[0]?.replace(/[^a-z0-9]/g, '') ?? 'site';
return parts[0] ? slugify(parts[0]) : 'site';
} catch { return 'site'; }
}
export function slugify(value: string): string {
return value.trim().toLowerCase().replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-|-$/g, '') || 'site';
}
// ── Field & capability inference ───────────────────────────────────────────
/**
* Common field names grouped by semantic role.
* Used to auto-detect which response fields map to which columns.
*/
const FIELD_ROLES: Record<string, string[]> = {
title: ['title', 'name', 'text', 'content', 'desc', 'description', 'headline', 'subject'],
url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'short_link', 'share_url'],
author: ['author', 'username', 'user_name', 'nickname', 'nick', 'owner', 'creator', 'up_name', 'uname'],
score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'stat', 'play', 'favorite_count', 'reply_count'],
time: ['time', 'created_at', 'publish_time', 'pub_time', 'date', 'ctime', 'mtime', 'pubdate', 'created'],
id: ['id', 'aid', 'bvid', 'mid', 'uid', 'oid', 'note_id', 'item_id'],
cover: ['cover', 'pic', 'image', 'thumbnail', 'poster', 'avatar'],
category: ['category', 'tag', 'type', 'tname', 'channel', 'section'],
title: ['title', 'name', 'text', 'content', 'desc', 'description', 'headline', 'subject'],
url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'share_url'],
author: ['author', 'username', 'user_name', 'nickname', 'nick', 'owner', 'creator', 'up_name', 'uname'],
score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'play', 'favorite_count', 'reply_count'],
time: ['time', 'created_at', 'publish_time', 'pub_time', 'date', 'ctime', 'mtime', 'pubdate', 'created'],
id: ['id', 'aid', 'bvid', 'mid', 'uid', 'oid', 'note_id', 'item_id'],
cover: ['cover', 'pic', 'image', 'thumbnail', 'poster', 'avatar'],
category: ['category', 'tag', 'type', 'tname', 'channel', 'section'],
};
/** Param names that indicate searchable APIs */
const SEARCH_PARAMS = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'search_query', 'w']);
/** Param names that indicate pagination */
const PAGINATION_PARAMS = new Set(['page', 'pn', 'offset', 'cursor', 'next', 'page_num']);
/** Param names that indicate limit control */
const LIMIT_PARAMS = new Set(['limit', 'count', 'size', 'per_page', 'page_size', 'ps', 'num']);
/** Content types to ignore */
const IGNORED_CONTENT_TYPES = new Set(['image/', 'font/', 'text/css', 'text/javascript', 'application/javascript', 'application/wasm']);
/** Volatile query params to strip from patterns */
const VOLATILE_PARAMS = new Set(['w_rid', 'wts', '_', 'callback', 'timestamp', 't', 'nonce', 'sign']);

@@ -79,39 +66,17 @@

interface NetworkEntry {
method: string;
url: string;
status: number | null;
contentType: string;
responseBody?: any;
requestHeaders?: Record<string, string>;
queryParams?: Record<string, string>;
method: string; url: string; status: number | null;
contentType: string; responseBody?: any; requestHeaders?: Record<string, string>;
}
interface AnalyzedEndpoint {
pattern: string;
method: string;
url: string;
status: number | null;
contentType: string;
queryParams: string[];
hasSearchParam: boolean;
hasPaginationParam: boolean;
hasLimitParam: boolean;
pattern: string; method: string; url: string; status: number | null;
contentType: string; queryParams: string[]; score: number;
hasSearchParam: boolean; hasPaginationParam: boolean; hasLimitParam: boolean;
authIndicators: string[];
responseAnalysis: ResponseAnalysis | null;
responseAnalysis: { itemPath: string | null; itemCount: number; detectedFields: Record<string, string>; sampleFields: string[] } | null;
}
interface ResponseAnalysis {
itemPath: string | null;
itemCount: number;
detectedFields: Record<string, string>; // role → actual field name
sampleFieldNames: string[];
}
interface InferredCapability {
name: string;
description: string;
strategy: string;
confidence: number;
endpoint: string;
itemPath: string | null;
name: string; description: string; strategy: string; confidence: number;
endpoint: string; itemPath: string | null;
recommendedColumns: string[];

@@ -122,37 +87,18 @@ recommendedArgs: Array<{ name: string; type: string; required: boolean; default?: any }>;

/**
* Parse raw network output from Playwright MCP into structured entries.
* Handles both text format ([GET] url => [200]) and structured JSON.
* Parse raw network output from Playwright MCP.
* Handles text format: [GET] url => [200]
*/
function parseNetworkOutput(raw: any): NetworkEntry[] {
function parseNetworkRequests(raw: any): NetworkEntry[] {
if (typeof raw === 'string') {
// Playwright MCP returns network as text lines like:
// "[GET] https://api.example.com/xxx => [200] "
// May also have markdown headers like "### Result"
const entries: NetworkEntry[] = [];
const lines = raw.split('\n').filter((l: string) => l.trim());
for (const line of lines) {
// Format: [METHOD] URL => [STATUS] optional_extra
const bracketMatch = line.match(/^\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i);
if (bracketMatch) {
const [, method, url, status] = bracketMatch;
for (const line of raw.split('\n')) {
// Format: [GET] URL => [200]
const m = line.match(/\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i);
if (m) {
const [, method, url, status] = m;
entries.push({
method: method.toUpperCase(),
url,
status: status ? parseInt(status) : null,
contentType: url.endsWith('.json') ? 'application/json' :
(url.includes('/api/') || url.includes('/x/')) ? 'application/json' : '',
method: method.toUpperCase(), url, status: status ? parseInt(status) : null,
contentType: (url.includes('/api/') || url.includes('/x/') || url.endsWith('.json')) ? 'application/json' : '',
});
continue;
}
// Legacy format: GET url → 200 (application/json)
const legacyMatch = line.match(/^(GET|POST|PUT|DELETE|PATCH|OPTIONS)\s+(\S+)\s*→?\s*(\d+)?\s*(?:\(([^)]*)\))?/i);
if (legacyMatch) {
const [, method, url, status, ct] = legacyMatch;
entries.push({
method: method.toUpperCase(),
url,
status: status ? parseInt(status) : null,
contentType: ct ?? '',
});
}
}

@@ -162,9 +108,8 @@ return entries;

if (Array.isArray(raw)) {
return raw.map((e: any) => ({
return raw.filter(e => e && typeof e === 'object').map(e => ({
method: (e.method ?? 'GET').toUpperCase(),
url: e.url ?? e.request?.url ?? '',
url: String(e.url ?? e.request?.url ?? e.requestUrl ?? ''),
status: e.status ?? e.statusCode ?? null,
contentType: e.contentType ?? e.mimeType ?? '',
responseBody: e.responseBody ?? e.body,
requestHeaders: e.requestHeaders ?? e.headers,
contentType: e.contentType ?? e.response?.contentType ?? '',
responseBody: e.responseBody, requestHeaders: e.requestHeaders,
}));

@@ -175,37 +120,12 @@ }

/**
* Normalize a URL into a pattern by replacing IDs with placeholders.
*/
function urlToPattern(url: string): string {
try {
const parsed = new URL(url);
const pathNorm = parsed.pathname
.replace(/\/\d+/g, '/{id}')
.replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}')
.replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}');
const p = new URL(url);
const pathNorm = p.pathname.replace(/\/\d+/g, '/{id}').replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}').replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}');
const params: string[] = [];
parsed.searchParams.forEach((_v, k) => {
if (!VOLATILE_PARAMS.has(k)) params.push(k);
});
const qs = params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : '';
return `${parsed.host}${pathNorm}${qs}`;
p.searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k)) params.push(k); });
return `${p.host}${pathNorm}${params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : ''}`;
} catch { return url; }
}
/**
* Extract query params from a URL.
*/
function extractQueryParams(url: string): Record<string, string> {
try {
const params: Record<string, string> = {};
new URL(url).searchParams.forEach((v, k) => { params[k] = v; });
return params;
} catch { return {}; }
}
/**
* Detect auth indicators from request headers.
*/
function detectAuthIndicators(headers?: Record<string, string>): string[] {

@@ -218,76 +138,43 @@ if (!headers) return [];

if (keys.some(k => k.startsWith('x-s') || k === 'x-t' || k === 'x-s-common')) indicators.push('signature');
if (keys.some(k => k === 'x-client-transaction-id')) indicators.push('transaction');
return indicators;
}
/**
* Analyze a JSON response to find list data and field mappings.
*/
function analyzeResponseBody(body: any): ResponseAnalysis | null {
function analyzeResponseBody(body: any): AnalyzedEndpoint['responseAnalysis'] {
if (!body || typeof body !== 'object') return null;
// Try to find the main list in the response
const candidates: Array<{ path: string; items: any[] }> = [];
function findArrays(obj: any, currentPath: string, depth: number) {
function findArrays(obj: any, path: string, depth: number) {
if (depth > 4) return;
if (Array.isArray(obj) && obj.length >= 2) {
// Check if items are objects (not primitive arrays)
if (obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) {
candidates.push({ path: currentPath, items: obj });
}
if (Array.isArray(obj) && obj.length >= 2 && obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) {
candidates.push({ path, items: obj });
}
if (obj && typeof obj === 'object' && !Array.isArray(obj)) {
for (const [key, val] of Object.entries(obj)) {
const nextPath = currentPath ? `${currentPath}.${key}` : key;
findArrays(val, nextPath, depth + 1);
}
for (const [key, val] of Object.entries(obj)) findArrays(val, path ? `${path}.${key}` : key, depth + 1);
}
}
findArrays(body, '', 0);
if (!candidates.length) return null;
// Pick the largest array as the main list
candidates.sort((a, b) => b.items.length - a.items.length);
const best = candidates[0];
const sample = best.items[0];
const sampleFields = sample && typeof sample === 'object' ? flattenFields(sample, '', 2) : [];
// Analyze field names in the first item
const sampleItem = best.items[0];
const sampleFieldNames = sampleItem && typeof sampleItem === 'object'
? flattenFieldNames(sampleItem, '', 2)
: [];
// Match fields to semantic roles
const detectedFields: Record<string, string> = {};
for (const [role, aliases] of Object.entries(FIELD_ROLES)) {
for (const fieldName of sampleFieldNames) {
const basename = fieldName.split('.').pop()?.toLowerCase() ?? '';
if (aliases.includes(basename)) {
detectedFields[role] = fieldName;
break;
}
for (const f of sampleFields) {
if (aliases.includes(f.split('.').pop()?.toLowerCase() ?? '')) { detectedFields[role] = f; break; }
}
}
return {
itemPath: best.path || null,
itemCount: best.items.length,
detectedFields,
sampleFieldNames,
};
return { itemPath: best.path || null, itemCount: best.items.length, detectedFields, sampleFields };
}
/**
* Flatten nested object field names for analysis.
*/
function flattenFieldNames(obj: any, prefix: string, maxDepth: number): string[] {
function flattenFields(obj: any, prefix: string, maxDepth: number): string[] {
if (maxDepth <= 0 || !obj || typeof obj !== 'object') return [];
const names: string[] = [];
for (const key of Object.keys(obj)) {
const fullKey = prefix ? `${prefix}.${key}` : key;
names.push(fullKey);
if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) {
names.push(...flattenFieldNames(obj[key], fullKey, maxDepth - 1));
}
const full = prefix ? `${prefix}.${key}` : key;
names.push(full);
if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) names.push(...flattenFields(obj[key], full, maxDepth - 1));
}

@@ -297,266 +184,192 @@ return names;

/**
* Analyze a list of network entries into structured endpoints.
*/
function analyzeEndpoints(entries: NetworkEntry[], siteHost: string): AnalyzedEndpoint[] {
const seen = new Map<string, AnalyzedEndpoint>();
for (const entry of entries) {
if (!entry.url) continue;
// Skip static resources
const ct = entry.contentType.toLowerCase();
if (IGNORED_CONTENT_TYPES.has(ct.split(';')[0]?.trim() ?? '') ||
ct.includes('image/') || ct.includes('font/') || ct.includes('css') ||
ct.includes('javascript') || ct.includes('wasm')) continue;
// Skip non-JSON and failed responses
if (entry.status && entry.status >= 400) continue;
const pattern = urlToPattern(entry.url);
const queryParams = extractQueryParams(entry.url);
const paramNames = Object.keys(queryParams).filter(k => !VOLATILE_PARAMS.has(k));
const key = `${entry.method}:${pattern}`;
if (seen.has(key)) continue;
const endpoint: AnalyzedEndpoint = {
pattern,
method: entry.method,
url: entry.url,
status: entry.status,
contentType: ct,
queryParams: paramNames,
hasSearchParam: paramNames.some(p => SEARCH_PARAMS.has(p)),
hasPaginationParam: paramNames.some(p => PAGINATION_PARAMS.has(p)),
hasLimitParam: paramNames.some(p => LIMIT_PARAMS.has(p)),
authIndicators: detectAuthIndicators(entry.requestHeaders),
responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null,
};
seen.set(key, endpoint);
}
return [...seen.values()];
function scoreEndpoint(ep: { contentType: string; responseAnalysis: any; pattern: string; status: number | null; hasSearchParam: boolean; hasPaginationParam: boolean; hasLimitParam: boolean }): number {
let s = 0;
if (ep.contentType.includes('json')) s += 10;
if (ep.responseAnalysis) { s += 5; s += Math.min(ep.responseAnalysis.itemCount, 10); s += Object.keys(ep.responseAnalysis.detectedFields).length * 2; }
if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/')) s += 3;
if (ep.hasSearchParam) s += 3;
if (ep.hasPaginationParam) s += 2;
if (ep.hasLimitParam) s += 2;
if (ep.status === 200) s += 2;
return s;
}
/**
* Infer what strategy to use based on endpoint analysis.
*/
function inferStrategy(endpoint: AnalyzedEndpoint): string {
if (endpoint.authIndicators.includes('signature')) return 'intercept';
if (endpoint.authIndicators.includes('transaction')) return 'header';
if (endpoint.authIndicators.includes('bearer') || endpoint.authIndicators.includes('csrf')) return 'header';
// Check if the URL is a public API (no auth indicators)
if (endpoint.authIndicators.length === 0) {
// If it's the same domain, likely cookie auth
return 'cookie';
}
return 'cookie';
}
/**
* Infer the capability name from an endpoint pattern.
*/
function inferCapabilityName(endpoint: AnalyzedEndpoint, goal?: string): string {
function inferCapabilityName(url: string, goal?: string): string {
if (goal) return goal;
const u = endpoint.url.toLowerCase();
const p = endpoint.pattern.toLowerCase();
// Match common patterns
if (endpoint.hasSearchParam) return 'search';
const u = url.toLowerCase();
if (u.includes('hot') || u.includes('popular') || u.includes('ranking') || u.includes('trending')) return 'hot';
if (u.includes('search')) return 'search';
if (u.includes('feed') || u.includes('timeline') || u.includes('dynamic')) return 'feed';
if (u.includes('comment') || u.includes('reply')) return 'comments';
if (u.includes('history')) return 'history';
if (u.includes('profile') || u.includes('userinfo') || u.includes('/me') || u.includes('myinfo')) return 'me';
if (u.includes('video') || u.includes('article') || u.includes('detail') || u.includes('view')) return 'detail';
if (u.includes('profile') || u.includes('userinfo') || u.includes('/me')) return 'me';
if (u.includes('favorite') || u.includes('collect') || u.includes('bookmark')) return 'favorite';
if (u.includes('notification') || u.includes('notice')) return 'notifications';
// Fallback: try to extract from path
try {
const pathname = new URL(endpoint.url).pathname;
const segments = pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i));
if (segments.length) return segments[segments.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase();
const segs = new URL(url).pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i));
if (segs.length) return segs[segs.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase();
} catch {}
return 'data';
}
/**
* Build recommended columns from response analysis.
*/
function buildRecommendedColumns(analysis: ResponseAnalysis | null): string[] {
if (!analysis) return ['title', 'url'];
const cols: string[] = [];
// Prioritize: title → url → author → score → time
const priority = ['title', 'url', 'author', 'score', 'time'];
for (const role of priority) {
if (analysis.detectedFields[role]) cols.push(role);
}
return cols.length ? cols : ['title', 'url'];
function inferStrategy(authIndicators: string[]): string {
if (authIndicators.includes('signature')) return 'intercept';
if (authIndicators.includes('bearer') || authIndicators.includes('csrf')) return 'header';
return 'cookie';
}
/**
* Build recommended args from endpoint query params.
*/
function buildRecommendedArgs(endpoint: AnalyzedEndpoint): InferredCapability['recommendedArgs'] {
const args: InferredCapability['recommendedArgs'] = [];
// ── Framework detection ────────────────────────────────────────────────────
if (endpoint.hasSearchParam) {
const paramName = endpoint.queryParams.find(p => SEARCH_PARAMS.has(p)) ?? 'keyword';
args.push({ name: 'keyword', type: 'str', required: true });
const FRAMEWORK_DETECT_JS = `
() => {
const r = {};
try {
const app = document.querySelector('#app');
r.vue3 = !!(app && app.__vue_app__);
r.vue2 = !!(app && app.__vue__);
r.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]');
r.nextjs = !!window.__NEXT_DATA__;
r.nuxt = !!window.__NUXT__;
if (r.vue3 && app.__vue_app__) { const gp = app.__vue_app__.config?.globalProperties; r.pinia = !!(gp && gp.$pinia); r.vuex = !!(gp && gp.$store); }
} catch {}
return r;
}
`;
// Always add limit
args.push({ name: 'limit', type: 'int', required: false, default: 20 });
// ── Store discovery ────────────────────────────────────────────────────────
if (endpoint.hasPaginationParam) {
args.push({ name: 'page', type: 'int', required: false, default: 1 });
}
const STORE_DISCOVER_JS = `
() => {
const stores = [];
try {
const app = document.querySelector('#app');
if (!app?.__vue_app__) return stores;
const gp = app.__vue_app__.config?.globalProperties;
return args;
}
// Pinia stores
const pinia = gp?.$pinia;
if (pinia?._s) {
pinia._s.forEach((store, id) => {
const actions = [];
const stateKeys = [];
for (const k in store) {
try {
if (k.startsWith('$') || k.startsWith('_')) continue;
if (typeof store[k] === 'function') actions.push(k);
else stateKeys.push(k);
} catch {}
}
stores.push({ type: 'pinia', id, actions: actions.slice(0, 20), stateKeys: stateKeys.slice(0, 15) });
});
}
/**
* Score an endpoint's interest level for capability generation.
* Higher score = more likely to be a useful API endpoint.
*/
function scoreEndpoint(ep: AnalyzedEndpoint): number {
let score = 0;
// JSON content type is strongly preferred
if (ep.contentType.includes('json')) score += 10;
// Has response analysis with items
if (ep.responseAnalysis) {
score += 5;
score += Math.min(ep.responseAnalysis.itemCount, 10);
score += Object.keys(ep.responseAnalysis.detectedFields).length * 2;
// Vuex store modules
const vuex = gp?.$store;
if (vuex?._modules?.root?._children) {
const children = vuex._modules.root._children;
for (const [modName, mod] of Object.entries(children)) {
const actions = Object.keys(mod._rawModule?.actions ?? {}).slice(0, 20);
const stateKeys = Object.keys(mod.state ?? {}).slice(0, 15);
stores.push({ type: 'vuex', id: modName, actions, stateKeys });
}
}
} catch {}
return stores;
}
// API-like path patterns
if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/')) score += 3;
// Has search/pagination params
if (ep.hasSearchParam) score += 3;
if (ep.hasPaginationParam) score += 2;
if (ep.hasLimitParam) score += 2;
// 200 OK
if (ep.status === 200) score += 2;
return score;
`;
export interface DiscoveredStore {
type: 'pinia' | 'vuex';
id: string;
actions: string[];
stateKeys: string[];
}
// ── Framework detection ────────────────────────────────────────────────────
const FRAMEWORK_DETECT_JS = `
(() => {
const result = {};
try {
const app = document.querySelector('#app');
result.vue3 = !!(app && app.__vue_app__);
result.vue2 = !!(app && app.__vue__);
result.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]');
result.nextjs = !!window.__NEXT_DATA__;
result.nuxt = !!window.__NUXT__;
if (result.vue3 && app.__vue_app__) {
const gp = app.__vue_app__.config?.globalProperties;
result.pinia = !!(gp && gp.$pinia);
result.vuex = !!(gp && gp.$store);
}
} catch {}
return JSON.stringify(result);
})()
`;
// ── Main explore function ──────────────────────────────────────────────────
export async function exploreUrl(url: string, opts: any = {}): Promise<any> {
const site = opts.site ?? detectSiteName(url);
const outDir = opts.outDir ?? path.join('.opencli', 'explore', site);
fs.mkdirSync(outDir, { recursive: true });
export async function exploreUrl(
url: string,
opts: {
BrowserFactory: new () => any;
site?: string; goal?: string; authenticated?: boolean;
outDir?: string; waitSeconds?: number; query?: string;
clickLabels?: string[]; auto?: boolean;
},
): Promise<Record<string, any>> {
const waitSeconds = opts.waitSeconds ?? 3.0;
const exploreTimeout = Math.max(DEFAULT_BROWSER_EXPLORE_TIMEOUT, 45.0 + waitSeconds * 8.0);
const result: any = await browserSession(opts.BrowserFactory, async (page: any) => {
return browserSession(opts.BrowserFactory, async (page) => {
return runWithTimeout((async () => {
// Step 1: Navigate
await page.goto(url);
await page.wait(opts.waitSeconds ?? 3);
await page.wait(waitSeconds);
// Step 2: Auto-scroll to trigger lazy loading
for (let i = 0; i < 3; i++) {
await page.scroll('down');
await page.wait(1);
}
// Step 2: Auto-scroll to trigger lazy loading (use keyboard since page.scroll may not exist)
for (let i = 0; i < 3; i++) { try { await page.pressKey('End'); } catch {} await page.wait(1); }
// Step 3: Capture network traffic
// Step 3: Read page metadata
const metadata = await readPageMetadata(page);
// Step 4: Capture network traffic
const rawNetwork = await page.networkRequests(false);
const networkEntries = parseNetworkOutput(rawNetwork);
const networkEntries = parseNetworkRequests(rawNetwork);
// Step 4: For JSON endpoints, try to fetch response body in-browser
const jsonEndpoints = networkEntries.filter(e =>
e.contentType.includes('json') && e.method === 'GET' && e.status === 200
);
// Step 5: For JSON endpoints, re-fetch response body in-browser
const jsonEndpoints = networkEntries.filter(e => e.contentType.includes('json') && e.method === 'GET' && e.status === 200);
for (const ep of jsonEndpoints.slice(0, 10)) {
// Only fetch body for promising-looking API endpoints
if (ep.url.includes('/api/') || ep.url.includes('/x/') || ep.url.includes('/web/') ||
ep.contentType.includes('json')) {
try {
const bodyResult = await page.evaluate(`
async () => {
try {
const resp = await fetch(${JSON.stringify(ep.url)}, { credentials: 'include' });
if (!resp.ok) return null;
const data = await resp.json();
return JSON.stringify(data).slice(0, 10000);
} catch { return null; }
}
`);
if (bodyResult && typeof bodyResult === 'string') {
try { ep.responseBody = JSON.parse(bodyResult); } catch {}
} else if (bodyResult && typeof bodyResult === 'object') {
ep.responseBody = bodyResult;
}
} catch {}
}
try {
const body = await page.evaluate(`async () => { try { const r = await fetch(${JSON.stringify(ep.url)}, {credentials:'include'}); if (!r.ok) return null; const d = await r.json(); return JSON.stringify(d).slice(0,10000); } catch { return null; } }`);
if (body && typeof body === 'string') { try { ep.responseBody = JSON.parse(body); } catch {} }
else if (body && typeof body === 'object') ep.responseBody = body;
} catch {}
}
// Step 5: Detect frontend framework
// Step 6: Detect framework
let framework: Record<string, boolean> = {};
try {
const fwResult = await page.evaluate(FRAMEWORK_DETECT_JS);
if (typeof fwResult === 'string') framework = JSON.parse(fwResult);
else if (typeof fwResult === 'object') framework = fwResult;
} catch {}
try { const fw = await page.evaluate(FRAMEWORK_DETECT_JS); if (fw && typeof fw === 'object') framework = fw; } catch {}
// Step 6: Get page metadata
let title = '', finalUrl = '';
try {
const meta = await page.evaluate(`
() => JSON.stringify({ url: window.location.href, title: document.title || '' })
`);
if (typeof meta === 'string') {
const parsed = JSON.parse(meta);
title = parsed.title; finalUrl = parsed.url;
} else if (typeof meta === 'object') {
title = meta.title; finalUrl = meta.url;
}
} catch {}
// Step 6.5: Discover stores (Pinia / Vuex)
let stores: DiscoveredStore[] = [];
if (framework.pinia || framework.vuex) {
try {
const raw = await page.evaluate(STORE_DISCOVER_JS);
if (Array.isArray(raw)) stores = raw;
} catch {}
}
// Step 7: Analyze endpoints
let siteHost = '';
try { siteHost = new URL(url).hostname; } catch {}
const analyzedEndpoints = analyzeEndpoints(networkEntries, siteHost);
const seen = new Map<string, AnalyzedEndpoint>();
for (const entry of networkEntries) {
if (!entry.url) continue;
const ct = entry.contentType.toLowerCase();
if (ct.includes('image/') || ct.includes('font/') || ct.includes('css') || ct.includes('javascript') || ct.includes('wasm')) continue;
if (entry.status && entry.status >= 400) continue;
// Step 8: Score and rank endpoints
const scoredEndpoints = analyzedEndpoints
.map(ep => ({ ...ep, score: scoreEndpoint(ep) }))
.filter(ep => ep.score >= 5)
.sort((a, b) => b.score - a.score);
const pattern = urlToPattern(entry.url);
const key = `${entry.method}:${pattern}`;
if (seen.has(key)) continue;
// Step 9: Infer capabilities from top endpoints
const qp: string[] = [];
try { new URL(entry.url).searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k)) qp.push(k); }); } catch {}
const ep: AnalyzedEndpoint = {
pattern, method: entry.method, url: entry.url, status: entry.status, contentType: ct,
queryParams: qp, hasSearchParam: qp.some(p => SEARCH_PARAMS.has(p)),
hasPaginationParam: qp.some(p => PAGINATION_PARAMS.has(p)),
hasLimitParam: qp.some(p => LIMIT_PARAMS.has(p)),
authIndicators: detectAuthIndicators(entry.requestHeaders),
responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null,
score: 0,
};
ep.score = scoreEndpoint(ep);
seen.set(key, ep);
}
const analyzedEndpoints = [...seen.values()].filter(ep => ep.score >= 5).sort((a, b) => b.score - a.score);
// Step 8: Infer capabilities
const capabilities: InferredCapability[] = [];
const usedNames = new Set<string>();
for (const ep of scoredEndpoints.slice(0, 8)) {
let capName = inferCapabilityName(ep, opts.goal);
// Deduplicate names
for (const ep of analyzedEndpoints.slice(0, 8)) {
let capName = inferCapabilityName(ep.url, opts.goal);
if (usedNames.has(capName)) {

@@ -568,78 +381,87 @@ const suffix = ep.pattern.split('/').filter(s => s && !s.startsWith('{') && !s.includes('.')).pop();

const cols: string[] = [];
if (ep.responseAnalysis) {
for (const role of ['title', 'url', 'author', 'score', 'time']) {
if (ep.responseAnalysis.detectedFields[role]) cols.push(role);
}
}
const args: InferredCapability['recommendedArgs'] = [];
if (ep.hasSearchParam) args.push({ name: 'keyword', type: 'str', required: true });
args.push({ name: 'limit', type: 'int', required: false, default: 20 });
if (ep.hasPaginationParam) args.push({ name: 'page', type: 'int', required: false, default: 1 });
// Link store actions to capabilities when store-action strategy is recommended
const epStrategy = inferStrategy(ep.authIndicators);
let storeHint: { store: string; action: string } | undefined;
if ((epStrategy === 'intercept' || ep.authIndicators.includes('signature')) && stores.length > 0) {
// Try to find a store/action that matches this endpoint's purpose
for (const s of stores) {
const matchingAction = s.actions.find(a =>
capName.split('_').some(part => a.toLowerCase().includes(part)) ||
a.toLowerCase().includes('fetch') || a.toLowerCase().includes('get')
);
if (matchingAction) {
storeHint = { store: s.id, action: matchingAction };
break;
}
}
}
capabilities.push({
name: capName,
description: `${site} ${capName}`,
strategy: inferStrategy(ep),
confidence: Math.min(ep.score / 20, 1.0),
endpoint: ep.pattern,
name: capName, description: `${opts.site ?? detectSiteName(url)} ${capName}`,
strategy: storeHint ? 'store-action' : epStrategy,
confidence: Math.min(ep.score / 20, 1.0), endpoint: ep.pattern,
itemPath: ep.responseAnalysis?.itemPath ?? null,
recommendedColumns: buildRecommendedColumns(ep.responseAnalysis),
recommendedArgs: buildRecommendedArgs(ep),
recommendedColumns: cols.length ? cols : ['title', 'url'],
recommendedArgs: args,
...(storeHint ? { storeHint } : {}),
});
}
// Step 10: Determine auth strategy
const allAuthIndicators = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators));
let topStrategy = 'cookie';
if (allAuthIndicators.has('signature')) topStrategy = 'intercept';
else if (allAuthIndicators.has('transaction') || allAuthIndicators.has('bearer')) topStrategy = 'header';
else if (allAuthIndicators.size === 0 && scoredEndpoints.some(ep => ep.contentType.includes('json'))) topStrategy = 'public';
// Step 9: Determine overall auth strategy
const allAuth = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators));
const topStrategy = allAuth.has('signature') ? 'intercept' : allAuth.has('bearer') || allAuth.has('csrf') ? 'header' : allAuth.size === 0 ? 'public' : 'cookie';
return {
site,
target_url: url,
final_url: finalUrl,
title,
framework,
top_strategy: topStrategy,
endpoint_count: analyzedEndpoints.length,
api_endpoint_count: scoredEndpoints.length,
capabilities,
endpoints: scoredEndpoints.map(ep => ({
pattern: ep.pattern,
method: ep.method,
url: ep.url,
status: ep.status,
contentType: ep.contentType,
score: ep.score,
queryParams: ep.queryParams,
itemPath: ep.responseAnalysis?.itemPath ?? null,
itemCount: ep.responseAnalysis?.itemCount ?? 0,
detectedFields: ep.responseAnalysis?.detectedFields ?? {},
authIndicators: ep.authIndicators,
})),
auth_indicators: [...allAuthIndicators],
const siteName = opts.site ?? detectSiteName(metadata.url || url);
const targetDir = opts.outDir ?? path.join('.opencli', 'explore', siteName);
fs.mkdirSync(targetDir, { recursive: true });
const result = {
site: siteName, target_url: url, final_url: metadata.url, title: metadata.title,
framework, stores, top_strategy: topStrategy,
endpoint_count: analyzedEndpoints.length + [...seen.values()].filter(ep => ep.score < 5).length,
api_endpoint_count: analyzedEndpoints.length,
capabilities, auth_indicators: [...allAuth],
};
})(), { timeout: DEFAULT_BROWSER_EXPLORE_TIMEOUT, label: 'explore' });
});
// Write artifacts
const manifest = {
site: result.site,
target_url: result.target_url,
final_url: result.final_url,
title: result.title,
framework: result.framework,
top_strategy: result.top_strategy,
explored_at: new Date().toISOString(),
};
fs.writeFileSync(path.join(outDir, 'manifest.json'), JSON.stringify(manifest, null, 2));
fs.writeFileSync(path.join(outDir, 'endpoints.json'), JSON.stringify(result.endpoints ?? [], null, 2));
fs.writeFileSync(path.join(outDir, 'capabilities.json'), JSON.stringify(result.capabilities ?? [], null, 2));
fs.writeFileSync(path.join(outDir, 'auth.json'), JSON.stringify({
top_strategy: result.top_strategy,
indicators: result.auth_indicators ?? [],
framework: result.framework ?? {},
}, null, 2));
// Write artifacts
fs.writeFileSync(path.join(targetDir, 'manifest.json'), JSON.stringify({
site: siteName, target_url: url, final_url: metadata.url, title: metadata.title,
framework, stores: stores.map(s => ({ type: s.type, id: s.id, actions: s.actions })),
top_strategy: topStrategy, explored_at: new Date().toISOString(),
}, null, 2));
fs.writeFileSync(path.join(targetDir, 'endpoints.json'), JSON.stringify(analyzedEndpoints.map(ep => ({
pattern: ep.pattern, method: ep.method, url: ep.url, status: ep.status,
contentType: ep.contentType, score: ep.score, queryParams: ep.queryParams,
itemPath: ep.responseAnalysis?.itemPath ?? null, itemCount: ep.responseAnalysis?.itemCount ?? 0,
detectedFields: ep.responseAnalysis?.detectedFields ?? {}, authIndicators: ep.authIndicators,
})), null, 2));
fs.writeFileSync(path.join(targetDir, 'capabilities.json'), JSON.stringify(capabilities, null, 2));
fs.writeFileSync(path.join(targetDir, 'auth.json'), JSON.stringify({
top_strategy: topStrategy, indicators: [...allAuth], framework,
}, null, 2));
if (stores.length > 0) {
fs.writeFileSync(path.join(targetDir, 'stores.json'), JSON.stringify(stores, null, 2));
}
return { ...result, out_dir: outDir };
return { ...result, out_dir: targetDir };
})(), { timeout: exploreTimeout, label: `Explore ${url}` });
});
}
export function renderExploreSummary(result: any): string {
export function renderExploreSummary(result: Record<string, any>): string {
const lines = [
'opencli explore: OK',
`Site: ${result.site}`,
`URL: ${result.target_url}`,
`Title: ${result.title || '(none)'}`,
`Strategy: ${result.top_strategy}`,
'opencli probe: OK', `Site: ${result.site}`, `URL: ${result.target_url}`,
`Title: ${result.title || '(none)'}`, `Strategy: ${result.top_strategy}`,
`Endpoints: ${result.endpoint_count} total, ${result.api_endpoint_count} API`,

@@ -649,3 +471,4 @@ `Capabilities: ${result.capabilities?.length ?? 0}`,

for (const cap of (result.capabilities ?? []).slice(0, 5)) {
lines.push(` • ${cap.name} (${cap.strategy}, confidence: ${(cap.confidence * 100).toFixed(0)}%)`);
const storeInfo = cap.storeHint ? ` → ${cap.storeHint.store}.${cap.storeHint.action}()` : '';
lines.push(` • ${cap.name} (${cap.strategy}, ${(cap.confidence * 100).toFixed(0)}%)${storeInfo}`);
}

@@ -655,4 +478,19 @@ const fw = result.framework ?? {};

if (fwNames.length) lines.push(`Framework: ${fwNames.join(', ')}`);
const stores: DiscoveredStore[] = result.stores ?? [];
if (stores.length) {
lines.push(`Stores: ${stores.length}`);
for (const s of stores.slice(0, 5)) {
lines.push(` • ${s.type}/${s.id}: ${s.actions.slice(0, 5).join(', ')}${s.actions.length > 5 ? '...' : ''}`);
}
}
lines.push(`Output: ${result.out_dir}`);
return lines.join('\n');
}
async function readPageMetadata(page: any): Promise<{ url: string; title: string }> {
try {
const result = await page.evaluate(`() => ({ url: window.location.href, title: document.title || '' })`);
if (result && typeof result === 'object') return { url: String(result.url ?? ''), title: String(result.title ?? '') };
} catch {}
return { url: '', title: '' };
}

@@ -55,2 +55,5 @@ #!/usr/bin/env node

program.command('probe').description('Probe a website: discover APIs, stores, and recommend strategies').argument('<url>').option('--site <name>').option('--goal <text>').option('--wait <s>', '', '3')
.action(async (url, opts) => { const { exploreUrl, renderExploreSummary } = await import('./explore.js'); console.log(renderExploreSummary(await exploreUrl(url, { BrowserFactory: PlaywrightMCP, site: opts.site, goal: opts.goal, waitSeconds: parseFloat(opts.wait) }))); });
program.command('synthesize').description('Synthesize CLIs from explore').argument('<target>').option('--top <n>', '', '3')

@@ -62,2 +65,13 @@ .action(async (target, opts) => { const { synthesizeFromExplore, renderSynthesizeSummary } = await import('./synthesize.js'); console.log(renderSynthesizeSummary(synthesizeFromExplore(target, { top: parseInt(opts.top) }))); });

program.command('cascade').description('Strategy cascade: find simplest working strategy').argument('<url>').option('--site <name>')
.action(async (url, opts) => {
const { cascadeProbe, renderCascadeResult } = await import('./cascade.js');
const result = await browserSession(PlaywrightMCP, async (page) => {
// Navigate to the site first for cookie context
try { const siteUrl = new URL(url); await page.goto(`${siteUrl.protocol}//${siteUrl.host}`); await page.wait(2); } catch {}
return cascadeProbe(page, url);
});
console.log(renderCascadeResult(result));
});
// ── Dynamic site commands ──────────────────────────────────────────────────

@@ -64,0 +78,0 @@

@@ -29,2 +29,7 @@ /**

data = await executeStep(page, op, params, data, args);
// Detect error objects returned by steps (e.g. tap store not found)
if (data && typeof data === 'object' && !Array.isArray(data) && data.error) {
process.stderr.write(` ${chalk.yellow('⚠')} ${chalk.yellow(op)}: ${data.error}\n`);
if (data.hint) process.stderr.write(` ${chalk.dim('💡')} ${chalk.dim(data.hint)}\n`);
}
if (debug) debugStepResult(op, data);

@@ -147,3 +152,11 @@ }

const js = String(render(params, { args, data }));
return page.evaluate(normalizeEvaluateSource(js));
let result = await page.evaluate(normalizeEvaluateSource(js));
// MCP may return JSON as a string — auto-parse it
if (typeof result === 'string') {
const trimmed = result.trim();
if ((trimmed.startsWith('[') && trimmed.endsWith(']')) || (trimmed.startsWith('{') && trimmed.endsWith('}'))) {
try { result = JSON.parse(trimmed); } catch {}
}
}
return result;
}

@@ -213,3 +226,227 @@ case 'snapshot': {

}
case 'intercept': return data;
case 'intercept': {
// Declarative XHR interception step
// Usage:
// intercept:
// trigger: "navigate:https://..." | "evaluate:store.note.fetch()" | "click:ref"
// capture: "api/pattern" # URL substring to match
// timeout: 5 # seconds to wait for matching request
// select: "data.items" # optional: extract sub-path from response
const cfg = typeof params === 'object' ? params : {};
const trigger = cfg.trigger ?? '';
const capturePattern = cfg.capture ?? '';
const timeout = cfg.timeout ?? 8;
const selectPath = cfg.select ?? null;
if (!capturePattern) return data;
// Step 1: Execute the trigger action
if (trigger.startsWith('navigate:')) {
const url = render(trigger.slice('navigate:'.length), { args, data });
await page.goto(String(url));
} else if (trigger.startsWith('evaluate:')) {
const js = trigger.slice('evaluate:'.length);
await page.evaluate(normalizeEvaluateSource(render(js, { args, data }) as string));
} else if (trigger.startsWith('click:')) {
const ref = render(trigger.slice('click:'.length), { args, data });
await page.click(String(ref).replace(/^@/, ''));
} else if (trigger === 'scroll') {
await page.scroll('down');
}
// Step 2: Wait a bit for network requests to fire
await page.wait(Math.min(timeout, 3));
// Step 3: Get network requests and find matching ones
const rawNetwork = await page.networkRequests(false);
const matchingResponses: any[] = [];
if (typeof rawNetwork === 'string') {
// Parse the network output to find matching URLs
const lines = rawNetwork.split('\n');
for (const line of lines) {
const match = line.match(/\[?(GET|POST)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i);
if (match) {
const [, method, url, status] = match;
if (url.includes(capturePattern) && status === '200') {
// Re-fetch the matching URL to get the response body
try {
const body = await page.evaluate(`
async () => {
try {
const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' });
if (!resp.ok) return null;
return await resp.json();
} catch { return null; }
}
`);
if (body) matchingResponses.push(body);
} catch {}
}
}
}
}
// Step 4: Select from response if specified
let result = matchingResponses.length === 1 ? matchingResponses[0] :
matchingResponses.length > 1 ? matchingResponses : data;
if (selectPath && result) {
let current = result;
for (const part of String(selectPath).split('.')) {
if (current && typeof current === 'object' && !Array.isArray(current)) {
current = current[part];
} else break;
}
result = current ?? result;
}
return result;
}
case 'tap': {
// ── Declarative Store Action Bridge ──────────────────────────────────
// Usage:
// tap:
// store: feed # Pinia/Vuex store name
// action: fetchFeeds # Store action to call
// args: [] # Optional args to pass to action
// capture: homefeed # URL pattern to capture response
// timeout: 5 # Seconds to wait for network (default: 5)
// select: data.items # Optional: extract sub-path from response
// framework: pinia # Optional: pinia | vuex (auto-detected if omitted)
//
// Generates a self-contained IIFE that:
// 1. Injects fetch + XHR dual interception proxy
// 2. Finds the Pinia/Vuex store and calls the action
// 3. Captures the response matching the URL pattern
// 4. Auto-cleans up interception in finally block
// 5. Returns the captured data (optionally sub-selected)
const cfg = typeof params === 'object' ? params : {};
const storeName = String(render(cfg.store ?? '', { args, data }));
const actionName = String(render(cfg.action ?? '', { args, data }));
const capturePattern = String(render(cfg.capture ?? '', { args, data }));
const timeout = cfg.timeout ?? 5;
const selectPath = cfg.select ? String(render(cfg.select, { args, data })) : null;
const framework = cfg.framework ?? null; // auto-detect if null
const actionArgs = cfg.args ?? [];
if (!storeName || !actionName) throw new Error('tap: store and action are required');
// Build select chain for the captured response
const selectChain = selectPath
? selectPath.split('.').map((p: string) => `?.[${JSON.stringify(p)}]`).join('')
: '';
// Serialize action arguments
const actionArgsRendered = actionArgs.map((a: any) => {
const rendered = render(a, { args, data });
return JSON.stringify(rendered);
});
const actionCall = actionArgsRendered.length
? `store[${JSON.stringify(actionName)}](${actionArgsRendered.join(', ')})`
: `store[${JSON.stringify(actionName)}]()`;
const js = `
async () => {
// ── 1. Setup capture proxy (fetch + XHR dual interception) ──
let captured = null;
const capturePattern = ${JSON.stringify(capturePattern)};
// Intercept fetch API
const origFetch = window.fetch;
window.fetch = async function(...fetchArgs) {
const resp = await origFetch.apply(this, fetchArgs);
try {
const url = typeof fetchArgs[0] === 'string' ? fetchArgs[0]
: fetchArgs[0] instanceof Request ? fetchArgs[0].url : String(fetchArgs[0]);
if (capturePattern && url.includes(capturePattern) && !captured) {
try { captured = await resp.clone().json(); } catch {}
}
} catch {}
return resp;
};
// Intercept XMLHttpRequest
const origXhrOpen = XMLHttpRequest.prototype.open;
const origXhrSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.open = function(method, url) {
this.__tapUrl = String(url);
return origXhrOpen.apply(this, arguments);
};
XMLHttpRequest.prototype.send = function(body) {
if (capturePattern && this.__tapUrl?.includes(capturePattern)) {
const xhr = this;
const origHandler = xhr.onreadystatechange;
xhr.onreadystatechange = function() {
if (xhr.readyState === 4 && !captured) {
try { captured = JSON.parse(xhr.responseText); } catch {}
}
if (origHandler) origHandler.apply(this, arguments);
};
// Also handle onload
const origOnload = xhr.onload;
xhr.onload = function() {
if (!captured) { try { captured = JSON.parse(xhr.responseText); } catch {} }
if (origOnload) origOnload.apply(this, arguments);
};
}
return origXhrSend.apply(this, arguments);
};
try {
// ── 2. Find store ──
let store = null;
const storeName = ${JSON.stringify(storeName)};
const fw = ${JSON.stringify(framework)};
// Auto-detect framework if not specified
const app = document.querySelector('#app');
if (!fw || fw === 'pinia') {
// Try Pinia (Vue 3)
try {
const pinia = app?.__vue_app__?.config?.globalProperties?.$pinia;
if (pinia?._s) store = pinia._s.get(storeName);
} catch {}
}
if (!store && (!fw || fw === 'vuex')) {
// Try Vuex (Vue 2/3)
try {
const vuexStore = app?.__vue_app__?.config?.globalProperties?.$store
?? app?.__vue__?.$store;
if (vuexStore) {
// Vuex doesn't have named stores like Pinia, dispatch action
store = { [${JSON.stringify(actionName)}]: (...a) => vuexStore.dispatch(storeName + '/' + ${JSON.stringify(actionName)}, ...a) };
}
} catch {}
}
if (!store) return { error: 'Store not found: ' + storeName, hint: 'Page may not be fully loaded or store name may be incorrect' };
if (typeof store[${JSON.stringify(actionName)}] !== 'function') {
return { error: 'Action not found: ' + ${JSON.stringify(actionName)} + ' on store ' + storeName,
hint: 'Available: ' + Object.keys(store).filter(k => typeof store[k] === 'function' && !k.startsWith('$') && !k.startsWith('_')).join(', ') };
}
// ── 3. Call store action ──
await ${actionCall};
// ── 4. Wait for network response ──
const deadline = Date.now() + ${timeout} * 1000;
while (!captured && Date.now() < deadline) {
await new Promise(r => setTimeout(r, 200));
}
} finally {
// ── 5. Always restore originals ──
window.fetch = origFetch;
XMLHttpRequest.prototype.open = origXhrOpen;
XMLHttpRequest.prototype.send = origXhrSend;
}
if (!captured) return { error: 'No matching response captured for pattern: ' + capturePattern };
return captured${selectChain} ?? captured;
}
`;
return page.evaluate(js);
}
default: return data;

@@ -216,0 +453,0 @@ }

/**
* Synthesize: turn explore capabilities into ready-to-use CLI definitions.
*
* Takes the structured capabilities from Deep Explore and generates
* YAML pipeline files that can be directly registered as CLI commands.
*
* This is the bridge between discovery (explore) and usability (CLI).
* Synthesize candidate CLIs from explore artifacts.
* Generates evaluate-based YAML pipelines (matching hand-written adapter patterns).
*/

@@ -14,10 +10,14 @@

export function synthesizeFromExplore(target: string, opts: any = {}): any {
const exploreDir = fs.existsSync(target) ? target : path.join('.opencli', 'explore', target);
if (!fs.existsSync(exploreDir)) throw new Error(`Explore dir not found: ${target}`);
/** Volatile params to strip from generated URLs */
const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']);
const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']);
const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']);
const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']);
const manifest = JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8'));
const capabilities = JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8'));
const endpoints = JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8'));
const auth = JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8'));
export function synthesizeFromExplore(
target: string,
opts: { outDir?: string; top?: number } = {},
): Record<string, any> {
const exploreDir = resolveExploreDir(target);
const bundle = loadExploreBundle(exploreDir);

@@ -27,62 +27,58 @@ const targetDir = opts.outDir ?? path.join(exploreDir, 'candidates');

const site = manifest.site;
const topN = opts.top ?? 5;
const site = bundle.manifest.site;
const capabilities = (bundle.capabilities ?? [])
.sort((a: any, b: any) => (b.confidence ?? 0) - (a.confidence ?? 0))
.slice(0, opts.top ?? 3);
const candidates: any[] = [];
// Sort capabilities by confidence
const sortedCaps = [...capabilities]
.sort((a: any, b: any) => (b.confidence ?? 0) - (a.confidence ?? 0))
.slice(0, topN);
for (const cap of sortedCaps) {
// Find the matching endpoint for more detail
const endpoint = endpoints.find((ep: any) => ep.pattern === cap.endpoint) ??
endpoints[0];
const candidate = buildCandidateYaml(site, manifest, cap, endpoint);
const fileName = `${cap.name}.yaml`;
const filePath = path.join(targetDir, fileName);
for (const cap of capabilities) {
const endpoint = chooseEndpoint(cap, bundle.endpoints);
if (!endpoint) continue;
const candidate = buildCandidateYaml(site, bundle.manifest, cap, endpoint);
const filePath = path.join(targetDir, `${candidate.name}.yaml`);
fs.writeFileSync(filePath, yaml.dump(candidate.yaml, { sortKeys: false, lineWidth: 120 }));
candidates.push({
name: cap.name,
path: filePath,
strategy: cap.strategy,
endpoint: cap.endpoint,
confidence: cap.confidence,
columns: candidate.yaml.columns,
});
candidates.push({ name: candidate.name, path: filePath, strategy: cap.strategy, confidence: cap.confidence });
}
const index = {
site,
target_url: manifest.target_url,
generated_from: exploreDir,
candidate_count: candidates.length,
candidates,
};
const index = { site, target_url: bundle.manifest.target_url, generated_from: exploreDir, candidate_count: candidates.length, candidates };
fs.writeFileSync(path.join(targetDir, 'candidates.json'), JSON.stringify(index, null, 2));
return { site, explore_dir: exploreDir, out_dir: targetDir, candidate_count: candidates.length, candidates };
}
export function renderSynthesizeSummary(result: Record<string, any>): string {
const lines = ['opencli synthesize: OK', `Site: ${result.site}`, `Source: ${result.explore_dir}`, `Candidates: ${result.candidate_count}`];
for (const c of result.candidates ?? []) lines.push(` • ${c.name} (${c.strategy}, ${((c.confidence ?? 0) * 100).toFixed(0)}% confidence) → ${c.path}`);
return lines.join('\n');
}
export function resolveExploreDir(target: string): string {
if (fs.existsSync(target)) return target;
const candidate = path.join('.opencli', 'explore', target);
if (fs.existsSync(candidate)) return candidate;
throw new Error(`Explore directory not found: ${target}`);
}
export function loadExploreBundle(exploreDir: string): Record<string, any> {
return {
site,
explore_dir: exploreDir,
out_dir: targetDir,
candidate_count: candidates.length,
candidates,
manifest: JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8')),
endpoints: JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8')),
capabilities: JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8')),
auth: JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8')),
};
}
/** Volatile params to strip from generated URLs */
const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']);
const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']);
const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']);
const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']);
function chooseEndpoint(cap: any, endpoints: any[]): any | null {
if (!endpoints.length) return null;
// Match by endpoint pattern from capability
if (cap.endpoint) {
const match = endpoints.find((e: any) => e.pattern === cap.endpoint || e.url?.includes(cap.endpoint));
if (match) return match;
}
return endpoints.sort((a: any, b: any) => (b.score ?? 0) - (a.score ?? 0))[0];
}
/**
* Build a clean templated URL from a raw API URL.
* - Strips volatile params (w_rid, wts, etc.)
* - Templates search, limit, and pagination params
* - Builds URL string manually to avoid URL encoding of ${{ }} expressions
*/
function buildTemplatedUrl(rawUrl: string, cap: any, endpoint: any): string {
// ── URL templating ─────────────────────────────────────────────────────────
function buildTemplatedUrl(rawUrl: string, cap: any, _endpoint: any): string {
try {

@@ -92,77 +88,99 @@ const u = new URL(rawUrl);

const params: Array<[string, string]> = [];
const hasKeyword = cap.recommendedArgs?.some((a: any) => a.name === 'keyword');
u.searchParams.forEach((v, k) => {
// Skip volatile params
if (VOLATILE_PARAMS.has(k)) return;
// Template known param types
if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) {
params.push([k, '${{ args.keyword }}']);
} else if (LIMIT_PARAM_NAMES.has(k)) {
params.push([k, '${{ args.limit | default(20) }}']);
} else if (PAGE_PARAM_NAMES.has(k)) {
params.push([k, '${{ args.page | default(1) }}']);
} else {
params.push([k, v]);
}
if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) params.push([k, '${{ args.keyword }}']);
else if (LIMIT_PARAM_NAMES.has(k)) params.push([k, '${{ args.limit | default(20) }}']);
else if (PAGE_PARAM_NAMES.has(k)) params.push([k, '${{ args.page | default(1) }}']);
else params.push([k, v]);
});
if (params.length === 0) return base;
return base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&');
} catch {
return rawUrl;
}
return params.length ? base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&') : base;
} catch { return rawUrl; }
}
/**
* Build a YAML pipeline definition from a capability + endpoint.
* Build inline evaluate script for browser-based fetch+parse.
* Follows patterns from bilibili/hot.yaml and twitter/trending.yaml.
*/
function buildEvaluateScript(url: string, itemPath: string, endpoint: any): string {
const pathChain = itemPath.split('.').map((p: string) => `?.${p}`).join('');
const detectedFields = endpoint?.detectedFields ?? {};
const hasFields = Object.keys(detectedFields).length > 0;
let mapCode = '';
if (hasFields) {
const mappings = Object.entries(detectedFields)
.map(([role, field]) => ` ${role}: item${String(field).split('.').map(p => `?.${p}`).join('')}`)
.join(',\n');
mapCode = `.map((item) => ({\n${mappings}\n }))`;
}
return [
'(async () => {',
` const res = await fetch('${url}', {`,
` credentials: 'include'`,
' });',
' const data = await res.json();',
` return (data${pathChain} || [])${mapCode};`,
'})()\n',
].join('\n');
}
// ── YAML pipeline generation ───────────────────────────────────────────────
function buildCandidateYaml(site: string, manifest: any, cap: any, endpoint: any): { name: string; yaml: any } {
const needsBrowser = cap.strategy !== 'public';
const pipeline: any[] = [];
const templatedUrl = buildTemplatedUrl(endpoint?.url ?? manifest.target_url, cap, endpoint);
// Step 1: Navigate (if browser-based)
if (needsBrowser) {
let domain = '';
try { domain = new URL(manifest.target_url).hostname; } catch {}
if (cap.strategy === 'store-action' && cap.storeHint) {
// Store Action: navigate + wait + tap (declarative, clean)
pipeline.push({ navigate: manifest.target_url });
pipeline.push({ wait: 3 });
const tapStep: Record<string, any> = {
store: cap.storeHint.store,
action: cap.storeHint.action,
timeout: 8,
};
// Infer capture pattern from endpoint URL
if (endpoint?.url) {
try {
const epUrl = new URL(endpoint.url);
const pathParts = epUrl.pathname.split('/').filter((p: string) => p);
// Use last meaningful path segment as capture pattern
const capturePart = pathParts.filter((p: string) => !p.match(/^v\d+$/)).pop();
if (capturePart) tapStep.capture = capturePart;
} catch {}
}
if (cap.itemPath) tapStep.select = cap.itemPath;
pipeline.push({ tap: tapStep });
} else if (needsBrowser) {
// Browser-based: navigate + evaluate (like bilibili/hot.yaml, twitter/trending.yaml)
pipeline.push({ navigate: manifest.target_url });
const itemPath = cap.itemPath ?? 'data.data.list';
pipeline.push({ evaluate: buildEvaluateScript(templatedUrl, itemPath, endpoint) });
} else {
// Public API: direct fetch (like hackernews/top.yaml)
pipeline.push({ fetch: { url: templatedUrl } });
if (cap.itemPath) pipeline.push({ select: cap.itemPath });
}
// Step 2: Fetch the API — build a clean URL with templates
const rawUrl = endpoint?.url ?? manifest.target_url;
const fetchStep: any = { url: buildTemplatedUrl(rawUrl, cap, endpoint) };
pipeline.push({ fetch: fetchStep });
// Step 3: Select the item path
if (cap.itemPath) {
pipeline.push({ select: cap.itemPath });
}
// Step 4: Map fields to columns
// Map fields
const mapStep: Record<string, string> = {};
const columns = cap.recommendedColumns ?? ['title', 'url'];
// Add a rank column if not doing search
if (!cap.recommendedArgs?.some((a: any) => a.name === 'keyword')) {
mapStep['rank'] = '${{ index + 1 }}';
}
// Build field mappings from the endpoint's detected fields
if (!cap.recommendedArgs?.some((a: any) => a.name === 'keyword')) mapStep['rank'] = '${{ index + 1 }}';
const detectedFields = endpoint?.detectedFields ?? {};
for (const col of columns) {
const fieldPath = detectedFields[col];
if (fieldPath) {
mapStep[col] = `\${{ item.${fieldPath} }}`;
} else {
mapStep[col] = `\${{ item.${col} }}`;
}
mapStep[col] = fieldPath ? `\${{ item.${fieldPath} }}` : `\${{ item.${col} }}`;
}
pipeline.push({ map: mapStep });
// Step 5: Limit
pipeline.push({ limit: '${{ args.limit | default(20) }}' });
// Build args definition
// Args
const argsDef: Record<string, any> = {};

@@ -178,22 +196,10 @@ for (const arg of cap.recommendedArgs ?? []) {

}
if (!argsDef['limit']) argsDef['limit'] = { type: 'int', default: 20, description: 'Number of items to return' };
// Ensure limit arg always exists
if (!argsDef['limit']) {
argsDef['limit'] = { type: 'int', default: 20, description: 'Number of items to return' };
}
const allColumns = Object.keys(mapStep);
return {
name: cap.name,
yaml: {
site,
name: cap.name,
description: `${site} ${cap.name} (auto-generated)`,
domain: manifest.final_url ? new URL(manifest.final_url).hostname : undefined,
strategy: cap.strategy,
browser: needsBrowser,
args: argsDef,
pipeline,
columns: allColumns,
site, name: cap.name, description: `${cap.description || site + ' ' + cap.name} (auto-generated)`,
domain, strategy: cap.strategy, browser: needsBrowser,
args: argsDef, pipeline, columns: Object.keys(mapStep),
},

@@ -203,13 +209,12 @@ };

export function renderSynthesizeSummary(r: any): string {
const lines = [
'opencli synthesize: OK',
`Site: ${r.site}`,
`Source: ${r.explore_dir}`,
`Candidates: ${r.candidate_count}`,
];
for (const c of r.candidates ?? []) {
lines.push(` • ${c.name} (${c.strategy}, ${(c.confidence * 100).toFixed(0)}% confidence) → ${c.path}`);
}
return lines.join('\n');
/** Backward-compatible export for scaffold.ts */
export function buildCandidate(site: string, targetUrl: string, cap: any, endpoint: any): any {
// Map old-style field names to new ones
const normalizedCap = {
...cap,
recommendedArgs: cap.recommendedArgs ?? cap.recommended_args,
recommendedColumns: cap.recommendedColumns ?? cap.recommended_columns,
};
const manifest = { target_url: targetUrl, final_url: targetUrl };
return buildCandidateYaml(site, manifest, normalizedCap, endpoint);
}
import { cli, Strategy } from '../../registry.js';
import { fetchJson } from '../../bilibili.js';
cli({
site: 'zhihu',
name: 'search',
description: '搜索知乎问题和回答',
domain: 'www.zhihu.com',
strategy: Strategy.COOKIE,
args: [
{ name: 'keyword', required: true, help: 'Search keyword' },
{ name: 'type', default: 'general', help: 'general, article, video' },
{ name: 'limit', type: 'int', default: 20, help: 'Number of results' },
],
columns: ['rank', 'title', 'author', 'type', 'url'],
func: async (page, kwargs) => {
const { keyword, type = 'general', limit = 20 } = kwargs;
// Navigate to zhihu to ensure cookie context
await page.goto('https://www.zhihu.com');
const qs = new URLSearchParams({ q: keyword, type, limit: String(limit) });
const payload = await fetchJson(page, `https://www.zhihu.com/api/v4/search_v3?${qs}`);
const data = payload?.data ?? [];
const rows = [];
for (let i = 0; i < Math.min(data.length, Number(limit)); i++) {
const item = data[i];
const obj = item.object ?? item;
const itemType = item.type ?? obj.type ?? 'unknown';
let title = '';
let author = '';
let url = '';
if (itemType === 'search_result') {
const highlight = obj.highlight ?? {};
title = (highlight.title ?? obj.title ?? '').replace(/<[^>]+>/g, '');
author = obj.author?.name ?? '';
url = obj.url ?? '';
}
else if (obj.question) {
title = (obj.question.title ?? obj.title ?? '').replace(/<[^>]+>/g, '');
author = obj.author?.name ?? '';
url = obj.question.url ? `https://www.zhihu.com/question/${obj.question.id}` : '';
}
else {
title = (obj.title ?? obj.name ?? '').replace(/<[^>]+>/g, '');
author = obj.author?.name ?? '';
url = obj.url ?? '';
}
if (!title)
continue;
rows.push({
rank: rows.length + 1,
title: title.slice(0, 60),
author,
type: itemType.replace('search_result', 'result'),
url,
});
}
return rows;
},
});
import { cli, Strategy } from '../../registry.js';
import { fetchJson } from '../../bilibili.js';
cli({
site: 'zhihu',
name: 'search',
description: '搜索知乎问题和回答',
domain: 'www.zhihu.com',
strategy: Strategy.COOKIE,
args: [
{ name: 'keyword', required: true, help: 'Search keyword' },
{ name: 'type', default: 'general', help: 'general, article, video' },
{ name: 'limit', type: 'int', default: 20, help: 'Number of results' },
],
columns: ['rank', 'title', 'author', 'type', 'url'],
func: async (page, kwargs) => {
const { keyword, type = 'general', limit = 20 } = kwargs;
// Navigate to zhihu to ensure cookie context
await page.goto('https://www.zhihu.com');
const qs = new URLSearchParams({ q: keyword, type, limit: String(limit) });
const payload = await fetchJson(page, `https://www.zhihu.com/api/v4/search_v3?${qs}`);
const data: any[] = payload?.data ?? [];
const rows: any[] = [];
for (let i = 0; i < Math.min(data.length, Number(limit)); i++) {
const item = data[i];
const obj = item.object ?? item;
const itemType = item.type ?? obj.type ?? 'unknown';
let title = '';
let author = '';
let url = '';
if (itemType === 'search_result') {
const highlight = obj.highlight ?? {};
title = (highlight.title ?? obj.title ?? '').replace(/<[^>]+>/g, '');
author = obj.author?.name ?? '';
url = obj.url ?? '';
} else if (obj.question) {
title = (obj.question.title ?? obj.title ?? '').replace(/<[^>]+>/g, '');
author = obj.author?.name ?? '';
url = obj.question.url ? `https://www.zhihu.com/question/${obj.question.id}` : '';
} else {
title = (obj.title ?? obj.name ?? '').replace(/<[^>]+>/g, '');
author = obj.author?.name ?? '';
url = obj.url ?? '';
}
if (!title) continue;
rows.push({
rank: rows.length + 1,
title: title.slice(0, 60),
author,
type: itemType.replace('search_result', 'result'),
url,
});
}
return rows;
},
});