@jackwener/opencli
Advanced tools
+594
| # CLI-CREATOR — 适配器开发完全指南 | ||
| > 本文档教你(或 AI Agent)如何为 OpenCLI 添加一个新网站的命令。 | ||
| > 从零到发布,覆盖 API 发现、方案选择、适配器编写、测试验证全流程。 | ||
| ## 核心流程 | ||
| ``` | ||
| ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ ┌────────┐ | ||
| │ 1. 发现 API │ ──▶ │ 2. 选择策略 │ ──▶ │ 3. 写适配器 │ ──▶ │ 4. 测试 │ | ||
| └─────────────┘ └─────────────┘ └──────────────┘ └────────┘ | ||
| explore cascade YAML / TS run + verify | ||
| ``` | ||
| --- | ||
| ## Step 1: 发现 API | ||
| ### 1a. 自动化发现(推荐) | ||
| OpenCLI 内置 Deep Explore,自动分析网站网络请求: | ||
| ```bash | ||
| opencli explore https://www.example.com --site mysite | ||
| ``` | ||
| 输出到 `.opencli/explore/mysite/`: | ||
| | 文件 | 内容 | | ||
| |------|------| | ||
| | `manifest.json` | 站点元数据、框架检测(Vue2/3、React、Next.js、Pinia、Vuex) | | ||
| | `endpoints.json` | 已发现的 API 端点,按评分排序,含 URL pattern、方法、响应类型 | | ||
| | `capabilities.json` | 推理出的功能(`hot`、`search`、`feed`…),含置信度和推荐参数 | | ||
| | `auth.json` | 认证方式检测(Cookie/Header/无认证),策略候选列表 | | ||
| ### 1b. 手动抓包验证 | ||
| Explore 的自动分析可能不完美,用 verbose 模式手动确认: | ||
| ```bash | ||
| # 在浏览器中打开目标页面,观察网络请求 | ||
| opencli explore https://www.example.com --site mysite -v | ||
| # 或直接用 evaluate 测试 API | ||
| opencli bilibili hot -v # 查看已有命令的 pipeline 每步数据流 | ||
| ``` | ||
| 关注抓包结果中的关键信息: | ||
| - **URL pattern**: `/api/v2/hot?limit=20` → 这就是你要调用的端点 | ||
| - **Method**: `GET` / `POST` | ||
| - **Request Headers**: Cookie? Bearer? 自定义签名头(X-s、X-t)? | ||
| - **Response Body**: JSON 结构,特别是数据在哪个路径(`data.items`、`data.list`) | ||
| ### 1c. 框架检测 | ||
| Explore 自动检测前端框架。如果需要手动确认: | ||
| ```bash | ||
| # 在已打开目标网站的情况下 | ||
| opencli evaluate "(()=>{ | ||
| const vue3 = !!document.querySelector('#app')?.__vue_app__; | ||
| const vue2 = !!document.querySelector('#app')?.__vue__; | ||
| const react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__; | ||
| const pinia = vue3 && !!document.querySelector('#app').__vue_app__.config.globalProperties.\$pinia; | ||
| return JSON.stringify({vue3, vue2, react, pinia}); | ||
| })()" | ||
| ``` | ||
| Vue + Pinia 的站点(如小红书)可以直接通过 Store Action 绕过签名。 | ||
| --- | ||
| ## Step 2: 选择认证策略 | ||
| OpenCLI 提供 5 级认证策略。使用 `cascade` 命令自动探测: | ||
| ```bash | ||
| opencli cascade https://api.example.com/hot | ||
| ``` | ||
| ### 策略决策树 | ||
| ``` | ||
| 直接 fetch(url) 能拿到数据? | ||
| → ✅ Tier 1: public(公开 API,不需要浏览器) | ||
| → ❌ fetch(url, {credentials:'include'}) 带 Cookie 能拿到? | ||
| → ✅ Tier 2: cookie(最常见,evaluate 步骤内 fetch) | ||
| → ❌ → 加上 Bearer / CSRF header 后能拿到? | ||
| → ✅ Tier 3: header(如 Twitter ct0 + Bearer) | ||
| → ❌ → 网站有 Pinia/Vuex Store? | ||
| → ✅ Tier 4: intercept(Store Action + XHR 拦截) | ||
| → ❌ Tier 5: ui(UI 自动化,最后手段) | ||
| ``` | ||
| ### 各策略对比 | ||
| | Tier | 策略 | 速度 | 复杂度 | 适用场景 | 实例 | | ||
| |------|------|------|--------|---------|------| | ||
| | 1 | `public` | ⚡ ~1s | 最简 | 公开 API,无需登录 | Hacker News, V2EX | | ||
| | 2 | `cookie` | 🔄 ~7s | 简单 | Cookie 认证即可 | Bilibili, Zhihu, Reddit | | ||
| | 3 | `header` | 🔄 ~7s | 中等 | 需要 CSRF token 或 Bearer | Twitter GraphQL | | ||
| | 4 | `intercept` | 🔄 ~10s | 较高 | 请求有复杂签名 | 小红书 (Pinia + XHR) | | ||
| | 5 | `ui` | 🐌 ~15s+ | 最高 | 无 API,纯 DOM 解析 | 遗留网站 | | ||
| --- | ||
| ## Step 3: 编写适配器 | ||
| ### YAML vs TS?先看决策树 | ||
| ``` | ||
| 你的 pipeline 里有 evaluate 步骤(内嵌 JS 代码)? | ||
| → ✅ 用 TypeScript (src/clis/<site>/<name>.ts),需在 index.ts 注册 | ||
| → ❌ 纯声明式(navigate + tap + map + limit)? | ||
| → ✅ 用 YAML (src/clis/<site>/<name>.yaml),放入即自动注册 | ||
| ``` | ||
| | 场景 | 选择 | 示例 | | ||
| |------|------|------| | ||
| | 纯 fetch/select/map/limit | YAML | `v2ex/hot.yaml`, `hackernews/top.yaml` | | ||
| | navigate + evaluate(fetch) + map | YAML(评估复杂度) | `zhihu/hot.yaml` | | ||
| | navigate + tap + map | YAML ✅ | `xiaohongshu/feed.yaml`, `xiaohongshu/notifications.yaml` | | ||
| | 有复杂 JS 逻辑(Pinia state 读取、条件分支) | TS | `xiaohongshu/me.ts`, `bilibili/me.ts` | | ||
| | XHR 拦截 + 签名 | TS | `xiaohongshu/search.ts` | | ||
| | GraphQL / 分页 / Wbi 签名 | TS | `bilibili/search.ts`, `twitter/search.ts` | | ||
| > **经验法则**:如果你发现 YAML 里嵌了超过 10 行 JS,改用 TS 更可维护。 | ||
| ### 方式 A: YAML Pipeline(声明式,推荐) | ||
| 文件路径: `src/clis/<site>/<name>.yaml`,放入即自动注册。 | ||
| #### Tier 1 — 公开 API 模板 | ||
| ```yaml | ||
| # src/clis/v2ex/hot.yaml | ||
| site: v2ex | ||
| name: hot | ||
| description: V2EX 热门话题 | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| pipeline: | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/hot.json | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| replies: ${{ item.replies }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, replies] | ||
| ``` | ||
| #### Tier 2 — Cookie 认证模板(最常用) | ||
| ```yaml | ||
| # src/clis/zhihu/hot.yaml | ||
| site: zhihu | ||
| name: hot | ||
| description: 知乎热榜 | ||
| domain: www.zhihu.com | ||
| pipeline: | ||
| - navigate: https://www.zhihu.com # 先加载页面建立 session | ||
| - evaluate: | # 在浏览器内发请求,自动带 Cookie | ||
| (async () => { | ||
| const res = await fetch('/api/v3/feed/topstory/hot-lists/total?limit=50', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data || []).map(item => { | ||
| const t = item.target || {}; | ||
| return { | ||
| title: t.title, | ||
| heat: item.detail_text || '', | ||
| answers: t.answer_count, | ||
| }; | ||
| }); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| heat: ${{ item.heat }} | ||
| answers: ${{ item.answers }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, heat, answers] | ||
| ``` | ||
| > **关键**: `evaluate` 步骤内的 `fetch` 运行在浏览器页面内,自动携带 `credentials: 'include'`,无需手动处理 Cookie。 | ||
| #### 进阶 — 带搜索参数 | ||
| ```yaml | ||
| # src/clis/zhihu/search.yaml | ||
| site: zhihu | ||
| name: search | ||
| description: 知乎搜索 | ||
| args: | ||
| keyword: | ||
| type: str | ||
| required: true | ||
| description: Search keyword | ||
| limit: | ||
| type: int | ||
| default: 10 | ||
| pipeline: | ||
| - navigate: https://www.zhihu.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const q = encodeURIComponent('${{ args.keyword }}'); | ||
| const res = await fetch('/api/v4/search_v3?q=' + q + '&t=general&limit=${{ args.limit }}', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data || []) | ||
| .filter(item => item.type === 'search_result') | ||
| .map(item => ({ | ||
| title: (item.object?.title || '').replace(/<[^>]+>/g, ''), | ||
| type: item.object?.type || '', | ||
| author: item.object?.author?.name || '', | ||
| votes: item.object?.voteup_count || 0, | ||
| })); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| type: ${{ item.type }} | ||
| author: ${{ item.author }} | ||
| votes: ${{ item.votes }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, type, author, votes] | ||
| ``` | ||
| #### Tier 4 — Store Action Bridge(`tap` 步骤,intercept 策略推荐) | ||
| 适用于 Vue + Pinia/Vuex 的网站(如小红书),无须手动写 XHR 拦截代码: | ||
| ```yaml | ||
| # src/clis/xiaohongshu/notifications.yaml | ||
| site: xiaohongshu | ||
| name: notifications | ||
| description: "小红书通知" | ||
| domain: www.xiaohongshu.com | ||
| strategy: intercept | ||
| browser: true | ||
| args: | ||
| type: | ||
| type: str | ||
| default: mentions | ||
| description: "Notification type: mentions, likes, or connections" | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| columns: [rank, user, action, content, note, time] | ||
| pipeline: | ||
| - navigate: https://www.xiaohongshu.com/notification | ||
| - wait: 3 | ||
| - tap: | ||
| store: notification # Pinia store name | ||
| action: getNotification # Store action to call | ||
| args: # Action arguments | ||
| - ${{ args.type | default('mentions') }} | ||
| capture: /you/ # URL pattern to capture response | ||
| select: data.message_list # Extract sub-path from response | ||
| timeout: 8 | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| user: ${{ item.user_info.nickname }} | ||
| action: ${{ item.title }} | ||
| content: ${{ item.comment_info.content }} | ||
| - limit: ${{ args.limit | default(20) }} | ||
| ``` | ||
| > **`tap` 步骤自动完成**:注入 fetch+XHR 双拦截 → 查找 Pinia/Vuex store → 调用 action → 捕获匹配 URL 的响应 → 清理拦截。 | ||
| > 如果 store 或 action 找不到,会返回 `hint` 列出所有可用的 store actions,方便调试。 | ||
| | tap 参数 | 必填 | 说明 | | ||
| |---------|------|------| | ||
| | `store` | ✅ | Pinia store 名称(如 `feed`, `search`, `notification`) | | ||
| | `action` | ✅ | Store action 方法名 | | ||
| | `capture` | ✅ | URL 子串匹配(匹配网络请求 URL) | | ||
| | `args` | ❌ | 传给 action 的参数数组 | | ||
| | `select` | ❌ | 从 captured JSON 中提取的路径(如 `data.items`) | | ||
| | `timeout` | ❌ | 等待网络响应的超时秒数(默认 5s) | | ||
| | `framework` | ❌ | `pinia` 或 `vuex`(默认自动检测) | | ||
| ### 方式 B: TypeScript 适配器(编程式) | ||
| 适用于需要嵌入 JS 代码读取 Pinia state、XHR 拦截、GraphQL、分页、复杂数据转换等场景。 | ||
| 文件路径: `src/clis/<site>/<name>.ts`,还需要在 `src/clis/index.ts` 中 import 注册。 | ||
| #### Tier 3 — Header 认证(Twitter) | ||
| ```typescript | ||
| // src/clis/twitter/search.ts | ||
| import { cli, Strategy } from '../../registry.js'; | ||
| cli({ | ||
| site: 'twitter', | ||
| name: 'search', | ||
| description: 'Search tweets', | ||
| strategy: Strategy.HEADER, | ||
| args: [{ name: 'keyword', required: true }], | ||
| columns: ['rank', 'author', 'text', 'likes'], | ||
| func: async (page, kwargs) => { | ||
| await page.goto('https://x.com'); | ||
| const data = await page.evaluate(` | ||
| (async () => { | ||
| // 从 Cookie 提取 CSRF token | ||
| const ct0 = document.cookie.split(';') | ||
| .map(c => c.trim()) | ||
| .find(c => c.startsWith('ct0='))?.split('=')[1]; | ||
| if (!ct0) return { error: 'Not logged in' }; | ||
| const bearer = 'AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D...'; | ||
| const headers = { | ||
| 'Authorization': 'Bearer ' + decodeURIComponent(bearer), | ||
| 'X-Csrf-Token': ct0, | ||
| 'X-Twitter-Auth-Type': 'OAuth2Session', | ||
| }; | ||
| const variables = JSON.stringify({ rawQuery: '${kwargs.keyword}', count: 20 }); | ||
| const url = '/i/api/graphql/xxx/SearchTimeline?variables=' + encodeURIComponent(variables); | ||
| const res = await fetch(url, { headers, credentials: 'include' }); | ||
| return await res.json(); | ||
| })() | ||
| `); | ||
| // ... 解析 data | ||
| }, | ||
| }); | ||
| ``` | ||
| #### Tier 4 — Store Action + XHR 拦截(小红书) | ||
| ```typescript | ||
| // src/clis/xiaohongshu/search.ts | ||
| import { cli, Strategy } from '../../registry.js'; | ||
| cli({ | ||
| site: 'xiaohongshu', | ||
| name: 'search', | ||
| description: '搜索小红书笔记', | ||
| strategy: Strategy.COOKIE, // 实际是 intercept 模式 | ||
| args: [{ name: 'keyword', required: true }], | ||
| columns: ['rank', 'title', 'author', 'likes', 'type'], | ||
| func: async (page, kwargs) => { | ||
| await page.goto('https://www.xiaohongshu.com'); | ||
| await page.wait(2); | ||
| const data = await page.evaluate(` | ||
| (async () => { | ||
| const app = document.querySelector('#app')?.__vue_app__; | ||
| const pinia = app?.config?.globalProperties?.$pinia; | ||
| if (!pinia?._s) return { error: 'Page not ready' }; | ||
| const searchStore = pinia._s.get('search'); | ||
| if (!searchStore) return { error: 'Search store not found' }; | ||
| // XHR 拦截:捕获 store action 发出的请求 | ||
| let captured = null; | ||
| const origOpen = XMLHttpRequest.prototype.open; | ||
| const origSend = XMLHttpRequest.prototype.send; | ||
| XMLHttpRequest.prototype.open = function(m, u) { | ||
| this.__url = u; | ||
| return origOpen.apply(this, arguments); | ||
| }; | ||
| XMLHttpRequest.prototype.send = function(b) { | ||
| if (this.__url?.includes('search/notes')) { | ||
| const x = this; | ||
| const orig = x.onreadystatechange; | ||
| x.onreadystatechange = function() { | ||
| if (x.readyState === 4 && !captured) { | ||
| try { captured = JSON.parse(x.responseText); } catch {} | ||
| } | ||
| if (orig) orig.apply(this, arguments); | ||
| }; | ||
| } | ||
| return origSend.apply(this, arguments); | ||
| }; | ||
| try { | ||
| // 触发 Store Action,让网站自己签名发请求 | ||
| searchStore.mutateSearchValue('${kwargs.keyword}'); | ||
| await searchStore.loadMore(); | ||
| await new Promise(r => setTimeout(r, 800)); | ||
| } finally { | ||
| // 恢复原始 XHR | ||
| XMLHttpRequest.prototype.open = origOpen; | ||
| XMLHttpRequest.prototype.send = origSend; | ||
| } | ||
| if (!captured?.success) return { error: captured?.msg || 'Search failed' }; | ||
| return (captured.data?.items || []).map(i => ({ | ||
| title: i.note_card?.display_title || '', | ||
| author: i.note_card?.user?.nickname || '', | ||
| likes: i.note_card?.interact_info?.liked_count || '0', | ||
| type: i.note_card?.type || '', | ||
| })); | ||
| })() | ||
| `); | ||
| if (!Array.isArray(data)) return []; | ||
| return data.slice(0, kwargs.limit || 20).map((item, i) => ({ | ||
| rank: i + 1, ...item, | ||
| })); | ||
| }, | ||
| }); | ||
| ``` | ||
| > **XHR 拦截核心思路**:不自己构造签名,而是劫持网站自己的 `XMLHttpRequest`,让网站的 Store Action 发出正确签名的请求,我们只是"窃听"响应。用完后必须恢复原始方法。 | ||
| --- | ||
| ## Step 4: 测试 | ||
| > **⚠️ 构建通过 ≠ 功能正常**。`npm run build` 只验证 TypeScript / YAML 语法,不验证运行时行为。 | ||
| > 每个新命令 **必须实际运行** 并确认输出正确后才算完成。 | ||
| ### 必做清单 | ||
| ```bash | ||
| # 1. 构建(确认语法无误) | ||
| npm run build | ||
| # 2. 确认命令已注册 | ||
| opencli list | grep mysite | ||
| # 3. 实际运行命令(最关键!) | ||
| opencli mysite hot --limit 3 -v # verbose 查看每步数据流 | ||
| opencli mysite hot --limit 3 -f json # JSON 输出确认字段完整 | ||
| ``` | ||
| ### tap 步骤调试(intercept 策略专用) | ||
| > **⚠️ 不要猜 store name / action name**。先用 evaluate 探索,再写 YAML。 | ||
| #### Step 1: 列出所有 Pinia store | ||
| 在浏览器中打开目标网站后: | ||
| ```bash | ||
| opencli evaluate "(() => { | ||
| const app = document.querySelector('#app')?.__vue_app__; | ||
| const pinia = app?.config?.globalProperties?.\$pinia; | ||
| return [...pinia._s.keys()]; | ||
| })()" | ||
| # 输出: ["user", "feed", "search", "notification", ...] | ||
| ``` | ||
| #### Step 2: 查看 store 的 action 名称 | ||
| 故意写一个错误 action 名,tap 会返回所有可用 actions: | ||
| ``` | ||
| ⚠ tap: Action not found: wrongName on store notification | ||
| 💡 Available: getNotification, replyComment, getNotificationCount, reset | ||
| ``` | ||
| #### Step 3: 用 network requests 确认 capture 模式 | ||
| ```bash | ||
| # 在浏览器打开目标页面,查看网络请求 | ||
| # 找到目标 API 的 URL 特征(如 "/you/mentions"、"homefeed") | ||
| ``` | ||
| #### 完整流程 | ||
| ``` | ||
| ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────┐ | ||
| │ 1. navigate │ ──▶ │ 2. 探索 store │ ──▶ │ 3. 写 YAML │ ──▶ │ 4. 测试 │ | ||
| │ 到目标页面 │ │ name/action │ │ tap 步骤 │ │ 运行验证 │ | ||
| └──────────────┘ └──────────────┘ └──────────────┘ └────────┘ | ||
| ``` | ||
| ### Verbose 模式 | ||
| ```bash | ||
| # 查看 pipeline 每步的输入输出 | ||
| opencli bilibili hot --limit 1 -v | ||
| ``` | ||
| 输出示例: | ||
| ``` | ||
| [1/4] navigate → https://www.bilibili.com | ||
| → (no data) | ||
| [2/4] evaluate → (async () => { const res = await fetch(… | ||
| → [{title: "…", author: "…", play: 230835}] | ||
| [3/4] map (rank, title, author, play, danmaku) | ||
| → [{rank: 1, title: "…", author: "…"}] | ||
| [4/4] limit → 1 | ||
| → [{rank: 1, title: "…"}] | ||
| ``` | ||
| ### 输出格式验证 | ||
| ```bash | ||
| # 确认表格渲染正确 | ||
| opencli mysite hot -f table | ||
| # 确认 JSON 可被 jq 解析 | ||
| opencli mysite hot -f json | jq '.[0]' | ||
| # 确认 CSV 可被导入 | ||
| opencli mysite hot -f csv > data.csv | ||
| ``` | ||
| --- | ||
| ## Step 5: 注册 & 发布 | ||
| ### YAML 适配器 | ||
| 放入 `src/clis/<site>/<name>.yaml` 即自动注册,无需额外操作。 | ||
| ### TS 适配器 | ||
| 在 `src/clis/index.ts` 添加 import: | ||
| ```typescript | ||
| import './mysite/search.js'; | ||
| ``` | ||
| ### 验证注册 | ||
| ```bash | ||
| opencli list # 确认新命令出现 | ||
| opencli validate mysite # 校验定义完整性 | ||
| ``` | ||
| ### 提交 | ||
| ```bash | ||
| git add src/clis/mysite/ | ||
| git commit -m "feat(mysite): add hot and search adapters" | ||
| git push | ||
| ``` | ||
| --- | ||
| ## 常见陷阱 | ||
| | 陷阱 | 表现 | 解决方案 | | ||
| |------|------|---------| | ||
| | 缺少 `navigate` | evaluate 报 `Target page context` 错误 | 在 evaluate 前加 `navigate:` 步骤 | | ||
| | 嵌套字段访问 | `${{ item.node?.title }}` 不工作 | 在 evaluate 中 flatten 数据,不在模板中用 optional chaining | | ||
| | 缺少 `strategy: public` | 公开 API 也启动浏览器,7s → 1s | 公开 API 加上 `strategy: public` + `browser: false` | | ||
| | evaluate 返回字符串 | map 步骤收到 `""` 而非数组 | pipeline 有 auto-parse,但建议在 evaluate 内 `.map()` 整形 | | ||
| | 搜索参数被 URL 编码 | `${{ args.keyword }}` 被浏览器二次编码 | 在 evaluate 内用 `encodeURIComponent()` 手动编码 | | ||
| | Cookie 过期 | 返回 401 / 空数据 | 在浏览器里重新登录目标站点 | | ||
| | Extension tab 残留 | Chrome 多出 `chrome-extension://` tab | 已自动清理;若残留,手动关闭即可 | | ||
| | TS evaluate 格式 | `() => {}` 报 `result is not a function` | TS 中 `page.evaluate()` 必须用 IIFE:`(async () => { ... })()` | | ||
| | 页面异步加载 | evaluate 拿到空数据(store state 还没更新) | 在 evaluate 内用 polling 等待数据出现,或增加 `wait` 时间 | | ||
| | YAML 内嵌大段 JS | 调试困难,字符串转义问题 | 超过 10 行 JS 的命令改用 TS adapter | | ||
| --- | ||
| ## 用 AI Agent 自动生成适配器 | ||
| 最快的方式是让 AI Agent 完成全流程: | ||
| ```bash | ||
| # 一键:探索 → 分析 → 合成 → 注册 | ||
| opencli generate https://www.example.com --goal "hot" | ||
| # 或分步执行: | ||
| opencli explore https://www.example.com --site mysite # 发现 API | ||
| opencli synthesize mysite # 生成候选 YAML | ||
| opencli verify mysite/hot --smoke # 冒烟测试 | ||
| ``` | ||
| 生成的候选 YAML 保存在 `.opencli/explore/mysite/candidates/`,可直接复制到 `src/clis/mysite/` 并微调。 |
| /** | ||
| * Strategy Cascade: automatic strategy downgrade chain. | ||
| * | ||
| * Probes an API endpoint starting from the simplest strategy (PUBLIC) | ||
| * and automatically downgrades through the strategy tiers until one works: | ||
| * | ||
| * PUBLIC → COOKIE → HEADER → INTERCEPT → UI | ||
| * | ||
| * This eliminates the need for manual strategy selection — the system | ||
| * automatically finds the minimum-privilege strategy that works. | ||
| */ | ||
| import { Strategy } from './registry.js'; | ||
| interface ProbeResult { | ||
| strategy: Strategy; | ||
| success: boolean; | ||
| statusCode?: number; | ||
| hasData?: boolean; | ||
| error?: string; | ||
| responsePreview?: string; | ||
| } | ||
| interface CascadeResult { | ||
| bestStrategy: Strategy; | ||
| probes: ProbeResult[]; | ||
| confidence: number; | ||
| } | ||
| /** | ||
| * Probe an endpoint with a specific strategy. | ||
| * Returns whether the probe succeeded and basic response info. | ||
| */ | ||
| export declare function probeEndpoint(page: any, url: string, strategy: Strategy, opts?: { | ||
| timeout?: number; | ||
| }): Promise<ProbeResult>; | ||
| /** | ||
| * Run the cascade: try each strategy in order until one works. | ||
| * Returns the simplest working strategy. | ||
| */ | ||
| export declare function cascadeProbe(page: any, url: string, opts?: { | ||
| maxStrategy?: Strategy; | ||
| timeout?: number; | ||
| }): Promise<CascadeResult>; | ||
| /** | ||
| * Render cascade results for display. | ||
| */ | ||
| export declare function renderCascadeResult(result: CascadeResult): string; | ||
| export {}; |
+180
| /** | ||
| * Strategy Cascade: automatic strategy downgrade chain. | ||
| * | ||
| * Probes an API endpoint starting from the simplest strategy (PUBLIC) | ||
| * and automatically downgrades through the strategy tiers until one works: | ||
| * | ||
| * PUBLIC → COOKIE → HEADER → INTERCEPT → UI | ||
| * | ||
| * This eliminates the need for manual strategy selection — the system | ||
| * automatically finds the minimum-privilege strategy that works. | ||
| */ | ||
| import { Strategy } from './registry.js'; | ||
| /** Strategy cascade order (simplest → most complex) */ | ||
| const CASCADE_ORDER = [ | ||
| Strategy.PUBLIC, | ||
| Strategy.COOKIE, | ||
| Strategy.HEADER, | ||
| Strategy.INTERCEPT, | ||
| Strategy.UI, | ||
| ]; | ||
| /** | ||
| * Probe an endpoint with a specific strategy. | ||
| * Returns whether the probe succeeded and basic response info. | ||
| */ | ||
| export async function probeEndpoint(page, url, strategy, opts = {}) { | ||
| const result = { strategy, success: false }; | ||
| try { | ||
| switch (strategy) { | ||
| case Strategy.PUBLIC: { | ||
| // Try direct fetch without browser (no credentials) | ||
| const js = ` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(url)}); | ||
| const status = resp.status; | ||
| if (!resp.ok) return { status, ok: false }; | ||
| const text = await resp.text(); | ||
| let hasData = false; | ||
| try { | ||
| const json = JSON.parse(text); | ||
| hasData = !!json && (Array.isArray(json) ? json.length > 0 : | ||
| typeof json === 'object' && Object.keys(json).length > 0); | ||
| } catch {} | ||
| return { status, ok: true, hasData, preview: text.slice(0, 200) }; | ||
| } catch (e) { return { ok: false, error: e.message }; } | ||
| } | ||
| `; | ||
| const resp = await page.evaluate(js); | ||
| result.statusCode = resp?.status; | ||
| result.success = resp?.ok && resp?.hasData; | ||
| result.hasData = resp?.hasData; | ||
| result.responsePreview = resp?.preview; | ||
| break; | ||
| } | ||
| case Strategy.COOKIE: { | ||
| // Fetch with credentials: 'include' (uses browser cookies) | ||
| const js = ` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' }); | ||
| const status = resp.status; | ||
| if (!resp.ok) return { status, ok: false }; | ||
| const text = await resp.text(); | ||
| let hasData = false; | ||
| try { | ||
| const json = JSON.parse(text); | ||
| hasData = !!json && (Array.isArray(json) ? json.length > 0 : | ||
| typeof json === 'object' && Object.keys(json).length > 0); | ||
| // Check for API-level error codes (common in Chinese sites) | ||
| if (json.code !== undefined && json.code !== 0) hasData = false; | ||
| } catch {} | ||
| return { status, ok: true, hasData, preview: text.slice(0, 200) }; | ||
| } catch (e) { return { ok: false, error: e.message }; } | ||
| } | ||
| `; | ||
| const resp = await page.evaluate(js); | ||
| result.statusCode = resp?.status; | ||
| result.success = resp?.ok && resp?.hasData; | ||
| result.hasData = resp?.hasData; | ||
| result.responsePreview = resp?.preview; | ||
| break; | ||
| } | ||
| case Strategy.HEADER: { | ||
| // Fetch with credentials + try to extract common auth headers | ||
| const js = ` | ||
| async () => { | ||
| try { | ||
| // Try to extract CSRF tokens from cookies | ||
| const cookies = document.cookie.split(';').map(c => c.trim()); | ||
| const csrf = cookies.find(c => c.startsWith('ct0=') || c.startsWith('csrf_token=') || c.startsWith('_csrf='))?.split('=').slice(1).join('='); | ||
| const headers = {}; | ||
| if (csrf) { | ||
| headers['X-Csrf-Token'] = csrf; | ||
| headers['X-XSRF-Token'] = csrf; | ||
| } | ||
| const resp = await fetch(${JSON.stringify(url)}, { | ||
| credentials: 'include', | ||
| headers | ||
| }); | ||
| const status = resp.status; | ||
| if (!resp.ok) return { status, ok: false }; | ||
| const text = await resp.text(); | ||
| let hasData = false; | ||
| try { | ||
| const json = JSON.parse(text); | ||
| hasData = !!json && (Array.isArray(json) ? json.length > 0 : | ||
| typeof json === 'object' && Object.keys(json).length > 0); | ||
| if (json.code !== undefined && json.code !== 0) hasData = false; | ||
| } catch {} | ||
| return { status, ok: true, hasData, preview: text.slice(0, 200) }; | ||
| } catch (e) { return { ok: false, error: e.message }; } | ||
| } | ||
| `; | ||
| const resp = await page.evaluate(js); | ||
| result.statusCode = resp?.status; | ||
| result.success = resp?.ok && resp?.hasData; | ||
| result.hasData = resp?.hasData; | ||
| result.responsePreview = resp?.preview; | ||
| break; | ||
| } | ||
| case Strategy.INTERCEPT: | ||
| case Strategy.UI: | ||
| // These require specific implementation per-site | ||
| // Mark as needing manual implementation | ||
| result.success = false; | ||
| result.error = `Strategy ${strategy} requires site-specific implementation`; | ||
| break; | ||
| } | ||
| } | ||
| catch (err) { | ||
| result.success = false; | ||
| result.error = err.message ?? String(err); | ||
| } | ||
| return result; | ||
| } | ||
| /** | ||
| * Run the cascade: try each strategy in order until one works. | ||
| * Returns the simplest working strategy. | ||
| */ | ||
| export async function cascadeProbe(page, url, opts = {}) { | ||
| const maxIdx = opts.maxStrategy | ||
| ? CASCADE_ORDER.indexOf(opts.maxStrategy) | ||
| : CASCADE_ORDER.indexOf(Strategy.HEADER); // Don't auto-try INTERCEPT/UI | ||
| const probes = []; | ||
| for (let i = 0; i <= Math.min(maxIdx, CASCADE_ORDER.length - 1); i++) { | ||
| const strategy = CASCADE_ORDER[i]; | ||
| const probe = await probeEndpoint(page, url, strategy, opts); | ||
| probes.push(probe); | ||
| if (probe.success) { | ||
| return { | ||
| bestStrategy: strategy, | ||
| probes, | ||
| confidence: 1.0 - (i * 0.1), // Higher confidence for simpler strategies | ||
| }; | ||
| } | ||
| } | ||
| // None worked — default to COOKIE (most common for logged-in sites) | ||
| return { | ||
| bestStrategy: Strategy.COOKIE, | ||
| probes, | ||
| confidence: 0.3, | ||
| }; | ||
| } | ||
| /** | ||
| * Render cascade results for display. | ||
| */ | ||
| export function renderCascadeResult(result) { | ||
| const lines = [ | ||
| `Strategy Cascade: ${result.bestStrategy} (${(result.confidence * 100).toFixed(0)}% confidence)`, | ||
| ]; | ||
| for (const probe of result.probes) { | ||
| const icon = probe.success ? '✅' : '❌'; | ||
| const status = probe.statusCode ? ` [${probe.statusCode}]` : ''; | ||
| const err = probe.error ? ` — ${probe.error}` : ''; | ||
| lines.push(` ${icon} ${probe.strategy}${status}${err}`); | ||
| } | ||
| return lines.join('\n'); | ||
| } |
| site: bilibili | ||
| name: hot | ||
| description: B站热门视频 | ||
| domain: www.bilibili.com | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of videos | ||
| pipeline: | ||
| - navigate: https://www.bilibili.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const res = await fetch('https://api.bilibili.com/x/web-interface/popular?ps=${{ args.limit }}&pn=1', { | ||
| credentials: 'include' | ||
| }); | ||
| const data = await res.json(); | ||
| return (data?.data?.list || []).map((item) => ({ | ||
| title: item.title, | ||
| author: item.owner?.name, | ||
| play: item.stat?.view, | ||
| danmaku: item.stat?.danmaku, | ||
| })); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| author: ${{ item.author }} | ||
| play: ${{ item.play }} | ||
| danmaku: ${{ item.danmaku }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, author, play, danmaku] |
| site: github | ||
| name: trending | ||
| description: GitHub trending repositories | ||
| domain: github.com | ||
| args: | ||
| language: | ||
| type: str | ||
| default: "" | ||
| description: "Programming language filter (e.g. python, rust)" | ||
| since: | ||
| type: str | ||
| default: daily | ||
| description: "Time range: daily, weekly, monthly" | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of repos | ||
| pipeline: | ||
| - evaluate: | | ||
| (async () => { | ||
| const lang = '${{ args.language }}' ? '/${{ args.language }}' : ''; | ||
| const res = await fetch(`https://github.com/trending${lang}?since=${{ args.since }}`, { | ||
| headers: { 'Accept': 'text/html' } | ||
| }); | ||
| const html = await res.text(); | ||
| const parser = new DOMParser(); | ||
| const doc = parser.parseFromString(html, 'text/html'); | ||
| const rows = doc.querySelectorAll('article.Box-row'); | ||
| return Array.from(rows).map(row => { | ||
| const nameEl = row.querySelector('h2 a'); | ||
| const descEl = row.querySelector('p'); | ||
| const langEl = row.querySelector('[itemprop="programmingLanguage"]'); | ||
| const starsEl = row.querySelectorAll('a.Link--muted'); | ||
| const todayEl = row.querySelector('span.d-inline-block.float-sm-right'); | ||
| return { | ||
| repo: nameEl?.textContent?.trim()?.replace(/\s+/g, '') || '', | ||
| description: descEl?.textContent?.trim() || '', | ||
| language: langEl?.textContent?.trim() || '', | ||
| stars: starsEl[0]?.textContent?.trim() || '', | ||
| forks: starsEl[1]?.textContent?.trim() || '', | ||
| today: todayEl?.textContent?.trim() || '' | ||
| }; | ||
| }); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| repo: ${{ item.repo }} | ||
| description: ${{ item.description }} | ||
| language: ${{ item.language }} | ||
| stars: ${{ item.stars }} | ||
| today: ${{ item.today }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, repo, language, stars, today] |
| site: hackernews | ||
| name: top | ||
| description: Hacker News top stories | ||
| domain: news.ycombinator.com | ||
| strategy: public | ||
| browser: false | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of stories | ||
| pipeline: | ||
| - fetch: | ||
| url: https://hacker-news.firebaseio.com/v0/topstories.json | ||
| - limit: 30 | ||
| - map: | ||
| id: ${{ item }} | ||
| - fetch: | ||
| url: https://hacker-news.firebaseio.com/v0/item/${{ item.id }}.json | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| score: ${{ item.score }} | ||
| author: ${{ item.by }} | ||
| comments: ${{ item.descendants }} | ||
| url: ${{ item.url }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, score, author, comments] |
| site: reddit | ||
| name: hot | ||
| description: Reddit 热门帖子 | ||
| domain: www.reddit.com | ||
| args: | ||
| subreddit: | ||
| type: str | ||
| default: "" | ||
| description: "Subreddit name (e.g. programming). Empty for frontpage" | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of posts | ||
| pipeline: | ||
| - navigate: https://www.reddit.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const sub = '${{ args.subreddit }}'; | ||
| const path = sub ? '/r/' + sub + '/hot.json' : '/hot.json'; | ||
| const res = await fetch(path + '?limit=${{ args.limit }}&raw_json=1', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data?.children || []).map(c => ({ | ||
| title: c.data.title, | ||
| subreddit: c.data.subreddit_name_prefixed, | ||
| score: c.data.score, | ||
| comments: c.data.num_comments, | ||
| author: c.data.author, | ||
| url: 'https://www.reddit.com' + c.data.permalink, | ||
| })); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| subreddit: ${{ item.subreddit }} | ||
| score: ${{ item.score }} | ||
| comments: ${{ item.comments }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, subreddit, score, comments] |
| site: twitter | ||
| name: trending | ||
| description: Twitter/X trending topics | ||
| domain: x.com | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of trends to show | ||
| pipeline: | ||
| - navigate: https://x.com/explore/tabs/trending | ||
| - evaluate: | | ||
| (async () => { | ||
| const cookies = document.cookie.split(';').reduce((acc, c) => { | ||
| const [k, v] = c.trim().split('='); | ||
| acc[k] = v; | ||
| return acc; | ||
| }, {}); | ||
| const csrfToken = cookies['ct0'] || ''; | ||
| const bearerToken = 'AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'; | ||
| const res = await fetch('/i/api/2/guide.json?include_page_configuration=true', { | ||
| credentials: 'include', | ||
| headers: { 'x-twitter-active-user': 'yes', 'x-csrf-token': csrfToken, 'authorization': 'Bearer ' + bearerToken } | ||
| }); | ||
| const data = await res.json(); | ||
| const trends = data?.timeline?.instructions?.[1]?.addEntries?.entries || []; | ||
| return trends.filter(e => e.content?.timelineModule).flatMap(e => e.content.timelineModule.items || []).map(t => t?.item?.content?.trend).filter(Boolean); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| topic: ${{ item.name }} | ||
| tweets: ${{ item.tweetCount || 'N/A' }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, topic, tweets] |
| site: v2ex | ||
| name: hot | ||
| description: V2EX 热门话题 | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of topics | ||
| pipeline: | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/hot.json | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| replies: ${{ item.replies }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, replies] |
| site: v2ex | ||
| name: latest | ||
| description: V2EX 最新话题 | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of topics | ||
| pipeline: | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/latest.json | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| replies: ${{ item.replies }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, replies] |
| site: v2ex | ||
| name: topic | ||
| description: V2EX 主题详情和回复 | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
| args: | ||
| id: | ||
| type: str | ||
| required: true | ||
| description: Topic ID | ||
| pipeline: | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/show.json | ||
| params: | ||
| id: ${{ args.id }} | ||
| - map: | ||
| title: ${{ item.title }} | ||
| replies: ${{ item.replies }} | ||
| url: ${{ item.url }} | ||
| - limit: 1 | ||
| columns: [title, replies, url] |
| site: xiaohongshu | ||
| name: feed | ||
| description: "小红书首页推荐 Feed (via Pinia Store Action)" | ||
| domain: www.xiaohongshu.com | ||
| strategy: intercept | ||
| browser: true | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of items to return | ||
| columns: [title, author, likes, type, url] | ||
| pipeline: | ||
| - navigate: https://www.xiaohongshu.com/explore | ||
| - wait: 3 | ||
| - tap: | ||
| store: feed | ||
| action: fetchFeeds | ||
| capture: homefeed | ||
| select: data.items | ||
| timeout: 8 | ||
| - map: | ||
| id: ${{ item.id }} | ||
| title: ${{ item.note_card.display_title }} | ||
| type: ${{ item.note_card.type }} | ||
| author: ${{ item.note_card.user.nickname }} | ||
| likes: ${{ item.note_card.interact_info.liked_count }} | ||
| url: https://www.xiaohongshu.com/explore/${{ item.id }} | ||
| - limit: ${{ args.limit | default(20) }} |
| site: xiaohongshu | ||
| name: notifications | ||
| description: "小红书通知 (mentions/likes/connections)" | ||
| domain: www.xiaohongshu.com | ||
| strategy: intercept | ||
| browser: true | ||
| args: | ||
| type: | ||
| type: str | ||
| default: mentions | ||
| description: "Notification type: mentions, likes, or connections" | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of notifications to return | ||
| columns: [rank, user, action, content, note, time] | ||
| pipeline: | ||
| - navigate: https://www.xiaohongshu.com/notification | ||
| - wait: 3 | ||
| - tap: | ||
| store: notification | ||
| action: getNotification | ||
| args: | ||
| - ${{ args.type | default('mentions') }} | ||
| capture: /you/ | ||
| select: data.message_list | ||
| timeout: 8 | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| user: ${{ item.user_info.nickname }} | ||
| action: ${{ item.title }} | ||
| content: ${{ item.comment_info.content }} | ||
| note: ${{ item.item_info.content }} | ||
| time: ${{ item.time }} | ||
| - limit: ${{ args.limit | default(20) }} |
| /** | ||
| * Xiaohongshu search — trigger search via Pinia store + XHR interception. | ||
| * Inspired by bb-sites/xiaohongshu/search.js but adapted for opencli pipeline. | ||
| */ | ||
| export {}; |
| /** | ||
| * Xiaohongshu search — trigger search via Pinia store + XHR interception. | ||
| * Inspired by bb-sites/xiaohongshu/search.js but adapted for opencli pipeline. | ||
| */ | ||
| import { cli, Strategy } from '../../registry.js'; | ||
| cli({ | ||
| site: 'xiaohongshu', | ||
| name: 'search', | ||
| description: '搜索小红书笔记', | ||
| domain: 'www.xiaohongshu.com', | ||
| strategy: Strategy.COOKIE, | ||
| args: [ | ||
| { name: 'keyword', required: true, help: 'Search keyword' }, | ||
| { name: 'limit', type: 'int', default: 20, help: 'Number of results' }, | ||
| ], | ||
| columns: ['rank', 'title', 'author', 'likes', 'type'], | ||
| func: async (page, kwargs) => { | ||
| await page.goto('https://www.xiaohongshu.com'); | ||
| await page.wait(2); | ||
| const data = await page.evaluate(` | ||
| (async () => { | ||
| const app = document.querySelector('#app')?.__vue_app__; | ||
| const pinia = app?.config?.globalProperties?.$pinia; | ||
| if (!pinia?._s) return {error: 'Page not ready'}; | ||
| const searchStore = pinia._s.get('search'); | ||
| if (!searchStore) return {error: 'Search store not found'}; | ||
| let captured = null; | ||
| const origOpen = XMLHttpRequest.prototype.open; | ||
| const origSend = XMLHttpRequest.prototype.send; | ||
| XMLHttpRequest.prototype.open = function(m, u) { this.__url = u; return origOpen.apply(this, arguments); }; | ||
| XMLHttpRequest.prototype.send = function(b) { | ||
| if (this.__url?.includes('search/notes')) { | ||
| const x = this; | ||
| const orig = x.onreadystatechange; | ||
| x.onreadystatechange = function() { if (x.readyState === 4 && !captured) { try { captured = JSON.parse(x.responseText); } catch {} } if (orig) orig.apply(this, arguments); }; | ||
| } | ||
| return origSend.apply(this, arguments); | ||
| }; | ||
| try { | ||
| searchStore.mutateSearchValue('${kwargs.keyword}'); | ||
| await searchStore.loadMore(); | ||
| await new Promise(r => setTimeout(r, 800)); | ||
| } finally { | ||
| XMLHttpRequest.prototype.open = origOpen; | ||
| XMLHttpRequest.prototype.send = origSend; | ||
| } | ||
| if (!captured?.success) return {error: captured?.msg || 'Search failed'}; | ||
| return (captured.data?.items || []).map(i => ({ | ||
| title: i.note_card?.display_title || '', | ||
| type: i.note_card?.type || '', | ||
| url: 'https://www.xiaohongshu.com/explore/' + i.id, | ||
| author: i.note_card?.user?.nickname || '', | ||
| likes: i.note_card?.interact_info?.liked_count || '0', | ||
| })); | ||
| })() | ||
| `); | ||
| if (!Array.isArray(data)) | ||
| return []; | ||
| return data.slice(0, kwargs.limit).map((item, i) => ({ | ||
| rank: i + 1, | ||
| ...item, | ||
| })); | ||
| }, | ||
| }); |
| site: zhihu | ||
| name: hot | ||
| description: 知乎热榜 | ||
| domain: www.zhihu.com | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of items to return | ||
| pipeline: | ||
| - navigate: https://www.zhihu.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const res = await fetch('https://www.zhihu.com/api/v3/feed/topstory/hot-lists/total?limit=50', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data || []).map((item) => { | ||
| const t = item.target || {}; | ||
| return { | ||
| title: t.title, | ||
| url: 'https://www.zhihu.com/question/' + t.id, | ||
| answer_count: t.answer_count, | ||
| follower_count: t.follower_count, | ||
| heat: item.detail_text || '', | ||
| }; | ||
| }); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| heat: ${{ item.heat }} | ||
| answers: ${{ item.answer_count }} | ||
| url: ${{ item.url }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, heat, answers] |
| export {}; |
| import { cli, Strategy } from '../../registry.js'; | ||
| cli({ | ||
| site: 'zhihu', | ||
| name: 'question', | ||
| description: '知乎问题详情和回答', | ||
| domain: 'www.zhihu.com', | ||
| strategy: Strategy.COOKIE, | ||
| args: [ | ||
| { name: 'id', required: true, help: 'Question ID (numeric)' }, | ||
| { name: 'limit', type: 'int', default: 5, help: 'Number of answers' }, | ||
| ], | ||
| columns: ['rank', 'author', 'votes', 'content'], | ||
| func: async (page, kwargs) => { | ||
| const { id, limit = 5 } = kwargs; | ||
| const stripHtml = (html) => (html || '').replace(/<[^>]+>/g, '').replace(/ /g, ' ').replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&').trim(); | ||
| // Fetch question detail and answers in parallel via evaluate | ||
| const result = await page.evaluate(` | ||
| async () => { | ||
| const [qResp, aResp] = await Promise.all([ | ||
| fetch('https://www.zhihu.com/api/v4/questions/${id}?include=data[*].detail,excerpt,answer_count,follower_count,visit_count', {credentials: 'include'}), | ||
| fetch('https://www.zhihu.com/api/v4/questions/${id}/answers?limit=${limit}&offset=0&sort_by=default&include=data[*].content,voteup_count,comment_count,author', {credentials: 'include'}) | ||
| ]); | ||
| if (!qResp.ok || !aResp.ok) return { error: true }; | ||
| const q = await qResp.json(); | ||
| const a = await aResp.json(); | ||
| return { question: q, answers: a.data || [] }; | ||
| } | ||
| `); | ||
| if (!result || result.error) | ||
| throw new Error('Failed to fetch question. Are you logged in?'); | ||
| const answers = (result.answers ?? []).slice(0, Number(limit)).map((a, i) => ({ | ||
| rank: i + 1, | ||
| author: a.author?.name ?? 'anonymous', | ||
| votes: a.voteup_count ?? 0, | ||
| content: stripHtml(a.content ?? '').slice(0, 200), | ||
| })); | ||
| return answers; | ||
| }, | ||
| }); |
| site: zhihu | ||
| name: search | ||
| description: 知乎搜索 | ||
| domain: www.zhihu.com | ||
| args: | ||
| keyword: | ||
| type: str | ||
| required: true | ||
| description: Search keyword | ||
| limit: | ||
| type: int | ||
| default: 10 | ||
| description: Number of results | ||
| pipeline: | ||
| - navigate: https://www.zhihu.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const strip = (html) => (html || '').replace(/<[^>]+>/g, '').replace(/ /g, ' ').replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&').replace(/<em>/g, '').replace(/<\/em>/g, '').trim(); | ||
| const res = await fetch('https://www.zhihu.com/api/v4/search_v3?q=' + encodeURIComponent('${{ args.keyword }}') + '&t=general&offset=0&limit=${{ args.limit }}', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data || []) | ||
| .filter(item => item.type === 'search_result') | ||
| .map(item => { | ||
| const obj = item.object || {}; | ||
| const q = obj.question || {}; | ||
| return { | ||
| type: obj.type, | ||
| title: strip(obj.title || q.name || ''), | ||
| excerpt: strip(obj.excerpt || '').substring(0, 100), | ||
| author: obj.author?.name || '', | ||
| votes: obj.voteup_count || 0, | ||
| url: obj.type === 'answer' | ||
| ? 'https://www.zhihu.com/question/' + q.id + '/answer/' + obj.id | ||
| : obj.type === 'article' | ||
| ? 'https://zhuanlan.zhihu.com/p/' + obj.id | ||
| : 'https://www.zhihu.com/question/' + obj.id, | ||
| }; | ||
| }); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| type: ${{ item.type }} | ||
| author: ${{ item.author }} | ||
| votes: ${{ item.votes }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, type, author, votes] |
+143
| # OpenCLI | ||
| > **把任何网站变成你的命令行工具。** | ||
| > 零风控 · 复用 Chrome 登录 · AI 自动发现接口 | ||
| [English](./README.md) | ||
| [](https://www.npmjs.com/package/@jackwener/opencli) | ||
| OpenCLI 通过 Chrome 浏览器 + [Playwright MCP Bridge](https://github.com/nichochar/playwright-mcp) 扩展,将任何网站变成命令行工具。不存密码、不泄 token,直接复用浏览器已登录状态。 | ||
| ## ✨ 亮点 | ||
| - 🌐 **25+ 命令,8 个站点** — B站、知乎、GitHub、Twitter/X、Reddit、V2EX、小红书、Hacker News | ||
| - 🔐 **零风控** — 复用 Chrome 登录态,无需存储任何凭证 | ||
| - 🤖 **AI 原生** — `explore` 自动发现 API,`synthesize` 生成适配器,`cascade` 探测认证策略 | ||
| - 📝 **声明式 YAML** — 大部分适配器只需 ~30 行 YAML | ||
| - 🔌 **TypeScript 扩展** — 复杂场景(XHR 拦截、GraphQL)可用 TS 编写 | ||
| ## 🚀 快速开始 | ||
| ### npm 全局安装(推荐) | ||
| ```bash | ||
| npm install -g @jackwener/opencli | ||
| ``` | ||
| 直接使用: | ||
| ```bash | ||
| opencli list # 查看所有命令 | ||
| opencli hackernews top --limit 5 # 公共 API,无需浏览器 | ||
| opencli bilibili hot --limit 5 # 浏览器命令 | ||
| opencli zhihu hot -f json # JSON 输出 | ||
| ``` | ||
| ### 从源码安装 | ||
| ```bash | ||
| git clone git@github.com:jackwener/opencli.git | ||
| cd opencli && npm install | ||
| npx tsx src/main.ts list | ||
| ``` | ||
| ### 更新 | ||
| ```bash | ||
| # npm 全局更新 | ||
| npm update -g @jackwener/opencli | ||
| # 或直接安装最新版 | ||
| npm install -g @jackwener/opencli@latest | ||
| ``` | ||
| ## 📋 前置要求 | ||
| 浏览器命令需要: | ||
| 1. **Chrome** 浏览器正在运行,且**已登录目标网站**(如 bilibili.com、zhihu.com、xiaohongshu.com) | ||
| 2. 安装 **[Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm)** 扩展 | ||
| 3. 在 MCP 配置中设置 `PLAYWRIGHT_MCP_EXTENSION_TOKEN`(从扩展设置页获取): | ||
| ```json | ||
| { | ||
| "mcpServers": { | ||
| "playwright": { | ||
| "command": "npx", | ||
| "args": ["@playwright/mcp@latest", "--extension"], | ||
| "env": { | ||
| "PLAYWRIGHT_MCP_EXTENSION_TOKEN": "<your-token>" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| 公共 API 命令(`hackernews`、`github search`、`v2ex`)无需浏览器。 | ||
| > **⚠️ 重要**:浏览器命令复用你的 Chrome 登录状态。运行命令前,你必须已在 Chrome 中登录目标网站。如果获取到空数据或报错,请先检查登录状态。 | ||
| ## 📦 内置命令 | ||
| | 站点 | 命令 | 模式 | | ||
| |------|------|------| | ||
| | **bilibili** | `hot` `search` `me` `favorite` `history` `feed` `user-videos` | 🔐 浏览器 | | ||
| | **zhihu** | `hot` `search` `question` | 🔐 浏览器 | | ||
| | **xiaohongshu** | `search` `notifications` `feed` | 🔐 浏览器 | | ||
| | **twitter** | `trending` | 🔐 浏览器 | | ||
| | **reddit** | `hot` | 🔐 浏览器 | | ||
| | **github** | `trending` `search` | 🔐 / 🌐 | | ||
| | **v2ex** | `hot` `latest` `topic` | 🌐 公共 API | | ||
| | **hackernews** | `top` | 🌐 公共 API | | ||
| ## 🎨 输出格式 | ||
| ```bash | ||
| opencli bilibili hot -f table # 默认:表格 | ||
| opencli bilibili hot -f json # JSON(可 pipe 给 jq 或 AI agent) | ||
| opencli bilibili hot -f md # Markdown | ||
| opencli bilibili hot -f csv # CSV | ||
| opencli bilibili hot -v # 详细模式:展示 pipeline 每步数据 | ||
| ``` | ||
| ## 🧠 AI Agent 工作流 | ||
| ```bash | ||
| # 1. Deep Explore — 网络拦截 → 响应分析 → 能力推理 → 框架检测 | ||
| opencli explore https://example.com --site mysite | ||
| # 2. Synthesize — 从探索成果物生成 evaluate-based YAML 适配器 | ||
| opencli synthesize mysite | ||
| # 3. Generate — 一键完成:探索 → 合成 → 注册 | ||
| opencli generate https://example.com --goal "hot" | ||
| # 4. Strategy Cascade — 自动降级探测:PUBLIC → COOKIE → HEADER | ||
| opencli cascade https://api.example.com/data | ||
| ``` | ||
| 探索结果输出到 `.opencli/explore/<site>/`: | ||
| - `manifest.json` — 站点元数据、框架检测结果 | ||
| - `endpoints.json` — 评分排序的 API 端点,含响应 schema | ||
| - `capabilities.json` — 推理出的能力及置信度 | ||
| - `auth.json` — 认证策略建议 | ||
| ## 🔧 创建新命令 | ||
| 查看 **[SKILL.md](./SKILL.md)** 了解完整的适配器开发指南(YAML pipeline + TypeScript)。 | ||
| ## 版本发布 | ||
| ```bash | ||
| # 升级版本号 | ||
| npm version patch # 0.1.0 → 0.1.1 | ||
| npm version minor # 0.1.0 → 0.2.0 | ||
| npm version major # 0.1.0 → 1.0.0 | ||
| # 推送 tag,GitHub Actions 自动发 release 并发布到 npm | ||
| git push --follow-tags | ||
| ``` | ||
| ## 📄 License | ||
| MIT |
+217
| /** | ||
| * Strategy Cascade: automatic strategy downgrade chain. | ||
| * | ||
| * Probes an API endpoint starting from the simplest strategy (PUBLIC) | ||
| * and automatically downgrades through the strategy tiers until one works: | ||
| * | ||
| * PUBLIC → COOKIE → HEADER → INTERCEPT → UI | ||
| * | ||
| * This eliminates the need for manual strategy selection — the system | ||
| * automatically finds the minimum-privilege strategy that works. | ||
| */ | ||
| import { Strategy } from './registry.js'; | ||
| /** Strategy cascade order (simplest → most complex) */ | ||
| const CASCADE_ORDER: Strategy[] = [ | ||
| Strategy.PUBLIC, | ||
| Strategy.COOKIE, | ||
| Strategy.HEADER, | ||
| Strategy.INTERCEPT, | ||
| Strategy.UI, | ||
| ]; | ||
| interface ProbeResult { | ||
| strategy: Strategy; | ||
| success: boolean; | ||
| statusCode?: number; | ||
| hasData?: boolean; | ||
| error?: string; | ||
| responsePreview?: string; | ||
| } | ||
| interface CascadeResult { | ||
| bestStrategy: Strategy; | ||
| probes: ProbeResult[]; | ||
| confidence: number; | ||
| } | ||
| /** | ||
| * Probe an endpoint with a specific strategy. | ||
| * Returns whether the probe succeeded and basic response info. | ||
| */ | ||
| export async function probeEndpoint( | ||
| page: any, | ||
| url: string, | ||
| strategy: Strategy, | ||
| opts: { timeout?: number } = {}, | ||
| ): Promise<ProbeResult> { | ||
| const result: ProbeResult = { strategy, success: false }; | ||
| try { | ||
| switch (strategy) { | ||
| case Strategy.PUBLIC: { | ||
| // Try direct fetch without browser (no credentials) | ||
| const js = ` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(url)}); | ||
| const status = resp.status; | ||
| if (!resp.ok) return { status, ok: false }; | ||
| const text = await resp.text(); | ||
| let hasData = false; | ||
| try { | ||
| const json = JSON.parse(text); | ||
| hasData = !!json && (Array.isArray(json) ? json.length > 0 : | ||
| typeof json === 'object' && Object.keys(json).length > 0); | ||
| } catch {} | ||
| return { status, ok: true, hasData, preview: text.slice(0, 200) }; | ||
| } catch (e) { return { ok: false, error: e.message }; } | ||
| } | ||
| `; | ||
| const resp = await page.evaluate(js); | ||
| result.statusCode = resp?.status; | ||
| result.success = resp?.ok && resp?.hasData; | ||
| result.hasData = resp?.hasData; | ||
| result.responsePreview = resp?.preview; | ||
| break; | ||
| } | ||
| case Strategy.COOKIE: { | ||
| // Fetch with credentials: 'include' (uses browser cookies) | ||
| const js = ` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' }); | ||
| const status = resp.status; | ||
| if (!resp.ok) return { status, ok: false }; | ||
| const text = await resp.text(); | ||
| let hasData = false; | ||
| try { | ||
| const json = JSON.parse(text); | ||
| hasData = !!json && (Array.isArray(json) ? json.length > 0 : | ||
| typeof json === 'object' && Object.keys(json).length > 0); | ||
| // Check for API-level error codes (common in Chinese sites) | ||
| if (json.code !== undefined && json.code !== 0) hasData = false; | ||
| } catch {} | ||
| return { status, ok: true, hasData, preview: text.slice(0, 200) }; | ||
| } catch (e) { return { ok: false, error: e.message }; } | ||
| } | ||
| `; | ||
| const resp = await page.evaluate(js); | ||
| result.statusCode = resp?.status; | ||
| result.success = resp?.ok && resp?.hasData; | ||
| result.hasData = resp?.hasData; | ||
| result.responsePreview = resp?.preview; | ||
| break; | ||
| } | ||
| case Strategy.HEADER: { | ||
| // Fetch with credentials + try to extract common auth headers | ||
| const js = ` | ||
| async () => { | ||
| try { | ||
| // Try to extract CSRF tokens from cookies | ||
| const cookies = document.cookie.split(';').map(c => c.trim()); | ||
| const csrf = cookies.find(c => c.startsWith('ct0=') || c.startsWith('csrf_token=') || c.startsWith('_csrf='))?.split('=').slice(1).join('='); | ||
| const headers = {}; | ||
| if (csrf) { | ||
| headers['X-Csrf-Token'] = csrf; | ||
| headers['X-XSRF-Token'] = csrf; | ||
| } | ||
| const resp = await fetch(${JSON.stringify(url)}, { | ||
| credentials: 'include', | ||
| headers | ||
| }); | ||
| const status = resp.status; | ||
| if (!resp.ok) return { status, ok: false }; | ||
| const text = await resp.text(); | ||
| let hasData = false; | ||
| try { | ||
| const json = JSON.parse(text); | ||
| hasData = !!json && (Array.isArray(json) ? json.length > 0 : | ||
| typeof json === 'object' && Object.keys(json).length > 0); | ||
| if (json.code !== undefined && json.code !== 0) hasData = false; | ||
| } catch {} | ||
| return { status, ok: true, hasData, preview: text.slice(0, 200) }; | ||
| } catch (e) { return { ok: false, error: e.message }; } | ||
| } | ||
| `; | ||
| const resp = await page.evaluate(js); | ||
| result.statusCode = resp?.status; | ||
| result.success = resp?.ok && resp?.hasData; | ||
| result.hasData = resp?.hasData; | ||
| result.responsePreview = resp?.preview; | ||
| break; | ||
| } | ||
| case Strategy.INTERCEPT: | ||
| case Strategy.UI: | ||
| // These require specific implementation per-site | ||
| // Mark as needing manual implementation | ||
| result.success = false; | ||
| result.error = `Strategy ${strategy} requires site-specific implementation`; | ||
| break; | ||
| } | ||
| } catch (err: any) { | ||
| result.success = false; | ||
| result.error = err.message ?? String(err); | ||
| } | ||
| return result; | ||
| } | ||
| /** | ||
| * Run the cascade: try each strategy in order until one works. | ||
| * Returns the simplest working strategy. | ||
| */ | ||
| export async function cascadeProbe( | ||
| page: any, | ||
| url: string, | ||
| opts: { maxStrategy?: Strategy; timeout?: number } = {}, | ||
| ): Promise<CascadeResult> { | ||
| const maxIdx = opts.maxStrategy | ||
| ? CASCADE_ORDER.indexOf(opts.maxStrategy) | ||
| : CASCADE_ORDER.indexOf(Strategy.HEADER); // Don't auto-try INTERCEPT/UI | ||
| const probes: ProbeResult[] = []; | ||
| for (let i = 0; i <= Math.min(maxIdx, CASCADE_ORDER.length - 1); i++) { | ||
| const strategy = CASCADE_ORDER[i]; | ||
| const probe = await probeEndpoint(page, url, strategy, opts); | ||
| probes.push(probe); | ||
| if (probe.success) { | ||
| return { | ||
| bestStrategy: strategy, | ||
| probes, | ||
| confidence: 1.0 - (i * 0.1), // Higher confidence for simpler strategies | ||
| }; | ||
| } | ||
| } | ||
| // None worked — default to COOKIE (most common for logged-in sites) | ||
| return { | ||
| bestStrategy: Strategy.COOKIE, | ||
| probes, | ||
| confidence: 0.3, | ||
| }; | ||
| } | ||
| /** | ||
| * Render cascade results for display. | ||
| */ | ||
| export function renderCascadeResult(result: CascadeResult): string { | ||
| const lines = [ | ||
| `Strategy Cascade: ${result.bestStrategy} (${(result.confidence * 100).toFixed(0)}% confidence)`, | ||
| ]; | ||
| for (const probe of result.probes) { | ||
| const icon = probe.success ? '✅' : '❌'; | ||
| const status = probe.statusCode ? ` [${probe.statusCode}]` : ''; | ||
| const err = probe.error ? ` — ${probe.error}` : ''; | ||
| lines.push(` ${icon} ${probe.strategy}${status}${err}`); | ||
| } | ||
| return lines.join('\n'); | ||
| } |
| site: reddit | ||
| name: hot | ||
| description: Reddit 热门帖子 | ||
| domain: www.reddit.com | ||
| args: | ||
| subreddit: | ||
| type: str | ||
| default: "" | ||
| description: "Subreddit name (e.g. programming). Empty for frontpage" | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of posts | ||
| pipeline: | ||
| - navigate: https://www.reddit.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const sub = '${{ args.subreddit }}'; | ||
| const path = sub ? '/r/' + sub + '/hot.json' : '/hot.json'; | ||
| const res = await fetch(path + '?limit=${{ args.limit }}&raw_json=1', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data?.children || []).map(c => ({ | ||
| title: c.data.title, | ||
| subreddit: c.data.subreddit_name_prefixed, | ||
| score: c.data.score, | ||
| comments: c.data.num_comments, | ||
| author: c.data.author, | ||
| url: 'https://www.reddit.com' + c.data.permalink, | ||
| })); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| subreddit: ${{ item.subreddit }} | ||
| score: ${{ item.score }} | ||
| comments: ${{ item.comments }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, subreddit, score, comments] |
| site: v2ex | ||
| name: topic | ||
| description: V2EX 主题详情和回复 | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
| args: | ||
| id: | ||
| type: str | ||
| required: true | ||
| description: Topic ID | ||
| pipeline: | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/show.json | ||
| params: | ||
| id: ${{ args.id }} | ||
| - map: | ||
| title: ${{ item.title }} | ||
| replies: ${{ item.replies }} | ||
| url: ${{ item.url }} | ||
| - limit: 1 | ||
| columns: [title, replies, url] |
| site: xiaohongshu | ||
| name: feed | ||
| description: "小红书首页推荐 Feed (via Pinia Store Action)" | ||
| domain: www.xiaohongshu.com | ||
| strategy: intercept | ||
| browser: true | ||
| args: | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of items to return | ||
| columns: [title, author, likes, type, url] | ||
| pipeline: | ||
| - navigate: https://www.xiaohongshu.com/explore | ||
| - wait: 3 | ||
| - tap: | ||
| store: feed | ||
| action: fetchFeeds | ||
| capture: homefeed | ||
| select: data.items | ||
| timeout: 8 | ||
| - map: | ||
| id: ${{ item.id }} | ||
| title: ${{ item.note_card.display_title }} | ||
| type: ${{ item.note_card.type }} | ||
| author: ${{ item.note_card.user.nickname }} | ||
| likes: ${{ item.note_card.interact_info.liked_count }} | ||
| url: https://www.xiaohongshu.com/explore/${{ item.id }} | ||
| - limit: ${{ args.limit | default(20) }} |
| site: xiaohongshu | ||
| name: notifications | ||
| description: "小红书通知 (mentions/likes/connections)" | ||
| domain: www.xiaohongshu.com | ||
| strategy: intercept | ||
| browser: true | ||
| args: | ||
| type: | ||
| type: str | ||
| default: mentions | ||
| description: "Notification type: mentions, likes, or connections" | ||
| limit: | ||
| type: int | ||
| default: 20 | ||
| description: Number of notifications to return | ||
| columns: [rank, user, action, content, note, time] | ||
| pipeline: | ||
| - navigate: https://www.xiaohongshu.com/notification | ||
| - wait: 3 | ||
| - tap: | ||
| store: notification | ||
| action: getNotification | ||
| args: | ||
| - ${{ args.type | default('mentions') }} | ||
| capture: /you/ | ||
| select: data.message_list | ||
| timeout: 8 | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| user: ${{ item.user_info.nickname }} | ||
| action: ${{ item.title }} | ||
| content: ${{ item.comment_info.content }} | ||
| note: ${{ item.item_info.content }} | ||
| time: ${{ item.time }} | ||
| - limit: ${{ args.limit | default(20) }} |
| /** | ||
| * Xiaohongshu search — trigger search via Pinia store + XHR interception. | ||
| * Inspired by bb-sites/xiaohongshu/search.js but adapted for opencli pipeline. | ||
| */ | ||
| import { cli, Strategy } from '../../registry.js'; | ||
| cli({ | ||
| site: 'xiaohongshu', | ||
| name: 'search', | ||
| description: '搜索小红书笔记', | ||
| domain: 'www.xiaohongshu.com', | ||
| strategy: Strategy.COOKIE, | ||
| args: [ | ||
| { name: 'keyword', required: true, help: 'Search keyword' }, | ||
| { name: 'limit', type: 'int', default: 20, help: 'Number of results' }, | ||
| ], | ||
| columns: ['rank', 'title', 'author', 'likes', 'type'], | ||
| func: async (page, kwargs) => { | ||
| await page.goto('https://www.xiaohongshu.com'); | ||
| await page.wait(2); | ||
| const data = await page.evaluate(` | ||
| (async () => { | ||
| const app = document.querySelector('#app')?.__vue_app__; | ||
| const pinia = app?.config?.globalProperties?.$pinia; | ||
| if (!pinia?._s) return {error: 'Page not ready'}; | ||
| const searchStore = pinia._s.get('search'); | ||
| if (!searchStore) return {error: 'Search store not found'}; | ||
| let captured = null; | ||
| const origOpen = XMLHttpRequest.prototype.open; | ||
| const origSend = XMLHttpRequest.prototype.send; | ||
| XMLHttpRequest.prototype.open = function(m, u) { this.__url = u; return origOpen.apply(this, arguments); }; | ||
| XMLHttpRequest.prototype.send = function(b) { | ||
| if (this.__url?.includes('search/notes')) { | ||
| const x = this; | ||
| const orig = x.onreadystatechange; | ||
| x.onreadystatechange = function() { if (x.readyState === 4 && !captured) { try { captured = JSON.parse(x.responseText); } catch {} } if (orig) orig.apply(this, arguments); }; | ||
| } | ||
| return origSend.apply(this, arguments); | ||
| }; | ||
| try { | ||
| searchStore.mutateSearchValue('${kwargs.keyword}'); | ||
| await searchStore.loadMore(); | ||
| await new Promise(r => setTimeout(r, 800)); | ||
| } finally { | ||
| XMLHttpRequest.prototype.open = origOpen; | ||
| XMLHttpRequest.prototype.send = origSend; | ||
| } | ||
| if (!captured?.success) return {error: captured?.msg || 'Search failed'}; | ||
| return (captured.data?.items || []).map(i => ({ | ||
| title: i.note_card?.display_title || '', | ||
| type: i.note_card?.type || '', | ||
| url: 'https://www.xiaohongshu.com/explore/' + i.id, | ||
| author: i.note_card?.user?.nickname || '', | ||
| likes: i.note_card?.interact_info?.liked_count || '0', | ||
| })); | ||
| })() | ||
| `); | ||
| if (!Array.isArray(data)) return []; | ||
| return data.slice(0, kwargs.limit).map((item: any, i: number) => ({ | ||
| rank: i + 1, | ||
| ...item, | ||
| })); | ||
| }, | ||
| }); |
| import { cli, Strategy } from '../../registry.js'; | ||
| cli({ | ||
| site: 'zhihu', | ||
| name: 'question', | ||
| description: '知乎问题详情和回答', | ||
| domain: 'www.zhihu.com', | ||
| strategy: Strategy.COOKIE, | ||
| args: [ | ||
| { name: 'id', required: true, help: 'Question ID (numeric)' }, | ||
| { name: 'limit', type: 'int', default: 5, help: 'Number of answers' }, | ||
| ], | ||
| columns: ['rank', 'author', 'votes', 'content'], | ||
| func: async (page, kwargs) => { | ||
| const { id, limit = 5 } = kwargs; | ||
| const stripHtml = (html: string) => | ||
| (html || '').replace(/<[^>]+>/g, '').replace(/ /g, ' ').replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&').trim(); | ||
| // Fetch question detail and answers in parallel via evaluate | ||
| const result = await page.evaluate(` | ||
| async () => { | ||
| const [qResp, aResp] = await Promise.all([ | ||
| fetch('https://www.zhihu.com/api/v4/questions/${id}?include=data[*].detail,excerpt,answer_count,follower_count,visit_count', {credentials: 'include'}), | ||
| fetch('https://www.zhihu.com/api/v4/questions/${id}/answers?limit=${limit}&offset=0&sort_by=default&include=data[*].content,voteup_count,comment_count,author', {credentials: 'include'}) | ||
| ]); | ||
| if (!qResp.ok || !aResp.ok) return { error: true }; | ||
| const q = await qResp.json(); | ||
| const a = await aResp.json(); | ||
| return { question: q, answers: a.data || [] }; | ||
| } | ||
| `); | ||
| if (!result || result.error) throw new Error('Failed to fetch question. Are you logged in?'); | ||
| const answers = (result.answers ?? []).slice(0, Number(limit)).map((a: any, i: number) => ({ | ||
| rank: i + 1, | ||
| author: a.author?.name ?? 'anonymous', | ||
| votes: a.voteup_count ?? 0, | ||
| content: stripHtml(a.content ?? '').slice(0, 200), | ||
| })); | ||
| return answers; | ||
| }, | ||
| }); |
| site: zhihu | ||
| name: search | ||
| description: 知乎搜索 | ||
| domain: www.zhihu.com | ||
| args: | ||
| keyword: | ||
| type: str | ||
| required: true | ||
| description: Search keyword | ||
| limit: | ||
| type: int | ||
| default: 10 | ||
| description: Number of results | ||
| pipeline: | ||
| - navigate: https://www.zhihu.com | ||
| - evaluate: | | ||
| (async () => { | ||
| const strip = (html) => (html || '').replace(/<[^>]+>/g, '').replace(/ /g, ' ').replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&').replace(/<em>/g, '').replace(/<\/em>/g, '').trim(); | ||
| const res = await fetch('https://www.zhihu.com/api/v4/search_v3?q=' + encodeURIComponent('${{ args.keyword }}') + '&t=general&offset=0&limit=${{ args.limit }}', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data || []) | ||
| .filter(item => item.type === 'search_result') | ||
| .map(item => { | ||
| const obj = item.object || {}; | ||
| const q = obj.question || {}; | ||
| return { | ||
| type: obj.type, | ||
| title: strip(obj.title || q.name || ''), | ||
| excerpt: strip(obj.excerpt || '').substring(0, 100), | ||
| author: obj.author?.name || '', | ||
| votes: obj.voteup_count || 0, | ||
| url: obj.type === 'answer' | ||
| ? 'https://www.zhihu.com/question/' + q.id + '/answer/' + obj.id | ||
| : obj.type === 'article' | ||
| ? 'https://zhuanlan.zhihu.com/p/' + obj.id | ||
| : 'https://www.zhihu.com/question/' + obj.id, | ||
| }; | ||
| }); | ||
| })() | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.title }} | ||
| type: ${{ item.type }} | ||
| author: ${{ item.author }} | ||
| votes: ${{ item.votes }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, type, author, votes] |
@@ -42,2 +42,3 @@ /** | ||
| private _initialTabCount; | ||
| private _page; | ||
| connect(opts?: { | ||
@@ -44,0 +45,0 @@ timeout?: number; |
+35
-1
@@ -40,3 +40,14 @@ /** | ||
| if (textParts.length === 1) { | ||
| const text = textParts[0].text; | ||
| let text = textParts[0].text; | ||
| // MCP browser_evaluate returns: "[JSON]\n### Ran Playwright code\n```js\n...\n```" | ||
| // Strip the "### Ran Playwright code" suffix to get clean JSON | ||
| const codeMarker = text.indexOf('### Ran Playwright code'); | ||
| if (codeMarker !== -1) { | ||
| text = text.slice(0, codeMarker).trim(); | ||
| } | ||
| // Also handle "### Result\n[JSON]" format (some MCP versions) | ||
| const resultMarker = text.indexOf('### Result\n'); | ||
| if (resultMarker !== -1) { | ||
| text = text.slice(resultMarker + '### Result\n'.length).trim(); | ||
| } | ||
| try { | ||
@@ -110,2 +121,3 @@ return JSON.parse(text); | ||
| _initialTabCount = 0; | ||
| _page = null; | ||
| async connect(opts = {}) { | ||
@@ -129,2 +141,3 @@ await this._acquireLock(); | ||
| this._proc.stdin.write(msg); }, () => new Promise((res) => { this._waiters.push(res); })); | ||
| this._page = page; | ||
| this._proc.stdout?.on('data', (chunk) => { | ||
@@ -180,2 +193,22 @@ this._buffer += chunk.toString(); | ||
| try { | ||
| // Close tabs opened during this session (site tabs + extension tabs) | ||
| if (this._page && this._proc && !this._proc.killed) { | ||
| try { | ||
| const tabs = await this._page.tabs(); | ||
| const tabStr = typeof tabs === 'string' ? tabs : JSON.stringify(tabs); | ||
| const allTabs = tabStr.match(/Tab (\d+)/g) || []; | ||
| const currentTabCount = allTabs.length; | ||
| // Close tabs in reverse order to avoid index shifting issues | ||
| // Keep the original tabs that existed before the command started | ||
| if (currentTabCount > this._initialTabCount && this._initialTabCount > 0) { | ||
| for (let i = currentTabCount - 1; i >= this._initialTabCount; i--) { | ||
| try { | ||
| await this._page.closeTab(i); | ||
| } | ||
| catch { } | ||
| } | ||
| } | ||
| } | ||
| catch { } | ||
| } | ||
| if (this._proc && !this._proc.killed) { | ||
@@ -187,2 +220,3 @@ this._proc.kill('SIGTERM'); | ||
| finally { | ||
| this._page = null; | ||
| this._releaseLock(); | ||
@@ -189,0 +223,0 @@ } |
@@ -13,2 +13,3 @@ /** | ||
| import './github/search.js'; | ||
| import './zhihu/search.js'; | ||
| import './zhihu/question.js'; | ||
| import './xiaohongshu/search.js'; |
@@ -16,2 +16,4 @@ /** | ||
| // zhihu | ||
| import './zhihu/search.js'; | ||
| import './zhihu/question.js'; | ||
| // xiaohongshu | ||
| import './xiaohongshu/search.js'; |
+23
-13
| /** | ||
| * Deep Explore: intelligent API discovery with response analysis. | ||
| * | ||
| * Unlike simple page snapshots, Deep Explore intercepts network traffic, | ||
| * analyzes response schemas, and automatically infers capabilities that | ||
| * can be turned into CLI commands. | ||
| * | ||
| * Flow: | ||
| * 1. Navigate to target URL | ||
| * 2. Auto-scroll to trigger lazy loading | ||
| * 3. Capture network requests (with body analysis) | ||
| * 4. For each JSON response: detect list fields, infer columns, analyze auth | ||
| * 5. Detect frontend framework (Vue/React/Pinia/Next.js) | ||
| * 6. Generate structured capabilities.json | ||
| * Navigates to the target URL, auto-scrolls to trigger lazy loading, | ||
| * captures network traffic, analyzes JSON responses, and automatically | ||
| * infers CLI capabilities from discovered API endpoints. | ||
| */ | ||
| export declare function exploreUrl(url: string, opts?: any): Promise<any>; | ||
| export declare function renderExploreSummary(result: any): string; | ||
| export declare function detectSiteName(url: string): string; | ||
| export declare function slugify(value: string): string; | ||
| export interface DiscoveredStore { | ||
| type: 'pinia' | 'vuex'; | ||
| id: string; | ||
| actions: string[]; | ||
| stateKeys: string[]; | ||
| } | ||
| export declare function exploreUrl(url: string, opts: { | ||
| BrowserFactory: new () => any; | ||
| site?: string; | ||
| goal?: string; | ||
| authenticated?: boolean; | ||
| outDir?: string; | ||
| waitSeconds?: number; | ||
| query?: string; | ||
| clickLabels?: string[]; | ||
| auto?: boolean; | ||
| }): Promise<Record<string, any>>; | ||
| export declare function renderExploreSummary(result: Record<string, any>): string; |
+293
-422
| /** | ||
| * Deep Explore: intelligent API discovery with response analysis. | ||
| * | ||
| * Unlike simple page snapshots, Deep Explore intercepts network traffic, | ||
| * analyzes response schemas, and automatically infers capabilities that | ||
| * can be turned into CLI commands. | ||
| * | ||
| * Flow: | ||
| * 1. Navigate to target URL | ||
| * 2. Auto-scroll to trigger lazy loading | ||
| * 3. Capture network requests (with body analysis) | ||
| * 4. For each JSON response: detect list fields, infer columns, analyze auth | ||
| * 5. Detect frontend framework (Vue/React/Pinia/Next.js) | ||
| * 6. Generate structured capabilities.json | ||
| * Navigates to the target URL, auto-scrolls to trigger lazy loading, | ||
| * captures network traffic, analyzes JSON responses, and automatically | ||
| * infers CLI capabilities from discovered API endpoints. | ||
| */ | ||
| import * as fs from 'node:fs'; | ||
| import * as path from 'node:path'; | ||
| import { browserSession, DEFAULT_BROWSER_EXPLORE_TIMEOUT, runWithTimeout } from './runtime.js'; | ||
| import { DEFAULT_BROWSER_EXPLORE_TIMEOUT, browserSession, runWithTimeout } from './runtime.js'; | ||
| // ── Site name detection ──────────────────────────────────────────────────── | ||
| const KNOWN_ALIASES = { | ||
| const KNOWN_SITE_ALIASES = { | ||
| 'x.com': 'twitter', 'twitter.com': 'twitter', | ||
| 'news.ycombinator.com': 'hackernews', | ||
| 'www.zhihu.com': 'zhihu', 'www.bilibili.com': 'bilibili', | ||
| 'search.bilibili.com': 'bilibili', | ||
| 'www.v2ex.com': 'v2ex', 'www.reddit.com': 'reddit', | ||
| 'www.xiaohongshu.com': 'xiaohongshu', 'www.douban.com': 'douban', | ||
| 'www.weibo.com': 'weibo', 'search.bilibili.com': 'bilibili', | ||
| 'www.weibo.com': 'weibo', 'www.bbc.com': 'bbc', | ||
| }; | ||
| function detectSiteName(url) { | ||
| export function detectSiteName(url) { | ||
| try { | ||
| const host = new URL(url).hostname.toLowerCase(); | ||
| if (host in KNOWN_ALIASES) | ||
| return KNOWN_ALIASES[host]; | ||
| if (host in KNOWN_SITE_ALIASES) | ||
| return KNOWN_SITE_ALIASES[host]; | ||
| const parts = host.split('.').filter(p => p && p !== 'www'); | ||
| if (parts.length >= 2) { | ||
| if (['uk', 'jp', 'cn', 'com'].includes(parts[parts.length - 1]) && parts.length >= 3) { | ||
| return parts[parts.length - 3].replace(/[^a-z0-9]/g, ''); | ||
| return slugify(parts[parts.length - 3]); | ||
| } | ||
| return parts[parts.length - 2].replace(/[^a-z0-9]/g, ''); | ||
| return slugify(parts[parts.length - 2]); | ||
| } | ||
| return parts[0]?.replace(/[^a-z0-9]/g, '') ?? 'site'; | ||
| return parts[0] ? slugify(parts[0]) : 'site'; | ||
| } | ||
@@ -46,12 +39,11 @@ catch { | ||
| } | ||
| export function slugify(value) { | ||
| return value.trim().toLowerCase().replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-|-$/g, '') || 'site'; | ||
| } | ||
| // ── Field & capability inference ─────────────────────────────────────────── | ||
| /** | ||
| * Common field names grouped by semantic role. | ||
| * Used to auto-detect which response fields map to which columns. | ||
| */ | ||
| const FIELD_ROLES = { | ||
| title: ['title', 'name', 'text', 'content', 'desc', 'description', 'headline', 'subject'], | ||
| url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'short_link', 'share_url'], | ||
| url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'share_url'], | ||
| author: ['author', 'username', 'user_name', 'nickname', 'nick', 'owner', 'creator', 'up_name', 'uname'], | ||
| score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'stat', 'play', 'favorite_count', 'reply_count'], | ||
| score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'play', 'favorite_count', 'reply_count'], | ||
| time: ['time', 'created_at', 'publish_time', 'pub_time', 'date', 'ctime', 'mtime', 'pubdate', 'created'], | ||
@@ -62,48 +54,23 @@ id: ['id', 'aid', 'bvid', 'mid', 'uid', 'oid', 'note_id', 'item_id'], | ||
| }; | ||
| /** Param names that indicate searchable APIs */ | ||
| const SEARCH_PARAMS = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'search_query', 'w']); | ||
| /** Param names that indicate pagination */ | ||
| const PAGINATION_PARAMS = new Set(['page', 'pn', 'offset', 'cursor', 'next', 'page_num']); | ||
| /** Param names that indicate limit control */ | ||
| const LIMIT_PARAMS = new Set(['limit', 'count', 'size', 'per_page', 'page_size', 'ps', 'num']); | ||
| /** Content types to ignore */ | ||
| const IGNORED_CONTENT_TYPES = new Set(['image/', 'font/', 'text/css', 'text/javascript', 'application/javascript', 'application/wasm']); | ||
| /** Volatile query params to strip from patterns */ | ||
| const VOLATILE_PARAMS = new Set(['w_rid', 'wts', '_', 'callback', 'timestamp', 't', 'nonce', 'sign']); | ||
| /** | ||
| * Parse raw network output from Playwright MCP into structured entries. | ||
| * Handles both text format ([GET] url => [200]) and structured JSON. | ||
| * Parse raw network output from Playwright MCP. | ||
| * Handles text format: [GET] url => [200] | ||
| */ | ||
| function parseNetworkOutput(raw) { | ||
| function parseNetworkRequests(raw) { | ||
| if (typeof raw === 'string') { | ||
| // Playwright MCP returns network as text lines like: | ||
| // "[GET] https://api.example.com/xxx => [200] " | ||
| // May also have markdown headers like "### Result" | ||
| const entries = []; | ||
| const lines = raw.split('\n').filter((l) => l.trim()); | ||
| for (const line of lines) { | ||
| // Format: [METHOD] URL => [STATUS] optional_extra | ||
| const bracketMatch = line.match(/^\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i); | ||
| if (bracketMatch) { | ||
| const [, method, url, status] = bracketMatch; | ||
| for (const line of raw.split('\n')) { | ||
| // Format: [GET] URL => [200] | ||
| const m = line.match(/\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i); | ||
| if (m) { | ||
| const [, method, url, status] = m; | ||
| entries.push({ | ||
| method: method.toUpperCase(), | ||
| url, | ||
| status: status ? parseInt(status) : null, | ||
| contentType: url.endsWith('.json') ? 'application/json' : | ||
| (url.includes('/api/') || url.includes('/x/')) ? 'application/json' : '', | ||
| method: method.toUpperCase(), url, status: status ? parseInt(status) : null, | ||
| contentType: (url.includes('/api/') || url.includes('/x/') || url.endsWith('.json')) ? 'application/json' : '', | ||
| }); | ||
| continue; | ||
| } | ||
| // Legacy format: GET url → 200 (application/json) | ||
| const legacyMatch = line.match(/^(GET|POST|PUT|DELETE|PATCH|OPTIONS)\s+(\S+)\s*→?\s*(\d+)?\s*(?:\(([^)]*)\))?/i); | ||
| if (legacyMatch) { | ||
| const [, method, url, status, ct] = legacyMatch; | ||
| entries.push({ | ||
| method: method.toUpperCase(), | ||
| url, | ||
| status: status ? parseInt(status) : null, | ||
| contentType: ct ?? '', | ||
| }); | ||
| } | ||
| } | ||
@@ -113,9 +80,8 @@ return entries; | ||
| if (Array.isArray(raw)) { | ||
| return raw.map((e) => ({ | ||
| return raw.filter(e => e && typeof e === 'object').map(e => ({ | ||
| method: (e.method ?? 'GET').toUpperCase(), | ||
| url: e.url ?? e.request?.url ?? '', | ||
| url: String(e.url ?? e.request?.url ?? e.requestUrl ?? ''), | ||
| status: e.status ?? e.statusCode ?? null, | ||
| contentType: e.contentType ?? e.mimeType ?? '', | ||
| responseBody: e.responseBody ?? e.body, | ||
| requestHeaders: e.requestHeaders ?? e.headers, | ||
| contentType: e.contentType ?? e.response?.contentType ?? '', | ||
| responseBody: e.responseBody, requestHeaders: e.requestHeaders, | ||
| })); | ||
@@ -125,19 +91,10 @@ } | ||
| } | ||
| /** | ||
| * Normalize a URL into a pattern by replacing IDs with placeholders. | ||
| */ | ||
| function urlToPattern(url) { | ||
| try { | ||
| const parsed = new URL(url); | ||
| const pathNorm = parsed.pathname | ||
| .replace(/\/\d+/g, '/{id}') | ||
| .replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}') | ||
| .replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}'); | ||
| const p = new URL(url); | ||
| const pathNorm = p.pathname.replace(/\/\d+/g, '/{id}').replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}').replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}'); | ||
| const params = []; | ||
| parsed.searchParams.forEach((_v, k) => { | ||
| if (!VOLATILE_PARAMS.has(k)) | ||
| params.push(k); | ||
| }); | ||
| const qs = params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : ''; | ||
| return `${parsed.host}${pathNorm}${qs}`; | ||
| p.searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k)) | ||
| params.push(k); }); | ||
| return `${p.host}${pathNorm}${params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : ''}`; | ||
| } | ||
@@ -148,18 +105,2 @@ catch { | ||
| } | ||
| /** | ||
| * Extract query params from a URL. | ||
| */ | ||
| function extractQueryParams(url) { | ||
| try { | ||
| const params = {}; | ||
| new URL(url).searchParams.forEach((v, k) => { params[k] = v; }); | ||
| return params; | ||
| } | ||
| catch { | ||
| return {}; | ||
| } | ||
| } | ||
| /** | ||
| * Detect auth indicators from request headers. | ||
| */ | ||
| function detectAuthIndicators(headers) { | ||
@@ -176,28 +117,17 @@ if (!headers) | ||
| indicators.push('signature'); | ||
| if (keys.some(k => k === 'x-client-transaction-id')) | ||
| indicators.push('transaction'); | ||
| return indicators; | ||
| } | ||
| /** | ||
| * Analyze a JSON response to find list data and field mappings. | ||
| */ | ||
| function analyzeResponseBody(body) { | ||
| if (!body || typeof body !== 'object') | ||
| return null; | ||
| // Try to find the main list in the response | ||
| const candidates = []; | ||
| function findArrays(obj, currentPath, depth) { | ||
| function findArrays(obj, path, depth) { | ||
| if (depth > 4) | ||
| return; | ||
| if (Array.isArray(obj) && obj.length >= 2) { | ||
| // Check if items are objects (not primitive arrays) | ||
| if (obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) { | ||
| candidates.push({ path: currentPath, items: obj }); | ||
| } | ||
| if (Array.isArray(obj) && obj.length >= 2 && obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) { | ||
| candidates.push({ path, items: obj }); | ||
| } | ||
| if (obj && typeof obj === 'object' && !Array.isArray(obj)) { | ||
| for (const [key, val] of Object.entries(obj)) { | ||
| const nextPath = currentPath ? `${currentPath}.${key}` : key; | ||
| findArrays(val, nextPath, depth + 1); | ||
| } | ||
| for (const [key, val] of Object.entries(obj)) | ||
| findArrays(val, path ? `${path}.${key}` : key, depth + 1); | ||
| } | ||
@@ -208,17 +138,11 @@ } | ||
| return null; | ||
| // Pick the largest array as the main list | ||
| candidates.sort((a, b) => b.items.length - a.items.length); | ||
| const best = candidates[0]; | ||
| // Analyze field names in the first item | ||
| const sampleItem = best.items[0]; | ||
| const sampleFieldNames = sampleItem && typeof sampleItem === 'object' | ||
| ? flattenFieldNames(sampleItem, '', 2) | ||
| : []; | ||
| // Match fields to semantic roles | ||
| const sample = best.items[0]; | ||
| const sampleFields = sample && typeof sample === 'object' ? flattenFields(sample, '', 2) : []; | ||
| const detectedFields = {}; | ||
| for (const [role, aliases] of Object.entries(FIELD_ROLES)) { | ||
| for (const fieldName of sampleFieldNames) { | ||
| const basename = fieldName.split('.').pop()?.toLowerCase() ?? ''; | ||
| if (aliases.includes(basename)) { | ||
| detectedFields[role] = fieldName; | ||
| for (const f of sampleFields) { | ||
| if (aliases.includes(f.split('.').pop()?.toLowerCase() ?? '')) { | ||
| detectedFields[role] = f; | ||
| break; | ||
@@ -228,13 +152,5 @@ } | ||
| } | ||
| return { | ||
| itemPath: best.path || null, | ||
| itemCount: best.items.length, | ||
| detectedFields, | ||
| sampleFieldNames, | ||
| }; | ||
| return { itemPath: best.path || null, itemCount: best.items.length, detectedFields, sampleFields }; | ||
| } | ||
| /** | ||
| * Flatten nested object field names for analysis. | ||
| */ | ||
| function flattenFieldNames(obj, prefix, maxDepth) { | ||
| function flattenFields(obj, prefix, maxDepth) { | ||
| if (maxDepth <= 0 || !obj || typeof obj !== 'object') | ||
@@ -244,80 +160,38 @@ return []; | ||
| for (const key of Object.keys(obj)) { | ||
| const fullKey = prefix ? `${prefix}.${key}` : key; | ||
| names.push(fullKey); | ||
| if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) { | ||
| names.push(...flattenFieldNames(obj[key], fullKey, maxDepth - 1)); | ||
| } | ||
| const full = prefix ? `${prefix}.${key}` : key; | ||
| names.push(full); | ||
| if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) | ||
| names.push(...flattenFields(obj[key], full, maxDepth - 1)); | ||
| } | ||
| return names; | ||
| } | ||
| /** | ||
| * Analyze a list of network entries into structured endpoints. | ||
| */ | ||
| function analyzeEndpoints(entries, siteHost) { | ||
| const seen = new Map(); | ||
| for (const entry of entries) { | ||
| if (!entry.url) | ||
| continue; | ||
| // Skip static resources | ||
| const ct = entry.contentType.toLowerCase(); | ||
| if (IGNORED_CONTENT_TYPES.has(ct.split(';')[0]?.trim() ?? '') || | ||
| ct.includes('image/') || ct.includes('font/') || ct.includes('css') || | ||
| ct.includes('javascript') || ct.includes('wasm')) | ||
| continue; | ||
| // Skip non-JSON and failed responses | ||
| if (entry.status && entry.status >= 400) | ||
| continue; | ||
| const pattern = urlToPattern(entry.url); | ||
| const queryParams = extractQueryParams(entry.url); | ||
| const paramNames = Object.keys(queryParams).filter(k => !VOLATILE_PARAMS.has(k)); | ||
| const key = `${entry.method}:${pattern}`; | ||
| if (seen.has(key)) | ||
| continue; | ||
| const endpoint = { | ||
| pattern, | ||
| method: entry.method, | ||
| url: entry.url, | ||
| status: entry.status, | ||
| contentType: ct, | ||
| queryParams: paramNames, | ||
| hasSearchParam: paramNames.some(p => SEARCH_PARAMS.has(p)), | ||
| hasPaginationParam: paramNames.some(p => PAGINATION_PARAMS.has(p)), | ||
| hasLimitParam: paramNames.some(p => LIMIT_PARAMS.has(p)), | ||
| authIndicators: detectAuthIndicators(entry.requestHeaders), | ||
| responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null, | ||
| }; | ||
| seen.set(key, endpoint); | ||
| function scoreEndpoint(ep) { | ||
| let s = 0; | ||
| if (ep.contentType.includes('json')) | ||
| s += 10; | ||
| if (ep.responseAnalysis) { | ||
| s += 5; | ||
| s += Math.min(ep.responseAnalysis.itemCount, 10); | ||
| s += Object.keys(ep.responseAnalysis.detectedFields).length * 2; | ||
| } | ||
| return [...seen.values()]; | ||
| if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/')) | ||
| s += 3; | ||
| if (ep.hasSearchParam) | ||
| s += 3; | ||
| if (ep.hasPaginationParam) | ||
| s += 2; | ||
| if (ep.hasLimitParam) | ||
| s += 2; | ||
| if (ep.status === 200) | ||
| s += 2; | ||
| return s; | ||
| } | ||
| /** | ||
| * Infer what strategy to use based on endpoint analysis. | ||
| */ | ||
| function inferStrategy(endpoint) { | ||
| if (endpoint.authIndicators.includes('signature')) | ||
| return 'intercept'; | ||
| if (endpoint.authIndicators.includes('transaction')) | ||
| return 'header'; | ||
| if (endpoint.authIndicators.includes('bearer') || endpoint.authIndicators.includes('csrf')) | ||
| return 'header'; | ||
| // Check if the URL is a public API (no auth indicators) | ||
| if (endpoint.authIndicators.length === 0) { | ||
| // If it's the same domain, likely cookie auth | ||
| return 'cookie'; | ||
| } | ||
| return 'cookie'; | ||
| } | ||
| /** | ||
| * Infer the capability name from an endpoint pattern. | ||
| */ | ||
| function inferCapabilityName(endpoint, goal) { | ||
| function inferCapabilityName(url, goal) { | ||
| if (goal) | ||
| return goal; | ||
| const u = endpoint.url.toLowerCase(); | ||
| const p = endpoint.pattern.toLowerCase(); | ||
| // Match common patterns | ||
| if (endpoint.hasSearchParam) | ||
| return 'search'; | ||
| const u = url.toLowerCase(); | ||
| if (u.includes('hot') || u.includes('popular') || u.includes('ranking') || u.includes('trending')) | ||
| return 'hot'; | ||
| if (u.includes('search')) | ||
| return 'search'; | ||
| if (u.includes('feed') || u.includes('timeline') || u.includes('dynamic')) | ||
@@ -329,16 +203,10 @@ return 'feed'; | ||
| return 'history'; | ||
| if (u.includes('profile') || u.includes('userinfo') || u.includes('/me') || u.includes('myinfo')) | ||
| if (u.includes('profile') || u.includes('userinfo') || u.includes('/me')) | ||
| return 'me'; | ||
| if (u.includes('video') || u.includes('article') || u.includes('detail') || u.includes('view')) | ||
| return 'detail'; | ||
| if (u.includes('favorite') || u.includes('collect') || u.includes('bookmark')) | ||
| return 'favorite'; | ||
| if (u.includes('notification') || u.includes('notice')) | ||
| return 'notifications'; | ||
| // Fallback: try to extract from path | ||
| try { | ||
| const pathname = new URL(endpoint.url).pathname; | ||
| const segments = pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i)); | ||
| if (segments.length) | ||
| return segments[segments.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase(); | ||
| const segs = new URL(url).pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i)); | ||
| if (segs.length) | ||
| return segs[segs.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase(); | ||
| } | ||
@@ -348,176 +216,159 @@ catch { } | ||
| } | ||
| /** | ||
| * Build recommended columns from response analysis. | ||
| */ | ||
| function buildRecommendedColumns(analysis) { | ||
| if (!analysis) | ||
| return ['title', 'url']; | ||
| const cols = []; | ||
| // Prioritize: title → url → author → score → time | ||
| const priority = ['title', 'url', 'author', 'score', 'time']; | ||
| for (const role of priority) { | ||
| if (analysis.detectedFields[role]) | ||
| cols.push(role); | ||
| } | ||
| return cols.length ? cols : ['title', 'url']; | ||
| function inferStrategy(authIndicators) { | ||
| if (authIndicators.includes('signature')) | ||
| return 'intercept'; | ||
| if (authIndicators.includes('bearer') || authIndicators.includes('csrf')) | ||
| return 'header'; | ||
| return 'cookie'; | ||
| } | ||
| /** | ||
| * Build recommended args from endpoint query params. | ||
| */ | ||
| function buildRecommendedArgs(endpoint) { | ||
| const args = []; | ||
| if (endpoint.hasSearchParam) { | ||
| const paramName = endpoint.queryParams.find(p => SEARCH_PARAMS.has(p)) ?? 'keyword'; | ||
| args.push({ name: 'keyword', type: 'str', required: true }); | ||
| } | ||
| // Always add limit | ||
| args.push({ name: 'limit', type: 'int', required: false, default: 20 }); | ||
| if (endpoint.hasPaginationParam) { | ||
| args.push({ name: 'page', type: 'int', required: false, default: 1 }); | ||
| } | ||
| return args; | ||
| } | ||
| /** | ||
| * Score an endpoint's interest level for capability generation. | ||
| * Higher score = more likely to be a useful API endpoint. | ||
| */ | ||
| function scoreEndpoint(ep) { | ||
| let score = 0; | ||
| // JSON content type is strongly preferred | ||
| if (ep.contentType.includes('json')) | ||
| score += 10; | ||
| // Has response analysis with items | ||
| if (ep.responseAnalysis) { | ||
| score += 5; | ||
| score += Math.min(ep.responseAnalysis.itemCount, 10); | ||
| score += Object.keys(ep.responseAnalysis.detectedFields).length * 2; | ||
| } | ||
| // API-like path patterns | ||
| if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/')) | ||
| score += 3; | ||
| // Has search/pagination params | ||
| if (ep.hasSearchParam) | ||
| score += 3; | ||
| if (ep.hasPaginationParam) | ||
| score += 2; | ||
| if (ep.hasLimitParam) | ||
| score += 2; | ||
| // 200 OK | ||
| if (ep.status === 200) | ||
| score += 2; | ||
| return score; | ||
| } | ||
| // ── Framework detection ──────────────────────────────────────────────────── | ||
| const FRAMEWORK_DETECT_JS = ` | ||
| (() => { | ||
| const result = {}; | ||
| try { | ||
| const app = document.querySelector('#app'); | ||
| result.vue3 = !!(app && app.__vue_app__); | ||
| result.vue2 = !!(app && app.__vue__); | ||
| result.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]'); | ||
| result.nextjs = !!window.__NEXT_DATA__; | ||
| result.nuxt = !!window.__NUXT__; | ||
| if (result.vue3 && app.__vue_app__) { | ||
| () => { | ||
| const r = {}; | ||
| try { | ||
| const app = document.querySelector('#app'); | ||
| r.vue3 = !!(app && app.__vue_app__); | ||
| r.vue2 = !!(app && app.__vue__); | ||
| r.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]'); | ||
| r.nextjs = !!window.__NEXT_DATA__; | ||
| r.nuxt = !!window.__NUXT__; | ||
| if (r.vue3 && app.__vue_app__) { const gp = app.__vue_app__.config?.globalProperties; r.pinia = !!(gp && gp.$pinia); r.vuex = !!(gp && gp.$store); } | ||
| } catch {} | ||
| return r; | ||
| } | ||
| `; | ||
| // ── Store discovery ──────────────────────────────────────────────────────── | ||
| const STORE_DISCOVER_JS = ` | ||
| () => { | ||
| const stores = []; | ||
| try { | ||
| const app = document.querySelector('#app'); | ||
| if (!app?.__vue_app__) return stores; | ||
| const gp = app.__vue_app__.config?.globalProperties; | ||
| result.pinia = !!(gp && gp.$pinia); | ||
| result.vuex = !!(gp && gp.$store); | ||
| } | ||
| } catch {} | ||
| return JSON.stringify(result); | ||
| })() | ||
| // Pinia stores | ||
| const pinia = gp?.$pinia; | ||
| if (pinia?._s) { | ||
| pinia._s.forEach((store, id) => { | ||
| const actions = []; | ||
| const stateKeys = []; | ||
| for (const k in store) { | ||
| try { | ||
| if (k.startsWith('$') || k.startsWith('_')) continue; | ||
| if (typeof store[k] === 'function') actions.push(k); | ||
| else stateKeys.push(k); | ||
| } catch {} | ||
| } | ||
| stores.push({ type: 'pinia', id, actions: actions.slice(0, 20), stateKeys: stateKeys.slice(0, 15) }); | ||
| }); | ||
| } | ||
| // Vuex store modules | ||
| const vuex = gp?.$store; | ||
| if (vuex?._modules?.root?._children) { | ||
| const children = vuex._modules.root._children; | ||
| for (const [modName, mod] of Object.entries(children)) { | ||
| const actions = Object.keys(mod._rawModule?.actions ?? {}).slice(0, 20); | ||
| const stateKeys = Object.keys(mod.state ?? {}).slice(0, 15); | ||
| stores.push({ type: 'vuex', id: modName, actions, stateKeys }); | ||
| } | ||
| } | ||
| } catch {} | ||
| return stores; | ||
| } | ||
| `; | ||
| // ── Main explore function ────────────────────────────────────────────────── | ||
| export async function exploreUrl(url, opts = {}) { | ||
| const site = opts.site ?? detectSiteName(url); | ||
| const outDir = opts.outDir ?? path.join('.opencli', 'explore', site); | ||
| fs.mkdirSync(outDir, { recursive: true }); | ||
| const result = await browserSession(opts.BrowserFactory, async (page) => { | ||
| export async function exploreUrl(url, opts) { | ||
| const waitSeconds = opts.waitSeconds ?? 3.0; | ||
| const exploreTimeout = Math.max(DEFAULT_BROWSER_EXPLORE_TIMEOUT, 45.0 + waitSeconds * 8.0); | ||
| return browserSession(opts.BrowserFactory, async (page) => { | ||
| return runWithTimeout((async () => { | ||
| // Step 1: Navigate | ||
| await page.goto(url); | ||
| await page.wait(opts.waitSeconds ?? 3); | ||
| // Step 2: Auto-scroll to trigger lazy loading | ||
| await page.wait(waitSeconds); | ||
| // Step 2: Auto-scroll to trigger lazy loading (use keyboard since page.scroll may not exist) | ||
| for (let i = 0; i < 3; i++) { | ||
| await page.scroll('down'); | ||
| try { | ||
| await page.pressKey('End'); | ||
| } | ||
| catch { } | ||
| await page.wait(1); | ||
| } | ||
| // Step 3: Capture network traffic | ||
| // Step 3: Read page metadata | ||
| const metadata = await readPageMetadata(page); | ||
| // Step 4: Capture network traffic | ||
| const rawNetwork = await page.networkRequests(false); | ||
| const networkEntries = parseNetworkOutput(rawNetwork); | ||
| // Step 4: For JSON endpoints, try to fetch response body in-browser | ||
| const networkEntries = parseNetworkRequests(rawNetwork); | ||
| // Step 5: For JSON endpoints, re-fetch response body in-browser | ||
| const jsonEndpoints = networkEntries.filter(e => e.contentType.includes('json') && e.method === 'GET' && e.status === 200); | ||
| for (const ep of jsonEndpoints.slice(0, 10)) { | ||
| // Only fetch body for promising-looking API endpoints | ||
| if (ep.url.includes('/api/') || ep.url.includes('/x/') || ep.url.includes('/web/') || | ||
| ep.contentType.includes('json')) { | ||
| try { | ||
| const bodyResult = await page.evaluate(` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(ep.url)}, { credentials: 'include' }); | ||
| if (!resp.ok) return null; | ||
| const data = await resp.json(); | ||
| return JSON.stringify(data).slice(0, 10000); | ||
| } catch { return null; } | ||
| } | ||
| `); | ||
| if (bodyResult && typeof bodyResult === 'string') { | ||
| try { | ||
| ep.responseBody = JSON.parse(bodyResult); | ||
| } | ||
| catch { } | ||
| const body = await page.evaluate(`async () => { try { const r = await fetch(${JSON.stringify(ep.url)}, {credentials:'include'}); if (!r.ok) return null; const d = await r.json(); return JSON.stringify(d).slice(0,10000); } catch { return null; } }`); | ||
| if (body && typeof body === 'string') { | ||
| try { | ||
| ep.responseBody = JSON.parse(body); | ||
| } | ||
| else if (bodyResult && typeof bodyResult === 'object') { | ||
| ep.responseBody = bodyResult; | ||
| } | ||
| catch { } | ||
| } | ||
| catch { } | ||
| else if (body && typeof body === 'object') | ||
| ep.responseBody = body; | ||
| } | ||
| catch { } | ||
| } | ||
| // Step 5: Detect frontend framework | ||
| // Step 6: Detect framework | ||
| let framework = {}; | ||
| try { | ||
| const fwResult = await page.evaluate(FRAMEWORK_DETECT_JS); | ||
| if (typeof fwResult === 'string') | ||
| framework = JSON.parse(fwResult); | ||
| else if (typeof fwResult === 'object') | ||
| framework = fwResult; | ||
| const fw = await page.evaluate(FRAMEWORK_DETECT_JS); | ||
| if (fw && typeof fw === 'object') | ||
| framework = fw; | ||
| } | ||
| catch { } | ||
| // Step 6: Get page metadata | ||
| let title = '', finalUrl = ''; | ||
| try { | ||
| const meta = await page.evaluate(` | ||
| () => JSON.stringify({ url: window.location.href, title: document.title || '' }) | ||
| `); | ||
| if (typeof meta === 'string') { | ||
| const parsed = JSON.parse(meta); | ||
| title = parsed.title; | ||
| finalUrl = parsed.url; | ||
| // Step 6.5: Discover stores (Pinia / Vuex) | ||
| let stores = []; | ||
| if (framework.pinia || framework.vuex) { | ||
| try { | ||
| const raw = await page.evaluate(STORE_DISCOVER_JS); | ||
| if (Array.isArray(raw)) | ||
| stores = raw; | ||
| } | ||
| else if (typeof meta === 'object') { | ||
| title = meta.title; | ||
| finalUrl = meta.url; | ||
| } | ||
| catch { } | ||
| } | ||
| catch { } | ||
| // Step 7: Analyze endpoints | ||
| let siteHost = ''; | ||
| try { | ||
| siteHost = new URL(url).hostname; | ||
| const seen = new Map(); | ||
| for (const entry of networkEntries) { | ||
| if (!entry.url) | ||
| continue; | ||
| const ct = entry.contentType.toLowerCase(); | ||
| if (ct.includes('image/') || ct.includes('font/') || ct.includes('css') || ct.includes('javascript') || ct.includes('wasm')) | ||
| continue; | ||
| if (entry.status && entry.status >= 400) | ||
| continue; | ||
| const pattern = urlToPattern(entry.url); | ||
| const key = `${entry.method}:${pattern}`; | ||
| if (seen.has(key)) | ||
| continue; | ||
| const qp = []; | ||
| try { | ||
| new URL(entry.url).searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k)) | ||
| qp.push(k); }); | ||
| } | ||
| catch { } | ||
| const ep = { | ||
| pattern, method: entry.method, url: entry.url, status: entry.status, contentType: ct, | ||
| queryParams: qp, hasSearchParam: qp.some(p => SEARCH_PARAMS.has(p)), | ||
| hasPaginationParam: qp.some(p => PAGINATION_PARAMS.has(p)), | ||
| hasLimitParam: qp.some(p => LIMIT_PARAMS.has(p)), | ||
| authIndicators: detectAuthIndicators(entry.requestHeaders), | ||
| responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null, | ||
| score: 0, | ||
| }; | ||
| ep.score = scoreEndpoint(ep); | ||
| seen.set(key, ep); | ||
| } | ||
| catch { } | ||
| const analyzedEndpoints = analyzeEndpoints(networkEntries, siteHost); | ||
| // Step 8: Score and rank endpoints | ||
| const scoredEndpoints = analyzedEndpoints | ||
| .map(ep => ({ ...ep, score: scoreEndpoint(ep) })) | ||
| .filter(ep => ep.score >= 5) | ||
| .sort((a, b) => b.score - a.score); | ||
| // Step 9: Infer capabilities from top endpoints | ||
| const analyzedEndpoints = [...seen.values()].filter(ep => ep.score >= 5).sort((a, b) => b.score - a.score); | ||
| // Step 8: Infer capabilities | ||
| const capabilities = []; | ||
| const usedNames = new Set(); | ||
| for (const ep of scoredEndpoints.slice(0, 8)) { | ||
| let capName = inferCapabilityName(ep, opts.goal); | ||
| // Deduplicate names | ||
| for (const ep of analyzedEndpoints.slice(0, 8)) { | ||
| let capName = inferCapabilityName(ep.url, opts.goal); | ||
| if (usedNames.has(capName)) { | ||
@@ -528,76 +379,79 @@ const suffix = ep.pattern.split('/').filter(s => s && !s.startsWith('{') && !s.includes('.')).pop(); | ||
| usedNames.add(capName); | ||
| const cols = []; | ||
| if (ep.responseAnalysis) { | ||
| for (const role of ['title', 'url', 'author', 'score', 'time']) { | ||
| if (ep.responseAnalysis.detectedFields[role]) | ||
| cols.push(role); | ||
| } | ||
| } | ||
| const args = []; | ||
| if (ep.hasSearchParam) | ||
| args.push({ name: 'keyword', type: 'str', required: true }); | ||
| args.push({ name: 'limit', type: 'int', required: false, default: 20 }); | ||
| if (ep.hasPaginationParam) | ||
| args.push({ name: 'page', type: 'int', required: false, default: 1 }); | ||
| // Link store actions to capabilities when store-action strategy is recommended | ||
| const epStrategy = inferStrategy(ep.authIndicators); | ||
| let storeHint; | ||
| if ((epStrategy === 'intercept' || ep.authIndicators.includes('signature')) && stores.length > 0) { | ||
| // Try to find a store/action that matches this endpoint's purpose | ||
| for (const s of stores) { | ||
| const matchingAction = s.actions.find(a => capName.split('_').some(part => a.toLowerCase().includes(part)) || | ||
| a.toLowerCase().includes('fetch') || a.toLowerCase().includes('get')); | ||
| if (matchingAction) { | ||
| storeHint = { store: s.id, action: matchingAction }; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| capabilities.push({ | ||
| name: capName, | ||
| description: `${site} ${capName}`, | ||
| strategy: inferStrategy(ep), | ||
| confidence: Math.min(ep.score / 20, 1.0), | ||
| endpoint: ep.pattern, | ||
| name: capName, description: `${opts.site ?? detectSiteName(url)} ${capName}`, | ||
| strategy: storeHint ? 'store-action' : epStrategy, | ||
| confidence: Math.min(ep.score / 20, 1.0), endpoint: ep.pattern, | ||
| itemPath: ep.responseAnalysis?.itemPath ?? null, | ||
| recommendedColumns: buildRecommendedColumns(ep.responseAnalysis), | ||
| recommendedArgs: buildRecommendedArgs(ep), | ||
| recommendedColumns: cols.length ? cols : ['title', 'url'], | ||
| recommendedArgs: args, | ||
| ...(storeHint ? { storeHint } : {}), | ||
| }); | ||
| } | ||
| // Step 10: Determine auth strategy | ||
| const allAuthIndicators = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators)); | ||
| let topStrategy = 'cookie'; | ||
| if (allAuthIndicators.has('signature')) | ||
| topStrategy = 'intercept'; | ||
| else if (allAuthIndicators.has('transaction') || allAuthIndicators.has('bearer')) | ||
| topStrategy = 'header'; | ||
| else if (allAuthIndicators.size === 0 && scoredEndpoints.some(ep => ep.contentType.includes('json'))) | ||
| topStrategy = 'public'; | ||
| return { | ||
| site, | ||
| target_url: url, | ||
| final_url: finalUrl, | ||
| title, | ||
| framework, | ||
| top_strategy: topStrategy, | ||
| endpoint_count: analyzedEndpoints.length, | ||
| api_endpoint_count: scoredEndpoints.length, | ||
| capabilities, | ||
| endpoints: scoredEndpoints.map(ep => ({ | ||
| pattern: ep.pattern, | ||
| method: ep.method, | ||
| url: ep.url, | ||
| status: ep.status, | ||
| contentType: ep.contentType, | ||
| score: ep.score, | ||
| queryParams: ep.queryParams, | ||
| itemPath: ep.responseAnalysis?.itemPath ?? null, | ||
| itemCount: ep.responseAnalysis?.itemCount ?? 0, | ||
| detectedFields: ep.responseAnalysis?.detectedFields ?? {}, | ||
| authIndicators: ep.authIndicators, | ||
| })), | ||
| auth_indicators: [...allAuthIndicators], | ||
| // Step 9: Determine overall auth strategy | ||
| const allAuth = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators)); | ||
| const topStrategy = allAuth.has('signature') ? 'intercept' : allAuth.has('bearer') || allAuth.has('csrf') ? 'header' : allAuth.size === 0 ? 'public' : 'cookie'; | ||
| const siteName = opts.site ?? detectSiteName(metadata.url || url); | ||
| const targetDir = opts.outDir ?? path.join('.opencli', 'explore', siteName); | ||
| fs.mkdirSync(targetDir, { recursive: true }); | ||
| const result = { | ||
| site: siteName, target_url: url, final_url: metadata.url, title: metadata.title, | ||
| framework, stores, top_strategy: topStrategy, | ||
| endpoint_count: analyzedEndpoints.length + [...seen.values()].filter(ep => ep.score < 5).length, | ||
| api_endpoint_count: analyzedEndpoints.length, | ||
| capabilities, auth_indicators: [...allAuth], | ||
| }; | ||
| })(), { timeout: DEFAULT_BROWSER_EXPLORE_TIMEOUT, label: 'explore' }); | ||
| // Write artifacts | ||
| fs.writeFileSync(path.join(targetDir, 'manifest.json'), JSON.stringify({ | ||
| site: siteName, target_url: url, final_url: metadata.url, title: metadata.title, | ||
| framework, stores: stores.map(s => ({ type: s.type, id: s.id, actions: s.actions })), | ||
| top_strategy: topStrategy, explored_at: new Date().toISOString(), | ||
| }, null, 2)); | ||
| fs.writeFileSync(path.join(targetDir, 'endpoints.json'), JSON.stringify(analyzedEndpoints.map(ep => ({ | ||
| pattern: ep.pattern, method: ep.method, url: ep.url, status: ep.status, | ||
| contentType: ep.contentType, score: ep.score, queryParams: ep.queryParams, | ||
| itemPath: ep.responseAnalysis?.itemPath ?? null, itemCount: ep.responseAnalysis?.itemCount ?? 0, | ||
| detectedFields: ep.responseAnalysis?.detectedFields ?? {}, authIndicators: ep.authIndicators, | ||
| })), null, 2)); | ||
| fs.writeFileSync(path.join(targetDir, 'capabilities.json'), JSON.stringify(capabilities, null, 2)); | ||
| fs.writeFileSync(path.join(targetDir, 'auth.json'), JSON.stringify({ | ||
| top_strategy: topStrategy, indicators: [...allAuth], framework, | ||
| }, null, 2)); | ||
| if (stores.length > 0) { | ||
| fs.writeFileSync(path.join(targetDir, 'stores.json'), JSON.stringify(stores, null, 2)); | ||
| } | ||
| return { ...result, out_dir: targetDir }; | ||
| })(), { timeout: exploreTimeout, label: `Explore ${url}` }); | ||
| }); | ||
| // Write artifacts | ||
| const manifest = { | ||
| site: result.site, | ||
| target_url: result.target_url, | ||
| final_url: result.final_url, | ||
| title: result.title, | ||
| framework: result.framework, | ||
| top_strategy: result.top_strategy, | ||
| explored_at: new Date().toISOString(), | ||
| }; | ||
| fs.writeFileSync(path.join(outDir, 'manifest.json'), JSON.stringify(manifest, null, 2)); | ||
| fs.writeFileSync(path.join(outDir, 'endpoints.json'), JSON.stringify(result.endpoints ?? [], null, 2)); | ||
| fs.writeFileSync(path.join(outDir, 'capabilities.json'), JSON.stringify(result.capabilities ?? [], null, 2)); | ||
| fs.writeFileSync(path.join(outDir, 'auth.json'), JSON.stringify({ | ||
| top_strategy: result.top_strategy, | ||
| indicators: result.auth_indicators ?? [], | ||
| framework: result.framework ?? {}, | ||
| }, null, 2)); | ||
| return { ...result, out_dir: outDir }; | ||
| } | ||
| export function renderExploreSummary(result) { | ||
| const lines = [ | ||
| 'opencli explore: OK', | ||
| `Site: ${result.site}`, | ||
| `URL: ${result.target_url}`, | ||
| `Title: ${result.title || '(none)'}`, | ||
| `Strategy: ${result.top_strategy}`, | ||
| 'opencli probe: OK', `Site: ${result.site}`, `URL: ${result.target_url}`, | ||
| `Title: ${result.title || '(none)'}`, `Strategy: ${result.top_strategy}`, | ||
| `Endpoints: ${result.endpoint_count} total, ${result.api_endpoint_count} API`, | ||
@@ -607,3 +461,4 @@ `Capabilities: ${result.capabilities?.length ?? 0}`, | ||
| for (const cap of (result.capabilities ?? []).slice(0, 5)) { | ||
| lines.push(` • ${cap.name} (${cap.strategy}, confidence: ${(cap.confidence * 100).toFixed(0)}%)`); | ||
| const storeInfo = cap.storeHint ? ` → ${cap.storeHint.store}.${cap.storeHint.action}()` : ''; | ||
| lines.push(` • ${cap.name} (${cap.strategy}, ${(cap.confidence * 100).toFixed(0)}%)${storeInfo}`); | ||
| } | ||
@@ -614,4 +469,20 @@ const fw = result.framework ?? {}; | ||
| lines.push(`Framework: ${fwNames.join(', ')}`); | ||
| const stores = result.stores ?? []; | ||
| if (stores.length) { | ||
| lines.push(`Stores: ${stores.length}`); | ||
| for (const s of stores.slice(0, 5)) { | ||
| lines.push(` • ${s.type}/${s.id}: ${s.actions.slice(0, 5).join(', ')}${s.actions.length > 5 ? '...' : ''}`); | ||
| } | ||
| } | ||
| lines.push(`Output: ${result.out_dir}`); | ||
| return lines.join('\n'); | ||
| } | ||
| async function readPageMetadata(page) { | ||
| try { | ||
| const result = await page.evaluate(`() => ({ url: window.location.href, title: document.title || '' })`); | ||
| if (result && typeof result === 'object') | ||
| return { url: String(result.url ?? ''), title: String(result.title ?? '') }; | ||
| } | ||
| catch { } | ||
| return { url: '', title: '' }; | ||
| } |
+17
-0
@@ -58,2 +58,4 @@ #!/usr/bin/env node | ||
| .action(async (url, opts) => { const { exploreUrl, renderExploreSummary } = await import('./explore.js'); console.log(renderExploreSummary(await exploreUrl(url, { BrowserFactory: PlaywrightMCP, site: opts.site, goal: opts.goal, waitSeconds: parseFloat(opts.wait) }))); }); | ||
| program.command('probe').description('Probe a website: discover APIs, stores, and recommend strategies').argument('<url>').option('--site <name>').option('--goal <text>').option('--wait <s>', '', '3') | ||
| .action(async (url, opts) => { const { exploreUrl, renderExploreSummary } = await import('./explore.js'); console.log(renderExploreSummary(await exploreUrl(url, { BrowserFactory: PlaywrightMCP, site: opts.site, goal: opts.goal, waitSeconds: parseFloat(opts.wait) }))); }); | ||
| program.command('synthesize').description('Synthesize CLIs from explore').argument('<target>').option('--top <n>', '', '3') | ||
@@ -63,2 +65,17 @@ .action(async (target, opts) => { const { synthesizeFromExplore, renderSynthesizeSummary } = await import('./synthesize.js'); console.log(renderSynthesizeSummary(synthesizeFromExplore(target, { top: parseInt(opts.top) }))); }); | ||
| .action(async (url, opts) => { const { generateCliFromUrl, renderGenerateSummary } = await import('./generate.js'); const r = await generateCliFromUrl({ url, BrowserFactory: PlaywrightMCP, builtinClis: BUILTIN_CLIS, userClis: USER_CLIS, goal: opts.goal, site: opts.site }); console.log(renderGenerateSummary(r)); process.exitCode = r.ok ? 0 : 1; }); | ||
| program.command('cascade').description('Strategy cascade: find simplest working strategy').argument('<url>').option('--site <name>') | ||
| .action(async (url, opts) => { | ||
| const { cascadeProbe, renderCascadeResult } = await import('./cascade.js'); | ||
| const result = await browserSession(PlaywrightMCP, async (page) => { | ||
| // Navigate to the site first for cookie context | ||
| try { | ||
| const siteUrl = new URL(url); | ||
| await page.goto(`${siteUrl.protocol}//${siteUrl.host}`); | ||
| await page.wait(2); | ||
| } | ||
| catch { } | ||
| return cascadeProbe(page, url); | ||
| }); | ||
| console.log(renderCascadeResult(result)); | ||
| }); | ||
| // ── Dynamic site commands ────────────────────────────────────────────────── | ||
@@ -65,0 +82,0 @@ const registry = getRegistry(); |
+238
-2
@@ -19,2 +19,8 @@ /** | ||
| data = await executeStep(page, op, params, data, args); | ||
| // Detect error objects returned by steps (e.g. tap store not found) | ||
| if (data && typeof data === 'object' && !Array.isArray(data) && data.error) { | ||
| process.stderr.write(` ${chalk.yellow('⚠')} ${chalk.yellow(op)}: ${data.error}\n`); | ||
| if (data.hint) | ||
| process.stderr.write(` ${chalk.dim('💡')} ${chalk.dim(data.hint)}\n`); | ||
| } | ||
| if (debug) | ||
@@ -140,3 +146,14 @@ debugStepResult(op, data); | ||
| const js = String(render(params, { args, data })); | ||
| return page.evaluate(normalizeEvaluateSource(js)); | ||
| let result = await page.evaluate(normalizeEvaluateSource(js)); | ||
| // MCP may return JSON as a string — auto-parse it | ||
| if (typeof result === 'string') { | ||
| const trimmed = result.trim(); | ||
| if ((trimmed.startsWith('[') && trimmed.endsWith(']')) || (trimmed.startsWith('{') && trimmed.endsWith('}'))) { | ||
| try { | ||
| result = JSON.parse(trimmed); | ||
| } | ||
| catch { } | ||
| } | ||
| } | ||
| return result; | ||
| } | ||
@@ -219,3 +236,222 @@ case 'snapshot': { | ||
| } | ||
| case 'intercept': return data; | ||
| case 'intercept': { | ||
| // Declarative XHR interception step | ||
| // Usage: | ||
| // intercept: | ||
| // trigger: "navigate:https://..." | "evaluate:store.note.fetch()" | "click:ref" | ||
| // capture: "api/pattern" # URL substring to match | ||
| // timeout: 5 # seconds to wait for matching request | ||
| // select: "data.items" # optional: extract sub-path from response | ||
| const cfg = typeof params === 'object' ? params : {}; | ||
| const trigger = cfg.trigger ?? ''; | ||
| const capturePattern = cfg.capture ?? ''; | ||
| const timeout = cfg.timeout ?? 8; | ||
| const selectPath = cfg.select ?? null; | ||
| if (!capturePattern) | ||
| return data; | ||
| // Step 1: Execute the trigger action | ||
| if (trigger.startsWith('navigate:')) { | ||
| const url = render(trigger.slice('navigate:'.length), { args, data }); | ||
| await page.goto(String(url)); | ||
| } | ||
| else if (trigger.startsWith('evaluate:')) { | ||
| const js = trigger.slice('evaluate:'.length); | ||
| await page.evaluate(normalizeEvaluateSource(render(js, { args, data }))); | ||
| } | ||
| else if (trigger.startsWith('click:')) { | ||
| const ref = render(trigger.slice('click:'.length), { args, data }); | ||
| await page.click(String(ref).replace(/^@/, '')); | ||
| } | ||
| else if (trigger === 'scroll') { | ||
| await page.scroll('down'); | ||
| } | ||
| // Step 2: Wait a bit for network requests to fire | ||
| await page.wait(Math.min(timeout, 3)); | ||
| // Step 3: Get network requests and find matching ones | ||
| const rawNetwork = await page.networkRequests(false); | ||
| const matchingResponses = []; | ||
| if (typeof rawNetwork === 'string') { | ||
| // Parse the network output to find matching URLs | ||
| const lines = rawNetwork.split('\n'); | ||
| for (const line of lines) { | ||
| const match = line.match(/\[?(GET|POST)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i); | ||
| if (match) { | ||
| const [, method, url, status] = match; | ||
| if (url.includes(capturePattern) && status === '200') { | ||
| // Re-fetch the matching URL to get the response body | ||
| try { | ||
| const body = await page.evaluate(` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' }); | ||
| if (!resp.ok) return null; | ||
| return await resp.json(); | ||
| } catch { return null; } | ||
| } | ||
| `); | ||
| if (body) | ||
| matchingResponses.push(body); | ||
| } | ||
| catch { } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| // Step 4: Select from response if specified | ||
| let result = matchingResponses.length === 1 ? matchingResponses[0] : | ||
| matchingResponses.length > 1 ? matchingResponses : data; | ||
| if (selectPath && result) { | ||
| let current = result; | ||
| for (const part of String(selectPath).split('.')) { | ||
| if (current && typeof current === 'object' && !Array.isArray(current)) { | ||
| current = current[part]; | ||
| } | ||
| else | ||
| break; | ||
| } | ||
| result = current ?? result; | ||
| } | ||
| return result; | ||
| } | ||
| case 'tap': { | ||
| // ── Declarative Store Action Bridge ────────────────────────────────── | ||
| // Usage: | ||
| // tap: | ||
| // store: feed # Pinia/Vuex store name | ||
| // action: fetchFeeds # Store action to call | ||
| // args: [] # Optional args to pass to action | ||
| // capture: homefeed # URL pattern to capture response | ||
| // timeout: 5 # Seconds to wait for network (default: 5) | ||
| // select: data.items # Optional: extract sub-path from response | ||
| // framework: pinia # Optional: pinia | vuex (auto-detected if omitted) | ||
| // | ||
| // Generates a self-contained IIFE that: | ||
| // 1. Injects fetch + XHR dual interception proxy | ||
| // 2. Finds the Pinia/Vuex store and calls the action | ||
| // 3. Captures the response matching the URL pattern | ||
| // 4. Auto-cleans up interception in finally block | ||
| // 5. Returns the captured data (optionally sub-selected) | ||
| const cfg = typeof params === 'object' ? params : {}; | ||
| const storeName = String(render(cfg.store ?? '', { args, data })); | ||
| const actionName = String(render(cfg.action ?? '', { args, data })); | ||
| const capturePattern = String(render(cfg.capture ?? '', { args, data })); | ||
| const timeout = cfg.timeout ?? 5; | ||
| const selectPath = cfg.select ? String(render(cfg.select, { args, data })) : null; | ||
| const framework = cfg.framework ?? null; // auto-detect if null | ||
| const actionArgs = cfg.args ?? []; | ||
| if (!storeName || !actionName) | ||
| throw new Error('tap: store and action are required'); | ||
| // Build select chain for the captured response | ||
| const selectChain = selectPath | ||
| ? selectPath.split('.').map((p) => `?.[${JSON.stringify(p)}]`).join('') | ||
| : ''; | ||
| // Serialize action arguments | ||
| const actionArgsRendered = actionArgs.map((a) => { | ||
| const rendered = render(a, { args, data }); | ||
| return JSON.stringify(rendered); | ||
| }); | ||
| const actionCall = actionArgsRendered.length | ||
| ? `store[${JSON.stringify(actionName)}](${actionArgsRendered.join(', ')})` | ||
| : `store[${JSON.stringify(actionName)}]()`; | ||
| const js = ` | ||
| async () => { | ||
| // ── 1. Setup capture proxy (fetch + XHR dual interception) ── | ||
| let captured = null; | ||
| const capturePattern = ${JSON.stringify(capturePattern)}; | ||
| // Intercept fetch API | ||
| const origFetch = window.fetch; | ||
| window.fetch = async function(...fetchArgs) { | ||
| const resp = await origFetch.apply(this, fetchArgs); | ||
| try { | ||
| const url = typeof fetchArgs[0] === 'string' ? fetchArgs[0] | ||
| : fetchArgs[0] instanceof Request ? fetchArgs[0].url : String(fetchArgs[0]); | ||
| if (capturePattern && url.includes(capturePattern) && !captured) { | ||
| try { captured = await resp.clone().json(); } catch {} | ||
| } | ||
| } catch {} | ||
| return resp; | ||
| }; | ||
| // Intercept XMLHttpRequest | ||
| const origXhrOpen = XMLHttpRequest.prototype.open; | ||
| const origXhrSend = XMLHttpRequest.prototype.send; | ||
| XMLHttpRequest.prototype.open = function(method, url) { | ||
| this.__tapUrl = String(url); | ||
| return origXhrOpen.apply(this, arguments); | ||
| }; | ||
| XMLHttpRequest.prototype.send = function(body) { | ||
| if (capturePattern && this.__tapUrl?.includes(capturePattern)) { | ||
| const xhr = this; | ||
| const origHandler = xhr.onreadystatechange; | ||
| xhr.onreadystatechange = function() { | ||
| if (xhr.readyState === 4 && !captured) { | ||
| try { captured = JSON.parse(xhr.responseText); } catch {} | ||
| } | ||
| if (origHandler) origHandler.apply(this, arguments); | ||
| }; | ||
| // Also handle onload | ||
| const origOnload = xhr.onload; | ||
| xhr.onload = function() { | ||
| if (!captured) { try { captured = JSON.parse(xhr.responseText); } catch {} } | ||
| if (origOnload) origOnload.apply(this, arguments); | ||
| }; | ||
| } | ||
| return origXhrSend.apply(this, arguments); | ||
| }; | ||
| try { | ||
| // ── 2. Find store ── | ||
| let store = null; | ||
| const storeName = ${JSON.stringify(storeName)}; | ||
| const fw = ${JSON.stringify(framework)}; | ||
| // Auto-detect framework if not specified | ||
| const app = document.querySelector('#app'); | ||
| if (!fw || fw === 'pinia') { | ||
| // Try Pinia (Vue 3) | ||
| try { | ||
| const pinia = app?.__vue_app__?.config?.globalProperties?.$pinia; | ||
| if (pinia?._s) store = pinia._s.get(storeName); | ||
| } catch {} | ||
| } | ||
| if (!store && (!fw || fw === 'vuex')) { | ||
| // Try Vuex (Vue 2/3) | ||
| try { | ||
| const vuexStore = app?.__vue_app__?.config?.globalProperties?.$store | ||
| ?? app?.__vue__?.$store; | ||
| if (vuexStore) { | ||
| // Vuex doesn't have named stores like Pinia, dispatch action | ||
| store = { [${JSON.stringify(actionName)}]: (...a) => vuexStore.dispatch(storeName + '/' + ${JSON.stringify(actionName)}, ...a) }; | ||
| } | ||
| } catch {} | ||
| } | ||
| if (!store) return { error: 'Store not found: ' + storeName, hint: 'Page may not be fully loaded or store name may be incorrect' }; | ||
| if (typeof store[${JSON.stringify(actionName)}] !== 'function') { | ||
| return { error: 'Action not found: ' + ${JSON.stringify(actionName)} + ' on store ' + storeName, | ||
| hint: 'Available: ' + Object.keys(store).filter(k => typeof store[k] === 'function' && !k.startsWith('$') && !k.startsWith('_')).join(', ') }; | ||
| } | ||
| // ── 3. Call store action ── | ||
| await ${actionCall}; | ||
| // ── 4. Wait for network response ── | ||
| const deadline = Date.now() + ${timeout} * 1000; | ||
| while (!captured && Date.now() < deadline) { | ||
| await new Promise(r => setTimeout(r, 200)); | ||
| } | ||
| } finally { | ||
| // ── 5. Always restore originals ── | ||
| window.fetch = origFetch; | ||
| XMLHttpRequest.prototype.open = origXhrOpen; | ||
| XMLHttpRequest.prototype.send = origXhrSend; | ||
| } | ||
| if (!captured) return { error: 'No matching response captured for pattern: ' + capturePattern }; | ||
| return captured${selectChain} ?? captured; | ||
| } | ||
| `; | ||
| return page.evaluate(js); | ||
| } | ||
| default: return data; | ||
@@ -222,0 +458,0 @@ } |
+11
-8
| /** | ||
| * Synthesize: turn explore capabilities into ready-to-use CLI definitions. | ||
| * | ||
| * Takes the structured capabilities from Deep Explore and generates | ||
| * YAML pipeline files that can be directly registered as CLI commands. | ||
| * | ||
| * This is the bridge between discovery (explore) and usability (CLI). | ||
| * Synthesize candidate CLIs from explore artifacts. | ||
| * Generates evaluate-based YAML pipelines (matching hand-written adapter patterns). | ||
| */ | ||
| export declare function synthesizeFromExplore(target: string, opts?: any): any; | ||
| export declare function renderSynthesizeSummary(r: any): string; | ||
| export declare function synthesizeFromExplore(target: string, opts?: { | ||
| outDir?: string; | ||
| top?: number; | ||
| }): Record<string, any>; | ||
| export declare function renderSynthesizeSummary(result: Record<string, any>): string; | ||
| export declare function resolveExploreDir(target: string): string; | ||
| export declare function loadExploreBundle(exploreDir: string): Record<string, any>; | ||
| /** Backward-compatible export for scaffold.ts */ | ||
| export declare function buildCandidate(site: string, targetUrl: string, cap: any, endpoint: any): any; |
+142
-118
| /** | ||
| * Synthesize: turn explore capabilities into ready-to-use CLI definitions. | ||
| * | ||
| * Takes the structured capabilities from Deep Explore and generates | ||
| * YAML pipeline files that can be directly registered as CLI commands. | ||
| * | ||
| * This is the bridge between discovery (explore) and usability (CLI). | ||
| * Synthesize candidate CLIs from explore artifacts. | ||
| * Generates evaluate-based YAML pipelines (matching hand-written adapter patterns). | ||
| */ | ||
@@ -12,64 +8,65 @@ import * as fs from 'node:fs'; | ||
| import yaml from 'js-yaml'; | ||
| /** Volatile params to strip from generated URLs */ | ||
| const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']); | ||
| const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']); | ||
| const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']); | ||
| const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']); | ||
| export function synthesizeFromExplore(target, opts = {}) { | ||
| const exploreDir = fs.existsSync(target) ? target : path.join('.opencli', 'explore', target); | ||
| if (!fs.existsSync(exploreDir)) | ||
| throw new Error(`Explore dir not found: ${target}`); | ||
| const manifest = JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8')); | ||
| const capabilities = JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8')); | ||
| const endpoints = JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8')); | ||
| const auth = JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8')); | ||
| const exploreDir = resolveExploreDir(target); | ||
| const bundle = loadExploreBundle(exploreDir); | ||
| const targetDir = opts.outDir ?? path.join(exploreDir, 'candidates'); | ||
| fs.mkdirSync(targetDir, { recursive: true }); | ||
| const site = manifest.site; | ||
| const topN = opts.top ?? 5; | ||
| const site = bundle.manifest.site; | ||
| const capabilities = (bundle.capabilities ?? []) | ||
| .sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0)) | ||
| .slice(0, opts.top ?? 3); | ||
| const candidates = []; | ||
| // Sort capabilities by confidence | ||
| const sortedCaps = [...capabilities] | ||
| .sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0)) | ||
| .slice(0, topN); | ||
| for (const cap of sortedCaps) { | ||
| // Find the matching endpoint for more detail | ||
| const endpoint = endpoints.find((ep) => ep.pattern === cap.endpoint) ?? | ||
| endpoints[0]; | ||
| const candidate = buildCandidateYaml(site, manifest, cap, endpoint); | ||
| const fileName = `${cap.name}.yaml`; | ||
| const filePath = path.join(targetDir, fileName); | ||
| for (const cap of capabilities) { | ||
| const endpoint = chooseEndpoint(cap, bundle.endpoints); | ||
| if (!endpoint) | ||
| continue; | ||
| const candidate = buildCandidateYaml(site, bundle.manifest, cap, endpoint); | ||
| const filePath = path.join(targetDir, `${candidate.name}.yaml`); | ||
| fs.writeFileSync(filePath, yaml.dump(candidate.yaml, { sortKeys: false, lineWidth: 120 })); | ||
| candidates.push({ | ||
| name: cap.name, | ||
| path: filePath, | ||
| strategy: cap.strategy, | ||
| endpoint: cap.endpoint, | ||
| confidence: cap.confidence, | ||
| columns: candidate.yaml.columns, | ||
| }); | ||
| candidates.push({ name: candidate.name, path: filePath, strategy: cap.strategy, confidence: cap.confidence }); | ||
| } | ||
| const index = { | ||
| site, | ||
| target_url: manifest.target_url, | ||
| generated_from: exploreDir, | ||
| candidate_count: candidates.length, | ||
| candidates, | ||
| }; | ||
| const index = { site, target_url: bundle.manifest.target_url, generated_from: exploreDir, candidate_count: candidates.length, candidates }; | ||
| fs.writeFileSync(path.join(targetDir, 'candidates.json'), JSON.stringify(index, null, 2)); | ||
| return { site, explore_dir: exploreDir, out_dir: targetDir, candidate_count: candidates.length, candidates }; | ||
| } | ||
| export function renderSynthesizeSummary(result) { | ||
| const lines = ['opencli synthesize: OK', `Site: ${result.site}`, `Source: ${result.explore_dir}`, `Candidates: ${result.candidate_count}`]; | ||
| for (const c of result.candidates ?? []) | ||
| lines.push(` • ${c.name} (${c.strategy}, ${((c.confidence ?? 0) * 100).toFixed(0)}% confidence) → ${c.path}`); | ||
| return lines.join('\n'); | ||
| } | ||
| export function resolveExploreDir(target) { | ||
| if (fs.existsSync(target)) | ||
| return target; | ||
| const candidate = path.join('.opencli', 'explore', target); | ||
| if (fs.existsSync(candidate)) | ||
| return candidate; | ||
| throw new Error(`Explore directory not found: ${target}`); | ||
| } | ||
| export function loadExploreBundle(exploreDir) { | ||
| return { | ||
| site, | ||
| explore_dir: exploreDir, | ||
| out_dir: targetDir, | ||
| candidate_count: candidates.length, | ||
| candidates, | ||
| manifest: JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8')), | ||
| endpoints: JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8')), | ||
| capabilities: JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8')), | ||
| auth: JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8')), | ||
| }; | ||
| } | ||
| /** Volatile params to strip from generated URLs */ | ||
| const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']); | ||
| const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']); | ||
| const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']); | ||
| const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']); | ||
| /** | ||
| * Build a clean templated URL from a raw API URL. | ||
| * - Strips volatile params (w_rid, wts, etc.) | ||
| * - Templates search, limit, and pagination params | ||
| * - Builds URL string manually to avoid URL encoding of ${{ }} expressions | ||
| */ | ||
| function buildTemplatedUrl(rawUrl, cap, endpoint) { | ||
| function chooseEndpoint(cap, endpoints) { | ||
| if (!endpoints.length) | ||
| return null; | ||
| // Match by endpoint pattern from capability | ||
| if (cap.endpoint) { | ||
| const match = endpoints.find((e) => e.pattern === cap.endpoint || e.url?.includes(cap.endpoint)); | ||
| if (match) | ||
| return match; | ||
| } | ||
| return endpoints.sort((a, b) => (b.score ?? 0) - (a.score ?? 0))[0]; | ||
| } | ||
| // ── URL templating ───────────────────────────────────────────────────────── | ||
| function buildTemplatedUrl(rawUrl, cap, _endpoint) { | ||
| try { | ||
@@ -81,22 +78,14 @@ const u = new URL(rawUrl); | ||
| u.searchParams.forEach((v, k) => { | ||
| // Skip volatile params | ||
| if (VOLATILE_PARAMS.has(k)) | ||
| return; | ||
| // Template known param types | ||
| if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) { | ||
| if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) | ||
| params.push([k, '${{ args.keyword }}']); | ||
| } | ||
| else if (LIMIT_PARAM_NAMES.has(k)) { | ||
| else if (LIMIT_PARAM_NAMES.has(k)) | ||
| params.push([k, '${{ args.limit | default(20) }}']); | ||
| } | ||
| else if (PAGE_PARAM_NAMES.has(k)) { | ||
| else if (PAGE_PARAM_NAMES.has(k)) | ||
| params.push([k, '${{ args.page | default(1) }}']); | ||
| } | ||
| else { | ||
| else | ||
| params.push([k, v]); | ||
| } | ||
| }); | ||
| if (params.length === 0) | ||
| return base; | ||
| return base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&'); | ||
| return params.length ? base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&') : base; | ||
| } | ||
@@ -108,41 +97,86 @@ catch { | ||
| /** | ||
| * Build a YAML pipeline definition from a capability + endpoint. | ||
| * Build inline evaluate script for browser-based fetch+parse. | ||
| * Follows patterns from bilibili/hot.yaml and twitter/trending.yaml. | ||
| */ | ||
| function buildEvaluateScript(url, itemPath, endpoint) { | ||
| const pathChain = itemPath.split('.').map((p) => `?.${p}`).join(''); | ||
| const detectedFields = endpoint?.detectedFields ?? {}; | ||
| const hasFields = Object.keys(detectedFields).length > 0; | ||
| let mapCode = ''; | ||
| if (hasFields) { | ||
| const mappings = Object.entries(detectedFields) | ||
| .map(([role, field]) => ` ${role}: item${String(field).split('.').map(p => `?.${p}`).join('')}`) | ||
| .join(',\n'); | ||
| mapCode = `.map((item) => ({\n${mappings}\n }))`; | ||
| } | ||
| return [ | ||
| '(async () => {', | ||
| ` const res = await fetch('${url}', {`, | ||
| ` credentials: 'include'`, | ||
| ' });', | ||
| ' const data = await res.json();', | ||
| ` return (data${pathChain} || [])${mapCode};`, | ||
| '})()\n', | ||
| ].join('\n'); | ||
| } | ||
| // ── YAML pipeline generation ─────────────────────────────────────────────── | ||
| function buildCandidateYaml(site, manifest, cap, endpoint) { | ||
| const needsBrowser = cap.strategy !== 'public'; | ||
| const pipeline = []; | ||
| // Step 1: Navigate (if browser-based) | ||
| if (needsBrowser) { | ||
| const templatedUrl = buildTemplatedUrl(endpoint?.url ?? manifest.target_url, cap, endpoint); | ||
| let domain = ''; | ||
| try { | ||
| domain = new URL(manifest.target_url).hostname; | ||
| } | ||
| catch { } | ||
| if (cap.strategy === 'store-action' && cap.storeHint) { | ||
| // Store Action: navigate + wait + tap (declarative, clean) | ||
| pipeline.push({ navigate: manifest.target_url }); | ||
| pipeline.push({ wait: 3 }); | ||
| const tapStep = { | ||
| store: cap.storeHint.store, | ||
| action: cap.storeHint.action, | ||
| timeout: 8, | ||
| }; | ||
| // Infer capture pattern from endpoint URL | ||
| if (endpoint?.url) { | ||
| try { | ||
| const epUrl = new URL(endpoint.url); | ||
| const pathParts = epUrl.pathname.split('/').filter((p) => p); | ||
| // Use last meaningful path segment as capture pattern | ||
| const capturePart = pathParts.filter((p) => !p.match(/^v\d+$/)).pop(); | ||
| if (capturePart) | ||
| tapStep.capture = capturePart; | ||
| } | ||
| catch { } | ||
| } | ||
| if (cap.itemPath) | ||
| tapStep.select = cap.itemPath; | ||
| pipeline.push({ tap: tapStep }); | ||
| } | ||
| // Step 2: Fetch the API — build a clean URL with templates | ||
| const rawUrl = endpoint?.url ?? manifest.target_url; | ||
| const fetchStep = { url: buildTemplatedUrl(rawUrl, cap, endpoint) }; | ||
| pipeline.push({ fetch: fetchStep }); | ||
| // Step 3: Select the item path | ||
| if (cap.itemPath) { | ||
| pipeline.push({ select: cap.itemPath }); | ||
| else if (needsBrowser) { | ||
| // Browser-based: navigate + evaluate (like bilibili/hot.yaml, twitter/trending.yaml) | ||
| pipeline.push({ navigate: manifest.target_url }); | ||
| const itemPath = cap.itemPath ?? 'data.data.list'; | ||
| pipeline.push({ evaluate: buildEvaluateScript(templatedUrl, itemPath, endpoint) }); | ||
| } | ||
| // Step 4: Map fields to columns | ||
| else { | ||
| // Public API: direct fetch (like hackernews/top.yaml) | ||
| pipeline.push({ fetch: { url: templatedUrl } }); | ||
| if (cap.itemPath) | ||
| pipeline.push({ select: cap.itemPath }); | ||
| } | ||
| // Map fields | ||
| const mapStep = {}; | ||
| const columns = cap.recommendedColumns ?? ['title', 'url']; | ||
| // Add a rank column if not doing search | ||
| if (!cap.recommendedArgs?.some((a) => a.name === 'keyword')) { | ||
| if (!cap.recommendedArgs?.some((a) => a.name === 'keyword')) | ||
| mapStep['rank'] = '${{ index + 1 }}'; | ||
| } | ||
| // Build field mappings from the endpoint's detected fields | ||
| const detectedFields = endpoint?.detectedFields ?? {}; | ||
| for (const col of columns) { | ||
| const fieldPath = detectedFields[col]; | ||
| if (fieldPath) { | ||
| mapStep[col] = `\${{ item.${fieldPath} }}`; | ||
| } | ||
| else { | ||
| mapStep[col] = `\${{ item.${col} }}`; | ||
| } | ||
| mapStep[col] = fieldPath ? `\${{ item.${fieldPath} }}` : `\${{ item.${col} }}`; | ||
| } | ||
| pipeline.push({ map: mapStep }); | ||
| // Step 5: Limit | ||
| pipeline.push({ limit: '${{ args.limit | default(20) }}' }); | ||
| // Build args definition | ||
| // Args | ||
| const argsDef = {}; | ||
@@ -163,33 +197,23 @@ for (const arg of cap.recommendedArgs ?? []) { | ||
| } | ||
| // Ensure limit arg always exists | ||
| if (!argsDef['limit']) { | ||
| if (!argsDef['limit']) | ||
| argsDef['limit'] = { type: 'int', default: 20, description: 'Number of items to return' }; | ||
| } | ||
| const allColumns = Object.keys(mapStep); | ||
| return { | ||
| name: cap.name, | ||
| yaml: { | ||
| site, | ||
| name: cap.name, | ||
| description: `${site} ${cap.name} (auto-generated)`, | ||
| domain: manifest.final_url ? new URL(manifest.final_url).hostname : undefined, | ||
| strategy: cap.strategy, | ||
| browser: needsBrowser, | ||
| args: argsDef, | ||
| pipeline, | ||
| columns: allColumns, | ||
| site, name: cap.name, description: `${cap.description || site + ' ' + cap.name} (auto-generated)`, | ||
| domain, strategy: cap.strategy, browser: needsBrowser, | ||
| args: argsDef, pipeline, columns: Object.keys(mapStep), | ||
| }, | ||
| }; | ||
| } | ||
| export function renderSynthesizeSummary(r) { | ||
| const lines = [ | ||
| 'opencli synthesize: OK', | ||
| `Site: ${r.site}`, | ||
| `Source: ${r.explore_dir}`, | ||
| `Candidates: ${r.candidate_count}`, | ||
| ]; | ||
| for (const c of r.candidates ?? []) { | ||
| lines.push(` • ${c.name} (${c.strategy}, ${(c.confidence * 100).toFixed(0)}% confidence) → ${c.path}`); | ||
| } | ||
| return lines.join('\n'); | ||
| /** Backward-compatible export for scaffold.ts */ | ||
| export function buildCandidate(site, targetUrl, cap, endpoint) { | ||
| // Map old-style field names to new ones | ||
| const normalizedCap = { | ||
| ...cap, | ||
| recommendedArgs: cap.recommendedArgs ?? cap.recommended_args, | ||
| recommendedColumns: cap.recommendedColumns ?? cap.recommended_columns, | ||
| }; | ||
| const manifest = { target_url: targetUrl, final_url: targetUrl }; | ||
| return buildCandidateYaml(site, manifest, normalizedCap, endpoint); | ||
| } |
+4
-2
| { | ||
| "name": "@jackwener/opencli", | ||
| "version": "0.1.0", | ||
| "version": "0.1.1", | ||
| "publishConfig": { | ||
@@ -15,3 +15,5 @@ "access": "public" | ||
| "dev": "tsx src/main.ts", | ||
| "build": "tsc", | ||
| "build": "tsc && npm run clean-yaml && npm run copy-yaml", | ||
| "clean-yaml": "find dist/clis -name '*.yaml' -o -name '*.yml' 2>/dev/null | xargs rm -f", | ||
| "copy-yaml": "find src/clis -name '*.yaml' -o -name '*.yml' | while read f; do d=\"dist/${f#src/}\"; mkdir -p \"$(dirname \"$d\")\"; cp \"$f\" \"$d\"; done", | ||
| "start": "node dist/main.js", | ||
@@ -18,0 +20,0 @@ "typecheck": "tsc --noEmit", |
+116
-38
| # OpenCLI | ||
| > **Make any website your CLI.** 操控 Chrome 无风控风险,复用登录,CLI 化全部网站。 | ||
| > **Make any website your CLI.** | ||
| > Zero risk · Reuse Chrome login · AI-powered discovery | ||
| OpenCLI 是一个 AI Native 的 CLI 工具,通过 Chrome 浏览器 + Playwright MCP Bridge 扩展,将任何网站变成命令行工具。 | ||
| [中文文档](./README.zh-CN.md) | ||
| ## ✨ 特性 | ||
| [](https://www.npmjs.com/package/@jackwener/opencli) | ||
| - 🌐 **CLI 化全部网站** — 支持 Bilibili、知乎、GitHub、Twitter、V2EX、Hacker News 等 | ||
| - 🔐 **零风控风险** — 复用 Chrome 已登录状态,无需存储密码 | ||
| - 🤖 **AI Native** — AI agent 可直接探索网站并自动生成新命令 | ||
| - 📝 **声明式 YAML** — 用 YAML 定义 pipeline,无需写代码 | ||
| - 🔌 **TypeScript 扩展** — 复杂场景用 TS 编写适配器 | ||
| OpenCLI turns any website into a command-line tool by bridging your Chrome browser through [Playwright MCP](https://github.com/nichochar/playwright-mcp). No passwords stored, no tokens leaked — it just rides your existing browser session. | ||
| ## 🚀 快速开始 | ||
| ## ✨ Highlights | ||
| - 🌐 **25+ commands, 8 sites** — Bilibili, Zhihu, GitHub, Twitter/X, Reddit, V2EX, Xiaohongshu, Hacker News | ||
| - 🔐 **Account-safe** — Reuses Chrome's logged-in state; your credentials never leave the browser | ||
| - 🤖 **AI Agent ready** — `explore` discovers APIs, `synthesize` generates adapters, `cascade` finds auth strategies | ||
| - 📝 **Declarative YAML** — Most adapters are ~30 lines of YAML pipeline | ||
| - 🔌 **TypeScript escape hatch** — Complex adapters (XHR interception, GraphQL) in TS | ||
| ## 🚀 Quick Start | ||
| ### Install via npm (recommended) | ||
| ```bash | ||
| # 安装依赖 | ||
| cd ~/code/ai-native-cli && npm install | ||
| npm install -g @jackwener/opencli | ||
| ``` | ||
| # 列出所有命令 | ||
| Then use directly: | ||
| ```bash | ||
| opencli list # See all commands | ||
| opencli hackernews top --limit 5 # Public API, no browser | ||
| opencli bilibili hot --limit 5 # Browser command | ||
| opencli zhihu hot -f json # JSON output | ||
| ``` | ||
| ### Install from source | ||
| ```bash | ||
| git clone git@github.com:jackwener/opencli.git | ||
| cd opencli && npm install | ||
| npx tsx src/main.ts list | ||
| ``` | ||
| # 公共 API(无需浏览器) | ||
| npx tsx src/main.ts hackernews top --limit 10 | ||
| npx tsx src/main.ts github search --keyword "typescript" | ||
| ### Update | ||
| # 浏览器命令(需要 Chrome + Playwright MCP Bridge 扩展) | ||
| npx tsx src/main.ts bilibili hot --limit 10 | ||
| npx tsx src/main.ts zhihu hot --limit 10 | ||
| npx tsx src/main.ts twitter trending --limit 10 | ||
| ```bash | ||
| # npm global | ||
| npm update -g @jackwener/opencli | ||
| # Or reinstall to latest | ||
| npm install -g @jackwener/opencli@latest | ||
| ``` | ||
| ## 📋 前置要求 | ||
| ## 📋 Prerequisites | ||
| 浏览器命令需要: | ||
| 1. Chrome 浏览器正在运行 | ||
| 2. 安装 [Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm) 扩展 | ||
| 3. 首次使用时点击扩展图标批准连接 | ||
| Browser commands need: | ||
| 1. **Chrome** running **and logged into the target site** (e.g. bilibili.com, zhihu.com, xiaohongshu.com) | ||
| 2. **[Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm)** extension installed | ||
| 3. Configure `PLAYWRIGHT_MCP_EXTENSION_TOKEN` (from the extension settings page) in your MCP config: | ||
| ## 📦 内置命令 | ||
| ```json | ||
| { | ||
| "mcpServers": { | ||
| "playwright": { | ||
| "command": "npx", | ||
| "args": ["@playwright/mcp@latest", "--extension"], | ||
| "env": { | ||
| "PLAYWRIGHT_MCP_EXTENSION_TOKEN": "<your-token>" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| | 站点 | 命令 | 说明 | 模式 | | ||
| |------|------|------|------| | ||
| | bilibili | hot, search, me, favorite, history, feed, user-videos | 热门 / 搜索 / 个人 / 收藏 / 历史 / 动态 / 投稿 | 🔐 浏览器 | | ||
| | zhihu | hot, search | 热榜 / 搜索 | 🔐 浏览器 | | ||
| | github | trending, search | Trending / 搜索 | 🔐 / 🌐 公共 | | ||
| | twitter | trending | 热门话题 | 🔐 浏览器 | | ||
| | v2ex | hot, latest | 热门 / 最新 | 🔐 浏览器 | | ||
| | hackernews | top | 热门故事 | 🌐 公共 API | | ||
| Public API commands (`hackernews`, `github search`, `v2ex`) need no browser at all. | ||
| ## 🎨 输出格式 | ||
| > **⚠️ Important**: Browser commands reuse your Chrome login session. You must be logged into the target website in Chrome before running commands. If you get empty data or errors, check your login status first. | ||
| ## 📦 Built-in Commands | ||
| | Site | Commands | Mode | | ||
| |------|----------|------| | ||
| | **bilibili** | `hot` `search` `me` `favorite` `history` `feed` `user-videos` | 🔐 Browser | | ||
| | **zhihu** | `hot` `search` `question` | 🔐 Browser | | ||
| | **xiaohongshu** | `search` `notifications` `feed` | 🔐 Browser | | ||
| | **twitter** | `trending` | 🔐 Browser | | ||
| | **reddit** | `hot` | 🔐 Browser | | ||
| | **github** | `trending` `search` | 🔐 / 🌐 | | ||
| | **v2ex** | `hot` `latest` `topic` | 🌐 Public | | ||
| | **hackernews** | `top` | 🌐 Public | | ||
| ## 🎨 Output Formats | ||
| ```bash | ||
| opencli bilibili hot -f table # 默认表格 | ||
| opencli bilibili hot -f json # JSON(适合管道和 AI agent) | ||
| opencli bilibili hot -f table # Default: rich table | ||
| opencli bilibili hot -f json # JSON (pipe to jq, feed to AI) | ||
| opencli bilibili hot -f md # Markdown | ||
| opencli bilibili hot -f csv # CSV | ||
| opencli bilibili hot -v # Verbose: show pipeline steps | ||
| ``` | ||
| ## 🔧 创建新命令 | ||
| ## 🧠 AI Agent Workflow | ||
| 参考 [SKILL.md](./SKILL.md) 了解 YAML 和 TypeScript 两种方式创建新的 CLI 适配器。 | ||
| ```bash | ||
| # 1. Deep Explore — discover APIs, infer capabilities, detect framework | ||
| opencli explore https://example.com --site mysite | ||
| # 2. Synthesize — generate YAML adapters from explore artifacts | ||
| opencli synthesize mysite | ||
| # 3. Generate — one-shot: explore → synthesize → register | ||
| opencli generate https://example.com --goal "hot" | ||
| # 4. Strategy Cascade — auto-probe: PUBLIC → COOKIE → HEADER | ||
| opencli cascade https://api.example.com/data | ||
| ``` | ||
| Explore outputs to `.opencli/explore/<site>/`: | ||
| - `manifest.json` — site metadata, framework detection | ||
| - `endpoints.json` — scored API endpoints with response schemas | ||
| - `capabilities.json` — inferred capabilities with confidence scores | ||
| - `auth.json` — authentication strategy recommendations | ||
| ## 🔧 Create New Commands | ||
| See **[SKILL.md](./SKILL.md)** for the full adapter guide (YAML pipeline + TypeScript). | ||
| ## Releasing New Versions | ||
| ```bash | ||
| # Bump version | ||
| npm version patch # 0.1.0 → 0.1.1 | ||
| npm version minor # 0.1.0 → 0.2.0 | ||
| npm version major # 0.1.0 → 1.0.0 | ||
| # Push tag to trigger GitHub Actions auto-release | ||
| git push --follow-tags | ||
| ``` | ||
| The CI will automatically build, create a GitHub release, and publish to npm. | ||
| ## 📄 License | ||
| MIT |
+148
-96
| --- | ||
| name: opencli | ||
| description: "OpenCLI — Make any website your CLI. Zero setup, AI-powered. Turn any website into CLI commands via Chrome browser." | ||
| description: "OpenCLI — Make any website your CLI. Zero risk, AI-powered, reuse Chrome login." | ||
| version: 0.1.0 | ||
| author: jackwener | ||
| tags: [cli, browser, web, mcp, playwright, bilibili, zhihu, twitter, github, v2ex, hackernews, 哔哩哔哩, 知乎, AI, agent] | ||
| tags: [cli, browser, web, mcp, playwright, bilibili, zhihu, twitter, github, v2ex, hackernews, reddit, xiaohongshu, AI, agent] | ||
| --- | ||
@@ -11,36 +11,36 @@ | ||
| > Make any website your CLI. 操控 Chrome 无风控风险,复用登录,CLI 化全部网站。 | ||
| > Make any website your CLI. Reuse Chrome login, zero risk, AI-powered discovery. | ||
| ## 安装 | ||
| ## Install & Run | ||
| ```bash | ||
| cd ~/code/ai-native-cli | ||
| npm install | ||
| ``` | ||
| # npm global install (recommended) | ||
| npm install -g @jackwener/opencli | ||
| opencli <command> | ||
| ## 使用方式 | ||
| ```bash | ||
| # 通过 npx 运行(推荐) | ||
| # Or from source | ||
| cd ~/code/opencli && npm install | ||
| npx tsx src/main.ts <command> | ||
| # 或者构建后运行 | ||
| npm run build && node dist/main.js <command> | ||
| # Update to latest | ||
| npm update -g @jackwener/opencli | ||
| ``` | ||
| ## 前置要求 | ||
| ## Prerequisites | ||
| 浏览器命令需要: | ||
| 1. Chrome 浏览器正在运行 | ||
| 2. 安装 [Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm) 扩展 | ||
| 3. 点击扩展图标批准连接 | ||
| Browser commands require: | ||
| 1. Chrome browser running **(logged into target sites)** | ||
| 2. [Playwright MCP Bridge](https://chromewebstore.google.com/detail/playwright-mcp-bridge/mmlmfjhmonkocbjadbfplnigmagldckm) extension | ||
| 3. Configure `PLAYWRIGHT_MCP_EXTENSION_TOKEN` (from extension settings) in your MCP config | ||
| 公共 API 命令(hackernews、github search)无需浏览器。 | ||
| > **Note**: You must be logged into the target website in Chrome before running commands. Tabs opened during command execution are auto-closed afterwards. | ||
| ## 内置命令 | ||
| Public API commands (`hackernews`, `github search`, `v2ex`) need no browser. | ||
| ### 数据查询 | ||
| ## Commands Reference | ||
| ### Data Commands | ||
| ```bash | ||
| # Bilibili | ||
| # Bilibili (browser) | ||
| opencli bilibili hot --limit 10 # B站热门视频 | ||
@@ -54,46 +54,67 @@ opencli bilibili search --keyword "rust" # 搜索视频 | ||
| # 知乎 | ||
| # 知乎 (browser) | ||
| opencli zhihu hot --limit 10 # 知乎热榜 | ||
| opencli zhihu search --keyword "AI" # 搜索 | ||
| opencli zhihu question --id 34816524 # 问题详情和回答 | ||
| # GitHub | ||
| # 小红书 (browser) | ||
| opencli xiaohongshu search --keyword "美食" # 搜索笔记 | ||
| opencli xiaohongshu notifications # 通知(mentions/likes/connections) | ||
| opencli xiaohongshu feed --limit 10 # 推荐 Feed | ||
| # GitHub (trending=browser, search=public) | ||
| opencli github trending --limit 10 # GitHub Trending | ||
| opencli github search --keyword "cli" # 搜索仓库(无需浏览器) | ||
| opencli github search --keyword "cli" # 搜索仓库 | ||
| # Twitter/X | ||
| # Twitter/X (browser) | ||
| opencli twitter trending --limit 10 # 热门话题 | ||
| # V2EX | ||
| # Reddit (browser) | ||
| opencli reddit hot --limit 10 # 热门帖子 | ||
| opencli reddit hot --subreddit programming # 指定子版块 | ||
| # V2EX (public) | ||
| opencli v2ex hot --limit 10 # 热门话题 | ||
| opencli v2ex latest --limit 10 # 最新话题 | ||
| opencli v2ex topic --id 1024 # 主题详情 | ||
| # Hacker News | ||
| opencli hackernews top --limit 10 # 热门故事(无需浏览器) | ||
| # Hacker News (public) | ||
| opencli hackernews top --limit 10 # Top stories | ||
| ``` | ||
| ### 管理命令 | ||
| ### Management Commands | ||
| ```bash | ||
| opencli list # 列出所有可用命令 | ||
| opencli list --json # JSON 格式输出 | ||
| opencli validate # 验证所有 CLI 定义 | ||
| opencli validate bilibili # 验证指定站点 | ||
| opencli list # List all commands | ||
| opencli list --json # JSON output | ||
| opencli validate # Validate all CLI definitions | ||
| opencli validate bilibili # Validate specific site | ||
| ``` | ||
| ### AI 工作流(为 AI Agent 设计) | ||
| ### AI Agent Workflow | ||
| ```bash | ||
| opencli explore <url> # 探索网站,生成 API 发现成果物 | ||
| opencli synthesize <site> # 从探索成果物合成候选 CLI | ||
| opencli generate <url> --goal "hot" # 一键:探索 → 合成 → 注册 | ||
| opencli verify <site/name> --smoke # 验证 + Smoke 测试 | ||
| # Deep Explore: network intercept → response analysis → capability inference | ||
| opencli explore <url> --site <name> | ||
| # Synthesize: generate evaluate-based YAML pipelines from explore artifacts | ||
| opencli synthesize <site> | ||
| # Generate: one-shot explore → synthesize → register | ||
| opencli generate <url> --goal "hot" | ||
| # Strategy Cascade: auto-probe PUBLIC → COOKIE → HEADER | ||
| opencli cascade <api-url> | ||
| # Verify: smoke-test a generated adapter | ||
| opencli verify <site/name> --smoke | ||
| ``` | ||
| ## 输出格式 | ||
| ## Output Formats | ||
| 所有命令支持 `--format` / `-f` 选项: | ||
| All commands support `--format` / `-f`: | ||
| ```bash | ||
| opencli bilibili hot -f table # 默认表格 | ||
| opencli bilibili hot -f json # JSON | ||
| opencli bilibili hot -f table # Default: rich table | ||
| opencli bilibili hot -f json # JSON (pipe to jq, feed to AI agent) | ||
| opencli bilibili hot -f md # Markdown | ||
@@ -103,13 +124,13 @@ opencli bilibili hot -f csv # CSV | ||
| ## 调试 | ||
| ## Verbose Mode | ||
| ```bash | ||
| opencli bilibili hot -v # 显示 pipeline 每步详情 | ||
| opencli bilibili hot -v # Show each pipeline step and data flow | ||
| ``` | ||
| ## 创建新的 CLI 适配器 | ||
| ## Creating Adapters | ||
| ### YAML 方式(声明式,推荐) | ||
| ### YAML Pipeline (declarative, recommended) | ||
| 在 `src/clis/<site>/<name>.yaml` 创建文件: | ||
| Create `src/clis/<site>/<name>.yaml`: | ||
@@ -119,3 +140,3 @@ ```yaml | ||
| name: hot | ||
| description: Hot topics on mysite | ||
| description: Hot topics | ||
| domain: www.mysite.com | ||
@@ -137,7 +158,9 @@ strategy: cookie # public | cookie | header | intercept | ui | ||
| const res = await fetch('/api/hot', { credentials: 'include' }); | ||
| return await res.json(); | ||
| const d = await res.json(); | ||
| return d.data.items.map(item => ({ | ||
| title: item.title, | ||
| score: item.score, | ||
| })); | ||
| })() | ||
| - select: data.items | ||
| - map: | ||
@@ -153,6 +176,21 @@ rank: ${{ index + 1 }} | ||
| ### TypeScript 方式(编程式,更灵活) | ||
| For public APIs (no browser): | ||
| 在 `src/clis/<site>/<name>.ts` 创建并在 `clis/index.ts` 中 import: | ||
| ```yaml | ||
| strategy: public | ||
| browser: false | ||
| pipeline: | ||
| - fetch: | ||
| url: https://api.example.com/hot.json | ||
| - select: data.items | ||
| - map: | ||
| title: ${{ item.title }} | ||
| - limit: ${{ args.limit }} | ||
| ``` | ||
| ### TypeScript Adapter (programmatic) | ||
| Create `src/clis/<site>/<name>.ts` and import in `clis/index.ts`: | ||
| ```typescript | ||
@@ -168,12 +206,13 @@ import { cli, Strategy } from '../../registry.js'; | ||
| func: async (page, kwargs) => { | ||
| await page.goto('https://www.mysite.com'); | ||
| const data = await page.evaluate(` | ||
| async () => { | ||
| const res = await fetch('/api/search?q=${kwargs.keyword}', { credentials: 'include' }); | ||
| (async () => { | ||
| const res = await fetch('/api/search?q=${kwargs.keyword}', { | ||
| credentials: 'include' | ||
| }); | ||
| return await res.json(); | ||
| } | ||
| })() | ||
| `); | ||
| return data.items.map((item, i) => ({ | ||
| rank: i + 1, | ||
| title: item.title, | ||
| url: item.url, | ||
| rank: i + 1, title: item.title, url: item.url, | ||
| })); | ||
@@ -184,34 +223,36 @@ }, | ||
| ## Pipeline 步骤参考 | ||
| **When to use TS**: XHR interception (小红书), cookie extraction (Twitter ct0), Wbi signing (Bilibili), auto-pagination, complex data transforms. | ||
| | 步骤 | 说明 | 示例 | | ||
| |------|------|------| | ||
| | `navigate` | 导航到 URL | `navigate: https://example.com` | | ||
| | `fetch` | HTTP 请求(使用浏览器 cookie) | `fetch: { url: "...", params: { q: "${{ args.keyword }}" } }` | | ||
| | `evaluate` | 执行 JavaScript | `evaluate: \| (async () => { ... })()` | | ||
| | `select` | 选取 JSON 路径 | `select: data.items` | | ||
| | `map` | 映射字段 | `map: { title: "${{ item.title }}" }` | | ||
| | `filter` | 过滤 | `filter: item.score > 100` | | ||
| | `sort` | 排序 | `sort: { by: score, order: desc }` | | ||
| | `limit` | 限制数量 | `limit: ${{ args.limit }}` | | ||
| | `snapshot` | 获取页面快照 | `snapshot: { interactive: true }` | | ||
| | `click` | 点击元素 | `click: ${{ ref }}` | | ||
| | `type` | 输入文本 | `type: { ref: "@1", text: "hello" }` | | ||
| | `wait` | 等待 | `wait: 2` 或 `wait: { text: "loaded" }` | | ||
| | `press` | 按键 | `press: Enter` | | ||
| ## Pipeline Steps | ||
| ## 模板语法 | ||
| | Step | Description | Example | | ||
| |------|-------------|---------| | ||
| | `navigate` | Go to URL | `navigate: https://example.com` | | ||
| | `fetch` | HTTP request (browser cookies) | `fetch: { url: "...", params: { q: "..." } }` | | ||
| | `evaluate` | Run JavaScript in page | `evaluate: \| (async () => { ... })()` | | ||
| | `select` | Extract JSON path | `select: data.items` | | ||
| | `map` | Map fields | `map: { title: "${{ item.title }}" }` | | ||
| | `filter` | Filter items | `filter: item.score > 100` | | ||
| | `sort` | Sort items | `sort: { by: score, order: desc }` | | ||
| | `limit` | Cap result count | `limit: ${{ args.limit }}` | | ||
| | `intercept` | Declarative XHR capture | `intercept: { trigger: "navigate:...", capture: "api/hot" }` | | ||
| | `tap` | Store action + XHR capture | `tap: { store: "feed", action: "fetchFeeds", capture: "homefeed" }` | | ||
| | `snapshot` | Page accessibility tree | `snapshot: { interactive: true }` | | ||
| | `click` | Click element | `click: ${{ ref }}` | | ||
| | `type` | Type text | `type: { ref: "@1", text: "hello" }` | | ||
| | `wait` | Wait for time/text | `wait: 2` or `wait: { text: "loaded" }` | | ||
| | `press` | Press key | `press: Enter` | | ||
| 使用 `${{ expression }}` 进行模板替换: | ||
| ## Template Syntax | ||
| ```yaml | ||
| # 引用参数 | ||
| # Arguments with defaults | ||
| ${{ args.keyword }} | ||
| ${{ args.limit | default(20) }} | ||
| # 引用当前 item(在 map/filter 中) | ||
| # Current item (in map/filter) | ||
| ${{ item.title }} | ||
| ${{ item.data.nested.field }} | ||
| # 索引(从 0 开始) | ||
| # Index (0-based) | ||
| ${{ index }} | ||
@@ -221,19 +262,30 @@ ${{ index + 1 }} | ||
| ## 环境变量 | ||
| ## 5-Tier Authentication Strategy | ||
| | 变量 | 默认值 | 说明 | | ||
| |------|--------|------| | ||
| | `OPENCLI_BROWSER_CONNECT_TIMEOUT` | 30 | 浏览器连接超时(秒) | | ||
| | `OPENCLI_BROWSER_COMMAND_TIMEOUT` | 45 | 命令执行超时(秒) | | ||
| | `OPENCLI_BROWSER_EXPLORE_TIMEOUT` | 120 | Explore 超时(秒) | | ||
| | `OPENCLI_EXTENSION_LOCK_TIMEOUT` | 120 | 扩展锁超时(秒) | | ||
| | `PLAYWRIGHT_MCP_EXTENSION_TOKEN` | — | 自动批准扩展连接 | | ||
| | Tier | Name | Method | Example | | ||
| |------|------|--------|---------| | ||
| | 1 | `public` | No auth, Node.js fetch | Hacker News, V2EX | | ||
| | 2 | `cookie` | Browser fetch with `credentials: include` | Bilibili, Zhihu | | ||
| | 3 | `header` | Custom headers (ct0, Bearer) | Twitter GraphQL | | ||
| | 4 | `intercept` | XHR interception + store mutation | 小红书 Pinia | | ||
| | 5 | `ui` | Full UI automation (click/type/scroll) | Last resort | | ||
| ## 错误排查 | ||
| ## Environment Variables | ||
| | 错误 | 解决方案 | | ||
| |------|----------| | ||
| | `npx not found` | 安装 Node.js: `brew install node` | | ||
| | `Timed out connecting to browser` | 1) 确认 Chrome 已打开 2) 安装 Playwright MCP Bridge 扩展 3) 点击扩展图标批准 | | ||
| | `Extension lock timed out` | 等待其他 opencli 命令完成,浏览器命令需串行运行 | | ||
| | `Request timed out` | 增大 `OPENCLI_BROWSER_COMMAND_TIMEOUT` 或检查网络 | | ||
| | Variable | Default | Description | | ||
| |----------|---------|-------------| | ||
| | `OPENCLI_BROWSER_CONNECT_TIMEOUT` | 30 | Browser connection timeout (sec) | | ||
| | `OPENCLI_BROWSER_COMMAND_TIMEOUT` | 45 | Command execution timeout (sec) | | ||
| | `OPENCLI_BROWSER_EXPLORE_TIMEOUT` | 120 | Explore timeout (sec) | | ||
| | `OPENCLI_EXTENSION_LOCK_TIMEOUT` | 120 | Extension lock timeout (sec) | | ||
| | `PLAYWRIGHT_MCP_EXTENSION_TOKEN` | — | Auto-approve extension connection | | ||
| ## Troubleshooting | ||
| | Issue | Solution | | ||
| |-------|----------| | ||
| | `npx not found` | Install Node.js: `brew install node` | | ||
| | `Timed out connecting to browser` | 1) Chrome must be open 2) Install MCP Bridge extension 3) Click to approve | | ||
| | `Extension lock timed out` | Another opencli command is running; browser commands run serially | | ||
| | `Target page context` error | Add `navigate:` step before `evaluate:` in YAML | | ||
| | Empty table data | Check if evaluate returns JSON string (MCP parsing) or data path is wrong | |
+33
-1
@@ -39,3 +39,14 @@ /** | ||
| if (textParts.length === 1) { | ||
| const text = textParts[0].text; | ||
| let text = textParts[0].text; | ||
| // MCP browser_evaluate returns: "[JSON]\n### Ran Playwright code\n```js\n...\n```" | ||
| // Strip the "### Ran Playwright code" suffix to get clean JSON | ||
| const codeMarker = text.indexOf('### Ran Playwright code'); | ||
| if (codeMarker !== -1) { | ||
| text = text.slice(0, codeMarker).trim(); | ||
| } | ||
| // Also handle "### Result\n[JSON]" format (some MCP versions) | ||
| const resultMarker = text.indexOf('### Result\n'); | ||
| if (resultMarker !== -1) { | ||
| text = text.slice(resultMarker + '### Result\n'.length).trim(); | ||
| } | ||
| try { return JSON.parse(text); } catch { return text; } | ||
@@ -119,2 +130,4 @@ } | ||
| private _page: Page | null = null; | ||
| async connect(opts: { timeout?: number } = {}): Promise<Page> { | ||
@@ -142,2 +155,3 @@ await this._acquireLock(); | ||
| ); | ||
| this._page = page; | ||
@@ -191,2 +205,19 @@ this._proc.stdout?.on('data', (chunk: Buffer) => { | ||
| try { | ||
| // Close tabs opened during this session (site tabs + extension tabs) | ||
| if (this._page && this._proc && !this._proc.killed) { | ||
| try { | ||
| const tabs = await this._page.tabs(); | ||
| const tabStr = typeof tabs === 'string' ? tabs : JSON.stringify(tabs); | ||
| const allTabs = tabStr.match(/Tab (\d+)/g) || []; | ||
| const currentTabCount = allTabs.length; | ||
| // Close tabs in reverse order to avoid index shifting issues | ||
| // Keep the original tabs that existed before the command started | ||
| if (currentTabCount > this._initialTabCount && this._initialTabCount > 0) { | ||
| for (let i = currentTabCount - 1; i >= this._initialTabCount; i--) { | ||
| try { await this._page.closeTab(i); } catch {} | ||
| } | ||
| } | ||
| } catch {} | ||
| } | ||
| if (this._proc && !this._proc.killed) { | ||
@@ -197,2 +228,3 @@ this._proc.kill('SIGTERM'); | ||
| } finally { | ||
| this._page = null; | ||
| this._releaseLock(); | ||
@@ -199,0 +231,0 @@ } |
@@ -19,2 +19,5 @@ /** | ||
| // zhihu | ||
| import './zhihu/search.js'; | ||
| import './zhihu/question.js'; | ||
| // xiaohongshu | ||
| import './xiaohongshu/search.js'; |
@@ -5,2 +5,4 @@ site: v2ex | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
@@ -14,7 +16,4 @@ args: | ||
| pipeline: | ||
| - evaluate: | | ||
| (async () => { | ||
| const res = await fetch('https://www.v2ex.com/api/topics/hot.json'); | ||
| return await res.json(); | ||
| })() | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/hot.json | ||
@@ -24,9 +23,6 @@ - map: | ||
| title: ${{ item.title }} | ||
| node: ${{ item.node?.title }} | ||
| author: ${{ item.member?.username }} | ||
| replies: ${{ item.replies }} | ||
| url: ${{ item.url }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title, node, author, replies] | ||
| columns: [rank, title, replies] |
@@ -5,2 +5,4 @@ site: v2ex | ||
| domain: www.v2ex.com | ||
| strategy: public | ||
| browser: false | ||
@@ -14,7 +16,4 @@ args: | ||
| pipeline: | ||
| - evaluate: | | ||
| (async () => { | ||
| const res = await fetch('https://www.v2ex.com/api/topics/latest.json'); | ||
| return await res.json(); | ||
| })() | ||
| - fetch: | ||
| url: https://www.v2ex.com/api/topics/latest.json | ||
@@ -24,4 +23,2 @@ - map: | ||
| title: ${{ item.title }} | ||
| node: ${{ item.node?.title }} | ||
| author: ${{ item.member?.username }} | ||
| replies: ${{ item.replies }} | ||
@@ -31,2 +28,2 @@ | ||
| columns: [rank, title, node, author, replies] | ||
| columns: [rank, title, replies] |
@@ -15,15 +15,29 @@ site: zhihu | ||
| - fetch: | ||
| url: https://www.zhihu.com/api/v4/search/top_search | ||
| params: | ||
| limit: ${{ args.limit }} | ||
| - evaluate: | | ||
| (async () => { | ||
| const res = await fetch('https://www.zhihu.com/api/v3/feed/topstory/hot-lists/total?limit=50', { | ||
| credentials: 'include' | ||
| }); | ||
| const d = await res.json(); | ||
| return (d?.data || []).map((item) => { | ||
| const t = item.target || {}; | ||
| return { | ||
| title: t.title, | ||
| url: 'https://www.zhihu.com/question/' + t.id, | ||
| answer_count: t.answer_count, | ||
| follower_count: t.follower_count, | ||
| heat: item.detail_text || '', | ||
| }; | ||
| }); | ||
| })() | ||
| - select: top_search.words | ||
| - map: | ||
| rank: ${{ index + 1 }} | ||
| title: ${{ item.query }} | ||
| title: ${{ item.title }} | ||
| heat: ${{ item.heat }} | ||
| answers: ${{ item.answer_count }} | ||
| url: ${{ item.url }} | ||
| - limit: ${{ args.limit }} | ||
| columns: [rank, title] | ||
| columns: [rank, title, heat, answers] |
+299
-461
| /** | ||
| * Deep Explore: intelligent API discovery with response analysis. | ||
| * | ||
| * Unlike simple page snapshots, Deep Explore intercepts network traffic, | ||
| * analyzes response schemas, and automatically infers capabilities that | ||
| * can be turned into CLI commands. | ||
| * | ||
| * Flow: | ||
| * 1. Navigate to target URL | ||
| * 2. Auto-scroll to trigger lazy loading | ||
| * 3. Capture network requests (with body analysis) | ||
| * 4. For each JSON response: detect list fields, infer columns, analyze auth | ||
| * 5. Detect frontend framework (Vue/React/Pinia/Next.js) | ||
| * 6. Generate structured capabilities.json | ||
| * Navigates to the target URL, auto-scrolls to trigger lazy loading, | ||
| * captures network traffic, analyzes JSON responses, and automatically | ||
| * infers CLI capabilities from discovered API endpoints. | ||
| */ | ||
@@ -19,56 +11,51 @@ | ||
| import * as path from 'node:path'; | ||
| import { browserSession, DEFAULT_BROWSER_EXPLORE_TIMEOUT, runWithTimeout } from './runtime.js'; | ||
| import { DEFAULT_BROWSER_EXPLORE_TIMEOUT, browserSession, runWithTimeout } from './runtime.js'; | ||
| // ── Site name detection ──────────────────────────────────────────────────── | ||
| const KNOWN_ALIASES: Record<string, string> = { | ||
| const KNOWN_SITE_ALIASES: Record<string, string> = { | ||
| 'x.com': 'twitter', 'twitter.com': 'twitter', | ||
| 'news.ycombinator.com': 'hackernews', | ||
| 'www.zhihu.com': 'zhihu', 'www.bilibili.com': 'bilibili', | ||
| 'search.bilibili.com': 'bilibili', | ||
| 'www.v2ex.com': 'v2ex', 'www.reddit.com': 'reddit', | ||
| 'www.xiaohongshu.com': 'xiaohongshu', 'www.douban.com': 'douban', | ||
| 'www.weibo.com': 'weibo', 'search.bilibili.com': 'bilibili', | ||
| 'www.weibo.com': 'weibo', 'www.bbc.com': 'bbc', | ||
| }; | ||
| function detectSiteName(url: string): string { | ||
| export function detectSiteName(url: string): string { | ||
| try { | ||
| const host = new URL(url).hostname.toLowerCase(); | ||
| if (host in KNOWN_ALIASES) return KNOWN_ALIASES[host]; | ||
| if (host in KNOWN_SITE_ALIASES) return KNOWN_SITE_ALIASES[host]; | ||
| const parts = host.split('.').filter(p => p && p !== 'www'); | ||
| if (parts.length >= 2) { | ||
| if (['uk', 'jp', 'cn', 'com'].includes(parts[parts.length - 1]) && parts.length >= 3) { | ||
| return parts[parts.length - 3].replace(/[^a-z0-9]/g, ''); | ||
| return slugify(parts[parts.length - 3]); | ||
| } | ||
| return parts[parts.length - 2].replace(/[^a-z0-9]/g, ''); | ||
| return slugify(parts[parts.length - 2]); | ||
| } | ||
| return parts[0]?.replace(/[^a-z0-9]/g, '') ?? 'site'; | ||
| return parts[0] ? slugify(parts[0]) : 'site'; | ||
| } catch { return 'site'; } | ||
| } | ||
| export function slugify(value: string): string { | ||
| return value.trim().toLowerCase().replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-|-$/g, '') || 'site'; | ||
| } | ||
| // ── Field & capability inference ─────────────────────────────────────────── | ||
| /** | ||
| * Common field names grouped by semantic role. | ||
| * Used to auto-detect which response fields map to which columns. | ||
| */ | ||
| const FIELD_ROLES: Record<string, string[]> = { | ||
| title: ['title', 'name', 'text', 'content', 'desc', 'description', 'headline', 'subject'], | ||
| url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'short_link', 'share_url'], | ||
| author: ['author', 'username', 'user_name', 'nickname', 'nick', 'owner', 'creator', 'up_name', 'uname'], | ||
| score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'stat', 'play', 'favorite_count', 'reply_count'], | ||
| time: ['time', 'created_at', 'publish_time', 'pub_time', 'date', 'ctime', 'mtime', 'pubdate', 'created'], | ||
| id: ['id', 'aid', 'bvid', 'mid', 'uid', 'oid', 'note_id', 'item_id'], | ||
| cover: ['cover', 'pic', 'image', 'thumbnail', 'poster', 'avatar'], | ||
| category: ['category', 'tag', 'type', 'tname', 'channel', 'section'], | ||
| title: ['title', 'name', 'text', 'content', 'desc', 'description', 'headline', 'subject'], | ||
| url: ['url', 'uri', 'link', 'href', 'permalink', 'jump_url', 'web_url', 'share_url'], | ||
| author: ['author', 'username', 'user_name', 'nickname', 'nick', 'owner', 'creator', 'up_name', 'uname'], | ||
| score: ['score', 'hot', 'heat', 'likes', 'like_count', 'view_count', 'views', 'play', 'favorite_count', 'reply_count'], | ||
| time: ['time', 'created_at', 'publish_time', 'pub_time', 'date', 'ctime', 'mtime', 'pubdate', 'created'], | ||
| id: ['id', 'aid', 'bvid', 'mid', 'uid', 'oid', 'note_id', 'item_id'], | ||
| cover: ['cover', 'pic', 'image', 'thumbnail', 'poster', 'avatar'], | ||
| category: ['category', 'tag', 'type', 'tname', 'channel', 'section'], | ||
| }; | ||
| /** Param names that indicate searchable APIs */ | ||
| const SEARCH_PARAMS = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'search_query', 'w']); | ||
| /** Param names that indicate pagination */ | ||
| const PAGINATION_PARAMS = new Set(['page', 'pn', 'offset', 'cursor', 'next', 'page_num']); | ||
| /** Param names that indicate limit control */ | ||
| const LIMIT_PARAMS = new Set(['limit', 'count', 'size', 'per_page', 'page_size', 'ps', 'num']); | ||
| /** Content types to ignore */ | ||
| const IGNORED_CONTENT_TYPES = new Set(['image/', 'font/', 'text/css', 'text/javascript', 'application/javascript', 'application/wasm']); | ||
| /** Volatile query params to strip from patterns */ | ||
| const VOLATILE_PARAMS = new Set(['w_rid', 'wts', '_', 'callback', 'timestamp', 't', 'nonce', 'sign']); | ||
@@ -79,39 +66,17 @@ | ||
| interface NetworkEntry { | ||
| method: string; | ||
| url: string; | ||
| status: number | null; | ||
| contentType: string; | ||
| responseBody?: any; | ||
| requestHeaders?: Record<string, string>; | ||
| queryParams?: Record<string, string>; | ||
| method: string; url: string; status: number | null; | ||
| contentType: string; responseBody?: any; requestHeaders?: Record<string, string>; | ||
| } | ||
| interface AnalyzedEndpoint { | ||
| pattern: string; | ||
| method: string; | ||
| url: string; | ||
| status: number | null; | ||
| contentType: string; | ||
| queryParams: string[]; | ||
| hasSearchParam: boolean; | ||
| hasPaginationParam: boolean; | ||
| hasLimitParam: boolean; | ||
| pattern: string; method: string; url: string; status: number | null; | ||
| contentType: string; queryParams: string[]; score: number; | ||
| hasSearchParam: boolean; hasPaginationParam: boolean; hasLimitParam: boolean; | ||
| authIndicators: string[]; | ||
| responseAnalysis: ResponseAnalysis | null; | ||
| responseAnalysis: { itemPath: string | null; itemCount: number; detectedFields: Record<string, string>; sampleFields: string[] } | null; | ||
| } | ||
| interface ResponseAnalysis { | ||
| itemPath: string | null; | ||
| itemCount: number; | ||
| detectedFields: Record<string, string>; // role → actual field name | ||
| sampleFieldNames: string[]; | ||
| } | ||
| interface InferredCapability { | ||
| name: string; | ||
| description: string; | ||
| strategy: string; | ||
| confidence: number; | ||
| endpoint: string; | ||
| itemPath: string | null; | ||
| name: string; description: string; strategy: string; confidence: number; | ||
| endpoint: string; itemPath: string | null; | ||
| recommendedColumns: string[]; | ||
@@ -122,37 +87,18 @@ recommendedArgs: Array<{ name: string; type: string; required: boolean; default?: any }>; | ||
| /** | ||
| * Parse raw network output from Playwright MCP into structured entries. | ||
| * Handles both text format ([GET] url => [200]) and structured JSON. | ||
| * Parse raw network output from Playwright MCP. | ||
| * Handles text format: [GET] url => [200] | ||
| */ | ||
| function parseNetworkOutput(raw: any): NetworkEntry[] { | ||
| function parseNetworkRequests(raw: any): NetworkEntry[] { | ||
| if (typeof raw === 'string') { | ||
| // Playwright MCP returns network as text lines like: | ||
| // "[GET] https://api.example.com/xxx => [200] " | ||
| // May also have markdown headers like "### Result" | ||
| const entries: NetworkEntry[] = []; | ||
| const lines = raw.split('\n').filter((l: string) => l.trim()); | ||
| for (const line of lines) { | ||
| // Format: [METHOD] URL => [STATUS] optional_extra | ||
| const bracketMatch = line.match(/^\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i); | ||
| if (bracketMatch) { | ||
| const [, method, url, status] = bracketMatch; | ||
| for (const line of raw.split('\n')) { | ||
| // Format: [GET] URL => [200] | ||
| const m = line.match(/\[?(GET|POST|PUT|DELETE|PATCH|OPTIONS)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i); | ||
| if (m) { | ||
| const [, method, url, status] = m; | ||
| entries.push({ | ||
| method: method.toUpperCase(), | ||
| url, | ||
| status: status ? parseInt(status) : null, | ||
| contentType: url.endsWith('.json') ? 'application/json' : | ||
| (url.includes('/api/') || url.includes('/x/')) ? 'application/json' : '', | ||
| method: method.toUpperCase(), url, status: status ? parseInt(status) : null, | ||
| contentType: (url.includes('/api/') || url.includes('/x/') || url.endsWith('.json')) ? 'application/json' : '', | ||
| }); | ||
| continue; | ||
| } | ||
| // Legacy format: GET url → 200 (application/json) | ||
| const legacyMatch = line.match(/^(GET|POST|PUT|DELETE|PATCH|OPTIONS)\s+(\S+)\s*→?\s*(\d+)?\s*(?:\(([^)]*)\))?/i); | ||
| if (legacyMatch) { | ||
| const [, method, url, status, ct] = legacyMatch; | ||
| entries.push({ | ||
| method: method.toUpperCase(), | ||
| url, | ||
| status: status ? parseInt(status) : null, | ||
| contentType: ct ?? '', | ||
| }); | ||
| } | ||
| } | ||
@@ -162,9 +108,8 @@ return entries; | ||
| if (Array.isArray(raw)) { | ||
| return raw.map((e: any) => ({ | ||
| return raw.filter(e => e && typeof e === 'object').map(e => ({ | ||
| method: (e.method ?? 'GET').toUpperCase(), | ||
| url: e.url ?? e.request?.url ?? '', | ||
| url: String(e.url ?? e.request?.url ?? e.requestUrl ?? ''), | ||
| status: e.status ?? e.statusCode ?? null, | ||
| contentType: e.contentType ?? e.mimeType ?? '', | ||
| responseBody: e.responseBody ?? e.body, | ||
| requestHeaders: e.requestHeaders ?? e.headers, | ||
| contentType: e.contentType ?? e.response?.contentType ?? '', | ||
| responseBody: e.responseBody, requestHeaders: e.requestHeaders, | ||
| })); | ||
@@ -175,37 +120,12 @@ } | ||
| /** | ||
| * Normalize a URL into a pattern by replacing IDs with placeholders. | ||
| */ | ||
| function urlToPattern(url: string): string { | ||
| try { | ||
| const parsed = new URL(url); | ||
| const pathNorm = parsed.pathname | ||
| .replace(/\/\d+/g, '/{id}') | ||
| .replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}') | ||
| .replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}'); | ||
| const p = new URL(url); | ||
| const pathNorm = p.pathname.replace(/\/\d+/g, '/{id}').replace(/\/[0-9a-fA-F]{8,}/g, '/{hex}').replace(/\/BV[a-zA-Z0-9]{10}/g, '/{bvid}'); | ||
| const params: string[] = []; | ||
| parsed.searchParams.forEach((_v, k) => { | ||
| if (!VOLATILE_PARAMS.has(k)) params.push(k); | ||
| }); | ||
| const qs = params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : ''; | ||
| return `${parsed.host}${pathNorm}${qs}`; | ||
| p.searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k)) params.push(k); }); | ||
| return `${p.host}${pathNorm}${params.length ? '?' + params.sort().map(k => `${k}={}`).join('&') : ''}`; | ||
| } catch { return url; } | ||
| } | ||
| /** | ||
| * Extract query params from a URL. | ||
| */ | ||
| function extractQueryParams(url: string): Record<string, string> { | ||
| try { | ||
| const params: Record<string, string> = {}; | ||
| new URL(url).searchParams.forEach((v, k) => { params[k] = v; }); | ||
| return params; | ||
| } catch { return {}; } | ||
| } | ||
| /** | ||
| * Detect auth indicators from request headers. | ||
| */ | ||
| function detectAuthIndicators(headers?: Record<string, string>): string[] { | ||
@@ -218,76 +138,43 @@ if (!headers) return []; | ||
| if (keys.some(k => k.startsWith('x-s') || k === 'x-t' || k === 'x-s-common')) indicators.push('signature'); | ||
| if (keys.some(k => k === 'x-client-transaction-id')) indicators.push('transaction'); | ||
| return indicators; | ||
| } | ||
| /** | ||
| * Analyze a JSON response to find list data and field mappings. | ||
| */ | ||
| function analyzeResponseBody(body: any): ResponseAnalysis | null { | ||
| function analyzeResponseBody(body: any): AnalyzedEndpoint['responseAnalysis'] { | ||
| if (!body || typeof body !== 'object') return null; | ||
| // Try to find the main list in the response | ||
| const candidates: Array<{ path: string; items: any[] }> = []; | ||
| function findArrays(obj: any, currentPath: string, depth: number) { | ||
| function findArrays(obj: any, path: string, depth: number) { | ||
| if (depth > 4) return; | ||
| if (Array.isArray(obj) && obj.length >= 2) { | ||
| // Check if items are objects (not primitive arrays) | ||
| if (obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) { | ||
| candidates.push({ path: currentPath, items: obj }); | ||
| } | ||
| if (Array.isArray(obj) && obj.length >= 2 && obj.some(item => item && typeof item === 'object' && !Array.isArray(item))) { | ||
| candidates.push({ path, items: obj }); | ||
| } | ||
| if (obj && typeof obj === 'object' && !Array.isArray(obj)) { | ||
| for (const [key, val] of Object.entries(obj)) { | ||
| const nextPath = currentPath ? `${currentPath}.${key}` : key; | ||
| findArrays(val, nextPath, depth + 1); | ||
| } | ||
| for (const [key, val] of Object.entries(obj)) findArrays(val, path ? `${path}.${key}` : key, depth + 1); | ||
| } | ||
| } | ||
| findArrays(body, '', 0); | ||
| if (!candidates.length) return null; | ||
| // Pick the largest array as the main list | ||
| candidates.sort((a, b) => b.items.length - a.items.length); | ||
| const best = candidates[0]; | ||
| const sample = best.items[0]; | ||
| const sampleFields = sample && typeof sample === 'object' ? flattenFields(sample, '', 2) : []; | ||
| // Analyze field names in the first item | ||
| const sampleItem = best.items[0]; | ||
| const sampleFieldNames = sampleItem && typeof sampleItem === 'object' | ||
| ? flattenFieldNames(sampleItem, '', 2) | ||
| : []; | ||
| // Match fields to semantic roles | ||
| const detectedFields: Record<string, string> = {}; | ||
| for (const [role, aliases] of Object.entries(FIELD_ROLES)) { | ||
| for (const fieldName of sampleFieldNames) { | ||
| const basename = fieldName.split('.').pop()?.toLowerCase() ?? ''; | ||
| if (aliases.includes(basename)) { | ||
| detectedFields[role] = fieldName; | ||
| break; | ||
| } | ||
| for (const f of sampleFields) { | ||
| if (aliases.includes(f.split('.').pop()?.toLowerCase() ?? '')) { detectedFields[role] = f; break; } | ||
| } | ||
| } | ||
| return { | ||
| itemPath: best.path || null, | ||
| itemCount: best.items.length, | ||
| detectedFields, | ||
| sampleFieldNames, | ||
| }; | ||
| return { itemPath: best.path || null, itemCount: best.items.length, detectedFields, sampleFields }; | ||
| } | ||
| /** | ||
| * Flatten nested object field names for analysis. | ||
| */ | ||
| function flattenFieldNames(obj: any, prefix: string, maxDepth: number): string[] { | ||
| function flattenFields(obj: any, prefix: string, maxDepth: number): string[] { | ||
| if (maxDepth <= 0 || !obj || typeof obj !== 'object') return []; | ||
| const names: string[] = []; | ||
| for (const key of Object.keys(obj)) { | ||
| const fullKey = prefix ? `${prefix}.${key}` : key; | ||
| names.push(fullKey); | ||
| if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) { | ||
| names.push(...flattenFieldNames(obj[key], fullKey, maxDepth - 1)); | ||
| } | ||
| const full = prefix ? `${prefix}.${key}` : key; | ||
| names.push(full); | ||
| if (obj[key] && typeof obj[key] === 'object' && !Array.isArray(obj[key])) names.push(...flattenFields(obj[key], full, maxDepth - 1)); | ||
| } | ||
@@ -297,266 +184,192 @@ return names; | ||
| /** | ||
| * Analyze a list of network entries into structured endpoints. | ||
| */ | ||
| function analyzeEndpoints(entries: NetworkEntry[], siteHost: string): AnalyzedEndpoint[] { | ||
| const seen = new Map<string, AnalyzedEndpoint>(); | ||
| for (const entry of entries) { | ||
| if (!entry.url) continue; | ||
| // Skip static resources | ||
| const ct = entry.contentType.toLowerCase(); | ||
| if (IGNORED_CONTENT_TYPES.has(ct.split(';')[0]?.trim() ?? '') || | ||
| ct.includes('image/') || ct.includes('font/') || ct.includes('css') || | ||
| ct.includes('javascript') || ct.includes('wasm')) continue; | ||
| // Skip non-JSON and failed responses | ||
| if (entry.status && entry.status >= 400) continue; | ||
| const pattern = urlToPattern(entry.url); | ||
| const queryParams = extractQueryParams(entry.url); | ||
| const paramNames = Object.keys(queryParams).filter(k => !VOLATILE_PARAMS.has(k)); | ||
| const key = `${entry.method}:${pattern}`; | ||
| if (seen.has(key)) continue; | ||
| const endpoint: AnalyzedEndpoint = { | ||
| pattern, | ||
| method: entry.method, | ||
| url: entry.url, | ||
| status: entry.status, | ||
| contentType: ct, | ||
| queryParams: paramNames, | ||
| hasSearchParam: paramNames.some(p => SEARCH_PARAMS.has(p)), | ||
| hasPaginationParam: paramNames.some(p => PAGINATION_PARAMS.has(p)), | ||
| hasLimitParam: paramNames.some(p => LIMIT_PARAMS.has(p)), | ||
| authIndicators: detectAuthIndicators(entry.requestHeaders), | ||
| responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null, | ||
| }; | ||
| seen.set(key, endpoint); | ||
| } | ||
| return [...seen.values()]; | ||
| function scoreEndpoint(ep: { contentType: string; responseAnalysis: any; pattern: string; status: number | null; hasSearchParam: boolean; hasPaginationParam: boolean; hasLimitParam: boolean }): number { | ||
| let s = 0; | ||
| if (ep.contentType.includes('json')) s += 10; | ||
| if (ep.responseAnalysis) { s += 5; s += Math.min(ep.responseAnalysis.itemCount, 10); s += Object.keys(ep.responseAnalysis.detectedFields).length * 2; } | ||
| if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/')) s += 3; | ||
| if (ep.hasSearchParam) s += 3; | ||
| if (ep.hasPaginationParam) s += 2; | ||
| if (ep.hasLimitParam) s += 2; | ||
| if (ep.status === 200) s += 2; | ||
| return s; | ||
| } | ||
| /** | ||
| * Infer what strategy to use based on endpoint analysis. | ||
| */ | ||
| function inferStrategy(endpoint: AnalyzedEndpoint): string { | ||
| if (endpoint.authIndicators.includes('signature')) return 'intercept'; | ||
| if (endpoint.authIndicators.includes('transaction')) return 'header'; | ||
| if (endpoint.authIndicators.includes('bearer') || endpoint.authIndicators.includes('csrf')) return 'header'; | ||
| // Check if the URL is a public API (no auth indicators) | ||
| if (endpoint.authIndicators.length === 0) { | ||
| // If it's the same domain, likely cookie auth | ||
| return 'cookie'; | ||
| } | ||
| return 'cookie'; | ||
| } | ||
| /** | ||
| * Infer the capability name from an endpoint pattern. | ||
| */ | ||
| function inferCapabilityName(endpoint: AnalyzedEndpoint, goal?: string): string { | ||
| function inferCapabilityName(url: string, goal?: string): string { | ||
| if (goal) return goal; | ||
| const u = endpoint.url.toLowerCase(); | ||
| const p = endpoint.pattern.toLowerCase(); | ||
| // Match common patterns | ||
| if (endpoint.hasSearchParam) return 'search'; | ||
| const u = url.toLowerCase(); | ||
| if (u.includes('hot') || u.includes('popular') || u.includes('ranking') || u.includes('trending')) return 'hot'; | ||
| if (u.includes('search')) return 'search'; | ||
| if (u.includes('feed') || u.includes('timeline') || u.includes('dynamic')) return 'feed'; | ||
| if (u.includes('comment') || u.includes('reply')) return 'comments'; | ||
| if (u.includes('history')) return 'history'; | ||
| if (u.includes('profile') || u.includes('userinfo') || u.includes('/me') || u.includes('myinfo')) return 'me'; | ||
| if (u.includes('video') || u.includes('article') || u.includes('detail') || u.includes('view')) return 'detail'; | ||
| if (u.includes('profile') || u.includes('userinfo') || u.includes('/me')) return 'me'; | ||
| if (u.includes('favorite') || u.includes('collect') || u.includes('bookmark')) return 'favorite'; | ||
| if (u.includes('notification') || u.includes('notice')) return 'notifications'; | ||
| // Fallback: try to extract from path | ||
| try { | ||
| const pathname = new URL(endpoint.url).pathname; | ||
| const segments = pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i)); | ||
| if (segments.length) return segments[segments.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase(); | ||
| const segs = new URL(url).pathname.split('/').filter(s => s && !s.match(/^\d+$/) && !s.match(/^[0-9a-f]{8,}$/i)); | ||
| if (segs.length) return segs[segs.length - 1].replace(/[^a-z0-9]/gi, '_').toLowerCase(); | ||
| } catch {} | ||
| return 'data'; | ||
| } | ||
| /** | ||
| * Build recommended columns from response analysis. | ||
| */ | ||
| function buildRecommendedColumns(analysis: ResponseAnalysis | null): string[] { | ||
| if (!analysis) return ['title', 'url']; | ||
| const cols: string[] = []; | ||
| // Prioritize: title → url → author → score → time | ||
| const priority = ['title', 'url', 'author', 'score', 'time']; | ||
| for (const role of priority) { | ||
| if (analysis.detectedFields[role]) cols.push(role); | ||
| } | ||
| return cols.length ? cols : ['title', 'url']; | ||
| function inferStrategy(authIndicators: string[]): string { | ||
| if (authIndicators.includes('signature')) return 'intercept'; | ||
| if (authIndicators.includes('bearer') || authIndicators.includes('csrf')) return 'header'; | ||
| return 'cookie'; | ||
| } | ||
| /** | ||
| * Build recommended args from endpoint query params. | ||
| */ | ||
| function buildRecommendedArgs(endpoint: AnalyzedEndpoint): InferredCapability['recommendedArgs'] { | ||
| const args: InferredCapability['recommendedArgs'] = []; | ||
| // ── Framework detection ──────────────────────────────────────────────────── | ||
| if (endpoint.hasSearchParam) { | ||
| const paramName = endpoint.queryParams.find(p => SEARCH_PARAMS.has(p)) ?? 'keyword'; | ||
| args.push({ name: 'keyword', type: 'str', required: true }); | ||
| const FRAMEWORK_DETECT_JS = ` | ||
| () => { | ||
| const r = {}; | ||
| try { | ||
| const app = document.querySelector('#app'); | ||
| r.vue3 = !!(app && app.__vue_app__); | ||
| r.vue2 = !!(app && app.__vue__); | ||
| r.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]'); | ||
| r.nextjs = !!window.__NEXT_DATA__; | ||
| r.nuxt = !!window.__NUXT__; | ||
| if (r.vue3 && app.__vue_app__) { const gp = app.__vue_app__.config?.globalProperties; r.pinia = !!(gp && gp.$pinia); r.vuex = !!(gp && gp.$store); } | ||
| } catch {} | ||
| return r; | ||
| } | ||
| `; | ||
| // Always add limit | ||
| args.push({ name: 'limit', type: 'int', required: false, default: 20 }); | ||
| // ── Store discovery ──────────────────────────────────────────────────────── | ||
| if (endpoint.hasPaginationParam) { | ||
| args.push({ name: 'page', type: 'int', required: false, default: 1 }); | ||
| } | ||
| const STORE_DISCOVER_JS = ` | ||
| () => { | ||
| const stores = []; | ||
| try { | ||
| const app = document.querySelector('#app'); | ||
| if (!app?.__vue_app__) return stores; | ||
| const gp = app.__vue_app__.config?.globalProperties; | ||
| return args; | ||
| } | ||
| // Pinia stores | ||
| const pinia = gp?.$pinia; | ||
| if (pinia?._s) { | ||
| pinia._s.forEach((store, id) => { | ||
| const actions = []; | ||
| const stateKeys = []; | ||
| for (const k in store) { | ||
| try { | ||
| if (k.startsWith('$') || k.startsWith('_')) continue; | ||
| if (typeof store[k] === 'function') actions.push(k); | ||
| else stateKeys.push(k); | ||
| } catch {} | ||
| } | ||
| stores.push({ type: 'pinia', id, actions: actions.slice(0, 20), stateKeys: stateKeys.slice(0, 15) }); | ||
| }); | ||
| } | ||
| /** | ||
| * Score an endpoint's interest level for capability generation. | ||
| * Higher score = more likely to be a useful API endpoint. | ||
| */ | ||
| function scoreEndpoint(ep: AnalyzedEndpoint): number { | ||
| let score = 0; | ||
| // JSON content type is strongly preferred | ||
| if (ep.contentType.includes('json')) score += 10; | ||
| // Has response analysis with items | ||
| if (ep.responseAnalysis) { | ||
| score += 5; | ||
| score += Math.min(ep.responseAnalysis.itemCount, 10); | ||
| score += Object.keys(ep.responseAnalysis.detectedFields).length * 2; | ||
| // Vuex store modules | ||
| const vuex = gp?.$store; | ||
| if (vuex?._modules?.root?._children) { | ||
| const children = vuex._modules.root._children; | ||
| for (const [modName, mod] of Object.entries(children)) { | ||
| const actions = Object.keys(mod._rawModule?.actions ?? {}).slice(0, 20); | ||
| const stateKeys = Object.keys(mod.state ?? {}).slice(0, 15); | ||
| stores.push({ type: 'vuex', id: modName, actions, stateKeys }); | ||
| } | ||
| } | ||
| } catch {} | ||
| return stores; | ||
| } | ||
| // API-like path patterns | ||
| if (ep.pattern.includes('/api/') || ep.pattern.includes('/x/')) score += 3; | ||
| // Has search/pagination params | ||
| if (ep.hasSearchParam) score += 3; | ||
| if (ep.hasPaginationParam) score += 2; | ||
| if (ep.hasLimitParam) score += 2; | ||
| // 200 OK | ||
| if (ep.status === 200) score += 2; | ||
| return score; | ||
| `; | ||
| export interface DiscoveredStore { | ||
| type: 'pinia' | 'vuex'; | ||
| id: string; | ||
| actions: string[]; | ||
| stateKeys: string[]; | ||
| } | ||
| // ── Framework detection ──────────────────────────────────────────────────── | ||
| const FRAMEWORK_DETECT_JS = ` | ||
| (() => { | ||
| const result = {}; | ||
| try { | ||
| const app = document.querySelector('#app'); | ||
| result.vue3 = !!(app && app.__vue_app__); | ||
| result.vue2 = !!(app && app.__vue__); | ||
| result.react = !!window.__REACT_DEVTOOLS_GLOBAL_HOOK__ || !!document.querySelector('[data-reactroot]'); | ||
| result.nextjs = !!window.__NEXT_DATA__; | ||
| result.nuxt = !!window.__NUXT__; | ||
| if (result.vue3 && app.__vue_app__) { | ||
| const gp = app.__vue_app__.config?.globalProperties; | ||
| result.pinia = !!(gp && gp.$pinia); | ||
| result.vuex = !!(gp && gp.$store); | ||
| } | ||
| } catch {} | ||
| return JSON.stringify(result); | ||
| })() | ||
| `; | ||
| // ── Main explore function ────────────────────────────────────────────────── | ||
| export async function exploreUrl(url: string, opts: any = {}): Promise<any> { | ||
| const site = opts.site ?? detectSiteName(url); | ||
| const outDir = opts.outDir ?? path.join('.opencli', 'explore', site); | ||
| fs.mkdirSync(outDir, { recursive: true }); | ||
| export async function exploreUrl( | ||
| url: string, | ||
| opts: { | ||
| BrowserFactory: new () => any; | ||
| site?: string; goal?: string; authenticated?: boolean; | ||
| outDir?: string; waitSeconds?: number; query?: string; | ||
| clickLabels?: string[]; auto?: boolean; | ||
| }, | ||
| ): Promise<Record<string, any>> { | ||
| const waitSeconds = opts.waitSeconds ?? 3.0; | ||
| const exploreTimeout = Math.max(DEFAULT_BROWSER_EXPLORE_TIMEOUT, 45.0 + waitSeconds * 8.0); | ||
| const result: any = await browserSession(opts.BrowserFactory, async (page: any) => { | ||
| return browserSession(opts.BrowserFactory, async (page) => { | ||
| return runWithTimeout((async () => { | ||
| // Step 1: Navigate | ||
| await page.goto(url); | ||
| await page.wait(opts.waitSeconds ?? 3); | ||
| await page.wait(waitSeconds); | ||
| // Step 2: Auto-scroll to trigger lazy loading | ||
| for (let i = 0; i < 3; i++) { | ||
| await page.scroll('down'); | ||
| await page.wait(1); | ||
| } | ||
| // Step 2: Auto-scroll to trigger lazy loading (use keyboard since page.scroll may not exist) | ||
| for (let i = 0; i < 3; i++) { try { await page.pressKey('End'); } catch {} await page.wait(1); } | ||
| // Step 3: Capture network traffic | ||
| // Step 3: Read page metadata | ||
| const metadata = await readPageMetadata(page); | ||
| // Step 4: Capture network traffic | ||
| const rawNetwork = await page.networkRequests(false); | ||
| const networkEntries = parseNetworkOutput(rawNetwork); | ||
| const networkEntries = parseNetworkRequests(rawNetwork); | ||
| // Step 4: For JSON endpoints, try to fetch response body in-browser | ||
| const jsonEndpoints = networkEntries.filter(e => | ||
| e.contentType.includes('json') && e.method === 'GET' && e.status === 200 | ||
| ); | ||
| // Step 5: For JSON endpoints, re-fetch response body in-browser | ||
| const jsonEndpoints = networkEntries.filter(e => e.contentType.includes('json') && e.method === 'GET' && e.status === 200); | ||
| for (const ep of jsonEndpoints.slice(0, 10)) { | ||
| // Only fetch body for promising-looking API endpoints | ||
| if (ep.url.includes('/api/') || ep.url.includes('/x/') || ep.url.includes('/web/') || | ||
| ep.contentType.includes('json')) { | ||
| try { | ||
| const bodyResult = await page.evaluate(` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(ep.url)}, { credentials: 'include' }); | ||
| if (!resp.ok) return null; | ||
| const data = await resp.json(); | ||
| return JSON.stringify(data).slice(0, 10000); | ||
| } catch { return null; } | ||
| } | ||
| `); | ||
| if (bodyResult && typeof bodyResult === 'string') { | ||
| try { ep.responseBody = JSON.parse(bodyResult); } catch {} | ||
| } else if (bodyResult && typeof bodyResult === 'object') { | ||
| ep.responseBody = bodyResult; | ||
| } | ||
| } catch {} | ||
| } | ||
| try { | ||
| const body = await page.evaluate(`async () => { try { const r = await fetch(${JSON.stringify(ep.url)}, {credentials:'include'}); if (!r.ok) return null; const d = await r.json(); return JSON.stringify(d).slice(0,10000); } catch { return null; } }`); | ||
| if (body && typeof body === 'string') { try { ep.responseBody = JSON.parse(body); } catch {} } | ||
| else if (body && typeof body === 'object') ep.responseBody = body; | ||
| } catch {} | ||
| } | ||
| // Step 5: Detect frontend framework | ||
| // Step 6: Detect framework | ||
| let framework: Record<string, boolean> = {}; | ||
| try { | ||
| const fwResult = await page.evaluate(FRAMEWORK_DETECT_JS); | ||
| if (typeof fwResult === 'string') framework = JSON.parse(fwResult); | ||
| else if (typeof fwResult === 'object') framework = fwResult; | ||
| } catch {} | ||
| try { const fw = await page.evaluate(FRAMEWORK_DETECT_JS); if (fw && typeof fw === 'object') framework = fw; } catch {} | ||
| // Step 6: Get page metadata | ||
| let title = '', finalUrl = ''; | ||
| try { | ||
| const meta = await page.evaluate(` | ||
| () => JSON.stringify({ url: window.location.href, title: document.title || '' }) | ||
| `); | ||
| if (typeof meta === 'string') { | ||
| const parsed = JSON.parse(meta); | ||
| title = parsed.title; finalUrl = parsed.url; | ||
| } else if (typeof meta === 'object') { | ||
| title = meta.title; finalUrl = meta.url; | ||
| } | ||
| } catch {} | ||
| // Step 6.5: Discover stores (Pinia / Vuex) | ||
| let stores: DiscoveredStore[] = []; | ||
| if (framework.pinia || framework.vuex) { | ||
| try { | ||
| const raw = await page.evaluate(STORE_DISCOVER_JS); | ||
| if (Array.isArray(raw)) stores = raw; | ||
| } catch {} | ||
| } | ||
| // Step 7: Analyze endpoints | ||
| let siteHost = ''; | ||
| try { siteHost = new URL(url).hostname; } catch {} | ||
| const analyzedEndpoints = analyzeEndpoints(networkEntries, siteHost); | ||
| const seen = new Map<string, AnalyzedEndpoint>(); | ||
| for (const entry of networkEntries) { | ||
| if (!entry.url) continue; | ||
| const ct = entry.contentType.toLowerCase(); | ||
| if (ct.includes('image/') || ct.includes('font/') || ct.includes('css') || ct.includes('javascript') || ct.includes('wasm')) continue; | ||
| if (entry.status && entry.status >= 400) continue; | ||
| // Step 8: Score and rank endpoints | ||
| const scoredEndpoints = analyzedEndpoints | ||
| .map(ep => ({ ...ep, score: scoreEndpoint(ep) })) | ||
| .filter(ep => ep.score >= 5) | ||
| .sort((a, b) => b.score - a.score); | ||
| const pattern = urlToPattern(entry.url); | ||
| const key = `${entry.method}:${pattern}`; | ||
| if (seen.has(key)) continue; | ||
| // Step 9: Infer capabilities from top endpoints | ||
| const qp: string[] = []; | ||
| try { new URL(entry.url).searchParams.forEach((_v, k) => { if (!VOLATILE_PARAMS.has(k)) qp.push(k); }); } catch {} | ||
| const ep: AnalyzedEndpoint = { | ||
| pattern, method: entry.method, url: entry.url, status: entry.status, contentType: ct, | ||
| queryParams: qp, hasSearchParam: qp.some(p => SEARCH_PARAMS.has(p)), | ||
| hasPaginationParam: qp.some(p => PAGINATION_PARAMS.has(p)), | ||
| hasLimitParam: qp.some(p => LIMIT_PARAMS.has(p)), | ||
| authIndicators: detectAuthIndicators(entry.requestHeaders), | ||
| responseAnalysis: entry.responseBody ? analyzeResponseBody(entry.responseBody) : null, | ||
| score: 0, | ||
| }; | ||
| ep.score = scoreEndpoint(ep); | ||
| seen.set(key, ep); | ||
| } | ||
| const analyzedEndpoints = [...seen.values()].filter(ep => ep.score >= 5).sort((a, b) => b.score - a.score); | ||
| // Step 8: Infer capabilities | ||
| const capabilities: InferredCapability[] = []; | ||
| const usedNames = new Set<string>(); | ||
| for (const ep of scoredEndpoints.slice(0, 8)) { | ||
| let capName = inferCapabilityName(ep, opts.goal); | ||
| // Deduplicate names | ||
| for (const ep of analyzedEndpoints.slice(0, 8)) { | ||
| let capName = inferCapabilityName(ep.url, opts.goal); | ||
| if (usedNames.has(capName)) { | ||
@@ -568,78 +381,87 @@ const suffix = ep.pattern.split('/').filter(s => s && !s.startsWith('{') && !s.includes('.')).pop(); | ||
| const cols: string[] = []; | ||
| if (ep.responseAnalysis) { | ||
| for (const role of ['title', 'url', 'author', 'score', 'time']) { | ||
| if (ep.responseAnalysis.detectedFields[role]) cols.push(role); | ||
| } | ||
| } | ||
| const args: InferredCapability['recommendedArgs'] = []; | ||
| if (ep.hasSearchParam) args.push({ name: 'keyword', type: 'str', required: true }); | ||
| args.push({ name: 'limit', type: 'int', required: false, default: 20 }); | ||
| if (ep.hasPaginationParam) args.push({ name: 'page', type: 'int', required: false, default: 1 }); | ||
| // Link store actions to capabilities when store-action strategy is recommended | ||
| const epStrategy = inferStrategy(ep.authIndicators); | ||
| let storeHint: { store: string; action: string } | undefined; | ||
| if ((epStrategy === 'intercept' || ep.authIndicators.includes('signature')) && stores.length > 0) { | ||
| // Try to find a store/action that matches this endpoint's purpose | ||
| for (const s of stores) { | ||
| const matchingAction = s.actions.find(a => | ||
| capName.split('_').some(part => a.toLowerCase().includes(part)) || | ||
| a.toLowerCase().includes('fetch') || a.toLowerCase().includes('get') | ||
| ); | ||
| if (matchingAction) { | ||
| storeHint = { store: s.id, action: matchingAction }; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| capabilities.push({ | ||
| name: capName, | ||
| description: `${site} ${capName}`, | ||
| strategy: inferStrategy(ep), | ||
| confidence: Math.min(ep.score / 20, 1.0), | ||
| endpoint: ep.pattern, | ||
| name: capName, description: `${opts.site ?? detectSiteName(url)} ${capName}`, | ||
| strategy: storeHint ? 'store-action' : epStrategy, | ||
| confidence: Math.min(ep.score / 20, 1.0), endpoint: ep.pattern, | ||
| itemPath: ep.responseAnalysis?.itemPath ?? null, | ||
| recommendedColumns: buildRecommendedColumns(ep.responseAnalysis), | ||
| recommendedArgs: buildRecommendedArgs(ep), | ||
| recommendedColumns: cols.length ? cols : ['title', 'url'], | ||
| recommendedArgs: args, | ||
| ...(storeHint ? { storeHint } : {}), | ||
| }); | ||
| } | ||
| // Step 10: Determine auth strategy | ||
| const allAuthIndicators = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators)); | ||
| let topStrategy = 'cookie'; | ||
| if (allAuthIndicators.has('signature')) topStrategy = 'intercept'; | ||
| else if (allAuthIndicators.has('transaction') || allAuthIndicators.has('bearer')) topStrategy = 'header'; | ||
| else if (allAuthIndicators.size === 0 && scoredEndpoints.some(ep => ep.contentType.includes('json'))) topStrategy = 'public'; | ||
| // Step 9: Determine overall auth strategy | ||
| const allAuth = new Set(analyzedEndpoints.flatMap(ep => ep.authIndicators)); | ||
| const topStrategy = allAuth.has('signature') ? 'intercept' : allAuth.has('bearer') || allAuth.has('csrf') ? 'header' : allAuth.size === 0 ? 'public' : 'cookie'; | ||
| return { | ||
| site, | ||
| target_url: url, | ||
| final_url: finalUrl, | ||
| title, | ||
| framework, | ||
| top_strategy: topStrategy, | ||
| endpoint_count: analyzedEndpoints.length, | ||
| api_endpoint_count: scoredEndpoints.length, | ||
| capabilities, | ||
| endpoints: scoredEndpoints.map(ep => ({ | ||
| pattern: ep.pattern, | ||
| method: ep.method, | ||
| url: ep.url, | ||
| status: ep.status, | ||
| contentType: ep.contentType, | ||
| score: ep.score, | ||
| queryParams: ep.queryParams, | ||
| itemPath: ep.responseAnalysis?.itemPath ?? null, | ||
| itemCount: ep.responseAnalysis?.itemCount ?? 0, | ||
| detectedFields: ep.responseAnalysis?.detectedFields ?? {}, | ||
| authIndicators: ep.authIndicators, | ||
| })), | ||
| auth_indicators: [...allAuthIndicators], | ||
| const siteName = opts.site ?? detectSiteName(metadata.url || url); | ||
| const targetDir = opts.outDir ?? path.join('.opencli', 'explore', siteName); | ||
| fs.mkdirSync(targetDir, { recursive: true }); | ||
| const result = { | ||
| site: siteName, target_url: url, final_url: metadata.url, title: metadata.title, | ||
| framework, stores, top_strategy: topStrategy, | ||
| endpoint_count: analyzedEndpoints.length + [...seen.values()].filter(ep => ep.score < 5).length, | ||
| api_endpoint_count: analyzedEndpoints.length, | ||
| capabilities, auth_indicators: [...allAuth], | ||
| }; | ||
| })(), { timeout: DEFAULT_BROWSER_EXPLORE_TIMEOUT, label: 'explore' }); | ||
| }); | ||
| // Write artifacts | ||
| const manifest = { | ||
| site: result.site, | ||
| target_url: result.target_url, | ||
| final_url: result.final_url, | ||
| title: result.title, | ||
| framework: result.framework, | ||
| top_strategy: result.top_strategy, | ||
| explored_at: new Date().toISOString(), | ||
| }; | ||
| fs.writeFileSync(path.join(outDir, 'manifest.json'), JSON.stringify(manifest, null, 2)); | ||
| fs.writeFileSync(path.join(outDir, 'endpoints.json'), JSON.stringify(result.endpoints ?? [], null, 2)); | ||
| fs.writeFileSync(path.join(outDir, 'capabilities.json'), JSON.stringify(result.capabilities ?? [], null, 2)); | ||
| fs.writeFileSync(path.join(outDir, 'auth.json'), JSON.stringify({ | ||
| top_strategy: result.top_strategy, | ||
| indicators: result.auth_indicators ?? [], | ||
| framework: result.framework ?? {}, | ||
| }, null, 2)); | ||
| // Write artifacts | ||
| fs.writeFileSync(path.join(targetDir, 'manifest.json'), JSON.stringify({ | ||
| site: siteName, target_url: url, final_url: metadata.url, title: metadata.title, | ||
| framework, stores: stores.map(s => ({ type: s.type, id: s.id, actions: s.actions })), | ||
| top_strategy: topStrategy, explored_at: new Date().toISOString(), | ||
| }, null, 2)); | ||
| fs.writeFileSync(path.join(targetDir, 'endpoints.json'), JSON.stringify(analyzedEndpoints.map(ep => ({ | ||
| pattern: ep.pattern, method: ep.method, url: ep.url, status: ep.status, | ||
| contentType: ep.contentType, score: ep.score, queryParams: ep.queryParams, | ||
| itemPath: ep.responseAnalysis?.itemPath ?? null, itemCount: ep.responseAnalysis?.itemCount ?? 0, | ||
| detectedFields: ep.responseAnalysis?.detectedFields ?? {}, authIndicators: ep.authIndicators, | ||
| })), null, 2)); | ||
| fs.writeFileSync(path.join(targetDir, 'capabilities.json'), JSON.stringify(capabilities, null, 2)); | ||
| fs.writeFileSync(path.join(targetDir, 'auth.json'), JSON.stringify({ | ||
| top_strategy: topStrategy, indicators: [...allAuth], framework, | ||
| }, null, 2)); | ||
| if (stores.length > 0) { | ||
| fs.writeFileSync(path.join(targetDir, 'stores.json'), JSON.stringify(stores, null, 2)); | ||
| } | ||
| return { ...result, out_dir: outDir }; | ||
| return { ...result, out_dir: targetDir }; | ||
| })(), { timeout: exploreTimeout, label: `Explore ${url}` }); | ||
| }); | ||
| } | ||
| export function renderExploreSummary(result: any): string { | ||
| export function renderExploreSummary(result: Record<string, any>): string { | ||
| const lines = [ | ||
| 'opencli explore: OK', | ||
| `Site: ${result.site}`, | ||
| `URL: ${result.target_url}`, | ||
| `Title: ${result.title || '(none)'}`, | ||
| `Strategy: ${result.top_strategy}`, | ||
| 'opencli probe: OK', `Site: ${result.site}`, `URL: ${result.target_url}`, | ||
| `Title: ${result.title || '(none)'}`, `Strategy: ${result.top_strategy}`, | ||
| `Endpoints: ${result.endpoint_count} total, ${result.api_endpoint_count} API`, | ||
@@ -649,3 +471,4 @@ `Capabilities: ${result.capabilities?.length ?? 0}`, | ||
| for (const cap of (result.capabilities ?? []).slice(0, 5)) { | ||
| lines.push(` • ${cap.name} (${cap.strategy}, confidence: ${(cap.confidence * 100).toFixed(0)}%)`); | ||
| const storeInfo = cap.storeHint ? ` → ${cap.storeHint.store}.${cap.storeHint.action}()` : ''; | ||
| lines.push(` • ${cap.name} (${cap.strategy}, ${(cap.confidence * 100).toFixed(0)}%)${storeInfo}`); | ||
| } | ||
@@ -655,4 +478,19 @@ const fw = result.framework ?? {}; | ||
| if (fwNames.length) lines.push(`Framework: ${fwNames.join(', ')}`); | ||
| const stores: DiscoveredStore[] = result.stores ?? []; | ||
| if (stores.length) { | ||
| lines.push(`Stores: ${stores.length}`); | ||
| for (const s of stores.slice(0, 5)) { | ||
| lines.push(` • ${s.type}/${s.id}: ${s.actions.slice(0, 5).join(', ')}${s.actions.length > 5 ? '...' : ''}`); | ||
| } | ||
| } | ||
| lines.push(`Output: ${result.out_dir}`); | ||
| return lines.join('\n'); | ||
| } | ||
| async function readPageMetadata(page: any): Promise<{ url: string; title: string }> { | ||
| try { | ||
| const result = await page.evaluate(`() => ({ url: window.location.href, title: document.title || '' })`); | ||
| if (result && typeof result === 'object') return { url: String(result.url ?? ''), title: String(result.title ?? '') }; | ||
| } catch {} | ||
| return { url: '', title: '' }; | ||
| } |
+14
-0
@@ -55,2 +55,5 @@ #!/usr/bin/env node | ||
| program.command('probe').description('Probe a website: discover APIs, stores, and recommend strategies').argument('<url>').option('--site <name>').option('--goal <text>').option('--wait <s>', '', '3') | ||
| .action(async (url, opts) => { const { exploreUrl, renderExploreSummary } = await import('./explore.js'); console.log(renderExploreSummary(await exploreUrl(url, { BrowserFactory: PlaywrightMCP, site: opts.site, goal: opts.goal, waitSeconds: parseFloat(opts.wait) }))); }); | ||
| program.command('synthesize').description('Synthesize CLIs from explore').argument('<target>').option('--top <n>', '', '3') | ||
@@ -62,2 +65,13 @@ .action(async (target, opts) => { const { synthesizeFromExplore, renderSynthesizeSummary } = await import('./synthesize.js'); console.log(renderSynthesizeSummary(synthesizeFromExplore(target, { top: parseInt(opts.top) }))); }); | ||
| program.command('cascade').description('Strategy cascade: find simplest working strategy').argument('<url>').option('--site <name>') | ||
| .action(async (url, opts) => { | ||
| const { cascadeProbe, renderCascadeResult } = await import('./cascade.js'); | ||
| const result = await browserSession(PlaywrightMCP, async (page) => { | ||
| // Navigate to the site first for cookie context | ||
| try { const siteUrl = new URL(url); await page.goto(`${siteUrl.protocol}//${siteUrl.host}`); await page.wait(2); } catch {} | ||
| return cascadeProbe(page, url); | ||
| }); | ||
| console.log(renderCascadeResult(result)); | ||
| }); | ||
| // ── Dynamic site commands ────────────────────────────────────────────────── | ||
@@ -64,0 +78,0 @@ |
+239
-2
@@ -29,2 +29,7 @@ /** | ||
| data = await executeStep(page, op, params, data, args); | ||
| // Detect error objects returned by steps (e.g. tap store not found) | ||
| if (data && typeof data === 'object' && !Array.isArray(data) && data.error) { | ||
| process.stderr.write(` ${chalk.yellow('⚠')} ${chalk.yellow(op)}: ${data.error}\n`); | ||
| if (data.hint) process.stderr.write(` ${chalk.dim('💡')} ${chalk.dim(data.hint)}\n`); | ||
| } | ||
| if (debug) debugStepResult(op, data); | ||
@@ -147,3 +152,11 @@ } | ||
| const js = String(render(params, { args, data })); | ||
| return page.evaluate(normalizeEvaluateSource(js)); | ||
| let result = await page.evaluate(normalizeEvaluateSource(js)); | ||
| // MCP may return JSON as a string — auto-parse it | ||
| if (typeof result === 'string') { | ||
| const trimmed = result.trim(); | ||
| if ((trimmed.startsWith('[') && trimmed.endsWith(']')) || (trimmed.startsWith('{') && trimmed.endsWith('}'))) { | ||
| try { result = JSON.parse(trimmed); } catch {} | ||
| } | ||
| } | ||
| return result; | ||
| } | ||
@@ -213,3 +226,227 @@ case 'snapshot': { | ||
| } | ||
| case 'intercept': return data; | ||
| case 'intercept': { | ||
| // Declarative XHR interception step | ||
| // Usage: | ||
| // intercept: | ||
| // trigger: "navigate:https://..." | "evaluate:store.note.fetch()" | "click:ref" | ||
| // capture: "api/pattern" # URL substring to match | ||
| // timeout: 5 # seconds to wait for matching request | ||
| // select: "data.items" # optional: extract sub-path from response | ||
| const cfg = typeof params === 'object' ? params : {}; | ||
| const trigger = cfg.trigger ?? ''; | ||
| const capturePattern = cfg.capture ?? ''; | ||
| const timeout = cfg.timeout ?? 8; | ||
| const selectPath = cfg.select ?? null; | ||
| if (!capturePattern) return data; | ||
| // Step 1: Execute the trigger action | ||
| if (trigger.startsWith('navigate:')) { | ||
| const url = render(trigger.slice('navigate:'.length), { args, data }); | ||
| await page.goto(String(url)); | ||
| } else if (trigger.startsWith('evaluate:')) { | ||
| const js = trigger.slice('evaluate:'.length); | ||
| await page.evaluate(normalizeEvaluateSource(render(js, { args, data }) as string)); | ||
| } else if (trigger.startsWith('click:')) { | ||
| const ref = render(trigger.slice('click:'.length), { args, data }); | ||
| await page.click(String(ref).replace(/^@/, '')); | ||
| } else if (trigger === 'scroll') { | ||
| await page.scroll('down'); | ||
| } | ||
| // Step 2: Wait a bit for network requests to fire | ||
| await page.wait(Math.min(timeout, 3)); | ||
| // Step 3: Get network requests and find matching ones | ||
| const rawNetwork = await page.networkRequests(false); | ||
| const matchingResponses: any[] = []; | ||
| if (typeof rawNetwork === 'string') { | ||
| // Parse the network output to find matching URLs | ||
| const lines = rawNetwork.split('\n'); | ||
| for (const line of lines) { | ||
| const match = line.match(/\[?(GET|POST)\]?\s+(\S+)\s*(?:=>|→)\s*\[?(\d+)\]?/i); | ||
| if (match) { | ||
| const [, method, url, status] = match; | ||
| if (url.includes(capturePattern) && status === '200') { | ||
| // Re-fetch the matching URL to get the response body | ||
| try { | ||
| const body = await page.evaluate(` | ||
| async () => { | ||
| try { | ||
| const resp = await fetch(${JSON.stringify(url)}, { credentials: 'include' }); | ||
| if (!resp.ok) return null; | ||
| return await resp.json(); | ||
| } catch { return null; } | ||
| } | ||
| `); | ||
| if (body) matchingResponses.push(body); | ||
| } catch {} | ||
| } | ||
| } | ||
| } | ||
| } | ||
| // Step 4: Select from response if specified | ||
| let result = matchingResponses.length === 1 ? matchingResponses[0] : | ||
| matchingResponses.length > 1 ? matchingResponses : data; | ||
| if (selectPath && result) { | ||
| let current = result; | ||
| for (const part of String(selectPath).split('.')) { | ||
| if (current && typeof current === 'object' && !Array.isArray(current)) { | ||
| current = current[part]; | ||
| } else break; | ||
| } | ||
| result = current ?? result; | ||
| } | ||
| return result; | ||
| } | ||
| case 'tap': { | ||
| // ── Declarative Store Action Bridge ────────────────────────────────── | ||
| // Usage: | ||
| // tap: | ||
| // store: feed # Pinia/Vuex store name | ||
| // action: fetchFeeds # Store action to call | ||
| // args: [] # Optional args to pass to action | ||
| // capture: homefeed # URL pattern to capture response | ||
| // timeout: 5 # Seconds to wait for network (default: 5) | ||
| // select: data.items # Optional: extract sub-path from response | ||
| // framework: pinia # Optional: pinia | vuex (auto-detected if omitted) | ||
| // | ||
| // Generates a self-contained IIFE that: | ||
| // 1. Injects fetch + XHR dual interception proxy | ||
| // 2. Finds the Pinia/Vuex store and calls the action | ||
| // 3. Captures the response matching the URL pattern | ||
| // 4. Auto-cleans up interception in finally block | ||
| // 5. Returns the captured data (optionally sub-selected) | ||
| const cfg = typeof params === 'object' ? params : {}; | ||
| const storeName = String(render(cfg.store ?? '', { args, data })); | ||
| const actionName = String(render(cfg.action ?? '', { args, data })); | ||
| const capturePattern = String(render(cfg.capture ?? '', { args, data })); | ||
| const timeout = cfg.timeout ?? 5; | ||
| const selectPath = cfg.select ? String(render(cfg.select, { args, data })) : null; | ||
| const framework = cfg.framework ?? null; // auto-detect if null | ||
| const actionArgs = cfg.args ?? []; | ||
| if (!storeName || !actionName) throw new Error('tap: store and action are required'); | ||
| // Build select chain for the captured response | ||
| const selectChain = selectPath | ||
| ? selectPath.split('.').map((p: string) => `?.[${JSON.stringify(p)}]`).join('') | ||
| : ''; | ||
| // Serialize action arguments | ||
| const actionArgsRendered = actionArgs.map((a: any) => { | ||
| const rendered = render(a, { args, data }); | ||
| return JSON.stringify(rendered); | ||
| }); | ||
| const actionCall = actionArgsRendered.length | ||
| ? `store[${JSON.stringify(actionName)}](${actionArgsRendered.join(', ')})` | ||
| : `store[${JSON.stringify(actionName)}]()`; | ||
| const js = ` | ||
| async () => { | ||
| // ── 1. Setup capture proxy (fetch + XHR dual interception) ── | ||
| let captured = null; | ||
| const capturePattern = ${JSON.stringify(capturePattern)}; | ||
| // Intercept fetch API | ||
| const origFetch = window.fetch; | ||
| window.fetch = async function(...fetchArgs) { | ||
| const resp = await origFetch.apply(this, fetchArgs); | ||
| try { | ||
| const url = typeof fetchArgs[0] === 'string' ? fetchArgs[0] | ||
| : fetchArgs[0] instanceof Request ? fetchArgs[0].url : String(fetchArgs[0]); | ||
| if (capturePattern && url.includes(capturePattern) && !captured) { | ||
| try { captured = await resp.clone().json(); } catch {} | ||
| } | ||
| } catch {} | ||
| return resp; | ||
| }; | ||
| // Intercept XMLHttpRequest | ||
| const origXhrOpen = XMLHttpRequest.prototype.open; | ||
| const origXhrSend = XMLHttpRequest.prototype.send; | ||
| XMLHttpRequest.prototype.open = function(method, url) { | ||
| this.__tapUrl = String(url); | ||
| return origXhrOpen.apply(this, arguments); | ||
| }; | ||
| XMLHttpRequest.prototype.send = function(body) { | ||
| if (capturePattern && this.__tapUrl?.includes(capturePattern)) { | ||
| const xhr = this; | ||
| const origHandler = xhr.onreadystatechange; | ||
| xhr.onreadystatechange = function() { | ||
| if (xhr.readyState === 4 && !captured) { | ||
| try { captured = JSON.parse(xhr.responseText); } catch {} | ||
| } | ||
| if (origHandler) origHandler.apply(this, arguments); | ||
| }; | ||
| // Also handle onload | ||
| const origOnload = xhr.onload; | ||
| xhr.onload = function() { | ||
| if (!captured) { try { captured = JSON.parse(xhr.responseText); } catch {} } | ||
| if (origOnload) origOnload.apply(this, arguments); | ||
| }; | ||
| } | ||
| return origXhrSend.apply(this, arguments); | ||
| }; | ||
| try { | ||
| // ── 2. Find store ── | ||
| let store = null; | ||
| const storeName = ${JSON.stringify(storeName)}; | ||
| const fw = ${JSON.stringify(framework)}; | ||
| // Auto-detect framework if not specified | ||
| const app = document.querySelector('#app'); | ||
| if (!fw || fw === 'pinia') { | ||
| // Try Pinia (Vue 3) | ||
| try { | ||
| const pinia = app?.__vue_app__?.config?.globalProperties?.$pinia; | ||
| if (pinia?._s) store = pinia._s.get(storeName); | ||
| } catch {} | ||
| } | ||
| if (!store && (!fw || fw === 'vuex')) { | ||
| // Try Vuex (Vue 2/3) | ||
| try { | ||
| const vuexStore = app?.__vue_app__?.config?.globalProperties?.$store | ||
| ?? app?.__vue__?.$store; | ||
| if (vuexStore) { | ||
| // Vuex doesn't have named stores like Pinia, dispatch action | ||
| store = { [${JSON.stringify(actionName)}]: (...a) => vuexStore.dispatch(storeName + '/' + ${JSON.stringify(actionName)}, ...a) }; | ||
| } | ||
| } catch {} | ||
| } | ||
| if (!store) return { error: 'Store not found: ' + storeName, hint: 'Page may not be fully loaded or store name may be incorrect' }; | ||
| if (typeof store[${JSON.stringify(actionName)}] !== 'function') { | ||
| return { error: 'Action not found: ' + ${JSON.stringify(actionName)} + ' on store ' + storeName, | ||
| hint: 'Available: ' + Object.keys(store).filter(k => typeof store[k] === 'function' && !k.startsWith('$') && !k.startsWith('_')).join(', ') }; | ||
| } | ||
| // ── 3. Call store action ── | ||
| await ${actionCall}; | ||
| // ── 4. Wait for network response ── | ||
| const deadline = Date.now() + ${timeout} * 1000; | ||
| while (!captured && Date.now() < deadline) { | ||
| await new Promise(r => setTimeout(r, 200)); | ||
| } | ||
| } finally { | ||
| // ── 5. Always restore originals ── | ||
| window.fetch = origFetch; | ||
| XMLHttpRequest.prototype.open = origXhrOpen; | ||
| XMLHttpRequest.prototype.send = origXhrSend; | ||
| } | ||
| if (!captured) return { error: 'No matching response captured for pattern: ' + capturePattern }; | ||
| return captured${selectChain} ?? captured; | ||
| } | ||
| `; | ||
| return page.evaluate(js); | ||
| } | ||
| default: return data; | ||
@@ -216,0 +453,0 @@ } |
+142
-137
| /** | ||
| * Synthesize: turn explore capabilities into ready-to-use CLI definitions. | ||
| * | ||
| * Takes the structured capabilities from Deep Explore and generates | ||
| * YAML pipeline files that can be directly registered as CLI commands. | ||
| * | ||
| * This is the bridge between discovery (explore) and usability (CLI). | ||
| * Synthesize candidate CLIs from explore artifacts. | ||
| * Generates evaluate-based YAML pipelines (matching hand-written adapter patterns). | ||
| */ | ||
@@ -14,10 +10,14 @@ | ||
| export function synthesizeFromExplore(target: string, opts: any = {}): any { | ||
| const exploreDir = fs.existsSync(target) ? target : path.join('.opencli', 'explore', target); | ||
| if (!fs.existsSync(exploreDir)) throw new Error(`Explore dir not found: ${target}`); | ||
| /** Volatile params to strip from generated URLs */ | ||
| const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']); | ||
| const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']); | ||
| const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']); | ||
| const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']); | ||
| const manifest = JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8')); | ||
| const capabilities = JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8')); | ||
| const endpoints = JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8')); | ||
| const auth = JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8')); | ||
| export function synthesizeFromExplore( | ||
| target: string, | ||
| opts: { outDir?: string; top?: number } = {}, | ||
| ): Record<string, any> { | ||
| const exploreDir = resolveExploreDir(target); | ||
| const bundle = loadExploreBundle(exploreDir); | ||
@@ -27,62 +27,58 @@ const targetDir = opts.outDir ?? path.join(exploreDir, 'candidates'); | ||
| const site = manifest.site; | ||
| const topN = opts.top ?? 5; | ||
| const site = bundle.manifest.site; | ||
| const capabilities = (bundle.capabilities ?? []) | ||
| .sort((a: any, b: any) => (b.confidence ?? 0) - (a.confidence ?? 0)) | ||
| .slice(0, opts.top ?? 3); | ||
| const candidates: any[] = []; | ||
| // Sort capabilities by confidence | ||
| const sortedCaps = [...capabilities] | ||
| .sort((a: any, b: any) => (b.confidence ?? 0) - (a.confidence ?? 0)) | ||
| .slice(0, topN); | ||
| for (const cap of sortedCaps) { | ||
| // Find the matching endpoint for more detail | ||
| const endpoint = endpoints.find((ep: any) => ep.pattern === cap.endpoint) ?? | ||
| endpoints[0]; | ||
| const candidate = buildCandidateYaml(site, manifest, cap, endpoint); | ||
| const fileName = `${cap.name}.yaml`; | ||
| const filePath = path.join(targetDir, fileName); | ||
| for (const cap of capabilities) { | ||
| const endpoint = chooseEndpoint(cap, bundle.endpoints); | ||
| if (!endpoint) continue; | ||
| const candidate = buildCandidateYaml(site, bundle.manifest, cap, endpoint); | ||
| const filePath = path.join(targetDir, `${candidate.name}.yaml`); | ||
| fs.writeFileSync(filePath, yaml.dump(candidate.yaml, { sortKeys: false, lineWidth: 120 })); | ||
| candidates.push({ | ||
| name: cap.name, | ||
| path: filePath, | ||
| strategy: cap.strategy, | ||
| endpoint: cap.endpoint, | ||
| confidence: cap.confidence, | ||
| columns: candidate.yaml.columns, | ||
| }); | ||
| candidates.push({ name: candidate.name, path: filePath, strategy: cap.strategy, confidence: cap.confidence }); | ||
| } | ||
| const index = { | ||
| site, | ||
| target_url: manifest.target_url, | ||
| generated_from: exploreDir, | ||
| candidate_count: candidates.length, | ||
| candidates, | ||
| }; | ||
| const index = { site, target_url: bundle.manifest.target_url, generated_from: exploreDir, candidate_count: candidates.length, candidates }; | ||
| fs.writeFileSync(path.join(targetDir, 'candidates.json'), JSON.stringify(index, null, 2)); | ||
| return { site, explore_dir: exploreDir, out_dir: targetDir, candidate_count: candidates.length, candidates }; | ||
| } | ||
| export function renderSynthesizeSummary(result: Record<string, any>): string { | ||
| const lines = ['opencli synthesize: OK', `Site: ${result.site}`, `Source: ${result.explore_dir}`, `Candidates: ${result.candidate_count}`]; | ||
| for (const c of result.candidates ?? []) lines.push(` • ${c.name} (${c.strategy}, ${((c.confidence ?? 0) * 100).toFixed(0)}% confidence) → ${c.path}`); | ||
| return lines.join('\n'); | ||
| } | ||
| export function resolveExploreDir(target: string): string { | ||
| if (fs.existsSync(target)) return target; | ||
| const candidate = path.join('.opencli', 'explore', target); | ||
| if (fs.existsSync(candidate)) return candidate; | ||
| throw new Error(`Explore directory not found: ${target}`); | ||
| } | ||
| export function loadExploreBundle(exploreDir: string): Record<string, any> { | ||
| return { | ||
| site, | ||
| explore_dir: exploreDir, | ||
| out_dir: targetDir, | ||
| candidate_count: candidates.length, | ||
| candidates, | ||
| manifest: JSON.parse(fs.readFileSync(path.join(exploreDir, 'manifest.json'), 'utf-8')), | ||
| endpoints: JSON.parse(fs.readFileSync(path.join(exploreDir, 'endpoints.json'), 'utf-8')), | ||
| capabilities: JSON.parse(fs.readFileSync(path.join(exploreDir, 'capabilities.json'), 'utf-8')), | ||
| auth: JSON.parse(fs.readFileSync(path.join(exploreDir, 'auth.json'), 'utf-8')), | ||
| }; | ||
| } | ||
| /** Volatile params to strip from generated URLs */ | ||
| const VOLATILE_PARAMS = new Set(['w_rid', 'wts', 'callback', '_', 'timestamp', 't', 'nonce', 'sign']); | ||
| const SEARCH_PARAM_NAMES = new Set(['q', 'query', 'keyword', 'search', 'wd', 'kw', 'w', 'search_query']); | ||
| const LIMIT_PARAM_NAMES = new Set(['ps', 'page_size', 'limit', 'count', 'per_page', 'size', 'num']); | ||
| const PAGE_PARAM_NAMES = new Set(['pn', 'page', 'page_num', 'offset', 'cursor']); | ||
| function chooseEndpoint(cap: any, endpoints: any[]): any | null { | ||
| if (!endpoints.length) return null; | ||
| // Match by endpoint pattern from capability | ||
| if (cap.endpoint) { | ||
| const match = endpoints.find((e: any) => e.pattern === cap.endpoint || e.url?.includes(cap.endpoint)); | ||
| if (match) return match; | ||
| } | ||
| return endpoints.sort((a: any, b: any) => (b.score ?? 0) - (a.score ?? 0))[0]; | ||
| } | ||
| /** | ||
| * Build a clean templated URL from a raw API URL. | ||
| * - Strips volatile params (w_rid, wts, etc.) | ||
| * - Templates search, limit, and pagination params | ||
| * - Builds URL string manually to avoid URL encoding of ${{ }} expressions | ||
| */ | ||
| function buildTemplatedUrl(rawUrl: string, cap: any, endpoint: any): string { | ||
| // ── URL templating ───────────────────────────────────────────────────────── | ||
| function buildTemplatedUrl(rawUrl: string, cap: any, _endpoint: any): string { | ||
| try { | ||
@@ -92,77 +88,99 @@ const u = new URL(rawUrl); | ||
| const params: Array<[string, string]> = []; | ||
| const hasKeyword = cap.recommendedArgs?.some((a: any) => a.name === 'keyword'); | ||
| u.searchParams.forEach((v, k) => { | ||
| // Skip volatile params | ||
| if (VOLATILE_PARAMS.has(k)) return; | ||
| // Template known param types | ||
| if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) { | ||
| params.push([k, '${{ args.keyword }}']); | ||
| } else if (LIMIT_PARAM_NAMES.has(k)) { | ||
| params.push([k, '${{ args.limit | default(20) }}']); | ||
| } else if (PAGE_PARAM_NAMES.has(k)) { | ||
| params.push([k, '${{ args.page | default(1) }}']); | ||
| } else { | ||
| params.push([k, v]); | ||
| } | ||
| if (hasKeyword && SEARCH_PARAM_NAMES.has(k)) params.push([k, '${{ args.keyword }}']); | ||
| else if (LIMIT_PARAM_NAMES.has(k)) params.push([k, '${{ args.limit | default(20) }}']); | ||
| else if (PAGE_PARAM_NAMES.has(k)) params.push([k, '${{ args.page | default(1) }}']); | ||
| else params.push([k, v]); | ||
| }); | ||
| if (params.length === 0) return base; | ||
| return base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&'); | ||
| } catch { | ||
| return rawUrl; | ||
| } | ||
| return params.length ? base + '?' + params.map(([k, v]) => `${k}=${v}`).join('&') : base; | ||
| } catch { return rawUrl; } | ||
| } | ||
| /** | ||
| * Build a YAML pipeline definition from a capability + endpoint. | ||
| * Build inline evaluate script for browser-based fetch+parse. | ||
| * Follows patterns from bilibili/hot.yaml and twitter/trending.yaml. | ||
| */ | ||
| function buildEvaluateScript(url: string, itemPath: string, endpoint: any): string { | ||
| const pathChain = itemPath.split('.').map((p: string) => `?.${p}`).join(''); | ||
| const detectedFields = endpoint?.detectedFields ?? {}; | ||
| const hasFields = Object.keys(detectedFields).length > 0; | ||
| let mapCode = ''; | ||
| if (hasFields) { | ||
| const mappings = Object.entries(detectedFields) | ||
| .map(([role, field]) => ` ${role}: item${String(field).split('.').map(p => `?.${p}`).join('')}`) | ||
| .join(',\n'); | ||
| mapCode = `.map((item) => ({\n${mappings}\n }))`; | ||
| } | ||
| return [ | ||
| '(async () => {', | ||
| ` const res = await fetch('${url}', {`, | ||
| ` credentials: 'include'`, | ||
| ' });', | ||
| ' const data = await res.json();', | ||
| ` return (data${pathChain} || [])${mapCode};`, | ||
| '})()\n', | ||
| ].join('\n'); | ||
| } | ||
| // ── YAML pipeline generation ─────────────────────────────────────────────── | ||
| function buildCandidateYaml(site: string, manifest: any, cap: any, endpoint: any): { name: string; yaml: any } { | ||
| const needsBrowser = cap.strategy !== 'public'; | ||
| const pipeline: any[] = []; | ||
| const templatedUrl = buildTemplatedUrl(endpoint?.url ?? manifest.target_url, cap, endpoint); | ||
| // Step 1: Navigate (if browser-based) | ||
| if (needsBrowser) { | ||
| let domain = ''; | ||
| try { domain = new URL(manifest.target_url).hostname; } catch {} | ||
| if (cap.strategy === 'store-action' && cap.storeHint) { | ||
| // Store Action: navigate + wait + tap (declarative, clean) | ||
| pipeline.push({ navigate: manifest.target_url }); | ||
| pipeline.push({ wait: 3 }); | ||
| const tapStep: Record<string, any> = { | ||
| store: cap.storeHint.store, | ||
| action: cap.storeHint.action, | ||
| timeout: 8, | ||
| }; | ||
| // Infer capture pattern from endpoint URL | ||
| if (endpoint?.url) { | ||
| try { | ||
| const epUrl = new URL(endpoint.url); | ||
| const pathParts = epUrl.pathname.split('/').filter((p: string) => p); | ||
| // Use last meaningful path segment as capture pattern | ||
| const capturePart = pathParts.filter((p: string) => !p.match(/^v\d+$/)).pop(); | ||
| if (capturePart) tapStep.capture = capturePart; | ||
| } catch {} | ||
| } | ||
| if (cap.itemPath) tapStep.select = cap.itemPath; | ||
| pipeline.push({ tap: tapStep }); | ||
| } else if (needsBrowser) { | ||
| // Browser-based: navigate + evaluate (like bilibili/hot.yaml, twitter/trending.yaml) | ||
| pipeline.push({ navigate: manifest.target_url }); | ||
| const itemPath = cap.itemPath ?? 'data.data.list'; | ||
| pipeline.push({ evaluate: buildEvaluateScript(templatedUrl, itemPath, endpoint) }); | ||
| } else { | ||
| // Public API: direct fetch (like hackernews/top.yaml) | ||
| pipeline.push({ fetch: { url: templatedUrl } }); | ||
| if (cap.itemPath) pipeline.push({ select: cap.itemPath }); | ||
| } | ||
| // Step 2: Fetch the API — build a clean URL with templates | ||
| const rawUrl = endpoint?.url ?? manifest.target_url; | ||
| const fetchStep: any = { url: buildTemplatedUrl(rawUrl, cap, endpoint) }; | ||
| pipeline.push({ fetch: fetchStep }); | ||
| // Step 3: Select the item path | ||
| if (cap.itemPath) { | ||
| pipeline.push({ select: cap.itemPath }); | ||
| } | ||
| // Step 4: Map fields to columns | ||
| // Map fields | ||
| const mapStep: Record<string, string> = {}; | ||
| const columns = cap.recommendedColumns ?? ['title', 'url']; | ||
| // Add a rank column if not doing search | ||
| if (!cap.recommendedArgs?.some((a: any) => a.name === 'keyword')) { | ||
| mapStep['rank'] = '${{ index + 1 }}'; | ||
| } | ||
| // Build field mappings from the endpoint's detected fields | ||
| if (!cap.recommendedArgs?.some((a: any) => a.name === 'keyword')) mapStep['rank'] = '${{ index + 1 }}'; | ||
| const detectedFields = endpoint?.detectedFields ?? {}; | ||
| for (const col of columns) { | ||
| const fieldPath = detectedFields[col]; | ||
| if (fieldPath) { | ||
| mapStep[col] = `\${{ item.${fieldPath} }}`; | ||
| } else { | ||
| mapStep[col] = `\${{ item.${col} }}`; | ||
| } | ||
| mapStep[col] = fieldPath ? `\${{ item.${fieldPath} }}` : `\${{ item.${col} }}`; | ||
| } | ||
| pipeline.push({ map: mapStep }); | ||
| // Step 5: Limit | ||
| pipeline.push({ limit: '${{ args.limit | default(20) }}' }); | ||
| // Build args definition | ||
| // Args | ||
| const argsDef: Record<string, any> = {}; | ||
@@ -178,22 +196,10 @@ for (const arg of cap.recommendedArgs ?? []) { | ||
| } | ||
| if (!argsDef['limit']) argsDef['limit'] = { type: 'int', default: 20, description: 'Number of items to return' }; | ||
| // Ensure limit arg always exists | ||
| if (!argsDef['limit']) { | ||
| argsDef['limit'] = { type: 'int', default: 20, description: 'Number of items to return' }; | ||
| } | ||
| const allColumns = Object.keys(mapStep); | ||
| return { | ||
| name: cap.name, | ||
| yaml: { | ||
| site, | ||
| name: cap.name, | ||
| description: `${site} ${cap.name} (auto-generated)`, | ||
| domain: manifest.final_url ? new URL(manifest.final_url).hostname : undefined, | ||
| strategy: cap.strategy, | ||
| browser: needsBrowser, | ||
| args: argsDef, | ||
| pipeline, | ||
| columns: allColumns, | ||
| site, name: cap.name, description: `${cap.description || site + ' ' + cap.name} (auto-generated)`, | ||
| domain, strategy: cap.strategy, browser: needsBrowser, | ||
| args: argsDef, pipeline, columns: Object.keys(mapStep), | ||
| }, | ||
@@ -203,13 +209,12 @@ }; | ||
| export function renderSynthesizeSummary(r: any): string { | ||
| const lines = [ | ||
| 'opencli synthesize: OK', | ||
| `Site: ${r.site}`, | ||
| `Source: ${r.explore_dir}`, | ||
| `Candidates: ${r.candidate_count}`, | ||
| ]; | ||
| for (const c of r.candidates ?? []) { | ||
| lines.push(` • ${c.name} (${c.strategy}, ${(c.confidence * 100).toFixed(0)}% confidence) → ${c.path}`); | ||
| } | ||
| return lines.join('\n'); | ||
| /** Backward-compatible export for scaffold.ts */ | ||
| export function buildCandidate(site: string, targetUrl: string, cap: any, endpoint: any): any { | ||
| // Map old-style field names to new ones | ||
| const normalizedCap = { | ||
| ...cap, | ||
| recommendedArgs: cap.recommendedArgs ?? cap.recommended_args, | ||
| recommendedColumns: cap.recommendedColumns ?? cap.recommended_columns, | ||
| }; | ||
| const manifest = { target_url: targetUrl, final_url: targetUrl }; | ||
| return buildCandidateYaml(site, manifest, normalizedCap, endpoint); | ||
| } |
| export {}; |
| import { cli, Strategy } from '../../registry.js'; | ||
| import { fetchJson } from '../../bilibili.js'; | ||
| cli({ | ||
| site: 'zhihu', | ||
| name: 'search', | ||
| description: '搜索知乎问题和回答', | ||
| domain: 'www.zhihu.com', | ||
| strategy: Strategy.COOKIE, | ||
| args: [ | ||
| { name: 'keyword', required: true, help: 'Search keyword' }, | ||
| { name: 'type', default: 'general', help: 'general, article, video' }, | ||
| { name: 'limit', type: 'int', default: 20, help: 'Number of results' }, | ||
| ], | ||
| columns: ['rank', 'title', 'author', 'type', 'url'], | ||
| func: async (page, kwargs) => { | ||
| const { keyword, type = 'general', limit = 20 } = kwargs; | ||
| // Navigate to zhihu to ensure cookie context | ||
| await page.goto('https://www.zhihu.com'); | ||
| const qs = new URLSearchParams({ q: keyword, type, limit: String(limit) }); | ||
| const payload = await fetchJson(page, `https://www.zhihu.com/api/v4/search_v3?${qs}`); | ||
| const data = payload?.data ?? []; | ||
| const rows = []; | ||
| for (let i = 0; i < Math.min(data.length, Number(limit)); i++) { | ||
| const item = data[i]; | ||
| const obj = item.object ?? item; | ||
| const itemType = item.type ?? obj.type ?? 'unknown'; | ||
| let title = ''; | ||
| let author = ''; | ||
| let url = ''; | ||
| if (itemType === 'search_result') { | ||
| const highlight = obj.highlight ?? {}; | ||
| title = (highlight.title ?? obj.title ?? '').replace(/<[^>]+>/g, ''); | ||
| author = obj.author?.name ?? ''; | ||
| url = obj.url ?? ''; | ||
| } | ||
| else if (obj.question) { | ||
| title = (obj.question.title ?? obj.title ?? '').replace(/<[^>]+>/g, ''); | ||
| author = obj.author?.name ?? ''; | ||
| url = obj.question.url ? `https://www.zhihu.com/question/${obj.question.id}` : ''; | ||
| } | ||
| else { | ||
| title = (obj.title ?? obj.name ?? '').replace(/<[^>]+>/g, ''); | ||
| author = obj.author?.name ?? ''; | ||
| url = obj.url ?? ''; | ||
| } | ||
| if (!title) | ||
| continue; | ||
| rows.push({ | ||
| rank: rows.length + 1, | ||
| title: title.slice(0, 60), | ||
| author, | ||
| type: itemType.replace('search_result', 'result'), | ||
| url, | ||
| }); | ||
| } | ||
| return rows; | ||
| }, | ||
| }); |
| import { cli, Strategy } from '../../registry.js'; | ||
| import { fetchJson } from '../../bilibili.js'; | ||
| cli({ | ||
| site: 'zhihu', | ||
| name: 'search', | ||
| description: '搜索知乎问题和回答', | ||
| domain: 'www.zhihu.com', | ||
| strategy: Strategy.COOKIE, | ||
| args: [ | ||
| { name: 'keyword', required: true, help: 'Search keyword' }, | ||
| { name: 'type', default: 'general', help: 'general, article, video' }, | ||
| { name: 'limit', type: 'int', default: 20, help: 'Number of results' }, | ||
| ], | ||
| columns: ['rank', 'title', 'author', 'type', 'url'], | ||
| func: async (page, kwargs) => { | ||
| const { keyword, type = 'general', limit = 20 } = kwargs; | ||
| // Navigate to zhihu to ensure cookie context | ||
| await page.goto('https://www.zhihu.com'); | ||
| const qs = new URLSearchParams({ q: keyword, type, limit: String(limit) }); | ||
| const payload = await fetchJson(page, `https://www.zhihu.com/api/v4/search_v3?${qs}`); | ||
| const data: any[] = payload?.data ?? []; | ||
| const rows: any[] = []; | ||
| for (let i = 0; i < Math.min(data.length, Number(limit)); i++) { | ||
| const item = data[i]; | ||
| const obj = item.object ?? item; | ||
| const itemType = item.type ?? obj.type ?? 'unknown'; | ||
| let title = ''; | ||
| let author = ''; | ||
| let url = ''; | ||
| if (itemType === 'search_result') { | ||
| const highlight = obj.highlight ?? {}; | ||
| title = (highlight.title ?? obj.title ?? '').replace(/<[^>]+>/g, ''); | ||
| author = obj.author?.name ?? ''; | ||
| url = obj.url ?? ''; | ||
| } else if (obj.question) { | ||
| title = (obj.question.title ?? obj.title ?? '').replace(/<[^>]+>/g, ''); | ||
| author = obj.author?.name ?? ''; | ||
| url = obj.question.url ? `https://www.zhihu.com/question/${obj.question.id}` : ''; | ||
| } else { | ||
| title = (obj.title ?? obj.name ?? '').replace(/<[^>]+>/g, ''); | ||
| author = obj.author?.name ?? ''; | ||
| url = obj.url ?? ''; | ||
| } | ||
| if (!title) continue; | ||
| rows.push({ | ||
| rank: rows.length + 1, | ||
| title: title.slice(0, 60), | ||
| author, | ||
| type: itemType.replace('search_result', 'result'), | ||
| url, | ||
| }); | ||
| } | ||
| return rows; | ||
| }, | ||
| }); |
AI-detected potential code anomaly
Supply chain riskAI has identified unusual behaviors that may pose a security risk.
Found 1 instance in 1 package
Long strings
Supply chain riskContains long string literals, which may be a sign of obfuscated or packed code.
Found 1 instance in 1 package
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
317360
43.24%119
26.6%5738
16.86%146
114.71%28
33.33%