Spider

Spider 是最快的爬虫。它可以将任何网站转换为纯 HTML、markdown、元数据或文本，同时使您能够使用 AI 进行自定义操作爬取。

概述

Spider 允许您使用高性能代理来防止检测，缓存 AI 操作，用于爬取状态的网络钩子，计划爬取等...

本指南展示了如何使用 Spider 爬取/抓取网站，并使用 LanghChain 中的 SpiderLoader 加载 LLM 就绪的文档。

设置

在 spider.cloud 上获取您自己的 Spider API 密钥。

使用方法

以下是如何使用 SpiderLoader 的示例

Spider 提供两种抓取模式 scrape 和 crawl。Scrape 仅获取提供的 URL 的内容，而 crawl 获取提供的 URL 的内容并爬取更深层的子页面。

npm
Yarn
pnpm

npm install @langchain/community @langchain/core @spider-cloud/spider-client

yarn add @langchain/community @langchain/core @spider-cloud/spider-client

pnpm add @langchain/community @langchain/core @spider-cloud/spider-client

import { SpiderLoader } from "@langchain/community/document_loaders/web/spider";

const loader = new SpiderLoader({
  url: "https://spider.cloud", // The URL to scrape
  apiKey: process.env.SPIDER_API_KEY, // Optional, defaults to `SPIDER_API_KEY` in your env.
  mode: "scrape", // The mode to run the crawler in. Can be "scrape" for single urls or "crawl" for deeper scraping following subpages
  // params: {
  //   // optional parameters based on Spider API docs
  //   // For API documentation, visit https://spider.cloud/docs/api
  // },
});

const docs = await loader.load();

API 参考

SpiderLoader 来自 @langchain/community/document_loaders/web/spider

其他参数

请参阅 Spider 文档以获取所有可用的 params。

Spider

概述

设置

使用方法

API 参考

其他参数

此页是否对您有帮助？

您也可以留下详细的反馈在 GitHub 上.

Spider

概述​

设置​

使用方法​

API 参考

其他参数​

此页是否对您有帮助？

您也可以留下详细的反馈 在 GitHub 上.

概述

设置

使用方法

其他参数

您也可以留下详细的反馈在 GitHub 上.