参考:
GitHub - mayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs
1.摘要:
使用新的GPT-4 api为多个大型PDF文件构建chatGPT聊天机器人。
使用的技术栈包括LangChain, Pinecone, Typescript, Openai和Next.js。LangChain是一个框架,可以更容易地构建可扩展的AI/LLM大语言模型应用程序和聊天机器人。Pinecone是一个矢量存储,用于存储嵌入和文本格式的PDF,以便以后检索类似的文档。
2.准备工作:
OpenAI API Key GPT-3.5或者GPT-4 openai
Pinecone API Key/Environment/Index pinecone
Pinecone Starter(免费)计划用户的Index在7天后被删除。为了防止这种情况,在7天之前向Pinecone发送API请求重置计数器。就可以继续免费使用了。
3.克隆或下载项目gpt4-pdf-chatbot-langchain
git clone https://github.com/mayooear/gpt4-pdf-chatbot-langchain.git
4.安装依赖包
使用npm安装yarn,如果没有npm,参考安装
npm/Node.js介绍及快速安装 - Linux CentOS_Entropy-Go的博客-CSDN博客
npm install yarn -g
再使用yarn安装依赖包
进入项目根目录,执行命令
yarn install
安装成功后,可以看到 node_modules 目录
gpt4-pdf-chatbot-langchain-main$ ls -a
. declarations .eslintrc.json node_modules .prettierrc styles utils yarn.lock
.. docs .gitignore package.json public tailwind.config.cjs venv
components .env .idea pages README.md tsconfig.json visual-guide
config .env.example next.config.js postcss.config.cjs scripts types yarn-error.log
5.环境配置
将.env.example复制成.env配置文件
OPENAI_API_KEY=sk-xxx# Update these with your pinecone details from your dashboard.
# PINECONE_INDEX_NAME is in the indexes tab under "index name" in blue
# PINECONE_ENVIRONMENT is in indexes tab under "Environment". Example: "us-east1-gcp"
PINECONE_API_KEY=xxx
PINECONE_ENVIRONMENT=us-west1-gcp-free
PINECONE_INDEX_NAME=xxx
config/pinecone.ts修改
在config文件夹中,将PINECONE_NAME_SPACE替换为一个namespace,当你运行npm run ingest时,你想在这个namespace中存储嵌入到PINECONE_NAME_SPACE。这个namespace稍后将用于查询和检索。
修改聊天机器人的提示词和OpenAI模型
在utils/makechain.ts中为您自己的用例更改QA_PROMPT。
如果您可以访问gpt-4 api,请将新OpenAI中的modelName更改为gpt-4。请在此repo之外验证您是否可以访问gpt-4 api,否则应用程序将无法工作。
import { OpenAI } from 'langchain/llms/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { ConversationalRetrievalQAChain } from 'langchain/chains';const CONDENSE_PROMPT = `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:`;const QA_PROMPT = `You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.{context}Question: {question}
Helpful answer in markdown:`;export const makeChain = (vectorstore: PineconeStore) => {const model = new OpenAI({temperature: 0, // increase temepreature to get more creative answersmodelName: 'gpt-3.5-turbo', //change this to gpt-4 if you have access});const chain = ConversationalRetrievalQAChain.fromLLM(model,vectorstore.asRetriever(),{qaTemplate: QA_PROMPT,questionGeneratorTemplate: CONDENSE_PROMPT,returnSourceDocuments: true, //The number of source documents returned is 4 by default},);return chain;
};
6.添加PDF文档为知识库
因为会和OpenAI和Pinecone有数据交互,建议上传文档之前,慎重考虑数据隐私和安全。
将1个或多个PDF文档上传到 docs 目录下
执行上传命令
npm run ingest
在Pinecone上检查是否上传成功
7.运行知识库聊天机器人
当你验证了嵌入和内容已经成功地添加到你的Pinecone中,你可以运行应用程序npm run dev来启动本地开发环境,然后在聊天界面中输入一个问题,进行对话。
执行命令:
npm run dev
8.常见问题Troubleshooting
https://github.com/mayooear/gpt4-pdf-chatbot-langchain#troubleshooting
In general, keep an eye out in the issues
and discussions
section of this repo for solutions.
General errors
- Make sure you're running the latest Node version. Run
node -v
- Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text.
Console.log
theenv
variables and make sure they are exposed.- Make sure you're using the same versions of LangChain and Pinecone as this repo.
- Check that you've created an
.env
file that contains your valid (and working) API keys, environment and index name. - If you change
modelName
inOpenAI
, make sure you have access to the api for the appropriate model. - Make sure you have enough OpenAI credits and a valid card on your billings account.
- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local
env
file from the project will be overwritten by systemsenv
variable. - Try to hard code your API keys into the
process.env
variables if there are still issues.
Pinecone errors
- Make sure your pinecone dashboard
environment
andindex
matches the one in thepinecone.ts
and.env
files. - Check that you've set the vector dimensions to
1536
. - Make sure your pinecone namespace is in lowercase.
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days.
- Retry from scratch with a new Pinecone project, index, and cloned repo.