AI大模型学习(四): LangChain(三)

Langchain构建代理

语言模型本身无法执行动作,他们只能输出文本,代理是使用大型语言模型(LLM)作为推理引擎来确定要执行的操作以及这些操作的输入应该是什么,然后这些操作的结果可以反馈到代理中,代理将决定是否需要更多的操作,或者是否可以结束

例如:我们想要查询现在北京的天气,这个天气结果因为是事实变化的,大语言模型他的训练资料是之前的没办法获取现在的天气情况,需要去调用代理去获取结果

安装包

pip install langgraph

使用大模型直接查询天气

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
import oskey = ''
os.environ["OPENAI_API_KEY"] = key
model = ChatOpenAI(model='gpt-4')result = model.invoke([HumanMessage(content='上海今天天气怎么样?')
])
print(result.content)
# 对不起，作为一个AI，我无法提供实时信息，包括天气信息。你可以通过搜索或者使用天气应用来获得这些信息

我们可以看到LLM他是没办法获取到实时的天气情况,这个时候就需要代理

使用tavily配合大模型

配置TAVILY_API_KEY

官网链接:Tavily AI

简单使用tavily

from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
import oskey = ''
os.environ["OPENAI_API_KEY"] = key
os.environ["TAVILY_API_KEY"] = ""
model = ChatOpenAI(model='gpt-4')# Langchain内置了一个工具,可以轻松地使用Taviky搜索引擎作为工具
search = TavilySearchResults(max_results=2)  # max_results只返回两个结果
print(search.invoke('上海今天天气怎么样?'))
"""
[{'url': 'https://tianqi.moji.com/today/china/shanghai/shanghai', 'content': '首页 天气 下载 资讯 关于墨迹 天气 中国 上海市， 上海市， 中国 热门时景 更多 附近地区 更多 附近景点 更多 1  晴 晴 2°日出 06:41 晴 -2°日落 17:35 晴 02/07 *   周六 晴  晴 02/08 *   周日 晴  晴 02/09 *   周一 多云  阴 02/10 *   周二 阴  小雨 02/11 阴 02/12 *   周四 阴  阴 02/13 *   周五 小雨  小雨 02/14 *   周六 小雨  多云 02/15 *   周日 晴  多云 02/16 *   周一 阴  阴 02/17 *   周二 阴  多云 02/18 *   周三 晴  阴 02/19 晴 02/20 *   周五 晴  阴 02/21 阴 02/22 较适宜 狗狗 不适宜 猫咪 不适宜 运动 较适宜 广场舞 较适宜 钓鱼 不适宜 啤酒 较适宜 夜生活 不宜 放风筝 较适宜 旅游 不宜 晨练 较适宜 洗车 较适宜 逛街 pm2.5 pm2.5省份列表 pm2.5城市列表'}, {'url': 'https://weather.cma.cn/web/weather/58367.html', 'content': '主站首页 领导主站 部门概况 新闻资讯 信息公开 服务办事 天气预报 首页 天气实况 气象公报 气象预警 城市预报 天气资讯 气象专题 气象科普 首页 国内 上海 徐家汇 国内  上海  徐家汇  更新 7天天气预报（2025/02/08 08:00发布） 星期六 无持续风向 微风 2℃ -2℃ 无持续风向 微风 星期日 无持续风向 微风 4℃ 0℃ 无持续风向 微风 星期一 多云 无持续风向 微风 9℃ 4℃ 无持续风向 微风 星期二 无持续风向 微风 12℃ 7℃ 小雨 无持续风向 微风 星期三 小雨 东北风 3~4级 10℃ 4℃ 北风 3~4级 星期四 无持续风向 微风 10℃ 2℃ 无持续风向 微风 星期五 小雨 无持续风向 微风 8℃ 4℃ 小雨 东风 3~4级 时间 11:00   14:00   17:00   20:00   23:00   02:00   05:00   08:00 天气                            气温 0.7℃    2℃  1.4℃    -0.2℃   -1.1℃   -1.8℃   -1.9℃   -1.1℃ 降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 风速 3.2m/s  3.3m/s  2.7m/s  2.6m/s  3.1m/s  3.3m/s  3m/s    3.2m/s 南方地区有较大范围雨雪天气 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程'}]
"""

上面的返回结果是查询了两个一个是从url:'https://tianqi.moji.com/today/china/shanghai/shanghai'获取,另一个是'https://weather.cma.cn/web/weather/58367.html' 这是因为我们最大获取的是2个

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
import osfrom langgraph.prebuilt import chat_agent_executorkey = ''
os.environ["OPENAI_API_KEY"] = key
os.environ["TAVILY_API_KEY"] = ""
model = ChatOpenAI(model='gpt-4')# Langchain内置了一个工具,可以轻松地使用Taviky搜索引擎作为工具
search = TavilySearchResults(max_results=2)  # max_results只返回两个结果# 让模型绑定工具
model_and_tool = model.bind_tools([search])
response = model_and_tool.invoke([HumanMessage(content="上海今天天气怎么样?")
])
print(f'model_result:{response.content}')
# 这个为空,证明LLM本身并没有回答答案,所以调用了工具
print(f'model_result:{response.tool_calls}')
# 这个是一个搜索的指令,要代理去调用工具
# model_result:[{'name': 'tavily_search_results_json', 'args': {'query': '上海今天天气'}, 'id': 'call_S4CKoL0UHaMBtHEa5COUgvhc', 'type': 'tool_call'}]

AI+代理

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
import osfrom langgraph.prebuilt import chat_agent_executorkey = ''
os.environ["OPENAI_API_KEY"] = key
os.environ["TAVILY_API_KEY"] = "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"
model = ChatOpenAI(model='gpt-4')# Langchain内置了一个工具,可以轻松地使用Taviky搜索引擎作为工具
search = TavilySearchResults(max_results=1)  # max_results只返回两个结果# 让模型绑定工具
tool = [search]# 创建代理
agent_executor = chat_agent_executor.create_tool_calling_executor(model, tool)# 这里有个坑必须是messages,写message会报错的
response = agent_executor.invoke({'messages': [HumanMessage(content="上海今天天气怎么样?")]}
)
# print(response['messages'])
# 返回是一个特殊的格式
# [HumanMessage(content='上海今天天气怎么样?', additional_kwargs={}, response_metadata={}, id='9670a4fb-06b4-42a1-bf46-a9455cdd72ef'),
# AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_MPM8ATtgU1R3d9X9mF9IZxZ1', 'function': {'arguments': '{\n  "query": "上海今天天气"\n}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 93, 'total_tokens': 119, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-c6635a96-a3f4-46e8-8edd-36183d7da62e-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': '上海今天天气'}, 'id': 'call_MPM8ATtgU1R3d9X9mF9IZxZ1', 'type': 'tool_call'}], usage_metadata={'input_tokens': 93, 'output_tokens': 26, 'total_tokens': 119, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}),
# ToolMessage(content='[{"url": "https://tianqi.moji.com/today/china/shanghai/shanghai", "content": "首页 天气 下载 资讯 关于墨迹 天气 中国 上海市， 上海市， 中国 热门时景 更多 附近地区 更多 附近景点 更多 1  晴 晴 2°日出 06:41 晴 -2°日落 17:35 晴 02/07 *   周六 晴  晴 02/08 *   周日 晴  晴 02/09 *   周一 多云  阴 02/10 *   周二 阴  小雨 02/11 阴 02/12 *   周四 阴  阴 02/13 *   周五 小雨  小雨 02/14 *   周六 小雨  多云 02/15 *   周日 晴  多云 02/16 *   周一 阴  阴 02/17 *   周二 阴  多云 02/18 *   周三 晴  阴 02/19 晴 02/20 *   周五 晴  阴 02/21 阴 02/22 较适宜 狗狗 不适宜 猫咪 不适宜 运动 较适宜 广场舞 较适宜 钓鱼 不适宜 啤酒 较适宜 夜生活 不宜 放风筝 较适宜 旅游 不宜 晨练 较适宜 洗车 较适宜 逛街 pm2.5 pm2.5省份列表 pm2.5城市列表"}, {"url": "https://weather.cma.cn/web/weather/58367.html", "content": "主站首页 领导主站 部门概况 新闻资讯 信息公开 服务办事 天气预报 首页 天气实况 气象公报 气象预警 城市预报 天气资讯 气象专题 气象科普 首页 国内 上海 徐家汇 国内  上海  徐家汇  更新 7天天气预报（2025/02/08 08:00发布） 星期六 无持续风向 微风 2℃ -2℃ 无持续风向 微风 星期日 无持续风向 微风 4℃ 0℃ 无持续风向 微风 星期一 多云 无持续风向 微风 9℃ 4℃ 无持续风向 微风 星期二 无持续风向 微风 12℃ 7℃ 小雨 无持续风向 微风 星期三 小雨 东北风 3~4级 10℃ 4℃ 北风 3~4级 星期四 无持续风向 微风 10℃ 2℃ 无持续风向 微风 星期五 小雨 无持续风向 微风 8℃ 4℃ 小雨 东风 3~4级 时间 11:00   14:00   17:00   20:00   23:00   02:00   05:00   08:00 天气                            气温 0.7℃    2℃  1.4℃    -0.2℃   -1.1℃   -1.8℃   -1.9℃   -1.1℃ 降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 风速 3.2m/s  3.3m/s  2.7m/s  2.6m/s  3.1m/s  3.3m/s  3m/s    3.2m/s 南方地区有较大范围雨雪天气 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程"}]', name='tavily_search_results_json', id='99e42bba-8415-468a-8565-3414548235af', tool_call_id='call_MPM8ATtgU1R3d9X9mF9IZxZ1', artifact={'query': '上海今天天气', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://tianqi.moji.com/today/china/shanghai/shanghai', 'title': '今天上海市天气_今日天气预报- 墨迹天气', 'content': '首页 天气 下载 资讯 关于墨迹 天气 中国 上海市， 上海市， 中国 热门时景 更多 附近地区 更多 附近景点 更多 1  晴 晴 2°日出 06:41 晴 -2°日落 17:35 晴 02/07 *   周六 晴  晴 02/08 *   周日 晴  晴 02/09 *   周一 多云  阴 02/10 *   周二 阴  小雨 02/11 阴 02/12 *   周四 阴  阴 02/13 *   周五 小雨  小雨 02/14 *   周六 小雨  多云 02/15 *   周日 晴  多云 02/16 *   周一 阴  阴 02/17 *   周二 阴  多云 02/18 *   周三 晴  阴 02/19 晴 02/20 *   周五 晴  阴 02/21 阴 02/22 较适宜 狗狗 不适宜 猫咪 不适宜 运动 较适宜 广场舞 较适宜 钓鱼 不适宜 啤酒 较适宜 夜生活 不宜 放风筝 较适宜 旅游 不宜 晨练 较适宜 洗车 较适宜 逛街 pm2.5 pm2.5省份列表 pm2.5城市列表', 'score': 0.8315952, 'raw_content': None}, {'url': 'https://weather.cma.cn/web/weather/58367.html', 'title': '中国气象局-天气预报-城市预报- 上海', 'content': '主站首页 领导主站 部门概况 新闻资讯 信息公开 服务办事 天气预报 首页 天气实况 气象公报 气象预警 城市预报 天气资讯 气象专题 气象科普 首页 国内 上海 徐家汇 国内  上海  徐家汇  更新 7天天气预报（2025/02/08 08:00发布） 星期六 无持续风向 微风 2℃ -2℃ 无持续风向 微风 星期日 无持续风向 微风 4℃ 0℃ 无持续风向 微风 星期一 多云 无持续风向 微风 9℃ 4℃ 无持续风向 微风 星期二 无持续风向 微风 12℃ 7℃ 小雨 无持续风向 微风 星期三 小雨 东北风 3~4级 10℃ 4℃ 北风 3~4级 星期四 无持续风向 微风 10℃ 2℃ 无持续风向 微风 星期五 小雨 无持续风向 微风 8℃ 4℃ 小雨 东风 3~4级 时间 11:00   14:00   17:00   20:00   23:00   02:00   05:00   08:00 天气                            气温 0.7℃    2℃  1.4℃    -0.2℃   -1.1℃   -1.8℃   -1.9℃   -1.1℃ 降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 风速 3.2m/s  3.3m/s  2.7m/s  2.6m/s  3.1m/s  3.3m/s  3m/s    3.2m/s 南方地区有较大范围雨雪天气 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程', 'score': 0.7539928, 'raw_content': None}], 'response_time': 1.09}), AIMessage(content='根据搜索结果，上海的天气状况如下：今天是晴天，温度介于-2℃~2℃之间。下个星期的天气预报，周一多云，周二至周五有小雨。同时，近期空气质量良好，适宜出行。[天气信息来源](https://tianqi.moji.com/today/china/shanghai/shanghai)', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 116, 'prompt_tokens': 1358, 'total_tokens': 1474, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-16c1ee88-b95b-4030-9a76-45d22dcc8a48-0', usage_metadata={'input_tokens': 1358, 'output_tokens': 116, 'total_tokens': 1474, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]
"""
HumanMessage:用户输入
AIMessage:大模型返回的消息
ToolMessage:Agent返回的消息
"""# 我们就可以这样去判断,如果有Ai返回则用Ai没有则用工具
def get_message(result):if result[1].content:return result[1].contentreturn result[2].contentprint(get_message(response['messages']))
"""
[{"url": "https://weather.cma.cn/web/weather/58367.html", "content": "主站首页 领导主站 部门概况 新闻资讯 信息公开 服务办事 天气预报 首页 天气实况 气象公报 气象预警 城市预报 天气资讯 气象专题 气象科普 首页 国内 上海 徐家汇 国内  上海  徐家汇  更新 7天天气预报（2025/02/08 08:00发布） 星期六 无持续风向 微风 2℃ -2℃ 无持续风向 微风 星期日 无持续风向 微风 4℃ 0℃ 无持续风向 微风 星期一 多云 无持续风向 微风 9℃ 4℃ 无持续风向 微风 星期二 无持续风向 微风 12℃ 7℃ 小雨 无持续风向 微风 星期三 小雨 东北风 3~4级 10℃ 4℃ 北风 3~4级 星期四 无持续风向 微风 10℃ 2℃ 无持续风向 微风 星期五 小雨 无持续风向 微风 8℃ 4℃ 小雨 东风 3~4级 时间 11:00   14:00   17:00   20:00   23:00   02:00   05:00   08:00 天气                            气温 0.7℃    2℃  1.4℃    -0.2℃   -1.1℃   -1.8℃   -1.9℃   -1.1℃ 降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 无降水 风速 3.2m/s  3.3m/s  2.7m/s  2.6m/s  3.1m/s  3.3m/s  3m/s    3.2m/s 南方地区有较大范围雨雪天气 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程 琼州海峡等海域有大雾 中东部地区有较大范围雨雪过程"}]
"""

Langchain构建RAG的对话应用

案例:复杂的问答(Q&A)聊天机器人,应用程序可以回答有关特定源信息的问题,使用一种称为检索增强生成(RAG)的技术

RAG是一种增强LLM知识的方法,它通过引入额外的数据来实现

安装包

pip install langgraph

实现思路:

1.加载:首先我们需要加载数据,这是通过DocumentLoaders完成的

2.分割:Text splitters 将大型文档分割成更小的快,这对于索引数据和将其传递给模型很有用,因为大数据块更难搜索,并且不适合模型有限上下文窗口

3.存储:我们需要一个地方来存储和索引我们的分割,以便以后可以搜索,这通常使用VectorStore和Embeddings模型完成

4.检索:给定用户输入,使用检索器从存储中检索相关分割

5.生成:ChatModel/LLM使用包括问题和检索到的数据的提示生成答案

简单不包含历史记录的实现

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
import bs4key = ''
os.environ["OPENAI_API_KEY"] = key
model = ChatOpenAI(model='gpt-4')# 1.加载数据:一篇博客内容数据
# web_paths 爬取内容的url地址,可以一次性写多个
loader = WebBaseLoader(web_paths=['http://lilianweng.github.io//posts/2023-06-23-agent/'],bs_kwargs=dict(parse_only=bs4.SoupStrainer(class_=('post-header', 'post-title', 'post-content'))  # class_这个是html页面的类选择器)
)
desc = loader.load()  # 将博客内容解析出来
# 2.大文本切割
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)# 切割documents对象,切文本用split_text,传入字符串
res = splitter.split_documents(desc)# 3.存储
vectorstore = Chroma.from_documents(documents=res, embedding=OpenAIEmbeddings())# 4.检索器
retriever = vectorstore.as_retriever()# 5.整合# 6.创建一个问题的模版system_prompt = """
You are an assistant for question-answering tasks.Use the following pieces of 
retrieved context to answer the question.If you don't know how to answer say that you don't know
Use the sentences maximum and keep the answer concise.\n{context}
"""prompt = ChatPromptTemplate.from_messages([('system', system_prompt),# MessagesPlaceholder("chat_history"),  # 提问的历史记录('human', '{input}')
])# 7.得到chain
# 提问的chain
chain_one = create_stuff_documents_chain(model, prompt)
# 检索器+问答
chain_two = create_retrieval_chain(retriever, chain_one)response = chain_two.invoke({'input': 'What is Task Decomposition'})
# print(response)
"""
{'input': 'What is Task Decomposition', 
'context': [Document(id='c85ca1e7-7370-41dd-bab1-53bfe4cc7d42', metadata={'source': 'http://lilianweng.github.io//posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(id='7d97c299-2160-45da-8936-9466bb055e10', metadata={'source': 'http://lilianweng.github.io//posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(id='a536873c-7abd-41e7-9b5f-53dc1ebe4198', metadata={'source': 'http://lilianweng.github.io//posts/2023-06-23-agent/'}, page_content='The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_ids, "args": {"text": text, "image": URL, "audio": URL, "video": URL}}]. The "dep" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag "-task_id" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.'), Document(id='424ff333-58f5-4d6f-9c20-97f8ced11458', metadata={'source': 'http://lilianweng.github.io//posts/2023-06-23-agent/'}, page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:')], 
'answer': 'Task Decomposition is a process where hard tasks are broken down into smaller and simpler steps. Techniques like Chain of Thought (CoT) and Tree of Thoughts play an important role in this respect. CoT prompts the model to think step by step, thereby transforming large tasks into multiple manageable ones. On the other hand, Tree of Thoughts expands on CoT and explores multiple reasoning paths at each step, creating a tree-like structure through which search processes like breadth-first search or depth-first search can be conducted. Decomposition can be achieved using language learning models with simple prompting, utilizing task-specific instructions, or with human inputs. The AI assistant is capable of parsing user input into several tasks, each with its own id, dependencies, and arguments.'}
"""
# 我们要的结果是print(response['answer'])

一般情况下,我们构建的链直接使用输入问题答案记录来关联上下文,但在此案例中,查询检索器也需要对话上下文才能被理解

解决办法:

添加一个子链,它采用最新用户问题和聊天历史,并在它引用历史信息中任何信息时重新表述问题,这可以被简单地认为是构建一个新的历史感知的检索器

这个子链的目的:让检索过程融入了对话的上下文

完整的RAG对话

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.retrieval import create_retrieval_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.chat_message_histories import ChatMessageHistory
import os
import bs4key = ''
os.environ["OPENAI_API_KEY"] = key
model = ChatOpenAI(model='gpt-4')# 1.加载数据:一篇博客内容数据
# web_paths 爬取内容的url地址,可以一次性写多个
loader = WebBaseLoader(web_paths=['http://lilianweng.github.io//posts/2023-06-23-agent/'],bs_kwargs=dict(parse_only=bs4.SoupStrainer(class_=('post-header', 'post-title', 'post-content'))  # class_这个是html页面的类选择器)
)
desc = loader.load()  # 将博客内容解析出来
# 2.大文本切割
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)# 切割documents对象,切文本用split_text,传入字符串
res = splitter.split_documents(desc)# 3.存储
vectorstore = Chroma.from_documents(documents=res, embedding=OpenAIEmbeddings())# 4.检索器
retriever = vectorstore.as_retriever()# 5.整合# 6.创建一个问题的模版system_prompt = """
You are an assistant for question-answering tasks.Use the following pieces of 
retrieved context to answer the question.If you don't know how to answer say that you don't know
Use the sentences maximum and keep the answer concise.\n{context}
"""prompt = ChatPromptTemplate.from_messages([('system', system_prompt),MessagesPlaceholder("chat_history"),  # 提问的历史记录('human', '{input}')
])# 7.得到chain
# 提问的chain
chain_one = create_stuff_documents_chain(model, prompt)# 创建一个子链
# 1.子链的提示模版
contextualize_q_system_prompt = """Given a chat history and the latest user question
which might reference context in the chat history,
formulate a standalone question which can be understood
without the chat history.Do not answer the question,
just reformulate it if needed and otherwise return it as is. 
"""
retriever_history_temp = ChatPromptTemplate.from_messages([('system', contextualize_q_system_prompt),MessagesPlaceholder("chat_history"),('human', '{input}')
])
# 2.创建子链
history_chain = create_history_aware_retriever(model, retriever, retriever_history_temp)# 保存问答的历史记录
store = {}def get_session_history(session_id):if session_id not in store:store[session_id] = ChatMessageHistory()return store[session_id]# 创建一个父链:把前两个链整合
chain = create_retrieval_chain(history_chain, chain_one)result_chain = RunnableWithMessageHistory(chain,get_session_history,input_messages_key='input',history_messages_key='chat_history',output_messages_key='answer'
)# 第一轮
res_1 = result_chain.invoke({'input': 'What is Task Decomposition?'},config={'configurable': {'session_id': '123456'}}
)
print(res_1['answer'])# 第二轮
res_2 = result_chain.invoke({'input': 'What are common ways of doing it?'},config={'configurable': {'session_id': '123456'}}
)
print(res_2['answer'])# 第三轮,问一个跟文档没有关联的
res_3 = result_chain.invoke({'input': 'li si?'},config={'configurable': {'session_id': '123456'}}
)
print(res_3['answer'])
# 不想关的内容AI无法回答出来
# I'm sorry, but the question or phrase "li si" is unclear. Could you provide more context or clarify what you're asking?

关于模版定义中最开始的user参数跟后面使用的human参数,专门查询了一下,因为是不同的,也好奇两者是否有区别,大模型解释如下

在使用类似LangChain这样的库构建聊天提示模板（ChatPromptTemplate）时，user 和 human 都是用来标识消息来源是用户的占位符，但在具体实现或库的版本中可能有不同的含义或用途。然而，在你提供的两个代码片段中，它们实际上代表了相似的概念，即表示用户输入的部分。

在第一个代码片段中，('user', '{text}') 表示这条消息是由用户发送的，并且将使用{text}作为该消息的实际内容。
在第二个代码片段中，('human', '{input}') 也是指由用户发出的消息，这里使用{input}来表示用户的具体输入。

通常情况下，这两个术语（user和human）可以互换使用，特别是在不同的库或者不同的上下文中。但是在特定的库或框架中，可能会有一个首选的术语。例如，在某些框架中，可能统一使用user来表示来自用户的输入，而其他一些框架可能偏好human。如果你正在使用的库特别区分这两个术语，那么你应该参照该库的文档以了解它们之间的具体差异。

因此，在你给出的例子中，如果没有特定的库文档说明二者之间的区别，我们可以假设它们在这里的意义相同，都是为了标记出这是用户提供的输入部分。实际使用时，应该根据所用工具或库的具体指南进行选择。如果存在疑问，查阅相关库的最新文档是最好的解决办法。