【吴恩达deeplearning.ai】基于LangChain开发大语言应用模型（上）

以下内容均整理来自deeplearning.ai的同名课程

Location 课程访问地址

DLAI - Learning Platform Beta (deeplearning.ai)

一、什么是LangChain

1、LangChain介绍

LangChain是一个框架，用于开发由大语言模型驱动的应用程序。开发者相信，最强大的、差异化的应用不仅会调用语言模型，而且还会具备以下原则：
数据感知：将语言模型与其他数据源连接起来。
代理性：允许语言模型与环境互动

LangChain支持python和javascript两种语言。专注于组合和模块化。
官方文档：https://python.langchain.com/en/latest/
中文文档：https://www.langchain.com.cn/

2、LangChain的模块化能力

包括大量的整合对话模型、聊天模型；提示词模板，输出分析器，示例选择器。

支持检索和调用其他数据源，包括不限于文本、数组，支持多个数据检索工具。

支持搭建对话链模板，按输入信息，自动生成标准化加工后的输出结果。

可调用多个预设或者自定义的算法和小工具。

二、模型、提示词和输出解析器Models, Prompts and Output Parsers

1、Prompt template提示词模板

通常来说，我们通过以下方式调用gpt

def get_completion(prompt, model="gpt-3.5-turbo"):messages = [{"role": "user", "content": prompt}]response = openai.ChatCompletion.create(model=model,messages=messages,temperature=0, )return response.choices[0].message["content"]
# 创建一个调用函数prompt = f"""Translate the text \
that is delimited by triple backticks 
into a style that is {style}.
text: ```{customer_email}```
"""
# 编写提示语response = get_completion(prompt)
#调用生成结果

现在看下langchain怎么基于模型进行调用

from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI(temperature=0.0)
# 加载langchain对话模型，并设置对话随机性为0template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""
# 设计模板信息from langchain.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_template(template_string)
# 加载提示语模板，载入模板信息customer_style = """American English \
in a calm and respectful tone
"""
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""
# 定义模板中可变字段的变量信息customer_messages = prompt_template.format_messages(style=customer_style,text=customer_email)
# 调用模板，对模板中的变量进行赋值，并生成最终提示语customer_response = chat(customer_messages)
# 调用提示语，生成对话结果

通过“创建包含变量信息的提示词模板”，可以按照需求场景，灵活的通过改变变量信息，生成新的提示词。实现了模板的复用。

2、Output Parsers输出解析器

将大语言模型生成的结果，转换为特定结构的输出，如字典，数组等

from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
# 加载输出解析器gift_schema = ResponseSchema(name="gift",description="Was the item purchased\as a gift for someone else? \Answer True if yes,\False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",description="How many days\did it take for the product\to arrive? If this \information is not found,\output -1.")
price_value_schema = ResponseSchema(name="price_value",description="Extract any\sentences about the value or \price, and output them as a \comma separated Python list.")response_schemas = [gift_schema, delivery_days_schema,price_value_schema]
# 创建一组解析规则output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()
#编译解析规则review_template_2 = """\
For the following text, extract the following information:gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.text: {text}{format_instructions}
"""
# 创建一个提示词模板，将编译好的解析规则添加到模板中prompt = ChatPromptTemplate.from_template(template=review_template_2)
messages = prompt.format_messages(text=customer_review, format_instructions=format_instructions)
# 通过模板生成提示词信息response = chat(messages)
# 生成结果
output_dict = output_parser.parse(response.content)
# 将生成结果存入字典中

三、Memory内存组件

大语言模型在通过接口调用过程中，并不会自动记忆历史问答/上下文（来进行回答）。而通过调用memory组件。langchain提供了多种记忆历史问答/上下文的方式。

Outline概要

ConversationBufferMemory
ConversationBufferWindowMemory
ConversationTokenBufferMemory
ConversationSummaryMemory

ConversationBufferMemory对话内存

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
# 加载所需包llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory = memory,verbose=True
)
# 船创建一个对话，创建一个上下文储存区，创建一个链式沟通会话。conversation.predict(input="Hi, my name is Andrew")
conversation.predict(input="What is 1+1?")
conversation.predict(input="What is my name?")
#在会话中添加会话内容，程序会自动将提问和回答一起保存到上下文储存区print(memory.buffer)
memory.load_memory_variables({})
#显示上下文储存区内保存的会话内容memory.save_context({"input": "Hi"}, {"output": "What's up"})
#直接对上下文储存区内的会话内容进行赋值（赋值内容为问答对）

ConversationBufferWindowMemory有限对话内存

from langchain.memory import ConversationBufferWindowMemory
# 加载组件memory = ConversationBufferWindowMemory(k=1)
# 添加一个只有1空间的记忆内存memory.save_context({"input": "Hi"},{"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},{"output": "Cool"})
# 此时，上下文储存区里面，只有第二个对话的记忆，即在1空间情况下，程序只会记忆最新的1空间的问答记忆。

ConversationTokenBufferMemory有限词汇内存

from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0)
# 加载组件memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=30)
# 创建一个只有30词汇大小的记忆空间（因为有限空间的判断也会用到大预言模型，所以需要加载llm）memory.save_context({"input": "AI is what?!"},{"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},{"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, {"output": "Charming!"})
# 在这种情况下，程序只会保存不大于30个词汇的最新的问答，此时并不会强行保证问答都存在，仅包含答案也行。memory.load_memory_variables({})
# 显示结果：{'history': 'AI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

ConversationSummaryMemory总结式记忆内存

from langchain.memory import ConversationSummaryBufferMemory
# 加载包schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."
# 一个长内容memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
# 创建一个最大词汇量为100的上下文总结式记忆空间（需要大预言模型进行总结，所以加载模型）memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},{"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, {"output": f"{schedule}"})
# 添加对话memory.load_memory_variables({})
# 显示结果为总结后的内容，通过总结将记忆内容缩短到100个词汇以内：{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments."}conversation = ConversationChain(llm=llm, memory = memory,verbose=True
)
conversation.predict(input="What would be a good demo to show?")
# 特别的，在对话中调用总结式记忆空间。会自动保存最新一段AI答的原文（不总结归纳）
# 并把其他对话内容进行总结。这样做可能是为了更好的获取回答，最后一段AI答价值很大，不宜信息缩减。

三、Chains对话链

Outline

LLMChain
Sequential Chains
- SimpleSequentialChain
- SequentialChain
Router Chain

LLMChain基础链

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
llm = ChatOpenAI(temperature=0.9)
# 加载包prompt = ChatPromptTemplate.from_template("What is the best name to describe \a company that makes {product}?"
)
# 创建一个待变量product的提示词chain = LLMChain(llm=llm, prompt=prompt)
# 创建一个基础对话链product = "Queen Size Sheet Set"
chain.run(product)
# 提示词变量赋值，并获得回答

SimpleSequentialChain一般序列链

一般序列链可以将前一个链的输出结果，作为后一个链的输入。一般序列链有唯一输入和输出变量。

from langchain.chains import SimpleSequentialChain
llm = ChatOpenAI(temperature=0.9)
# 加载包first_prompt = ChatPromptTemplate.from_template("What is the best name to describe \a company that makes {product}?"
)
# 提示词模板1，变量为productchain_one = LLMChain(llm=llm, prompt=first_prompt)
# 链1second_prompt = ChatPromptTemplate.from_template("Write a 20 words description for the following \company:{company_name}"
)
# 提示词模板2，变量为company_namechain_two = LLMChain(llm=llm, prompt=second_prompt)
# 链2overall_simple_chain = SimpleSequentialChain(chains=[chain_one, chain_two],verbose=True)
overall_simple_chain.run(product)
# 组合链1、链2，获取结果

SequentialChain序列链

序列链中包含多个链，其中一些链的结果可以作为另一个链的输入。序列链可以支持多个输入和输出变量。

from langchain.chains import SequentialChain
llm = ChatOpenAI(temperature=0.9)
# 加载first_prompt = ChatPromptTemplate.from_template("Translate the following review to english:""\n\n{Review}"
chain_one = LLMChain(llm=llm, prompt=first_prompt, output_key="English_Review")
# 链1：输入Review，输出English_Reviewsecond_prompt = ChatPromptTemplate.from_template("Can you summarize the following review in 1 sentence:""\n\n{English_Review}"
)
chain_two = LLMChain(llm=llm, prompt=second_prompt, output_key="summary")
# 链2：输入English_Review，输出summarythird_prompt = ChatPromptTemplate.from_template("What language is the following review:\n\n{Review}"
)
chain_three = LLMChain(llm=llm, prompt=third_prompt,output_key="language")
# 链3：输入Review，输出languagefourth_prompt = ChatPromptTemplate.from_template("Write a follow up response to the following ""summary in the specified language:""\n\nSummary: {summary}\n\nLanguage: {language}"
)
chain_four = LLMChain(llm=llm, prompt=fourth_prompt,output_key="followup_message")
# 链4：输入summary、language，输出followup_messageoverall_chain = SequentialChain(chains=[chain_one, chain_two, chain_three, chain_four],input_variables=["Review"],output_variables=["English_Review", "summary","followup_message"],verbose=True
)
# 构建完整链，输入Review，输出"English_Review", "summary","followup_message"overall_chain(review)

Router Chain路由链

路由链类似一个while else的函数，根据输入值，选择对应的路由（路径）进行后续的链路。整个路由链一般一个输入，一个输出。

physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise\
and easy to understand manner. \
When you don't know the answer to a question you admit\
that you don't know.Here is a question:
{input}"""math_template = """You are a very good mathematician. \
You are great at answering math questions. \
You are so good because you are able to break down \
hard problems into their component parts, 
answer the component parts, and then put them together\
to answer the broader question.Here is a question:
{input}"""history_template = """You are a very good historian. \
You have an excellent knowledge of and understanding of people,\
events and contexts from a range of historical periods. \
You have the ability to think, reflect, debate, discuss and \
evaluate the past. You have a respect for historical evidence\
and the ability to make use of it to support your explanations \
and judgements.Here is a question:
{input}"""computerscience_template = """ You are a successful computer scientist.\
You have a passion for creativity, collaboration,\
forward-thinking, confidence, strong problem-solving capabilities,\
understanding of theories and algorithms, and excellent communication \
skills. You are great at answering coding questions. \
You are so good because you know how to solve a problem by \
describing the solution in imperative steps \
that a machine can easily interpret and you know how to \
choose a solution that has a good balance between \
time complexity and space complexity. Here is a question:
{input}"""# 创建4种提示词模板prompt_infos = [{"name": "physics", "description": "Good for answering questions about physics", "prompt_template": physics_template},{"name": "math", "description": "Good for answering math questions", "prompt_template": math_template},{"name": "History", "description": "Good for answering history questions", "prompt_template": history_template},{"name": "computer science", "description": "Good for answering computer science questions", "prompt_template": computerscience_template}
]
# 提示词要点信息from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
from langchain.prompts import PromptTemplate
llm = ChatOpenAI(temperature=0)
# 加载destination_chains = {}
for p_info in prompt_infos:name = p_info["name"]prompt_template = p_info["prompt_template"]prompt = ChatPromptTemplate.from_template(template=prompt_template)chain = LLMChain(llm=llm, prompt=prompt)destination_chains[name] = chain  destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)
# 根据提示词要点信息，生成4个链，存入destination中default_prompt = ChatPromptTemplate.from_template("{input}")
default_chain = LLMChain(llm=llm, prompt=default_prompt)
# 创建默认提示词和链MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
language model select the model prompt best suited for the input. \
You will be given the names of the available prompts and a \
description of what the prompt is best suited for. \
You may also revise the original input if you think that revising\
it will ultimately lead to a better response from the language model.<< FORMATTING >>
Return a markdown code snippet with a JSON object formatted to look like:
```json
{{{{"destination": string \ name of the prompt to use or "DEFAULT""next_inputs": string \ a potentially modified version of the original input
}}}}
```REMEMBER: "destination" MUST be one of the candidate prompt \
names specified below OR it can be "DEFAULT" if the input is not\
well suited for any of the candidate prompts.
REMEMBER: "next_inputs" can just be the original input \
if you don't think any modifications are needed.<< CANDIDATE PROMPTS >>
{destinations}<< INPUT >>
{{input}}<< OUTPUT (remember to include the ```json)>>"""
# 创建一个提示词模板，包含destination和input两个变量router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(destinations=destinations_str
)
# 提示词模板赋值destinationrouter_prompt = PromptTemplate(template=router_template,input_variables=["input"],output_parser=RouterOutputParser(),
)
# 提示词模板赋值router_chain = LLMRouterChain.from_llm(llm, router_prompt)
chain = MultiPromptChain(router_chain=router_chain, destination_chains=destination_chains, default_chain=default_chain, verbose=True)
# 生成路由链chain.run("xxx")