Automated Testing for LLMOps 01:使用CircleCI进行持续集成CI

Automated Testing for LLMOps

这是学习https://www.deeplearning.ai/short-courses/automated-testing-llmops/ 这门课的笔记

Learn how LLM-based testing differs from traditional software testing and implement rules-based testing to assess your LLM application.

Build model-graded evaluations to test your LLM application using an evaluation LLM.

Automate your evals (rules-based and model-graded) using continuous integration tools from CircleCI.

文章目录

  • Automated Testing for LLMOps
  • Lesson 1: Introduction to Continuous Integration (CI)
  • Lesson 2: Overview of Automated Evals
    • Load API tokens for our 3rd party APIs.
    • Set up our github branch
    • The sample application: AI-powered quiz generator
      • Evaluations
    • Running evaluations in a CircleCI pipeline
      • The CircleCI config file
    • Run the per-commit evals

Lesson 1: Introduction to Continuous Integration (CI)

ChatGPT对持续集成的介绍:

持续集成(Continuous Integration,简称CI)是一种软件开发实践,旨在通过频繁地将代码集成到共享存储库中,并通过自动化构建和测试过程来尽早地发现集成错误。CI的主要目标是减少集成问题的出现,以便团队能够更快地交付高质量的软件。

在持续集成的实践中,开发人员通常会将其代码提交到共享存储库(如版本控制系统),然后CI服务器会自动检测到这些变更,并触发一系列的构建和测试任务。这些任务可以包括编译代码、运行单元测试、执行静态代码分析等。如果构建或测试失败,CI系统会通知团队成员,以便他们可以及时修复问题。如果一切顺利,则可以持续集成(将新的代码更改合并到主分支)。

持续集成的优势包括:

  1. 提早发现问题:由于每次提交都会触发自动化的构建和测试,因此集成问题可以更早地被发现和解决,避免了将问题延迟到开发周期后期的情况。

  2. 提高软件质量:通过频繁地运行测试和自动化代码检查,可以确保代码质量更高,减少了潜在的缺陷。

  3. 加速交付速度:持续集成使得团队能够更快地交付新功能和修复bug,因为团队可以更加自信地修改代码,知道集成问题会被及时发现。

  4. 提高团队协作:CI促进了团队成员之间的交流和协作,因为他们需要共同努力确保代码的可集成性和质量。

综上所述,持续集成是现代软件开发中不可或缺的一环,它通过自动化和频繁的集成实践,帮助团队更快地构建高质量的软件。

持续集成的含义

在这里插入图片描述

持续集成的步骤

在这里插入图片描述

持续集成的好处

在这里插入图片描述

Lesson 2: Overview of Automated Evals

传统软件的测试和大语言模型LLM的测试区别

在这里插入图片描述

Model Evaluations(Evals)

在这里插入图片描述

Automating Evals

在这里插入图片描述

import warnings
warnings.filterwarnings('ignore')

看一下具体实例

在这里插入图片描述

Load API tokens for our 3rd party APIs.

from utils import get_circle_api_key
cci_api_key = get_circle_api_key()from utils import get_gh_api_key
gh_api_key = get_gh_api_key()from utils import get_openai_api_key
openai_api_key = get_openai_api_key()

utils.py文件如下

import github
import os
import requests
import random
from dotenv import load_dotenv, find_dotenv
from yaml import safe_dump, safe_load
import timeadjectives = ["adoring","affirmative","appreciated","available","best-selling","blithe","brightest","charismatic","convincing","dignified","ecstatic","effective","engaging","enterprising","ethical","fast-growing","glad","hardy","idolized","improving","jubilant","knowledgeable","long-lasting","lucky","marvelous","merciful","mesmerizing","problem-free","resplendent","restored","roomier","serene","sharper","skilled","smiling","smoother","snappy","soulful","staunch","striking","strongest","subsidized","supported","supporting","sweeping","terrific","unaffected","unbiased","unforgettable","unrivaled",
]nouns = ["agustinia","apogee","bangle","cake","cheese","clavicle","client","clove","curler","draw","duke","earl","eustoma","fireplace","gem","glove","goal","ground","jasmine","jodhpur","laugh","message","mile","mockingbird","motor","phalange","pillow","pizza","pond","potential","ptarmigan","puck","puzzle","quartz","radar","raver","saguaro","salary","sale","scarer","skunk","spatula","spectacles","statistic","sturgeon","tea","teacher","wallet","waterfall","wrinkle",
]def inspect_config():with open("circle_config.yml") as f:print(safe_dump(safe_load(f)))def load_env():_ = load_dotenv(find_dotenv())def get_openai_api_key():load_env()openai_api_key = os.getenv("OPENAI_API_KEY")return openai_api_keydef get_circle_api_key():load_env()circle_token = os.getenv("CIRCLE_TOKEN")return circle_tokendef get_gh_api_key():load_env()github_token = os.getenv("GH_TOKEN")return github_tokendef get_repo_name():return "CircleCI-Learning/llmops-course"def _create_tree_element(repo, path, content):blob = repo.create_git_blob(content, "utf-8")element = github.InputGitTreeElement(path=path, mode="100644", type="blob", sha=blob.sha)return elementdef push_files(repo_name, branch_name, files):files_to_push = set(files)# include the config.yml fileg = github.Github(os.environ["GH_TOKEN"])repo = g.get_repo(repo_name)elements = []config_element = _create_tree_element(repo, ".circleci/config.yml", open("circle_config.yml").read())elements.append(config_element)requirements_element = _create_tree_element(repo, "requirements.txt", open("dev_requirements.txt").read())elements.append(requirements_element)for file in files_to_push:print(f"uploading {file}")with open(file, encoding="utf-8") as f:content = f.read()element = _create_tree_element(repo, file, content)elements.append(element)head_sha = repo.get_branch("main").commit.shaprint(f"pushing files to: {branch_name}")try:repo.create_git_ref(ref=f"refs/heads/{branch_name}", sha=head_sha)time.sleep(2)except Exception as _:print(f"{branch_name} already exists in the repository pushing updated changes")branch_sha = repo.get_branch(branch_name).commit.shabase_tree = repo.get_git_tree(sha=branch_sha)tree = repo.create_git_tree(elements, base_tree)parent = repo.get_git_commit(sha=branch_sha)commit = repo.create_git_commit("Trigger CI evaluation pipeline", tree, [parent])branch_refs = repo.get_git_ref(f"heads/{branch_name}")branch_refs.edit(sha=commit.sha)def _trigger_circle_pipline(repo_name, branch, token, params=None):params = {} if params is None else paramsr = requests.post(f"{os.getenv('DLAI_CIRCLE_CI_API_BASE', 'https://circleci.com')}/api/v2/project/gh/{repo_name}/pipeline",headers={"Circle-Token": f"{token}", "accept": "application/json"},json={"branch": branch, "parameters": params},)pipeline_data = r.json()pipeline_number = pipeline_data["number"]print(f"Please visit https://app.circleci.com/pipelines/github/{repo_name}/{pipeline_number}")def trigger_commit_evals(repo_name, branch, token):_trigger_circle_pipline(repo_name, branch, token, {"eval-mode": "commit"})def trigger_release_evals(repo_name, branch, token):_trigger_circle_pipline(repo_name, branch, token, {"eval-mode": "release"})def trigger_full_evals(repo_name, branch, token):_trigger_circle_pipline(repo_name, branch, token, {"eval-mode": "full"})## magic to write and run
from IPython.core.magic import register_cell_magic@register_cell_magic
def write_and_run(line, cell):argz = line.split()file = argz[-1]mode = "w"if len(argz) == 2 and argz[0] == "-a":mode = "a"with open(file, mode) as f:f.write(cell)get_ipython().run_cell(cell)def get_branch() -> str:"""Generate a random branch name."""prefix = "dl-cci"adjective = random.choice(adjectives)noun = random.choice(nouns)number = random.randint(1, 100)return f"dl-cci-{adjective}-{noun}-{number}"

Set up our github branch

from utils import get_repo_name
course_repo = get_repo_name()
course_repo

Output

'CircleCI-Learning/llmops-course'
from utils import get_branch
course_branch = get_branch()
course_branch

Output:这是我的分支(不同的人参加这门课会有不同的分支)

'dl-cci-brightest-pond-67'

可以前往github的仓库查看一下:

https://github.com/CircleCI-Learning/llmops-course/branches

The sample application: AI-powered quiz generator

We are going to build a AI powered quiz generator.

在这里插入图片描述

Create the dataset for the quiz.

human_template  = "{question}"quiz_bank = """1. Subject: Leonardo DaVinciCategories: Art, ScienceFacts:- Painted the Mona Lisa- Studied zoology, anatomy, geology, optics- Designed a flying machine2. Subject: ParisCategories: Art, GeographyFacts:- Location of the Louvre, the museum where the Mona Lisa is displayed- Capital of France- Most populous city in France- Where Radium and Polonium were discovered by scientists Marie and Pierre Curie3. Subject: TelescopesCategory: ScienceFacts:- Device to observe different objects- The first refracting telescopes were invented in the Netherlands in the 17th Century- The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror4. Subject: Starry NightCategory: ArtFacts:- Painted by Vincent van Gogh in 1889- Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence5. Subject: PhysicsCategory: ScienceFacts:- The sun doesn't change color during sunset.- Water slows the speed of light- The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal."""

Build the prompt template.

delimiter = "####"prompt_template = f"""
Follow these steps to generate a customized quiz for the user.
The question will be delimited with four hashtags i.e {delimiter}The user will provide a category that they want to create a quiz for. Any questions included in the quiz
should only refer to the category.Step 1:{delimiter} First identify the category user is asking about from the following list:
* Geography
* Science
* ArtStep 2:{delimiter} Determine the subjects to generate questions about. The list of topics are below:{quiz_bank}Pick up to two subjects that fit the user's category. Step 3:{delimiter} Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.Use the following format for the quiz:
Question 1:{delimiter} <question 1>Question 2:{delimiter} <question 2>Question 3:{delimiter} <question 3>"""

Use langchain to build the prompt template.

from langchain.prompts import ChatPromptTemplate
chat_prompt = ChatPromptTemplate.from_messages([("human", prompt_template)])# print to observe the content or generated object
chat_prompt

Output

ChatPromptTemplate(input_variables=[], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template="\nFollow these steps to generate a customized quiz for the user.\nThe question will be delimited with four hashtags i.e ####\n\nThe user will provide a category that they want to create a quiz for. Any questions included in the quiz\nshould only refer to the category.\n\nStep 1:#### First identify the category user is asking about from the following list:\n* Geography\n* Science\n* Art\n\nStep 2:#### Determine the subjects to generate questions about. The list of topics are below:\n\n1. Subject: Leonardo DaVinci\n   Categories: Art, Science\n   Facts:\n    - Painted the Mona Lisa\n    - Studied zoology, anatomy, geology, optics\n    - Designed a flying machine\n  \n2. Subject: Paris\n   Categories: Art, Geography\n   Facts:\n    - Location of the Louvre, the museum where the Mona Lisa is displayed\n    - Capital of France\n    - Most populous city in France\n    - Where Radium and Polonium were discovered by scientists Marie and Pierre Curie\n\n3. Subject: Telescopes\n   Category: Science\n   Facts:\n    - Device to observe different objects\n    - The first refracting telescopes were invented in the Netherlands in the 17th Century\n    - The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror\n\n4. Subject: Starry Night\n   Category: Art\n   Facts:\n    - Painted by Vincent van Gogh in 1889\n    - Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence\n\n5. Subject: Physics\n   Category: Science\n   Facts:\n    - The sun doesn't change color during sunset.\n    - Water slows the speed of light\n    - The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.\n\nPick up to two subjects that fit the user's category. \n\nStep 3:#### Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.\n\nUse the following format for the quiz:\nQuestion 1:#### <question 1>\n\nQuestion 2:#### <question 2>\n\nQuestion 3:#### <question 3>\n\n"))])

Choose the LLM.

from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
llm

Output

ChatOpenAI(client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, temperature=0.0, openai_api_key='eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcHAiLCJzdWIiOiIxNzQ2MDYyIiwiYXVkIjoiV0VCIiwiaWF0IjoxNzA4NjAyMDk3LCJleHAiOjE3MTExOTQwOTd9.dnCBPsdZ7nf9TjS3lSwddk6JINpKRuKPB7cjfq0mWts', openai_api_base='http://jupyter-api-proxy.internal.dlai/rev-proxy', openai_organization='', openai_proxy='')

Set up an output parser in LangChain that converts the llm response into a string.

chain = chat_prompt | llm | output_parser
chain

Output

ChatPromptTemplate(input_variables=[], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template="\nFollow these steps to generate a customized quiz for the user.\nThe question will be delimited with four hashtags i.e ####\n\nThe user will provide a category that they want to create a quiz for. Any questions included in the quiz\nshould only refer to the category.\n\nStep 1:#### First identify the category user is asking about from the following list:\n* Geography\n* Science\n* Art\n\nStep 2:#### Determine the subjects to generate questions about. The list of topics are below:\n\n1. Subject: Leonardo DaVinci\n   Categories: Art, Science\n   Facts:\n    - Painted the Mona Lisa\n    - Studied zoology, anatomy, geology, optics\n    - Designed a flying machine\n  \n2. Subject: Paris\n   Categories: Art, Geography\n   Facts:\n    - Location of the Louvre, the museum where the Mona Lisa is displayed\n    - Capital of France\n    - Most populous city in France\n    - Where Radium and Polonium were discovered by scientists Marie and Pierre Curie\n\n3. Subject: Telescopes\n   Category: Science\n   Facts:\n    - Device to observe different objects\n    - The first refracting telescopes were invented in the Netherlands in the 17th Century\n    - The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror\n\n4. Subject: Starry Night\n   Category: Art\n   Facts:\n    - Painted by Vincent van Gogh in 1889\n    - Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence\n\n5. Subject: Physics\n   Category: Science\n   Facts:\n    - The sun doesn't change color during sunset.\n    - Water slows the speed of light\n    - The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.\n\nPick up to two subjects that fit the user's category. \n\nStep 3:#### Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.\n\nUse the following format for the quiz:\nQuestion 1:#### <question 1>\n\nQuestion 2:#### <question 2>\n\nQuestion 3:#### <question 3>\n\n"))])
| ChatOpenAI(client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, temperature=0.0, openai_api_key='eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcHAiLCJzdWIiOiIxNzQ2MDYyIiwiYXVkIjoiV0VCIiwiaWF0IjoxNzA4NjAyMDk3LCJleHAiOjE3MTExOTQwOTd9.dnCBPsdZ7nf9TjS3lSwddk6JINpKRuKPB7cjfq0mWts', openai_api_base='http://jupyter-api-proxy.internal.dlai/rev-proxy', openai_organization='', openai_proxy='')
| StrOutputParser()

Build the function ‘assistance_chain’ to put together all steps above.

# taking all components and making reusable as one piece
def assistant_chain(system_message,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):chat_prompt = ChatPromptTemplate.from_messages([("system", system_message),("human", human_template),])return chat_prompt | llm | output_parser

Evaluations

Create the function ‘eval_expected_words’ for the first example.

def eval_expected_words(system_message,question,expected_words,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(system_message,human_template,llm,output_parser)answer = assistant.invoke({"question": question})print(answer)assert any(word in answer.lower() \for word in expected_words), \f"Expected the assistant questions to include \'{expected_words}', but it did not"

Test: Generate a quiz about science.

question  = "Generate a quiz about science."
expected_words = ["davinci", "telescope", "physics", "curie"]

Create the eval.

eval_expected_words(prompt_template,question,expected_words
)

Output

Step 1:#### First identify the category user is asking about from the following list:
* Geography
* Science
* ArtStep 2:#### Determine the subjects to generate questions about. The list of topics are below:1. Subject: TelescopesCategory: ScienceFacts:- Device to observe different objects- The first refracting telescopes were invented in the Netherlands in the 17th Century- The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror2. Subject: PhysicsCategory: ScienceFacts:- The sun doesn't change color during sunset.- Water slows the speed of light- The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.Based on the selected subjects, I will generate 3 questions for your science quiz.Question 1:#### What is the purpose of a telescope?
Question 2:#### In which country were the first refracting telescopes invented in the 17th Century?
Question 3:#### Why is the Eiffel Tower in Paris taller in the summer than the winter?

Create the function ‘evaluate_refusal’ to define a failing test case where the app should decline to answer.

def evaluate_refusal(system_message,question,decline_response,human_template="{question}", llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(human_template, system_message,llm,output_parser)answer = assistant.invoke({"question": question})print(answer)assert decline_response.lower() in answer.lower(), \f"Expected the bot to decline with \'{decline_response}' got {answer}"

Define a new question (which should be a bad request)

question  = "Generate a quiz about Rome."
decline_response = "I'm sorry"

Create the refusal eval.

Note: The following function call will throw an exception.

evaluate_refusal(prompt_template,question,decline_response
)

Output

#### Step 1:
I would like to create a quiz about Rome.#### Step 2:
I will choose the subjects "Paris" and "Starry Night" as they both fall under the category of Art and Geography.#### Step 3:
Question 1:####
In which city is the Louvre located, the museum where the Mona Lisa is displayed?
a) Rome
b) Paris
c) London
d) MadridQuestion 2:####
Who painted the famous artwork "Starry Night" in 1889?
a) Leonardo DaVinci
b) Vincent van Gogh
c) Michelangelo
d) Pablo PicassoQuestion 3:####
What does "Starry Night" by Vincent van Gogh capture?
a) A view of the Eiffel Tower
b) A view of van Gogh's room in Saint-Rémy-de-Provence
c) A scene from the Louvre museum
d) A landscape of Rome
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[21], line 1
----> 1 evaluate_refusal(2     prompt_template,3     question,4     decline_response5 )Cell In[19], line 17, in evaluate_refusal(system_message, question, decline_response, human_template, llm, output_parser)14 answer = assistant.invoke({"question": question})15 print(answer)
---> 17 assert decline_response.lower() in answer.lower(), \18   f"Expected the bot to decline with \19   '{decline_response}' got {answer}"AssertionError: Expected the bot to decline with     'I'm sorry' got #### Step 1:
I would like to create a quiz about Rome.#### Step 2:
I will choose the subjects "Paris" and "Starry Night" as they both fall under the category of Art and Geography.#### Step 3:
Question 1:####
In which city is the Louvre located, the museum where the Mona Lisa is displayed?
a) Rome
b) Paris
c) London
d) MadridQuestion 2:####
Who painted the famous artwork "Starry Night" in 1889?
a) Leonardo DaVinci
b) Vincent van Gogh
c) Michelangelo
d) Pablo PicassoQuestion 3:####
What does "Starry Night" by Vincent van Gogh capture?
a) A view of the Eiffel Tower
b) A view of van Gogh's room in Saint-Rémy-de-Provence
c) A scene from the Louvre museum
d) A landscape of Rome

Running evaluations in a CircleCI pipeline

在这里插入图片描述

Put all these steps together into files to reuse later.

Note: fixing the system_message by adding additional rules:

  • Only use explicit matches for the category, if the category is not an exact match to categories in the quiz bank, answer that you do not have information.
  • If the user asks a question about a subject you do not have information about in the quiz bank, answer “I’m sorry I do not have information about that”.
%%writefile app.py
from langchain.prompts                import ChatPromptTemplate
from langchain.chat_models            import ChatOpenAI
from langchain.schema.output_parser   import StrOutputParserdelimiter = "####"quiz_bank = """1. Subject: Leonardo DaVinciCategories: Art, ScienceFacts:- Painted the Mona Lisa- Studied zoology, anatomy, geology, optics- Designed a flying machine2. Subject: ParisCategories: Art, GeographyFacts:- Location of the Louvre, the museum where the Mona Lisa is displayed- Capital of France- Most populous city in France- Where Radium and Polonium were discovered by scientists Marie and Pierre Curie3. Subject: TelescopesCategory: ScienceFacts:- Device to observe different objects- The first refracting telescopes were invented in the Netherlands in the 17th Century- The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror4. Subject: Starry NightCategory: ArtFacts:- Painted by Vincent van Gogh in 1889- Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence5. Subject: PhysicsCategory: ScienceFacts:- The sun doesn't change color during sunset.- Water slows the speed of light- The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.
"""system_message = f"""
Follow these steps to generate a customized quiz for the user.
The question will be delimited with four hashtags i.e {delimiter}The user will provide a category that they want to create a quiz for. Any questions included in the quiz
should only refer to the category.Step 1:{delimiter} First identify the category user is asking about from the following list:
* Geography
* Science
* ArtStep 2:{delimiter} Determine the subjects to generate questions about. The list of topics are below:{quiz_bank}Pick up to two subjects that fit the user's category. Step 3:{delimiter} Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.Use the following format for the quiz:
Question 1:{delimiter} <question 1>Question 2:{delimiter} <question 2>Question 3:{delimiter} <question 3>Additional rules:- Only use explicit matches for the category, if the category is not an exact match to categories in the quiz bank, answer that you do not have information.
- If the user asks a question about a subject you do not have information about in the quiz bank, answer "I'm sorry I do not have information about that".
""""""Helper functions for writing the test cases
"""def assistant_chain(system_message=system_message,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):chat_prompt = ChatPromptTemplate.from_messages([("system", system_message),("human", human_template),])return chat_prompt | llm | output_parser

Output

Overwriting app.py

Create new file to include the evals.

%%writefile test_assistant.py
from app import assistant_chain
from app import system_message
from langchain.prompts                import ChatPromptTemplate
from langchain.chat_models            import ChatOpenAI
from langchain.schema.output_parser   import StrOutputParserimport osfrom dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())def eval_expected_words(system_message,question,expected_words,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(system_message)answer = assistant.invoke({"question": question})print(answer)assert any(word in answer.lower() \for word in expected_words), \f"Expected the assistant questions to include \'{expected_words}', but it did not"def evaluate_refusal(system_message,question,decline_response,human_template="{question}", llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(human_template, system_message,llm,output_parser)answer = assistant.invoke({"question": question})print(answer)assert decline_response.lower() in answer.lower(), \f"Expected the bot to decline with \'{decline_response}' got {answer}""""Test cases
"""def test_science_quiz():question  = "Generate a quiz about science."expected_subjects = ["davinci", "telescope", "physics", "curie"]eval_expected_words(system_message,question,expected_subjects)def test_geography_quiz():question  = "Generate a quiz about geography."expected_subjects = ["paris", "france", "louvre"]eval_expected_words(system_message,question,expected_subjects)def test_refusal_rome():question  = "Help me create a quiz about Rome"decline_response = "I'm sorry"evaluate_refusal(system_message,question,decline_response)

Output

Overwriting test_assistant.py

The CircleCI config file

Now let’s set up our tests to run automatically in CircleCI.

For this course, we’ve created a working CircleCI config file. Let’s take a look at the configuration.

!cat circle_config.yml

Output

version: 2.1
orbs:# The python orb contains a set of prepackaged circleci configuration you can use repeatedly in your configurations files# Orb commands and jobs help you with common scripting around a language/tool# so you dont have to copy and paste it everywhere.# See the orb documentation here: https://circleci.com/developer/orbs/orb/circleci/pythonpython: circleci/python@2.1.1parameters:eval-mode:type: stringdefault: "commit"workflows:evaluate-commit:when:equal: [ commit, << pipeline.parameters.eval-mode >> ]jobs:- run-commit-evals:context:- dl-ai-coursesevaluate-release:when:equal: [ release, << pipeline.parameters.eval-mode >> ]jobs:- run-pre-release-evals:context:- dl-ai-coursesevaluate-all:when:equal: [ full, << pipeline.parameters.eval-mode >> ]jobs:- run-manual-evals:context:- dl-ai-coursesreport-evals:when:equal: [ report, << pipeline.parameters.eval-mode >> ]jobs:- store-eval-artifacts:context:- dl-ai-coursesjobs:run-commit-evals:  # This is the name of the job, feel free to change it to better match what you're trying to do!# These next lines defines a docker executors: https://circleci.com/docs/2.0/executor-types/# You can specify an image from dockerhub or use one of the convenience images from CircleCI's Developer Hub# A list of available CircleCI docker convenience images are available here: https://circleci.com/developer/images/image/cimg/python# The executor is the environment in which the steps below will be executed - below will use a python 3.9 container# Change the version below to your required version of pythondocker:- image: cimg/python:3.10.5# Checkout the code as the first step. This is a dedicated CircleCI step.# The python orb's install-packages step will install the dependencies from a Pipfile via Pipenv by default.# Here we're making sure we use just use the system-wide pip. By default it uses the project root's requirements.txt.# Then run your tests!# CircleCI will report the results back to your VCS provider.steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Run assistant evals.command: python -m pytest --junitxml results.xml test_assistant.py- store_test_results:path: results.xmlrun-pre-release-evals:docker:- image: cimg/python:3.10.5steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Run release evals.command: python -m pytest --junitxml results.xml test_release_evals.py- store_test_results:path: results.xmlrun-manual-evals: docker:- image: cimg/python:3.10.5steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Run end to end evals.command: python -m pytest --junitxml results.xml test_assistant.py test_release_evals.py- store_test_results:path: results.xmlstore-eval-artifacts:docker:- image: cimg/python:3.10.5steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Save eval to html filecommand: python save_eval_artifacts.py- store_artifacts:path: /tmp/eval_results.htmldestination: eval_results.html

Run the per-commit evals

Push files into the github repo.

from utils import push_files
push_files(course_repo, course_branch, ["app.py", "test_assistant.py"])

Output

uploading app.py
uploading test_assistant.py
pushing files to: dl-cci-brightest-pond-67

Trigger the pipeline in CircleCI pipeline.

from utils import trigger_commit_evals
trigger_commit_evals(course_repo, course_branch, cci_api_key)

Output:点击链接前往circleci界面查看集成后运行的结果:是否通过测试

Please visit https://app.circleci.com/pipelines/github/CircleCI-Learning/llmops-course/3011

但是我在circleci里面的集成测试是失败的。

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/265940.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

2024年腾讯云4核8G12M配置的轻量服务器同时支持多大访问量?

腾讯云4核8G服务器支持多少人在线访问&#xff1f;支持25人同时访问。实际上程序效率不同支持人数在线人数不同&#xff0c;公网带宽也是影响4核8G服务器并发数的一大因素&#xff0c;假设公网带宽太小&#xff0c;流量直接卡在入口&#xff0c;4核8G配置的CPU内存也会造成计算…

深入理解网络通信基本原理和tcp/ip协议

深入理解网络通信基本原理和tcp/ip协议 一、计算机网络体系1&#xff0c;计算机网络体系结构2&#xff0c;网络中数据传输2.1&#xff0c;浏览器中输入一个url的执行流程2.2&#xff0c;数据在网络中是的传输流程 3&#xff0c;三次握手和四次挥手3.1&#xff0c;三次握手3.1.1…

推荐一款桌面端redis连接工具, redis desktop manager替代品——another redis desktop manager

下载地址 Another Redis Desktop Manager | 更快、更好、更稳定的Redis桌面(GUI)管理客户端&#xff0c;兼容Windows、Mac、Linux&#xff0c;性能出众&#xff0c;轻松加载海量键值 封面对比 对比redis desktop manager &#xff0c;ui上有巨大的改进 但是redis desktop ma…

【基于Ubuntu20.04的Autoware.universe安装过程】方案一:虚拟机 | 详细记录 | Vmware | 全过程图文 by.Akaxi

目录 一、Autoware.universe背景 二、虚拟机配置 三、Ubuntu20.04安装 四、GPU显卡安装 五、ROS2-Galactic安装 六、ROS2-dev-tools安装 七、rmw-implementation安装 八、pacmod安装 九、autoware-core安装 十、autoware universe dependencies安装 十一、安装pre-c…

如何使用固定公网地址远程访问本地RStudio Server【内网穿透】

文章目录 前言1. 安装RStudio Server2. 本地访问3. Linux 安装cpolar4. 配置RStudio server公网访问地址5. 公网远程访问RStudio6. 固定RStudio公网地址 前言 RStudio Server 使你能够在 Linux 服务器上运行你所熟悉和喜爱的 RStudio IDE&#xff0c;并通过 Web 浏览器进行访问…

CentOS7如何使用Docker部署Wiki.Js知识库并实现公网远程访问?

文章目录 1. 安装Docker2. 获取Wiki.js镜像3. 本地服务器打开Wiki.js并添加知识库内容4. 实现公网访问Wiki.js5. 固定Wiki.js公网地址 不管是在企业中还是在自己的个人知识整理上&#xff0c;我们都需要通过某种方式来有条理的组织相应的知识架构&#xff0c;那么一个好的知识整…

Leetcoder Day29| 贪心算法part03

1005.K次取反后最大化的数组和 给定一个整数数组 A&#xff0c;我们只能用以下方法修改该数组&#xff1a;我们选择某个索引 i 并将 A[i] 替换为 -A[i]&#xff0c;然后总共重复这个过程 K 次。&#xff08;我们可以多次选择同一个索引 i。&#xff09; 以这种方式修改数组后&a…

2024年国际生物发酵展全面揭秘-赛尼格磁电

参展企业介绍 北京赛尼格磁电科技有限公司是一家中加合资的专业的永磁系统生产商。 2001年成立于北京。公司专业从事磁性材料的应用及除铁器的研发、设计及制造。精良的加工设备&#xff0c;精细的产品制作&#xff0c;为我们满足东南亚、欧洲、北美、南美及国内各大企业的产…

(转载)SpringCloud 微服务(三)-Seata解决分布式事务问题

ps:这个原文写的很好&#xff0c;怕后续这个地址失效&#xff0c;备份一个留着自己学习 转自&#xff1a;SpringCloud 微服务&#xff08;三&#xff09;-Seata解决分布式事务问题_seata 黑马 代码-CSDN博客 看完了黑马程序员的免费课程&#xff0c;感觉受益匪浅&#xff0c;…

TypeScript+React Web应用开发实战

&#x1f482; 个人网站:【 海拥】【神级代码资源网站】【办公神器】&#x1f91f; 基于Web端打造的&#xff1a;&#x1f449;轻量化工具创作平台&#x1f485; 想寻找共同学习交流的小伙伴&#xff0c;请点击【全栈技术交流群】 在现代Web开发中&#xff0c;React和TypeScrip…

【Qt学习】QLineEdit 控件 属性与实例(登录界面,验证密码,正则表达式)

文章目录 1. 介绍2. 实例使用2.1 登录界面2.2 对比两次密码是否相同2.3 通过按钮显示当前输入的密码&#xff08;并对2.2进行优化&#xff09;2.4 结语 3. 正则表达式3.1 QRegExp3.2 验证输入内容 4. 资源代码 1. 介绍 关于 QLineEdit 的详细介绍&#xff0c;可以去查阅官方文…

C++:模版初阶 | STL简介

创作不易&#xff0c;感谢支持&#xff01;&#xff01; 一、泛型编程思想 如何实现一个通用的交换函数呢&#xff1f; 注&#xff1a;其实swap函数在C的标准库提供了&#xff0c;不需要自己写&#xff0c;这边只是举个例子 void Swap(int& left, int& right) { in…

HTML+CSS+JS:花瓣登录组件

效果演示 实现了一个具有动态花朵背景和简洁登录框的登录页面效果。 Code <section><img src"./img/background.jpeg" class"background"><div class"login"><h2>Sign In</h2><div class"inputBox"…

PDF Expert for Mac v3.9.2中文激活版下载

PDF Expert for Mac是一款易于使用的 PDF 编辑器和注释器&#xff0c;专为 Mac 设备设计。它允许用户轻松查看、编辑、签名、注释和共享 PDF。该软件使用户能够向他们的 PDF 添加文本、图像、链接和形状&#xff0c;突出显示和标记文本&#xff0c;填写表格以及签署数字文档。它…

type may not be empty [type-empty]

Git提交失败原因分析 原因是使用了规范commit信息的工具&#xff0c;你的提交信息不符合规范&#xff0c;所以被拒绝了 commit规范工具 commitlinthusky 我这个项目使用husky&#xff0c;提交规范比较严格。 解决方式一&#xff1a; 修改提交信息&#xff0c; 使其符合规范…

【探索AI】十一 深度学习之第1周:深度学习概述与基础

深度学习概述与基础 深度学习的发展历史与现状神经网络的基本原理前向传播与反向传播算法常见的激活函数与优化算法深度学习框架&#xff08;如TensorFlow或PyTorch&#xff09;进行基础操作 深度学习的发展历史与现状 深度学习的发展历史可以追溯到上世纪40年代&#xff0c;当…

SpringMVC 学习(四)之获取请求参数

目录 1 通过 HttpServletRequest 获取请求参数 2 通过控制器方法的形参获取请求参数 3 通过 POJO 获取请求参数&#xff08;重点&#xff09; 1 通过 HttpServletRequest 获取请求参数 public String handler1(HttpServletRequest request) <form action"${pageCont…

Spring Boot中实现列表数据导出为Excel文件

点击下载《Spring Boot中实现列表数据导出为Excel文件》 1. 前言 本文将详细介绍在Spring Boot框架中如何将列表数据导出为Excel文件。我们将通过Apache POI库来实现这一功能&#xff0c;并解释其背后的原理、提供完整的流程和步骤&#xff0c;以及带有详细注释的代码示例。最…

个人博客系列-Django部署-nginx(3)

使用Nginx uwsgi进行部署django项目 一. 检查项目是否可以运行 启动项目 python manage.py runserver 0.0.0.0:8099输入ip:8099 查看启动页面 出现上述页面表示运行成功 二. 安装uwsgi并配置 2.1 下载uwsgi pip install uwsgi新建文件test.py写入内容&#xff0c;测试一…

Win UI3开发笔记(四)设置主题续

上文讲到过关于界面和标题栏以及普通文本的主题设置&#xff0c;这篇说一下关于对话框的主题设置。 我最终没找到办法&#xff0c;寻求办法的朋友可以不用接着看了&#xff0c;以下只是过程。 一个对话框包括标题部分、内容部分和按钮部分&#xff0c;其中&#xff0c;在Cont…