成为钢铁侠!只需一块RTX3090,微软开源贾维斯(J.A.R.V.I.S.)人工智能AI助理系统

梦想照进现实,微软果然不愧是微软,开源了贾维斯(J.A.R.V.I.S.)人工智能助理系统,贾维斯(jarvis)全称为Just A Rather Very Intelligent System(只是一个相当聪明的人工智能系统),它可以帮助钢铁侠托尼斯塔克完成各种任务和挑战,包括控制和管理托尼的机甲装备,提供实时情报和数据分析,帮助托尼做出决策等等。

如今,我们也可以拥有自己的贾维斯人工智能助理,成本仅仅是一块RTX3090显卡。

贾维斯(Jarvis)的环境配置

一般情况下,深度学习领域相对主流的入门级别显卡是2070或者3070,而3090可以算是消费级深度学习显卡的天花板了:

再往上走就是工业级别的A系列和V系列显卡,显存是一个硬指标,因为需要加载本地的大模型,虽然可以改代码对模型加载进行“阉割”,但功能上肯定也会有一定的损失。如果没有3090,也可以组两块3060 12G的并行,显存虽然可以达标,但算力和综合性能抵不过3090。

确保本地具备足以支撑贾维斯(Jarvis)的硬件环境之后,老规矩,克隆项目:

git clone https://github.com/microsoft/JARVIS.git

随后进入项目目录:

cd JARVIS

修改项目的配置文件 server/config.yaml:

openai:  key: your_personal_key # gradio, your_personal_key  
huggingface:  cookie: # required for huggingface inference  
local: # ignore: just for development  endpoint: http://localhost:8003  
dev: false  
debug: false  
log_file: logs/debug.log  
model: text-davinci-003 # text-davinci-003  
use_completion: true  
inference_mode: hybrid # local, huggingface or hybrid  
local_deployment: minimal # no, minimal, standard or full  
num_candidate_models: 5  
max_description_length: 100  
proxy:   
httpserver:  host: localhost  port: 8004  
modelserver:  host: localhost  port: 8005  
logit_bias:  parse_task: 0.1  choose_model: 5

这里主要修改三个配置即可,分别是openaikey,huggingface官网的cookie令牌,以及OpenAI的model,默认使用的模型是text-davinci-003。

修改完成后,官方推荐使用虚拟环境conda,Python版本3.8,私以为这里完全没有任何必要使用虚拟环境,直接上Python3.10即可,接着安装依赖:

pip3 install -r requirements.txt

项目依赖库如下:

git+https://github.com/huggingface/diffusers.git@8c530fc2f6a76a2aefb6b285dce6df1675092ac6#egg=diffusers  
git+https://github.com/huggingface/transformers@c612628045822f909020f7eb6784c79700813eda#egg=transformers  
git+https://github.com/patrickvonplaten/controlnet_aux@78efc716868a7f5669c288233d65b471f542ce40#egg=controlnet_aux  
tiktoken==0.3.3  
pydub==0.25.1  
espnet==202301  
espnet_model_zoo==0.1.7  
flask==2.2.3  
flask_cors==3.0.10  
waitress==2.1.2  
datasets==2.11.0  
asteroid==0.6.0  
speechbrain==0.5.14  
timm==0.6.13  
typeguard==2.13.3  
accelerate==0.18.0  
pytesseract==0.3.10  
gradio==3.24.1

这里web端接口是用Flask2.2高版本搭建的,但奇怪的是微软并未使用Flask新版本的异步特性。

安装完成之后,进入模型目录:

cd models

下载模型和数据集:

sh download.sh

这里一定要做好心理准备,因为模型就已经占用海量的硬盘空间了,数据集更是不必多说,所有文件均来自huggingface:

models="  
nlpconnect/vit-gpt2-image-captioning  
lllyasviel/ControlNet  
runwayml/stable-diffusion-v1-5  
CompVis/stable-diffusion-v1-4  
stabilityai/stable-diffusion-2-1  
Salesforce/blip-image-captioning-large  
damo-vilab/text-to-video-ms-1.7b  
microsoft/speecht5_asr  
facebook/maskformer-swin-large-ade  
microsoft/biogpt  
facebook/esm2_t12_35M_UR50D  
microsoft/trocr-base-printed  
microsoft/trocr-base-handwritten  
JorisCos/DCCRNet_Libri1Mix_enhsingle_16k  
espnet/kan-bayashi_ljspeech_vits  
facebook/detr-resnet-101  
microsoft/speecht5_tts  
microsoft/speecht5_hifigan  
microsoft/speecht5_vc  
facebook/timesformer-base-finetuned-k400  
runwayml/stable-diffusion-v1-5  
superb/wav2vec2-base-superb-ks  
openai/whisper-base  
Intel/dpt-large  
microsoft/beit-base-patch16-224-pt22k-ft22k  
facebook/detr-resnet-50-panoptic  
facebook/detr-resnet-50  
openai/clip-vit-large-patch14  
google/owlvit-base-patch32  
microsoft/DialoGPT-medium  
bert-base-uncased  
Jean-Baptiste/camembert-ner  
deepset/roberta-base-squad2  
facebook/bart-large-cnn  
google/tapas-base-finetuned-wtq  
distilbert-base-uncased-finetuned-sst-2-english  
gpt2  
mrm8488/t5-base-finetuned-question-generation-ap  
Jean-Baptiste/camembert-ner  
t5-base  
impira/layoutlm-document-qa  
ydshieh/vit-gpt2-coco-en  
dandelin/vilt-b32-finetuned-vqa  
lambdalabs/sd-image-variations-diffusers  
facebook/timesformer-base-finetuned-k400  
facebook/maskformer-swin-base-coco  
Intel/dpt-hybrid-midas  
lllyasviel/sd-controlnet-canny  
lllyasviel/sd-controlnet-depth  
lllyasviel/sd-controlnet-hed  
lllyasviel/sd-controlnet-mlsd  
lllyasviel/sd-controlnet-openpose  
lllyasviel/sd-controlnet-scribble  
lllyasviel/sd-controlnet-seg  
"  # CURRENT_DIR=$(cd `dirname $0`; pwd)  
CURRENT_DIR=$(pwd)  
for model in $models;  
do  echo "----- Downloading from https://huggingface.co/"$model" -----"  if [ -d "$model" ]; then  # cd $model && git reset --hard && git pull && git lfs pull  cd $model && git pull && git lfs pull  cd $CURRENT_DIR  else  # git clone 包含了lfs  git clone https://huggingface.co/$model $model  fi  
done  datasets="Matthijs/cmu-arctic-xvectors"  for dataset in $datasets;  do  echo "----- Downloading from https://huggingface.co/datasets/"$dataset" -----"  if [ -d "$dataset" ]; then  cd $dataset && git pull && git lfs pull  cd $CURRENT_DIR  else  git clone https://huggingface.co/datasets/$dataset $dataset  fi  
done

也可以考虑拆成两个shell,开多进程下载,速度会快很多。

但事实上,真的,别下了,文件属实过于巨大,这玩意儿真的不是普通人能耍起来的,当然选择不下载本地模型和数据集也能运行,请看下文。

漫长的下载流程结束之后,贾维斯(Jarvis)就配置好了。

运行贾维斯(Jarvis)

如果您选择下载了所有的模型和数据集(佩服您是条汉子),终端内启动服务:

python models_server.py --config config.yaml

随后会在系统的8004端口启动一个Flask服务进程,然后发起Http请求即可运行贾维斯(Jarvis):

curl --location 'http://localhost:8004/hugginggpt' \  
--header 'Content-Type: application/json' \  
--data '{  "messages": [  {  "role": "user",  "content": "please generate a video based on \"Spiderman is surfing\""  }  ]  
}'

这个的意思是让贾维斯(Jarvis)生成一段“蜘蛛侠在冲浪”的视频。

当然了,以笔者的硬件环境,是不可能跑起来的,所以可以对加载的模型适当“阉割”,在models_server.py文件的81行左右:

other_pipes = {  "nlpconnect/vit-gpt2-image-captioning":{  "model": VisionEncoderDecoderModel.from_pretrained(f"{local_fold}/nlpconnect/vit-gpt2-image-captioning"),  "feature_extractor": ViTImageProcessor.from_pretrained(f"{local_fold}/nlpconnect/vit-gpt2-image-captioning"),  "tokenizer": AutoTokenizer.from_pretrained(f"{local_fold}/nlpconnect/vit-gpt2-image-captioning"),  "device": "cuda:0"  },  "Salesforce/blip-image-captioning-large": {  "model": BlipForConditionalGeneration.from_pretrained(f"{local_fold}/Salesforce/blip-image-captioning-large"),  "processor": BlipProcessor.from_pretrained(f"{local_fold}/Salesforce/blip-image-captioning-large"),  "device": "cuda:0"  },  "damo-vilab/text-to-video-ms-1.7b": {  "model": DiffusionPipeline.from_pretrained(f"{local_fold}/damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16"),  "device": "cuda:0"  },  "facebook/maskformer-swin-large-ade": {  "model": MaskFormerForInstanceSegmentation.from_pretrained(f"{local_fold}/facebook/maskformer-swin-large-ade"),  "feature_extractor" : AutoFeatureExtractor.from_pretrained("facebook/maskformer-swin-large-ade"),  "device": "cuda:0"  },  "microsoft/trocr-base-printed": {  "processor": TrOCRProcessor.from_pretrained(f"{local_fold}/microsoft/trocr-base-printed"),  "model": VisionEncoderDecoderModel.from_pretrained(f"{local_fold}/microsoft/trocr-base-printed"),  "device": "cuda:0"  },  "microsoft/trocr-base-handwritten": {  "processor": TrOCRProcessor.from_pretrained(f"{local_fold}/microsoft/trocr-base-handwritten"),  "model": VisionEncoderDecoderModel.from_pretrained(f"{local_fold}/microsoft/trocr-base-handwritten"),  "device": "cuda:0"  },  "JorisCos/DCCRNet_Libri1Mix_enhsingle_16k": {  "model": BaseModel.from_pretrained("JorisCos/DCCRNet_Libri1Mix_enhsingle_16k"),  "device": "cuda:0"  },  "espnet/kan-bayashi_ljspeech_vits": {  "model": Text2Speech.from_pretrained(f"espnet/kan-bayashi_ljspeech_vits"),  "device": "cuda:0"  },  "lambdalabs/sd-image-variations-diffusers": {  "model": DiffusionPipeline.from_pretrained(f"{local_fold}/lambdalabs/sd-image-variations-diffusers"), #torch_dtype=torch.float16  "device": "cuda:0"  },  "CompVis/stable-diffusion-v1-4": {  "model": DiffusionPipeline.from_pretrained(f"{local_fold}/CompVis/stable-diffusion-v1-4"),  "device": "cuda:0"  },  "stabilityai/stable-diffusion-2-1": {  "model": DiffusionPipeline.from_pretrained(f"{local_fold}/stabilityai/stable-diffusion-2-1"),  "device": "cuda:0"  },  "runwayml/stable-diffusion-v1-5": {  "model": DiffusionPipeline.from_pretrained(f"{local_fold}/runwayml/stable-diffusion-v1-5"),  "device": "cuda:0"  },  "microsoft/speecht5_tts":{  "processor": SpeechT5Processor.from_pretrained(f"{local_fold}/microsoft/speecht5_tts"),  "model": SpeechT5ForTextToSpeech.from_pretrained(f"{local_fold}/microsoft/speecht5_tts"),  "vocoder":  SpeechT5HifiGan.from_pretrained(f"{local_fold}/microsoft/speecht5_hifigan"),  "embeddings_dataset": load_dataset(f"{local_fold}/Matthijs/cmu-arctic-xvectors", split="validation"),  "device": "cuda:0"  },  "speechbrain/mtl-mimic-voicebank": {  "model": WaveformEnhancement.from_hparams(source="speechbrain/mtl-mimic-voicebank", savedir="models/mtl-mimic-voicebank"),  "device": "cuda:0"  },  "microsoft/speecht5_vc":{  "processor": SpeechT5Processor.from_pretrained(f"{local_fold}/microsoft/speecht5_vc"),  "model": SpeechT5ForSpeechToSpeech.from_pretrained(f"{local_fold}/microsoft/speecht5_vc"),  "vocoder": SpeechT5HifiGan.from_pretrained(f"{local_fold}/microsoft/speecht5_hifigan"),  "embeddings_dataset": load_dataset(f"{local_fold}/Matthijs/cmu-arctic-xvectors", split="validation"),  "device": "cuda:0"  },  "julien-c/wine-quality": {  "model": joblib.load(cached_download(hf_hub_url("julien-c/wine-quality", "sklearn_model.joblib")))  },  "facebook/timesformer-base-finetuned-k400": {  "processor": AutoImageProcessor.from_pretrained(f"{local_fold}/facebook/timesformer-base-finetuned-k400"),  "model": TimesformerForVideoClassification.from_pretrained(f"{local_fold}/facebook/timesformer-base-finetuned-k400"),  "device": "cuda:0"  },  "facebook/maskformer-swin-base-coco": {  "feature_extractor": MaskFormerFeatureExtractor.from_pretrained(f"{local_fold}/facebook/maskformer-swin-base-coco"),  "model": MaskFormerForInstanceSegmentation.from_pretrained(f"{local_fold}/facebook/maskformer-swin-base-coco"),  "device": "cuda:0"  },  "Intel/dpt-hybrid-midas": {  "model": DPTForDepthEstimation.from_pretrained(f"{local_fold}/Intel/dpt-hybrid-midas", low_cpu_mem_usage=True),  "feature_extractor": DPTFeatureExtractor.from_pretrained(f"{local_fold}/Intel/dpt-hybrid-midas"),  "device": "cuda:0"  }  }

把不需要的模型屏蔽即可。

与此同时,还可以不通过本地服务接口,修改config.yaml的inference_mode: huggingface

随后直接在终端运行ChatGPT版本的贾维斯:

cd server  
python3 awesome_chat.py --config config.yaml --mode cli

运行效果:

Welcome to Jarvis! A collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors. Jarvis can plan tasks, schedule Hugging Face models, generate friendly responses based on your requests, and help you with many things. Please enter your request (`exit` to exit).  
[ User ]: Given a collection of image A: /examples/cat.jpg, B: /examples/z3.jpg, C: /examples/z1.jpg, please tell me how many zebras in these picture?  
[ Jarvis ]: Based on your request, I have used several models to process the images you provided. For the image A: /examples/cat.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "a cat sitting on a window sill looking out". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/a3dc.jpg', which contains two objects with labels 'potted plant' and 'cat'. For the image B: /examples/z3.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "two zebras standing in a field of dry grass". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/f5df.jpg', which contains three objects with labels 'zebra'. For the image C: /examples/z1.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "a zebra standing in a field of tall grass". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/69c7.jpg', which contains one object with label 'zebra'. Finally, I used the canny-control model canny-control to generate the image '/images/7e18.png' based on the image /examples/savanna.jpg. Therefore, there are four zebras in these pictures. Is there anything else I can help you with?  
[ User ]: Please answer all the named entities in the sentence: Iron Man is a superhero appearing in American comic books published by Marvel Comics. The character was co-created by writer and editor Stan Lee, developed by scripter Larry Lieber, and designed by artists Don Heck and Jack Kirby.  
[ Jarvis ]: Sure, I understand your request. Based on the inference results of the models, I will answer your request step by step. Firstly, I used the token-classification model dslim/bert-base-NER to recognize the named entities in the sentence. The inference result is that the named entities are Iron Man, Stan Lee, Larry Lieber, Don Heck and Jack Kirby. Then, I used the question-answering model deepset/roberta-base-squad2 to answer your request. The inference result is that there is no output since the context cannot be empty. Therefore, I cannot make it. I hope this answer your request. Is there anything else I can help you with?  
[ User ]: Please generate a canny image based on /examples/savanna.jpg  
[ Jarvis ]: Sure. I have generated a canny image based on /examples/savanna.jpg. To do this, I first used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text description of the image, which is "a herd of giraffes and zebras grazing in a field". Then I used the canny-control model to generate the canny image of the original image, which is stored in the path "/images/1ea2.png". Finally, I used the canny-text-to-image model lllyasviel/sd-controlnet-canny to generate the canny image based on the text description and the canny image, which is stored in the path "/images/ba66.png". Do you need anything else?

可以理解为ChatGPT版本的文字生成图片,外面套了个贾维斯(Jarvis)的壳儿,演出效果满分。

结语

总的来说,和微软之前开源的“可视化图形ChatGPT”一样,此类项目的象征意义要远远大于现实意义。贾维斯(Jarvis)代表的是大多数技术同仁的共同愿景,对于这类人工智能技术的发展,可以肯定,但由于硬件门槛过高的原因,短期内还不能过于期待。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/27348.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【JAVA】让 ChatGPT 来描述 IOC

前言 又迎来了一年一度的金三银四,虽然说今年的大环境不好,但是招聘还是在火热进行中。 面试过 Java 工程师的小伙伴都知道,Spring 中的 IOC 是面试高频题,面试官上来就问,知道什么是 IOC 吗,IOC 是如何初…

【工具】VScode|Linux 中怎么调试 Python 项目比较方便?又名 VScode 怎么调试 Python 项目(兼容环境Ubuntu18.04)

使用过 Anaconda、Jupyter、Pycharm、VScode、VS2022、pdb 这几个 IDE 去编写 python 项目或者维护 python 环境,各有各的优缺点,但 VScode yyds! 可能会被网上说得天花乱坠的 Python 配置项吓退,会被 VScode 各种插件介绍吓退&a…

日本僧人问道弘法寺当家师

时间过得真快,六名日本禅僧在弘法寺为期10天的体验生活已进行到第三天,晚上10:00,方丈印顺大和尚依然等候在丈室,与前二日不同的是,弘法寺的当家师智空法师也来到了方丈室。 六位日本僧人行礼完…

佛教基础知识

佛教基础知识 1、《佛教常识》(1)佛陀和佛教的创立(2)佛法的基本内容、书籍(3)僧团和佛的弟子(4)佛教在印度的发展、衰亡及复兴(5)佛教在中国的发展、演变 2、…

【ChatGPT与网络安全攻击】AI密码破解器可在60秒内攻破50%以上普通密码

研究表明,ChatGPT等功能强大AI工具已经被用于网络攻击者实施犯罪活动,例如开发恶意软件和生成钓鱼邮件等。如果人们的密码从数据库泄露或被破坏,那么网络攻击者采用AI密码破解器猜出密码是概率几乎是100%,其中50%以上会在60秒内被…

chatgpt赋能Python-pythonmd5解密

Python MD5解密原理及应用 MD5是一种广泛使用的哈希算法,被用于加密敏感数据。MD5算法使用不可逆的方法将任何长度的数据转换为固定长度的哈希值,并且只能通过暴力破解的方式破解加密后的敏感数据。尽管MD5算法被广泛采用,但历史上已发现其存…

chatgpt赋能python:Python怎么破解Windows软件?

Python怎么破解Windows软件? 作为一名有10年Python编程经验的工程师,我想分享一些破解Windows软件的经验。Python是一种高级编程语言,可以用于许多不同的应用程序,包括软件破解。 什么是软件破解? 软件破解是指绕过…

讯飞星火认知大模型与ChatGPT的对比分析

引言: 人工智能是当今科技领域的热门话题,自然语言处理是人工智能的重要分支。自然语言处理的目标是让计算机能够理解和生成自然语言,实现人机交互和智能服务。近年来,随着深度学习的发展,自然语言处理领域出现了许多创…

【Unity开发小技巧】UnityWebGL移动端和电脑端调起输入法,中文输入处理

目录 一.TextMesh Pro中文显示问题 1.PC端和移动端中文显示异常乱码(解决方案) 1.制作TextMesh Pro字体 方式一 2.制作TextMesh Pro字体 方式二 3.通用字体资源 2.web端中文不能输入窗口模式(解决方案) 二.移动端Inputfile调…

分享一个利用ChatGPT为世界上任何城市建立旅行路线(带链接)的工具 GPTravel Advisor

GPTravel Advisor - 在几秒钟内创建世界上任何城市的旅行路线 网址链接:https://gpt-travel-advisor.vercel.app/ GIthub:https://github.com/dabit3/gpt-travel-advisor ChatGPT中文论坛:https://gptocean.com/

ChatGPT 新版 API 推出 語音轉換文字模型 Whisper

OpenAI 宣布釋出新 ChatGPT API,允許第三方開發人員通過 API 將 ChatGPT 整合到他們的網站、應用程式及產品中。同時發表開源的,讓用户用以轉錄或翻譯音訊。 OpenAI 表示,新版的 ChatGPT API 不僅可用於創建人工智能聊天界面,更可…

怎么玩chatgpt?如何利用ChatGPT来编写PRD?

很多人对于chatgpt不知道怎么玩?其实对于一个产品经理来说,他可以这样玩!在产品开发过程中,产品需求文档(PRD)是一个非常重要的文档,它描述了产品的功能、特性和目标用户等信息。编写PRD需要耗费…

我群 300+人已熟练使用的 ChatGPT Prompt 技巧

这是吴恩达联合 OpenAI 官方录制的 ChatGPT Prompt 免费视频课,最后一个总结,视频: https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/2/guidelines 在这一节中,我会分享两个技巧:Transforming 和 Expanding …

利用python进行数据分析~基金分析

利用python进行基金数据分析 背景说明分析过程1.获取所有种类基金数据1.1导入相关包1.2通过天天基金网接口获取基金数据1.2.1获取网页信息1.2.2将数据转化成二维表并写入本地磁盘(dataframe) 1.3数据概览1.3.1查看前几行数据1.3.2查看各类型基金分布及可…

Tushare+Talib基金指标分析

本文介绍python语言下的两个第三方库,Tushare(获取股票和基金数据)和Ta-Lib(用于数据指标分析),及其相关使用案例。 一、安装 Tushare安装 # 方式1:pip install tushare# 如果安装网络超时可尝…

chatgpt赋能python:Python实现人机对话

Python实现人机对话 人机对话是指人类和机器之间的交互过程,其中人类作为用户,通过语音或文字与机器进行交互。Python作为一门流行的编程语言,可以用来实现人机对话系统。本文介绍如何使用Python实现人机对话。 Python语言的优势 Python是…

爬取链家网房价数据

感觉最近做的东西好菜~~随便了。 import requests from lxml import etree import csvheaders {Referer: https://zs.fang.lianjia.com/loupan/nht1pg1/,User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.1…

python爬取全国房价并分析

使用方法程序原理 数据来源数据爬取 mac下mysql安装&使用&创建库和表网站爬取分析网站数据抓取代码基础 数据分析 我相信国内房价一直都是苦逼程序员比较关注的事情,我也非常关注全国部分地区的房价,因此我最原始的需求就是想看看全国都哪些地方…

爬虫 58同城房价信息 高德的接口进行房价地图绘制

爬虫部分 // An highlighted block def main():base_url1 https://hz.58.com/xinfang/loupan/all/pbase_url2 /?PGTID0d0091a8-0004-f055-d9b7-c447c3e1c187&ClickID1date_list []for i in range(1,34):html ask_url(base_url1str(i)base_url2)bs BeautifulSoup(htm…