源码编译llama.cpp for android

源码编译llama.cpp for android

我这有已经编译好的版本,直接下载使用:

https://github.com/turingevo/llama.cpp-build/releases/tag/b4331

准备 android-ndk

已下载:

/media/wmx/ws1/software/qtAndroid/Sdk/ndk/23.1.7779620

版本 : llama.cpp-b4331
下载源码
切换到 llama.cpp/目录

编译脚本 llama.cpp/build-android.sh


#!/bin/bashANDROID_NDK_PATH=/media/wmx/ws1/software/qtAndroid/Sdk/ndk/23.1.7779620
build_dir=build-android
src_dir=.
install_dir=bin/androidcmake \-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake \-DANDROID_ABI=arm64-v8a \-DANDROID_PLATFORM=android-28 \-DCMAKE_C_FLAGS="-march=armv8.7a" \-DCMAKE_CXX_FLAGS="-march=armv8.7a" \-DGGML_OPENMP=OFF \-DGGML_LLAMAFILE=OFF \-B ${build_dir} \-S ${src_dir}cmake --build ${build_dir} --config Release -j48cmake --install ${build_dir} --prefix ${install_dir} --config Release

push 到android设备测试

下面是 华为mate40pro 上的测试结果

build llama.cpp/bin/android


adb shell "mkdir /data/local/tmp/llama.cpp"
adb push bin/android /data/local/tmp/llama.cpp/
adb push qwen2.5-0.5b-instruct-q4_k_m.gguf /data/local/tmp/llama.cpp/adb shell
cd /data/local/tmp/llama.cpp/androidtouch test.sh
chmod a+x test.sh
cat " LD_LIBRARY_PATH=lib ./bin/llama-simple -m qwen2.5-0.5b-instruct-q4_k_m.gguf -p \"你是谁?\"  "  > test.sh./test.shHWNOH:/data/local/tmp/llama.cpp/android $ ./test.sh                                                                                           
llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /sdcard/a-wmx/models/qwen2.5-0.5b-instruct-q4_k_m.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = qwen2.5-0.5b-instruct
llama_model_loader: - kv   3:                            general.version str              = v0.1
llama_model_loader: - kv   4:                           general.finetune str              = qwen2.5-0.5b-instruct
llama_model_loader: - kv   5:                         general.size_label str              = 630M
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 15
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  133 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG
llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG
llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG
llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG
llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG
llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 151936
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 896
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_head           = 14
llm_load_print_meta: n_head_kv        = 2
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 128
llm_load_print_meta: n_embd_v_gqa     = 128
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 4864
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 630.17 M
llm_load_print_meta: model size       = 462.96 MiB (6.16 BPW) 
llm_load_print_meta: general.name     = qwen2.5-0.5b-instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: tensor 'token_embd.weight' (q5_0) (and 290 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
llm_load_tensors:   CPU_Mapped model buffer size =   462.96 MiB
.....................................................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 64
llama_new_context_with_model: n_ctx_per_seq = 64
llama_new_context_with_model: n_batch       = 32
llama_new_context_with_model: n_ubatch      = 32
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 1000000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (64) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_kv_cache_init:        CPU KV buffer size =     0.75 MiB
llama_new_context_with_model: KV self size  =    0.75 MiB, K (f16):    0.38 MiB, V (f16):    0.38 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.58 MiB
llama_new_context_with_model:        CPU compute buffer size =    18.66 MiB
llama_new_context_with_model: graph nodes  = 846
llama_new_context_with_model: graph splits = 1
-p 你是谁?我是阿里云开发的超大规模语言模型,我叫通义千问。通义是“通义天下”,千问是“千问天下
main: decoded 32 tokens in 2.21 s, speed: 14.49 t/sllama_perf_sampler_print:    sampling time =       5.69 ms /    32 runs   (    0.18 ms per token,  5622.91 tokens per second)
llama_perf_context_print:        load time =    1907.15 ms
llama_perf_context_print: prompt eval time =     165.11 ms /     5 tokens (   33.02 ms per token,    30.28 tokens per second)
llama_perf_context_print:        eval time =    2000.08 ms /    31 runs   (   64.52 ms per token,    15.50 tokens per second)
llama_perf_context_print:       total time =    3950.19 ms /    36 tokens

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/493018.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

笔记本电脑需要一直插着电源吗?电脑一直充电的利弊介绍

笔记本电脑属于常用电子设备&#xff0c;它的便携性和功能性给我们带来了很多便利。但是&#xff0c;我们在使用笔记本电脑的时候&#xff0c;是否应该一直插着电源呢&#xff1f;这个问题可能困扰了很多人&#xff0c;因为不同的使用方式可能会对笔记本电脑的性能和寿命产生不…

深入理解延迟队列:原理、实现与应用

深入理解延迟队列&#xff1a;原理、实现与应用 1. 什么是延迟队列 延迟队列&#xff08;Delayed Queue&#xff09;是一种特殊的队列&#xff0c;它的特点是队列中的元素需要在指定的时间后才能被消费者获取和处理。与普通的先进先出&#xff08;FIFO&#xff09;队列不同&a…

内容与资讯API优质清单

作为开发者&#xff0c;拥有一套API合集是必不可少的。这个开发者必备的API合集汇集了各种实用的API资源&#xff0c;为你的开发工作提供了强大的支持&#xff01;无论你是在构建网站、开发应用还是进行数据分析&#xff0c;这个合集都能满足你的需求。你可以通过这些免费API获…

jQuery总结(思维导图+二维表+问题)

关于什么是jQuery&#xff1a;&#xff08;下面是菜鸟里的介绍&#xff09; jQuery 是一个 JavaScript 库。 jQuery 极大地简化了 JavaScript 编程。 jQuery 很容易学习。 而jQuery对我的感受就是&#xff0c;链式运用的很形象&#xff0c;隐式迭代还有一些兼容性强的优点&…

python数据分析:介绍pandas库的数据类型Series和DataFrame

安装pandas pip install pandas -i https://mirrors.aliyun.com/pypi/simple/ 使用pandas 直接导入即可 import pandas as pd pandas的数据结构 pandas提供了两种主要的数据结构&#xff1a;Series 和 DataFrame,类似于python提供list列表&#xff0c;dict字典&#xff0c;…

安装opnet14.5遇到的问题

安装opnet遇到的问题 我是按照这个教程来安装的。 然后遇到了两个问题&#xff1a; 1、“mod_dirs”目录问题 Can’t enable ETS scripting support due to missing files。 This is likely because:<opnet_release_dir>\sys\lib is notinclude in the “mod_dirs” pre…

SLAAC如何工作?

SLAAC如何工作&#xff1f; IPv6无状态地址自动配置(SLAAC)-常见问题 - 苍然满关中 - 博客园 https://support.huawei.com/enterprise/zh/doc/EDOC1100323788?sectionj00shttps://www.zhihu.com/question/6691553243/answer/57023796400 主机在启动或接口UP后&#xff0c;发…

6.3.1 MR实战:计算总分与平均分

在本次实战中&#xff0c;我们的目标是利用Apache Hadoop的MapReduce框架来处理和分析学生成绩数据。具体来说&#xff0c;我们将计算一个包含五名学生五门科目成绩的数据集的总分和平均分。这个过程包括在云主机上准备数据&#xff0c;将成绩数据存储为文本文件&#xff0c;并…

空天地遥感数据识别与计算--数据分析如何助力农林牧渔、城市发展、地质灾害监测等行业革新

在科技飞速发展的时代&#xff0c;遥感数据的精准分析已经成为推动各行业智能决策的关键工具。从无人机监测农田到卫星数据支持气候研究&#xff0c;空天地遥感数据正以前所未有的方式为科研和商业带来深刻变革。然而&#xff0c;对于许多专业人士而言&#xff0c;如何高效地处…

基于langchain的Agent(实现实时查询天气)

心血来潮&#xff0c;玩一下Agent&#xff0c;实现了多轮对话功能 import requests, jsonfrom langchain.agents import load_tools from langchain.agents import initialize_agent from langchain_community.llms.tongyi import Tongyi from langchain.memory import Conver…

《剑网三》遇到找不到d3dx9_42.dll的问题要怎么解决?缺失d3dx9_42.dll是什么原因?

《剑网三》游戏运行中d3dx9_42.dll缺失问题深度解析与解决方案 在畅游《剑网三》的武侠世界时&#xff0c;不少玩家可能会遇到系统提示“找不到d3dx9_42.dll”的报错信息。这一突如其来的问题不仅让游戏进程受阻&#xff0c;还可能让玩家陷入困惑与无奈。我将为大家深入剖析这…

springboot443旅游管理系统(论文+源码)_kaic

摘 要 如今社会上各行各业&#xff0c;都喜欢用自己行业的专属软件工作&#xff0c;互联网发展到这个时候&#xff0c;人们已经发现离不开了互联网。新技术的产生&#xff0c;往往能解决一些老技术的弊端问题。因为传统旅游管理系统信息管理难度大&#xff0c;容错率低&#…

OneCode:开启高效编程新时代——企业定制出码手册

一、概述 OneCode 的 DSM&#xff08;领域特定建模&#xff09;出码模块是一个强大的工具&#xff0c;它支持多种建模方式&#xff0c;并具有强大的模型转换与集成能力&#xff0c;能够提升开发效率和代码质量&#xff0c;同时方便团队协作与知识传承&#xff0c;还具备方便的仿…

OpenCV(python)从入门到精通——运算操作

加法减法操作 import cv2 as cv import numpy as npx np.uint8([250]) y np.uint8([10])x_1 np.uint8([10]) y_1 np.uint8([20])# 加法,相加最大只能为255 print(cv.add(x,y))# 减法&#xff0c;相互减最小值只能为0 print(cv.subtract(x_1,y_1))图像加法 import cv2 as…

git 删除鉴权缓存及账号信息

在Windows系统下 清除凭证管理器中的Git凭据 按下Win R键&#xff0c;打开“运行”对话框&#xff0c;输入control&#xff0c;然后回车&#xff0c;打开控制面板。在控制面板中找到“用户账户”&#xff0c;然后点击“凭据管理器”。在凭据管理器中&#xff0c;找到“Windows…

【Linux进程】进程间的通信

目录 1. 进程间通信 1.1 进程间通信的目的 2. 管道 2.1 什么是管道 2.2. 匿名管道 匿名管道的特性 管道的4种情况 联系shell中的管道 2.3. 命名管道 代码级建立命名管道 2.4. 小结 总结 1. 进程间通信 进程间通信&#xff08;Inter-Process Communication&#xff0c;IPC&…

leecode494.目标和

这道题目第一眼感觉就不像是动态规划&#xff0c;可以看出来是回溯问题&#xff0c;但是暴力回溯超时&#xff0c;想要用动态规划得进行一点数学转换 class Solution { public:int findTargetSumWays(vector<int>& nums, int target) {int nnums.size(),bagWeight0,s…

会话守护进程

会话&&守护进程 文章目录 会话&&守护进程1.会话1.概念和特性2.创建会话3.getsid和setsid函数getsid函数setsid 函数 4.代码 2.守护进程3.创建守护进程模型守护进程创建步骤&#xff1a;两个函数 完整代码&#xff1a; 1.会话 1.概念和特性 进程组&#xff0c…

学习反射(反射的使用,反射的应用场景)

目录 反射的使用 总的测试代码如下 反射的应用场景 反射的使用 大家先看一个案例 有一个person 类 属性有 String 类型的 name ,int age &#xff0c;还有一个 方法 a。 package fs;public class Person {private String name;private int age;public void a(){System.out.p…

在ESP32使用AT指令集与服务器进行TCP/IP通信时,<link ID> 解释

在ESP32使用AT指令集与服务器进行TCP/IP通信时&#xff0c;<link ID> 是一个非常重要的参数。它用于标识不同的连接实例&#xff0c;特别是在多连接场景下&#xff08;如同时建立多个TCP或UDP连接&#xff09;。每个连接都有唯一的<link ID>&#xff0c;通过这个ID…