clickhouse MPPDB数据库--新特性使用示例

clickhouse 新特性:

从clickhouse 22.3至最新的版本24.3.2.23,clickhouse在快速发展中,每个版本都增加了一些新的特性,在数据写入、查询方面都有性能加速。
本文根据clickhouse blog中的clickhouse release blog中,学习并梳理了一些在实际工作中可能用到的新特性。

以下是如何基于docker,如果试用这些新性

docker run -d --name=ch -p 8123:8123 -p 9000:9000 -p 9009:9009 --ulimit nofile=262144:262144 -v D:/ch/latest/external:/external:rw -v  chlatest:/var/lib/clickhouse:rw -v D:/ch/latest/logs:/var/log/clickhouse-server:rw -v D:/ch/latest/etc/clickhouse-server:/etc/clickhouse-server:rw clickhouse/clickhouse-server:24.3.2.23docker exec -it bashclickhouse-client --format_csv_delimiter=','

transform函数

进行字典替换

transform(x, array_from, array_to, default)
transform(T, Array(T), Array(U), U) -> U
transform(x, array_from, array_to)

UK-house-price-dataset.csv

CREATE TABLE uk_price_paid
(price UInt32,date Date,postcode1 LowCardinality(String),postcode2 LowCardinality(String),type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),is_new UInt8,duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),addr1 String,addr2 String,street LowCardinality(String),locality LowCardinality(String),town LowCardinality(String),district LowCardinality(String),county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2);INSERT INTO uk_price_paid
WITHsplitByChar(' ', postcode) AS p
SELECTtoUInt32(price_string) AS price,parseDateTimeBestEffortUS(time) AS date,p[1] AS postcode1,p[2] AS postcode2,transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,b = 'Y' AS is_new,transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county
FROM file('UK-house-price-dataset.csv','CSV','uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String'
);SELECT transform(number, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'], NULL) AS numbers
FROM system.numbers
LIMIT 10

读取文件

可以自动识别文件的类型,推荐字段类型

SELECT * FROM (
WITHsplitByChar(' ', postcode) AS p
SELECTtoUInt32(price_string) AS price,parseDateTimeBestEffortUS(time) AS date,p[1] AS postcode1,p[2] AS postcode2,transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,b = 'Y' AS is_new,transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county
FROM file('UK-house-price-dataset.csv','CSV','uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String'
) SETTINGS format_csv_delimiter=','
) LIMIT 2;

自定义函数

根据需要,编写自定义函数

CREATE OR REPLACE TABLE line_changes
(version UInt32,line_change_type Enum('Add' = 1, 'Delete' = 2, 'Modify' = 3),line_number UInt32,line_content String,time datetime default now()
)
ENGINE = MergeTree
ORDER BY time;INSERT INTO default.line_changes (version,line_change_type,line_number,line_content) VALUES
(1, 'Add'   , 1, 'ClickHouse provides SQL'),
(2, 'Add'   , 2, 'with improvements'),
(3, 'Add'   , 3, 'that makes it more friendly for analytical tasks.'),
(4, 'Add'   , 2, 'with many extensions'),
(5, 'Modify', 3, 'and powerful improvements'),
(6, 'Delete', 1, ''),
(7, 'Add'   , 1, 'ClickHouse provides a superset of SQL');-- add a string (str) into an array (arr) at a specific position (pos)
CREATE OR REPLACE FUNCTION add AS (arr, pos, str) -> arrayConcat(arraySlice(arr, 1, pos-1), [str], arraySlice(arr, pos));-- delete the element at a specific position (pos) from an array (arr)
CREATE OR REPLACE FUNCTION delete AS (arr, pos) -> arrayConcat(arraySlice(arr, 1, pos-1), arraySlice(arr, pos+1));-- replace the element at a specific position (pos) in an array (arr)
CREATE OR REPLACE FUNCTION modify AS (arr, pos, str) -> arrayConcat(arraySlice(arr, 1, pos-1), [str], arraySlice(arr, pos+1));

arrayFold

SELECT arrayFold((acc, v) -> (acc + v), [10, 20, 30],  0::UInt64) AS sum;CREATE OR REPLACE VIEW text_version AS
WITH T1 AS (SELECT arrayZip(groupArray(line_change_type),groupArray(line_number),groupArray(line_content)) as line_opsFROM (SELECT * FROM line_changes WHERE version <= {version:UInt32} ORDER BY version ASC)
)
SELECT arrayJoin(arrayFold((acc, v) -> if(v.'change_type' = 'Add',       add(acc, v.'line_nr', v.'content'),if(v.'change_type' = 'Delete', delete(acc, v.'line_nr'),if(v.'change_type' = 'Modify', modify(acc, v.'line_nr', v.'content'), []))),line_ops::Array(Tuple(change_type String, line_nr UInt32, content String)),[]::Array(String))) as lines
FROM T1;SELECT * FROM text_version(version = 3);

Parallel window functions

窗口函数采用并行计算,性能大幅提升

SELECTcountry,day,max(tempAvg) AS temperature,avg(temperature) OVER (PARTITION BY country ORDER BY day ASC ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS moving_avg_temp
FROM noaa
WHERE country != ''
GROUP BYcountry,date AS day
ORDER BYcountry ASC,day ASC

FINAL

基于FINAL及enable_vertical_final,在如下引擎
ReplacingMergeTree、 AggregatingMergeTree引擎中,可以快速查询到最新的数据

SELECTpostcode1,formatReadableQuantity(avg(price))
FROM uk_property_offers FINAL
GROUP BY postcode1
ORDER BY avg(price) DESC
LIMIT 3;SELECTpostcode1,formatReadableQuantity(avg(price))
FROM uk_property_offers
GROUP BY postcode1
ORDER BY avg(price) DESC
LIMIT 3
SETTINGS enable_vertical_final = 1;

Variant Type

SET allow_experimental_variant_type=1, use_variant_as_common_type = 1;SELECTmap('Hello', 1, 'World', 'Mark') AS x,toTypeName(x) AS type
FORMAT Vertical;SELECTarrayJoin([1, true, 3.4, 'Mark']) AS value,toTypeName(value)
Row 1:
──────
x:    {'Hello':1,'World':'Mark'}
type: Map(String, Variant(String, UInt8))┌─value─┬─toTypeName(value)─────────────────────┐
1. │ true  │ Variant(Bool, Float64, String, UInt8) │
2. │ true  │ Variant(Bool, Float64, String, UInt8) │
3. │ 3.4   │ Variant(Bool, Float64, String, UInt8) │
4. │ Mark  │ Variant(Bool, Float64, String, UInt8) │└───────┴───────────────────────────────────────┘

字符相似性函数

  • byteHammingDistance: the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming.

    • karolin” and “kathrin” is 3.
    • karolin” and “kerstin” is 3.
    • kathrin” and “kerstin” is 4.
    • 0000 and 1111 is 4.
    • 2173896 and 2233796 is 3.
  • editDistance:a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to transform one string into the other.

  • damerauLevenshteinDistance: a string metric for measuring the edit distance between two sequences. Informally, the Damerau–Levenshtein distance between two words is the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one word into the other.

  • jaroWinklerSimilarity: a string metric measuring an edit distance between two sequences. It is a variant of the Jaro distance metric

  • levenshteinDistance: a string metric for measuring the edit distance between two sequences. Informally, the Damerau–Levenshtein distance between two words is the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one word into the other.

https://clickhouse.com/docs/en/sql-reference/functions/string-functions#dameraulevenshteindistance

CREATE TABLE domains
(`domain` String,`rank` Float64
)
ENGINE = MergeTree
ORDER BY domain;INSERT INTO domains SELECTc2 AS domain,1 / c1 AS rank
FROM url('domains.csv', 'CSV');SELECTdomain,levenshteinDistance(domain, 'facebook.com') AS d1,damerauLevenshteinDistance(domain, 'facebook.com') AS d2,jaroSimilarity(domain, 'facebook.com') AS d3,jaroWinklerSimilarity(domain, 'facebook.com') AS d4
FROM domains
ORDER BY d1 ASC
LIMIT 10 
Query id: 6f499f27-8274-4787-819a-b510322bdce3┌─domain────────┬─d1─┬─d2─┬─────────────────d3─┬─────────────────d4─┐1. │ facebook.com  │  0 │  0 │                  1 │                  1 │2. │ facebonk.com  │  1 │  1 │ 0.8838383838383838 │ 0.9303030303030303 │3. │ fabebook.com  │  1 │  1 │  0.914141414141414 │ 0.9313131313131312 │4. │ facabook.com  │  1 │  1 │ 0.9444444444444443 │  0.961111111111111 │5. │ facobook.com  │  1 │  1 │ 0.8535353535353535 │ 0.8974747474747474 │6. │ facebook1.com │  1 │  1 │ 0.9743589743589745 │ 0.9846153846153847 │7. │ faceook.com   │  1 │  1 │ 0.9722222222222221 │ 0.9833333333333333 │8. │ faacebook.com │  1 │  1 │ 0.9743589743589745 │ 0.9794871794871796 │9. │ faceboock.com │  1 │  1 │ 0.9326923076923077 │ 0.9596153846153846 │
10. │ facebool.com  │  1 │  1 │ 0.9444444444444443 │ 0.9666666666666666 │└───────────────┴────┴────┴────────────────────┴────────────────────┘

Vectorized distance functions

可以作为向量数据库使用,支持L2,cosineDistance,IP三种向量相似度的度量方法

https://clickhouse.com/blog/clickhouse-release-24-02

WITH 'dog' AS search_term,
(SELECT vectorFROM gloveWHERE word = search_termLIMIT 1
) AS target_vector
SELECT word, cosineDistance(vector, target_vector) AS score
FROM glove
WHERE lower(word) != lower(search_term)
ORDER BY score ASC
LIMIT 5;WITH'dog' AS search_term,(SELECT vectorFROM gloveWHERE word = search_termLIMIT 1) AS target_vector
SELECTword,1 - dotProduct(vector, target_vector) AS score
FROM glove
WHERE lower(word) != lower(search_term)
ORDER BY score ASC
LIMIT 5;

Adaptive asynchronous inserts

Asynchronous inserts shift data batching from the client side to the server side: data from insert queries is inserted into a buffer first and then written to the database storage later or asynchronously respectively.
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/299317.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

C++数据结构与算法——二叉树的修改与构造

C第二阶段——数据结构和算法&#xff0c;之前学过一点点数据结构&#xff0c;当时是基于Python来学习的&#xff0c;现在基于C查漏补缺&#xff0c;尤其是树的部分。这一部分计划一个月&#xff0c;主要利用代码随想录来学习&#xff0c;刷题使用力扣网站&#xff0c;不定时更…

Redis Desktop Manager可视化工具

可视化工具 Redis https://www.alipan.com/s/uHSbg14XmsL 提取码: 38cl 点击链接保存&#xff0c;或者复制本段内容&#xff0c;打开「阿里云盘」APP &#xff0c;无需下载极速在线查看&#xff0c;视频原画倍速播放。 官网下载&#xff08;不推荐&#xff09;&#xff1a;http…

SALESFORCE MODEL 简单记录

SALESFORCE MODEL 简单记录&#xff0c;在以下地址可以看到一些salesforce的模型资料 https://developer.salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_objects_list.htmhttps://architect.salesforce.com/diagrams#framework

2024054期传足14场胜负前瞻

2024054期售止时间为4月7日&#xff08;周日&#xff09;20点30分&#xff0c;敬请留意&#xff1a; 本期深盘多&#xff0c;1.5以下赔率3场&#xff0c;1.5-2.0赔率6场&#xff0c;其他场次是平半盘、平盘。本期14场难度中等。以下为基础盘前瞻&#xff0c;大家可根据自身判断…

Spring Cloud介绍

一、SpringCloud总体概述 Cloud Foundry Service Broker&#xff1a;通用service集成进入Cloud Foundry Cluster&#xff1a;服务集群 Consul&#xff1a;注册中心 Security&#xff1a;安全认证 Stream&#xff1a;消息队列 Stream App Starters&#xff1a;Spring Cloud Stre…

记第一次eudsrc拿到RCE(下)

目录 前言 个人介绍 挖洞公式 漏洞介绍 信息泄露漏洞 任意文件读取漏洞 远程命令执行&#xff08;RCE&#xff09;漏洞 漏洞详情 漏洞点1 漏洞点2 漏洞点3 修复建议 总结 前言 免责声明 以下漏洞均已经上报漏洞平台。请勿利用文章内的相关技术从事非法测试。若因…

基于python爬虫与数据分析系统设计

**单片机设计介绍&#xff0c;基于python爬虫与数据分析系统设计 文章目录 一 概要二、功能设计设计思路 三、 软件设计原理图 五、 程序六、 文章目录 一 概要 基于Python爬虫与数据分析系统的设计是一个结合了网络数据抓取、清洗、存储和数据分析的综合项目。这样的系统通常…

【javaWeb Maven高级】Maven高级学习

Maven高级学习 分模块设计继承与聚合继承版本锁定聚合 私服资源的上传与下载本地私服配置 分模块设计 为什么需要进行分模块设计&#xff1f; 将项目按照功能拆分成若干个子模块&#xff0c;方便项目的管理维护&#xff0c;扩展&#xff0c;也方便模块间的相互调用&#xff0c…

vue 打包 插槽 inject reactive draggable 动画 foreach pinia状态管理

在Vue项目中&#xff0c;当涉及到打包、插槽&#xff08;Slots&#xff09;、inject/reactive、draggable、transition、foreach以及pinia时&#xff0c;这些都是Vue框架的不同特性和库&#xff0c;它们各自在Vue应用中有不同的用途。下面我将逐一解释这些概念&#xff0c;并说…

vue2项目安装(使用vue-cli脚手架)

使用npm安装 安装镜像&#xff08;使npm创建项目更快&#xff09;&#xff1a;镜像可更换 npm config set registry https://registry.npmmirror.com1.全局安装vue-cli&#xff08;一次&#xff09; npm install -g vue/cli 2. 查看vue-cli 版本 vue --version 3. 创建项目…

HTTP详解及代码实现

HTTP详解及代码实现 HTTP超文本传输协议 URL简述状态码常见的状态码 请求方法请求报文响应报文HTTP常见的HeaderHTTP服务器代码 HTTP HTTP的也称为超文本传输协议。解释HTTP我们可以将其分为三个部分来解释&#xff1a;超文本&#xff0c;传输&#xff0c;协议。 超文本 加粗样…

ObjectiveC-08-OOP面向对象程序设计-类的分离与组合

本节用一简短的文章来说下是ObjectiveC中的类。类其实是OOP中的一个概念&#xff0c;概念上简单来讲类是它是一组关系密切属性的集合&#xff0c;所谓的关系就是对现实事物的抽象。 上面提到的关系包括很多种&#xff0c;比如has a&#xff0c; is a&#xff0c;has some等&…

jenkins+docker实现可持续自动化部署springboot项目

目录 一、前言 二、微服务带来的挑战 2.1 微服务有哪些问题 2.2 微服务给运维带来的挑战 三、可持续集成与交付概述 3.1 可持续集成与交付概念 3.1.1 持续集成 3.1.2 持续交付 3.1.3 可持续集成与交付核心理念 3.2 可持续集成优点 3.3 微服务为什么需要可持续集成 四…

【JVM】如何定位、解决内存泄漏和溢出

目录 1.概述 2.堆溢出、内存泄定位及解决办法 2.1.示例代码 2.2.抓堆快照 2.3.分析堆快照 1.概述 常见的几种JVM内存溢出的场景如下&#xff1a; Java堆溢出&#xff1a; 错误信息: java.lang.OutOfMemoryError: Java heap space 原因&#xff1a;Java对象实例在运行时持…

AI结合机器人的入门级仿真环境有哪些?

由于使用真实的机器人开发和测试应用程序既昂贵又费时&#xff0c;因此仿真已成为机器人应用程序开发中越来越重要的部分。在部署到机器人之前在仿真中验证应用程序可以通过尽早发现潜在问题来缩短迭代时间。通过模拟&#xff0c;还可以更轻松地测试在现实世界中可能过于危险的…

【每日刷题】Day3

【每日刷题】Day3 &#x1f955;个人主页&#xff1a;开敲&#x1f349; &#x1f525;所属专栏&#xff1a;每日刷题&#x1f34d; 目录 1. 69. x 的平方根 - 力扣&#xff08;LeetCode&#xff09; 2. 70. 爬楼梯 - 力扣&#xff08;LeetCode&#xff09; 3. 118. 杨辉三…

ZYNQ学习Linux 基础外设的使用

基本都是摘抄正点原子的文章&#xff1a;《领航者 ZYNQ 之嵌入式Linux 开发指南 V3.2.pdf》&#xff0c;因初次学习&#xff0c;仅作学习摘录之用&#xff0c;有不懂之处后续会继续更新~ 工程的创建参考&#xff1a;《ZYNQ学习之Petalinux 设计流程实战》 一、GPIO 之 LED 的使…

docker安装jenkins 2024版

docker 指令安装安装 docker run -d --restartalways \ --name jenkins -uroot -p 10340:8080 \ -p 10341:50000 \ -v /home/docker/jenkins:/var/jenkins_home \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /usr/bin/docker:/usr/bin/docker jenkins/jenkins:lts访问…

耐腐蚀耐高温实验室塑料烧杯进口高纯PFA材质反应器特氟龙烧杯

PFA烧杯在实验过程中可作为储酸容器或涉及强酸强碱类实验的反应容器&#xff0c;用于盛放样品、试剂&#xff0c;可搭配电热板加热、蒸煮、赶酸用。 外壁均有凸起刻度&#xff0c;直筒设计&#xff0c;带翻边&#xff0c;便于夹持和移动&#xff0c;边沿有嘴&#xff0c;便于倾…

深挖苹果Find My技术,伦茨科技ST17H6x芯片赋予产品功能

苹果发布AirTag发布以来&#xff0c;大家都更加注重物品的防丢&#xff0c;苹果的 Find My 就可以查找 iPhone、Mac、AirPods、Apple Watch&#xff0c;如今的Find My已经不单单可以查找苹果的设备&#xff0c;随着第三方设备的加入&#xff0c;将丰富Find My Network的版图。产…