python 字符串驻留机制

偶然发现一个python字符串的现象:

>>> a = '123_abc'
>>> b = '123_abc'
>>> a is b
True
>>> c = 'abc#123'
>>> d = 'abc#123'
>>> c is d
False

这是为什么呢,原来它们的id不一样。

>>> id(a) == id(b)
True
>>> id(c) == id(d)
False

那为什么它们的地址有的相同,有的不同呢?查询后得知这是一种 Python 的字符串驻留机制。

字符串驻留机制

也称为字符串常量优化(string interning),是一种在 Python 解释器中自动进行的优化过程。它主要目的是减少内存的使用,提高程序的运行效率。

工作原理

  1. 小字符串:Python 只会对短小的字符串进行驻留。但是,这个长度并不是固定的,它可能会因 Python 的不同版本或实现而有所不同。
  2. 字符串池(String Pool):Python 解释器维护一个字符串池,用于存储所有已经出现过的字符串常量。
  3. 驻留(Interning):当解释器遇到一个新的字符串字面量时,它会首先检查这个字符串是否已经存在于字符串池中。如果存在,则直接使用池中的引用;如果不存在,就将这个字符串添加到池中,并返回这个字符串的引用。
  4. 内存节省:由于相同的字符串字面量在程序中可能被多次使用,通过字符串驻留机制,可以确保这些重复的字符串只存储一次,从而节省内存。
  5. 性能提升:字符串比较操作可以通过比较它们的引用地址来完成,这比逐字符比较要快得多。因此,字符串驻留可以提高字符串比较的性能。
  6. 自动和透明:字符串驻留是自动进行的,程序员不需要显式地进行任何操作。Python 解释器会在后台处理这一过程。
  7. 不可变类型:字符串驻留机制只适用于不可变类型,因为可变类型的对象内容可能会改变,这会使得引用地址比较失去意义。如果字符串可以修改,那么驻留机制可能会导致意外的副作用。
  8. 限制:字符串驻留机制虽然有诸多好处,但也存在一些限制。例如,如果程序中使用了大量的动态生成的字符串,那么字符串驻留可能不会带来太大的好处,因为这些字符串可能不会被重复使用。
  9. 字符串字面量:只有当字符串是字面量时,Python 才会尝试进行驻留。通过其他方式(如 str() 函数、字符串拼接等)创建的字符串通常不会被驻留。
  10. 编译时驻留:字符串驻留是在 Python 源代码编译成字节码时进行的,而不是在运行时。这意味着在运行时动态生成的字符串通常不会被驻留。

显式驻留

Python 提供了一个sys库函数 intern(),允许程序员显式地将一个字符串驻留。使用这个函数可以手动控制字符串的驻留过程:

>>> from sys import intern
>>> s = intern('abc#123')
>>> t = intern('abc#123')
>>> s is t
True
>>> s = 'abc#123'
>>> t = 'abc#123'
>>> s is t
False
>>> a = intern('abc_123')
>>> b = 'abc_123'
>>> a is b
True

字符串长短的分界

小字符串才进行驻留,具体多少长度是界线也没有细查,大概用二分法也能枚举出来。

>>> x = 'abcdefghijklmnopqrstuvwxyz'*100
>>> y = 'abcdefghijklmnopqrstuvwxyz'*100
>>> x is y
True
>>> x = 'abcdefghijklmnopqrstuvwxyz'*200
>>> y = 'abcdefghijklmnopqrstuvwxyz'*200
>>> x is y
False
>>> x = 'abcdefghijklmnopqrstuvwxyz'*150
>>> y = 'abcdefghijklmnopqrstuvwxyz'*150
>>> x is y
True
>>> x = 'abcdefghijklmnopqrstuvwxyz'*175
>>> y = 'abcdefghijklmnopqrstuvwxyz'*175
>>> x is y
False
......

附:字符串方法

(版本python 3.12)

capitalize(self, /)
    Return a capitalized version of the string.

    More specifically, make the first character have upper case and the rest lower
    case.

casefold(self, /)
    Return a version of the string suitable for caseless comparisons.

center(self, width, fillchar=' ', /)
    Return a centered string of length width.

    Padding is done using the specified fill character (default is a space).

count(...)
    S.count(sub[, start[, end]]) -> int

    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

encode(self, /, encoding='utf-8', errors='strict')
    Encode the string using the codec registered for encoding.

    encoding
      The encoding in which to encode the string.
    errors
      The error handling scheme to use for encoding errors.
      The default is 'strict' meaning that encoding errors raise a
      UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
      'xmlcharrefreplace' as well as any other name registered with
      codecs.register_error that can handle UnicodeEncodeErrors.

endswith(...)
    S.endswith(suffix[, start[, end]]) -> bool

    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.

expandtabs(self, /, tabsize=8)
    Return a copy where all tab characters are expanded using spaces.

    If tabsize is not given, a tab size of 8 characters is assumed.

find(...)
    S.find(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

format(...)
    S.format(*args, **kwargs) -> str

    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').

format_map(...)
    S.format_map(mapping) -> str

    Return a formatted version of S, using substitutions from mapping.
    The substitutions are identified by braces ('{' and '}').

index(...)
    S.index(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.

isalnum(self, /)
    Return True if the string is an alpha-numeric string, False otherwise.

    A string is alpha-numeric if all characters in the string are alpha-numeric and
    there is at least one character in the string.

isalpha(self, /)
    Return True if the string is an alphabetic string, False otherwise.

    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.

isascii(self, /)
    Return True if all characters in the string are ASCII, False otherwise.

    ASCII characters have code points in the range U+0000-U+007F.
    Empty string is ASCII too.

isdecimal(self, /)
    Return True if the string is a decimal string, False otherwise.

    A string is a decimal string if all characters in the string are decimal and
    there is at least one character in the string.

isdigit(self, /)
    Return True if the string is a digit string, False otherwise.

    A string is a digit string if all characters in the string are digits and there
    is at least one character in the string.

isidentifier(self, /)
    Return True if the string is a valid Python identifier, False otherwise.

    Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
    such as "def" or "class".

islower(self, /)
    Return True if the string is a lowercase string, False otherwise.

    A string is lowercase if all cased characters in the string are lowercase and
    there is at least one cased character in the string.

isnumeric(self, /)
    Return True if the string is a numeric string, False otherwise.

    A string is numeric if all characters in the string are numeric and there is at
    least one character in the string.

isprintable(self, /)
    Return True if the string is printable, False otherwise.

    A string is printable if all of its characters are considered printable in
    repr() or if it is empty.

isspace(self, /)
    Return True if the string is a whitespace string, False otherwise.

    A string is whitespace if all characters in the string are whitespace and there
    is at least one character in the string.

istitle(self, /)
    Return True if the string is a title-cased string, False otherwise.

    In a title-cased string, upper- and title-case characters may only
    follow uncased characters and lowercase characters only cased ones.

isupper(self, /)
    Return True if the string is an uppercase string, False otherwise.

    A string is uppercase if all cased characters in the string are uppercase and
    there is at least one cased character in the string.

join(self, iterable, /)
    Concatenate any number of strings.

    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.

    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

ljust(self, width, fillchar=' ', /)
    Return a left-justified string of length width.

    Padding is done using the specified fill character (default is a space).

lower(self, /)
    Return a copy of the string converted to lowercase.

lstrip(self, chars=None, /)
    Return a copy of the string with leading whitespace removed.

    If chars is given and not None, remove characters in chars instead.

partition(self, sep, /)
    Partition the string into three parts using the given separator.

    This will search for the separator in the string.  If the separator is found,
    returns a 3-tuple containing the part before the separator, the separator
    itself, and the part after it.

    If the separator is not found, returns a 3-tuple containing the original string
    and two empty strings.

removeprefix(self, prefix, /)
    Return a str with the given prefix string removed if present.

    If the string starts with the prefix string, return string[len(prefix):].
    Otherwise, return a copy of the original string.

removesuffix(self, suffix, /)
    Return a str with the given suffix string removed if present.

    If the string ends with the suffix string and that suffix is not empty,
    return string[:-len(suffix)]. Otherwise, return a copy of the original
    string.

replace(self, old, new, count=-1, /)
    Return a copy with all occurrences of substring old replaced by new.

      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.

    If the optional argument count is given, only the first count occurrences are
    replaced.

rfind(...)
    S.rfind(sub[, start[, end]]) -> int

    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

rindex(...)
    S.rindex(sub[, start[, end]]) -> int

    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.

rjust(self, width, fillchar=' ', /)
    Return a right-justified string of length width.

    Padding is done using the specified fill character (default is a space).

rpartition(self, sep, /)
    Partition the string into three parts using the given separator.

    This will search for the separator in the string, starting at the end. If
    the separator is found, returns a 3-tuple containing the part before the
    separator, the separator itself, and the part after it.

    If the separator is not found, returns a 3-tuple containing two empty strings
    and the original string.

rsplit(self, /, sep=None, maxsplit=-1)
    Return a list of the substrings in the string, using sep as the separator string.

      sep
        The separator used to split the string.

        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits (starting from the left).
        -1 (the default value) means no limit.

    Splitting starts at the end of the string and works to the front.

rstrip(self, chars=None, /)
    Return a copy of the string with trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.

split(self, /, sep=None, maxsplit=-1)
    Return a list of the substrings in the string, using sep as the separator string.

      sep
        The separator used to split the string.

        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits (starting from the left).
        -1 (the default value) means no limit.

    Note, str.split() is mainly useful for data that has been intentionally
    delimited.  With natural text that includes punctuation, consider using
    the regular expression module.

splitlines(self, /, keepends=False)
    Return a list of the lines in the string, breaking at line boundaries.

    Line breaks are not included in the resulting list unless keepends is given and
    true.

startswith(...)
    S.startswith(prefix[, start[, end]]) -> bool

    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.

strip(self, chars=None, /)
    Return a copy of the string with leading and trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.

swapcase(self, /)
    Convert uppercase characters to lowercase and lowercase characters to uppercase.

title(self, /)
    Return a version of the string where each word is titlecased.

    More specifically, words start with uppercased characters and all remaining
    cased characters have lower case.

translate(self, table, /)
    Replace each character in the string using the given translation table.

      table
        Translation table, which must be a mapping of Unicode ordinals to
        Unicode ordinals, strings, or None.

    The table must implement lookup/indexing via __getitem__, for instance a
    dictionary or list.  If this operation raises LookupError, the character is
    left untouched.  Characters mapped to None are deleted.

upper(self, /)
    Return a copy of the string converted to uppercase.

zfill(self, width, /)
    Pad a numeric string with zeros on the left, to fill a field of the given width.

    The string is never truncated.


目录

字符串驻留机制

工作原理

显式驻留

字符串长短的分界

附:字符串方法


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/357621.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

使用AGG里面的clip_box函数裁剪画布, 绘制裁剪后的图形

// 矩形裁剪图片, 透明 void agg_testImageClipbox_rgba32(unsigned char* buffer, unsigned int width, unsigned int height) {// 创建渲染缓冲区 agg::rendering_buffer rbuf;// BMP是上下倒置的,为了和GDI习惯相同,最后一个参数是负值。rbuf.attach…

CausalMMM:基于因果结构学习的营销组合建模

1. 摘要 在线广告中,营销组合建模(Marketing Mix Modeling,MMM) 被用于预测广告商家的总商品交易量(GMV),并帮助决策者调整各种广告渠道的预算分配。传统的基于回归技术的MMM方法在复杂营销场景…

34 - 指定日期的产品价格(高频 SQL 50 题基础版)

34 - 指定日期的产品价格 -- row_number(行号) 生成连续的序号,不考虑分数相同 -- 在2019-08-16之前改的价格,使用最近一期的日期,没有在2019-08-16之前改的价格,默认价格为10 select t.product_id, t.new_price as price from (s…

华为某员工爆料:三年前985本科起薪30万,现在硕士起薪还是30w,感慨互联网行情变化

“曾经的30万年薪,是985本科学历的‘标配’,如今硕士也只值这个价?” 一位华为员工的爆料,揭开了互联网行业薪资变化的冰山一角,也引发了不少人的焦虑:互联网人才“通货膨胀”的时代,真的结束了…

板凳-------unix 网络编程 卷1-1简介

unix网络编程进程通信 unpipc.h https://blog.csdn.net/u010527630/article/details/33814377?spm1001.2014.3001.5502 订阅专栏 1>解压源码unpv22e.tar.gz。 $tar zxvf unpv22e.tar.gz //这样源码就被解压到当前的目录下了 2>运行configure脚本,以生成正确…

【区块链】区块链架构设计:从原理到实践

🌈个人主页: 鑫宝Code 🔥热门专栏: 闲话杂谈| 炫酷HTML | JavaScript基础 ​💫个人格言: "如无必要,勿增实体" 文章目录 区块链架构设计:从原理到实践引言一、区块链基础概念1.1 区块链定义…

k8s上使用ConfigMap 和 Secret

使用ConfigMap 和 Secret 实验目标: 学习如何使用 ConfigMap 和 Secret 来管理应用的配置。 实验步骤: 创建一个 ConfigMap 存储应用配置。创建一个 Secret 存储敏感信息(如数据库密码)。在 Pod 中挂载 ConfigMap 和 Secret&am…

荣耀社招 测试工程师 技术一面

面经哥只做互联网社招面试经历分享,关注我,每日推送精选面经,面试前,先找面经哥 1、自我介绍 2、具体介绍做过的项目,支撑的事什么业务 3、防火墙测试时、平时有写脚本或者使用第三方工具吗 4、对互联网的安全测试规…

浏览器插件利器-allWebPluginV2.0.0.14-bata版发布

allWebPlugin简介 allWebPlugin中间件是一款为用户提供安全、可靠、便捷的浏览器插件服务的中间件产品,致力于将浏览器插件重新应用到所有浏览器。它将现有ActiveX插件直接嵌入浏览器,实现插件加载、界面显示、接口调用、事件回调等。支持谷歌、火狐等浏…

DataStructure.时间和空间复杂度

时间和空间复杂度 【本节目标】1. 如何衡量一个算法的好坏2. 算法效率3. 时间复杂度3.1 时间复杂度的概念3.2 大O的渐进表示法3.3 推导大O阶方法3.4 常见时间复杂度计算举例3.4.1 示例13.4.2 示例23.4.3 示例33.4.4 示例43.4.5 示例53.4.6 示例63.4.7 示例7 4.空间复杂度4.1 示…

从零开始搭建一个酷炫的个人博客

效果图 一、搭建网站 git和hexo准备 注册GitHub本地安装Git绑定GitHub并提交文件安装npm和hexo,并绑定github上的仓库注意:上述教程都是Windows系统,Mac系统会更简单! 域名准备 购买域名,买的是腾讯云域名&#xf…

神经网络实战1-Sequential

链接:https://pytorch.org/docs/1.8.1/generated/torch.nn.Sequential.html#torch.nn.Sequential 完成这样一个网络模型 第一步新建一个卷积层 self.conv1Conv2d(3,32,5)#第一步将33232输出为32通道,卷积核5*5 注意一下:输出通道数等于卷积…

JavaSE基础总结复习之面向对象の知识总结

目录 Java语言的基础特点 面向对象 类和对象 类 类的构造 一,发现类 二,发现类的共有属性(成员变量) 三,定义类的成员方法(行为,动词) 四,使用类创建对象 对象…

用TensorRT-LLM进行LLama的推理和部署

Deploy an AI Coding Assistant with NVIDIA TensorRT-LLM and NVIDIA Triton | NVIDIA Technical BlogQuick Start Guide — tensorrt_llm documentation (nvidia.github.io) 使用TensorRT-LLM的源码,来下载docker并在docker里编译TensorRT-LLM; 模型…

Eureka 服务注册与发现

目录 前言 注册中心 CAP 理论 常⻅的注册中心 CAP理论对比 Eureka 搭建 Eureka Server 引⼊ eureka-server 依赖 完善启动类 编写配置⽂件 启动服务 服务注册 引⼊ eureka-client 依赖 完善配置⽂件 启动服务 服务发现 引⼊依赖 完善配置⽂件 远程调⽤ 启动…

江协科技51单片机学习- p16 矩阵键盘

🚀write in front🚀 🔎大家好,我是黄桃罐头,希望你看完之后,能对你有所帮助,不足请指正!共同学习交流 🎁欢迎各位→点赞👍 收藏⭐️ 留言📝​…

web安全渗透测试十大常规项(一):web渗透测试之JAVA反序列化

渗透测试之PHP反序列化 1. Java反序列化1.1 Java安全-反序列化-原生序列化类函数1.1.1 原生序列化类函数:1.2 Java安全-SpringBoot框架-泄漏&CVE1. Java反序列化 1、序列化与反序列化 序列化:将内存中的对象压缩成字节流 反序列化:将字节流转化成内存中的对象2、为什么…

数据仓库和数据库有什么区别?

一、什么是数据仓库二、什么是数据库三、数据仓库和数据库有什么区别 一、什么是数据仓库 数据仓库(Data Warehouse)是一种专门用于存储和管理大量结构化数据的信息系统。它通过整合来自不同来源的数据,为企业提供统一、一致的数据视图&…

SuiNS发布子名及新命名标准,推动Web3身份结构的进步

SuiNS子名是Sui Name Service的强大扩展,最近与新命名标准一起发布。子名允许用户在一个主要的SuiNS名下创建额外的自定义身份,而无需额外费用。用户 gia 可以创建如 gaminggia 或 lendinggia 这样的子名,从而增强个人组织和支持群组与组织的…

通过Socket通信实现局域网下Amov无人机连接与数据传输

1.局域网下的通信 1.1 局域网 厂家提供的方式是通过Homer图数传工具(硬件)构建的amov局域网实现通信连接. 好处是通信距离足够长,支持150m;坏处是"局部",无法访问互联网. [IMAGE:…