【NLP的python库(03/4) 】: 全面概述

一、说明 

        Python 对自然语言处理库有丰富的支持。从文本处理、标记化文本并确定其引理开始,到句法分析、解析文本并分配句法角色,再到语义处理,例如识别命名实体、情感分析和文档分类,一切都由至少一个库提供。那么,你从哪里开始呢?

        本文的目标是为每个核心 NLP 任务提供相关 Python 库的概述。这些库通过简要说明进行了解释,并给出了 NLP 任务的具体代码片段。继续我对 NLP 博客文章的介绍,本文仅显示用于文本处理、句法和语义分析以及文档语义等核心 NLP 任务的库。此外,在 NLP 实用程序类别中,还提供了用于语料库管理和数据集的库。

        涵盖以下库:

  • NLTK
  • TextBlob
  • Spacy
  • SciKit Learn
  • Gensim 

二、核心自然语言处理任务

2.1 文本处理

任务:标记化、词形还原、词干提取、部分标记

NLTK 库为文本处理提供了一个完整的工具包,包括标记化、词干提取和词形还原。

from nltk.tokenize import sent_tokenize, word_tokenizeparagraph = '''Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.'''sentences = []
for sent in sent_tokenize(paragraph):sentences.append(word_tokenize(sent))sentences[0]
# ['Artificial', 'intelligence', 'was', 'founded', 'as', 'an', 'academic', 'discipline'

        使用 TextBlob,支持相同的文本处理任务。它与NLTK的区别在于更高级的语义结果和易于使用的数据结构:解析句子已经生成了丰富的语义信息。

from textblob import TextBlobtext = '''
Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.
'''blob = TextBlob(text)blob.ngrams()
#[WordList(['Artificial', 'intelligence', 'was']),
# WordList(['intelligence', 'was', 'founded']),
# WordList(['was', 'founded', 'as']),blob.tokens
# WordList(['Artificial', 'intelligence', 'was', 'founded', 'as', 'an', 'academic', 'discipline', 'in', '1956', ',', 'and', 'in',

        借助现代 NLP 库 Spacy,文本处理只是主要语义任务的丰富管道中的第一步。与其他库不同,它需要首先加载目标语言的模型。最近的模型不是启发式的,而是人工神经网络,尤其是变压器,它提供了更丰富的抽象,可以更好地与其他模型相结合。

import spacy
nlp = spacy.load('en_core_web_lg')text = '''
Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.
'''doc = nlp(text)
tokens = [token for token in doc]print(tokens)
# [Artificial, intelligence, was, founded, as, an, academic, discipline

2.2 文本语法

任务:解析、词性标记、名词短语提取

        从 NLTK 开始,支持所有语法任务。它们的输出作为 Python 原生数据结构提供,并且始终可以显示为简单的文本输出。

from nltk.tokenize import word_tokenize
from nltk import pos_tag, RegexpParser# Source: Wikipedia, Artificial Intelligence, https://en.wikipedia.org/wiki/Artificial_intelligence
text = '''
Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.
'''pos_tag(word_tokenize(text))
# [('Artificial', 'JJ'),
#  ('intelligence', 'NN'),
#  ('was', 'VBD'),
#  ('founded', 'VBN'),
#  ('as', 'IN'),
#  ('an', 'DT'),
#  ('academic', 'JJ'),
#  ('discipline', 'NN'),# noun chunk parser
# source: https://www.nltk.org/book_1ed/ch07.html
grammar = "NP: {<DT>?<JJ>*<NN>}"
parser = RegexpParser(grammar)parser.parse(pos_tag(word_tokenize(text)))
#(S
#  (NP Artificial/JJ intelligence/NN)
#  was/VBD
#  founded/VBN
#  as/IN
#  (NP an/DT academic/JJ discipline/NN)
#  in/IN
#  1956/CD

文本 Blob 在处理文本时立即提供 POS 标记。使用另一种方法,创建包含丰富语法信息的解析树。

from textblob import TextBlobtext = '''
Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.
'''blob = TextBlob(text)
blob.tags
#[('Artificial', 'JJ'),
# ('intelligence', 'NN'),
# ('was', 'VBD'),
# ('founded', 'VBN'),blob.parse()
# Artificial/JJ/B-NP/O
# intelligence/NN/I-NP/O
# was/VBD/B-VP/O
# founded/VBN/I-VP/O

Spacy 库使用转换器神经网络来支持其语法任务。

import spacy
nlp = spacy.load('en_core_web_lg')for token in nlp(text):print(f'{token.text:<20}{token.pos_:>5}{token.tag_:>5}')#Artificial            ADJ   JJ
#intelligence         NOUN   NN
#was                   AUX  VBD
#founded              VERB  VBN

2.3 文本语义

任务:命名实体识别、词义消歧、语义角色标记

语义分析是NLP方法开始不同的领域。使用 NLTK 时,生成的语法信息将在字典中查找以识别例如命名实体。因此,在处理较新的文本时,可能无法识别实体。

from nltk import download as nltk_download
from nltk.tokenize import word_tokenize
from nltk import pos_tag, ne_chunknltk_download('maxent_ne_chunker')
nltk_download('words')# Source: Wikipedia, Spacecraft, https://en.wikipedia.org/wiki/Spacecraft
text = '''
As of 2016, only three nations have flown crewed spacecraft: USSR/Russia, USA, and China. The first crewed spacecraft was Vostok 1, which carried Soviet cosmonaut Yuri Gagarin into space in 1961, and completed a full Earth orbit. There were five other crewed missions which used a Vostok spacecraft. The second crewed spacecraft was named Freedom 7, and it performed a sub-orbital spaceflight in 1961 carrying American astronaut Alan Shepard to an altitude of just over 187 kilometers (116 mi). There were five other crewed missions using Mercury spacecraft.
'''pos_tag(word_tokenize(text))
# [('Artificial', 'JJ'),
#  ('intelligence', 'NN'),
#  ('was', 'VBD'),
#  ('founded', 'VBN'),
#  ('as', 'IN'),
#  ('an', 'DT'),
#  ('academic', 'JJ'),
#  ('discipline', 'NN'),# noun chunk parser
# source: https://www.nltk.org/book_1ed/ch07.html
print(ne_chunk(pos_tag(word_tokenize(text))))
# (S
#   As/IN
#   of/IN
#   [...]
#   (ORGANIZATION USA/NNP)
#   [...]
#   which/WDT
#   carried/VBD
#   (GPE Soviet/JJ)
#   cosmonaut/NN
#   (PERSON Yuri/NNP Gagarin/NNP)

Spacy 库使用的转换器模型包含一个隐式的“时间戳”:它们的训练时间。这决定了模型使用了哪些文本,因此模型能够识别哪些文本。

import spacy
nlp = spacy.load('en_core_web_lg')text = '''
As of 2016, only three nations have flown crewed spacecraft: USSR/Russia, USA, and China. The first crewed spacecraft was Vostok 1, which carried Soviet cosmonaut Yuri Gagarin into space in 1961, and completed a full Earth orbit. There were five other crewed missions which used a Vostok spacecraft. The second crewed spacecraft was named Freedom 7, and it performed a sub-orbital spaceflight in 1961 carrying American astronaut Alan Shepard to an altitude of just over 187 kilometers (116 mi). There were five other crewed missions using Mercury spacecraft.
'''doc = nlp(paragraph)
for token in doc.ents:print(f'{token.text:<25}{token.label_:<15}')# 2016                   DATE
# only three             CARDINAL
# USSR                   GPE
# Russia                 GPE
# USA                    GPE
# China                  GPE
# first                  ORDINAL
# Vostok 1               PRODUCT
# Soviet                 NORP
# Yuri Gagarin           PERSON

2.4 文档语义

任务:文本分类、主题建模、情感分析、毒性识别

情感分析也是NLP方法差异不同的任务:在词典中查找单词含义与在单词或文档向量上编码的学习单词相似性。

TextBlob 具有内置的情感分析,可返回文本中的极性(整体正面或负面内涵)和主观性(个人意见的程度)。

from textblob import TextBlobtext = '''
Artificial intelligence was founded as an academic discipline in 1956, and in the years since it has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success, and renewed funding. AI research has tried and discarded many different approaches, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge, and imitating animal behavior. In the first decades of the 21st century, highly mathematical and statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.
'''blob = TextBlob(text)
blob.sentiment
#Sentiment(polarity=0.16180290297937355, subjectivity=0.42155589508530683)

Spacy 不包含文本分类功能,但可以作为单独的管道步骤进行扩展。下面的代码很长,包含几个 Spacy 内部对象和数据结构 - 以后的文章将更详细地解释这一点。

## train single label categorization from multi-label dataset
def convert_single_label(dataset, filename):db = DocBin()nlp = spacy.load('en_core_web_lg')for index, fileid in enumerate(dataset):cat_dict = {cat: 0 for cat in dataset.categories()}cat_dict[dataset.categories(fileid).pop()] = 1doc = nlp(get_text(fileid))doc.cats = cat_dictdb.add(doc)db.to_disk(filename)## load trained model and apply to text
nlp = spacy.load('textcat_multilabel_model/model-best')text = dataset.raw(42)doc = nlp(text)estimated_cats = sorted(doc.cats.items(), key=lambda i:float(i[1]), reverse=True)print(dataset.categories(42))
# ['orange']print(estimated_cats)
# [('nzdlr', 0.998894989490509), ('money-supply', 0.9969857335090637), ... ('orange', 0.7344251871109009),

SciKit Learn 是一个通用的机器学习库,提供许多聚类和分类算法。它仅适用于数字输入,因此需要对文本进行矢量化,例如使用 GenSims 预先训练的词向量,或使用内置的特征矢量化器。仅举一个例子,这里有一个片段,用于将原始文本转换为单词向量,然后将 KMeans分类器应用于它们。

from sklearn.feature_extraction import DictVectorizer
from sklearn.cluster import KMeansvectorizer = DictVectorizer(sparse=False)
x_train = vectorizer.fit_transform(dataset['train'])kmeans = KMeans(n_clusters=8, random_state=0, n_init="auto").fit(x_train)print(kmeans.labels_.shape)
# (8551, )print(kmeans.labels_)
# [4 4 4 ... 6 6 6]

最后,Gensim是一个专门用于大规模语料库的主题分类的库。以下代码片段加载内置数据集,矢量化每个文档的令牌,并执行聚类分析算法 LDA。仅在 CPU 上运行时,这些最多可能需要 15 分钟。

# source: https://radimrehurek.com/gensim/auto_examples/tutorials/run_lda.html, https://radimrehurek.com/gensim/auto_examples/howtos/run_downloader_api.htmlimport logging
import gensim.downloader as api
from gensim.corpora import Dictionary
from gensim.models import LdaModellogging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)docs = api.load('text8')
dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]_ = dictionary[0]
id2word = dictionary.id2token# Define and train the model
model = LdaModel(corpus=corpus,id2word=id2word,chunksize=2000,alpha='auto',eta='auto',iterations=400,num_topics=10,passes=20,eval_every=None
)print(model.num_topics)
# 10print(model.top_topics(corpus)[6])
#  ([(4.201401e-06, 'done'),
#    (4.1998064e-06, 'zero'),
#    (4.1478743e-06, 'eight'),
#    (4.1257395e-06, 'one'),
#    (4.1166854e-06, 'two'),
#    (4.085097e-06, 'six'),
#    (4.080696e-06, 'language'),
#    (4.050306e-06, 'system'),
#    (4.041121e-06, 'network'),
#    (4.0385708e-06, 'internet'),
#    (4.0379923e-06, 'protocol'),
#    (4.035399e-06, 'open'),
#    (4.033435e-06, 'three'),
#    (4.0334166e-06, 'interface'),
#    (4.030141e-06, 'four'),
#    (4.0283044e-06, 'seven'),
#    (4.0163245e-06, 'no'),
#    (4.0149207e-06, 'i'),
#    (4.0072555e-06, 'object'),
#    (4.007036e-06, 'programming')],

三、公用事业

3.1 语料库管理

NLTK为JSON格式的纯文本,降价甚至Twitter提要提供语料库阅读器。它通过传递文件路径来创建,然后提供基本统计信息以及迭代器以处理所有找到的文件。

from  nltk.corpus.reader.plaintext import PlaintextCorpusReadercorpus = PlaintextCorpusReader('wikipedia_articles', r'.*\.txt')print(corpus.fileids())
# ['AI_alignment.txt', 'AI_safety.txt', 'Artificial_intelligence.txt', 'Machine_learning.txt', ...]print(len(corpus.sents()))
# 47289print(len(corpus.words()))
# 1146248

Gensim 处理文本文件以形成每个文档的词向量表示,然后可用于其主要用例主题分类。文档需要由包装遍历目录的迭代器处理,然后将语料库构建为词向量集合。但是,这种语料库表示很难外部化并与其他库重用。以下片段是上面的摘录 - 它将加载 Gensim 中包含的数据集,然后创建一个基于词向量的表示。

import gensim.downloader as api
from gensim.corpora import Dictionarydocs = api.load('text8')
dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]print('Number of unique tokens: %d' % len(dictionary))
# Number of unique tokens: 253854print('Number of documents: %d' % len(corpus))
# Number of documents: 1701

3.2 数据

NLTK提供了几个即用型数据集,例如路透社新闻摘录,欧洲议会会议记录以及古腾堡收藏的开放书籍。请参阅完整的数据集和模型列表。

from nltk.corpus import reutersprint(len(reuters.fileids()))
#10788print(reuters.categories()[:43])
# ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut', 'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', 'dmk', 'earn', 'fuel', 'gas', 'gnp', 'gold', 'grain', 'groundnut', 'groundnut-oil', 'heat', 'hog', 'housing', 'income', 'instal-debt', 'interest', 'ipi', 'iron-steel', 'jet', 'jobs', 'l-cattle', 'lead', 'lei', 'lin-oil']

SciKit Learn包括来自新闻组,房地产甚至IT入侵检测的数据集,请参阅完整列表。这是后者的快速示例。

from sklearn.datasets import fetch_20newsgroupsdataset = fetch_20newsgroups()
dataset.data[1]
# "From: guykuo@carson.u.washington.edu (Guy Kuo)\nSubject: SI Clock Poll - Final Call\nSummary: Final call for SI clock reports\nKeywords: SI,acceleration,clock,upgrade\nArticle-I.D.: shelley.1qvfo9INNc3s\nOrganization: University of Washington\nLines: 11\nNNTP-Posting-Host: carson.u.washington.edu\n\nA fair number of brave souls who upgraded their SI clock oscillator have\nshared their experiences for this poll.

四、结论

        对于 Python 中的 NLP 项目,存在大量的库选择。为了帮助您入门,本文提供了 NLP 任务驱动的概述,其中包含紧凑的库解释和代码片段。从文本处理开始,您了解了如何从文本创建标记和引理。继续语法分析,您学习了如何生成词性标签和句子的语法结构。到达语义,识别文本中的命名实体以及文本情感也可以在几行代码中解决。对于语料库管理和访问预结构化数据集的其他任务,您还看到了库示例。总而言之,本文应该让你在处理核心 NLP 任务时为下一个 NLP 项目提供一个良好的开端。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/147247.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Linux--socket编程

socket套接字编程 一、服务器和客户端的开发步骤&#xff1a; 1、创建套接字 2、为套接字添加信息&#xff08;ip地址和端口号&#xff09; 3、监听网络连接 4、监听到有客户端接入&#xff0c;接受连接&#xff08;如没有接入&#xff0c;会发生阻塞到&#xff09; 5、数据…

【机器学习】熵和概率分布,图像生成中的量化评估IS与FID

详解机器学习中的熵、条件熵、相对熵、交叉熵 图像生成中常用的量化评估指标通常有Inception Score (IS)和Frchet Inception Distance (FID) Inception Score (IS) 与 Frchet Inception Distance (FID) GAN的量化评估方法——IS和FID&#xff0c;及其pytorch代码

Unity 鼠标悬浮时文本滚动(Text Mesh Pro)

效果 直接将脚本挂载在Text Mesh Pro上&#xff0c;但是需要滚动的文本必须在Scroll View中&#xff0c;否侧会定位错误&#xff0c;还需要给Scroll View中看需求添加垂直或者水平布局的组件 代码 using System.Collections; using System.Collections.Generic; using UnityE…

思科:iOS和iOSXe软件存在漏洞

思科警告说,有人试图利用iOS软件和iOSXe软件中的一个安全缺陷,这些缺陷可能会让一个经过认证的远程攻击者在受影响的系统上实现远程代码执行。 中严重程度的脆弱性被追踪为 CVE-2023-20109 ,并以6.6分得分。它会影响启用Gdoi或G-Ikev2协议的软件的所有版本。 国际知名白帽黑客…

世界前沿技术发展报告2023《世界航天技术发展报告》(二)卫星技术

&#xff08;二&#xff09;卫星技术 1.概述2. 通信卫星2.1 美国太空发展局推进“国防太空体系架构”&#xff0c;持续部署“传输层”卫星2.2 美国军方在近地轨道成功演示验证星间激光通信2.3 DARPA启动“天基自适应通信节点”项目&#xff0c;为增强太空通信在轨互操作能力提供…

c#设计模式-结构型模式 之 组合模式

&#x1f680;简介 组合模式又名部分整体模式&#xff0c;是一种 结构型设计模式 &#xff0c;是用于把一组相似的对象当作一个 单一的对象 。组合模式 依据树形结构来组合对象 &#xff0c;用来表示部分以及整体层&#xff0c;它可以让你将对象组合成树形结构&#xff0c;并且…

【AI视野·今日CV 计算机视觉论文速览 第258期】Mon, 2 Oct 2023

AI视野今日CS.CV 计算机视觉论文速览 Mon, 2 Oct 2023 (showing first 100 of 112 entries) Totally 100 papers &#x1f449;上期速览✈更多精彩请移步主页 Interesting: &#x1f4da;HAvatar,基于神经辐射场的头部合成重建。(from 清华大学) &#x1f4da;GAIA-1, 用于自…

线上Vue项目访问其他服务器接口(宝塔平台配置解决)

前端本地解决跨域问题非常简单&#xff0c;配置代理即可&#xff0c;线上需要配置nginx&#xff0c;宝塔给我们更简单的配置方式&#xff1a;反向代理。 登录进宝塔页面&#xff0c;选择网站&#xff0c;点击网站名&#xff0c;选择反向代理 点击添加反向代理 注意&#xff…

分类预测 | Matlab实现SSA-CNN-SVM麻雀算法优化卷积支持向量机分类预测

分类预测 | Matlab实现SSA-CNN-SVM麻雀算法优化卷积支持向量机分类预测 目录 分类预测 | Matlab实现SSA-CNN-SVM麻雀算法优化卷积支持向量机分类预测分类效果基本描述程序设计参考资料 分类效果 基本描述 1.Matlab实现SSA-CNN-SVM麻雀算法优化卷积支持向量机分类预测&#xff0…

华为乾坤区县教育安全云服务解决方案(2)

本文承接&#xff1a; https://blog.csdn.net/qq_37633855/article/details/133276200?spm1001.2014.3001.5501 重点讲解华为乾坤区县教育安全云服务解决方案的部署流程。 华为乾坤区县教育安全云服务解决方案&#xff08;2&#xff09; 课程地址解决方案部署整体流程组网规划…

数字IC前端学习笔记:数字乘法器的优化设计(进位保留乘法器)

相关阅读 数字IC前端https://blog.csdn.net/weixin_45791458/category_12173698.html?spm1001.2014.3001.5482 阵列乘法器设计中限制乘法器速度的是随着数据位宽而迅速增大的串行进位链&#xff0c;如果使用进位保留加法器&#xff0c;则可以避免在设计中引入较长时间的等待&…

springmvc-JSR303进行服务端校验分组验证SpringMVC定义Restfull接口异常处理流程RestController异常处理

目录& 1. JSR303 2. JSR303中含有的注解 3. spring中使用JSR303进行服务端校验 3.1 导入依赖包 3.2 添加验证规则 3.3 执行校验 4. 分组验证 4.1 定义分组验证规则 4.2 验证时通过参数指定验证规则 4.3 验证信息的显示 5. SpringMVC定义Restfull接口 5.1 增加s…

mysql约束

约束 概念&#xff1a;约束是作用于表中字段上的规则&#xff0c;用于限制存储在表中的数据。 目的&#xff1a;保证数据的正确、有效性和完整性。 分类&#xff1a; 非空约束&#xff1a;限制该字段的数据不能为null。 not null 唯一约束&#xff1a;保证该字段所…

【面试经典150 | 矩阵】生命游戏

文章目录 写在前面Tag题目来源题目解读解题思路方法一&#xff1a; O ( m n ) O(mn) O(mn) 额外空间方法二&#xff1a; O ( 1 ) O(1) O(1) 额外空间 写在最后 写在前面 本专栏专注于分析与讲解【面试经典150】算法&#xff0c;两到三天更新一篇文章&#xff0c;欢迎催更…… 专…

计算机,软件工程,网络工程,大数据专业毕业设计选题有哪些(附源码获取)

计算机&#xff0c;软件工程&#xff0c;网络工程&#xff0c;大数据专业毕业设计选题有哪些?&#xff08;附源码获取&#xff09; ✌全网粉丝20W,csdn特邀作者、博客专家、CSDN新星计划导师、java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于J…

Java并发-满老师

Java并发 一级目录栈与栈帧线程上下文切换三级目录 一级目录 栈与栈帧 满老师视频链接 我们都知道 JVM 中由堆、栈、方法区所组成&#xff0c;其中栈内存是给谁用的呢&#xff1f;其实就是线程&#xff0c;每个线程启动后&#xff0c;虚拟机就会为其分配一块栈内存 每个栈由…

基于PYQT5的GUI开发系列教程【二】QT五个布局的介绍与运用

目录 本文概述 作者介绍 创建主窗口 水平布局 垂直布局 栅格布局 分裂器水平布局 分裂器垂直布局 自由布局 取消原先控件的布局的方法 尾言 本文概述 PYQT5是一个基于python的可视化GUI开发框架&#xff0c;具有容易上手&#xff0c;界面美观&#xff0c;多平台…

C++ 实现运算符重载

代码&#xff1a; #include <iostream> #include <cstring>using namespace std;class myString { private:char *str; //记录c风格的字符串int size; //记录字符串的实际长度 public://无参构造myString():size(10){str new char[size]; …

【学习笔记】深度学习分布式系统

深度学习分布式系统 前言1. 数据并行&#xff1a;参数服务器2. 流水线并行&#xff1a;GPipe3. 张量并行&#xff1a;Megatron LM4. 切片并行&#xff1a;ZeRO5. 异步分布式&#xff1a;PATHWAYS总结参考链接 前言 最近跟着李沐老师的视频学习了深度学习分布式系统的发展。这里…

Scrapy-应对反爬虫机制

参考自https://blog.csdn.net/y472360651/article/details/130002898 记得把BanSpider改成自己的项目名&#xff0c;还有一个细节要改一下&#xff0c;把代码user换成user_agent 禁止Cookie 在Scrapy项目中的settings文件&#xff0c;可以发现文件中有以下代码: COOKIES_ENA…