R语言笔记(五):Apply函数

文章目录

  • 一、Apply Family
  • 二、`apply()`: rows or columns of a matrix or data frame
  • 三、Applying a custom function
  • 四、Applying a custom function "on-the-fly"
  • 五、Applying a function that takes extra arguments
  • 六、What's the return argument?
  • 七、Optimized functions for special tasks
  • 八、`lapply`: elements of a list or vector
  • 九、`sapply()`: elements of a list or vector
  • 十、`tapply()`: levels of a factor vector
  • 十一、`split()`: split by levels of a factor


一、Apply Family

R offers a family of apply functions, which allow you to apply a function across different chunks of data. Offers an alternative to explicit iteration using for() loop; can be simpler and faster, though not always. Summary of functions:

  • apply(): apply a function to rows or columns of a matrix or data frame
  • lapply(): apply a function to elements of a list or vector
  • sapply(): same as the above, but simplify the output (if possible)
  • tapply(): apply a function to levels of a factor vector

二、apply(): rows or columns of a matrix or data frame

The apply() function takes inputs of the following form:

  • apply(x, MARGIN=1, FUN=my.fun), to apply my.fun() across rows of a matrix or data frame x
  • apply(x, MARGIN=2, FUN=my.fun), to apply my.fun() across columns of a matrix or data frame x
apply(state.x77, MARGIN=2, FUN=sum) # Minimum entry in each column
## Population Income Illiteracy Life Exp Murder HS Grad
## 212321.00 221790.00 58.50 3543.93 368.90 2655.40
## Frost Area
## 5223.00 3536794.00colSums(state.x77)
## Population Income Illiteracy Life Exp Murder HS Grad
## 212321.00 221790.00 58.50 3543.93 368.90 2655.40
## Frost Area
## 5223.00 3536794.00
  • When output of the function passed to FUN is a single value, apply() output a vector across the columns/rows
apply(state.x77, MARGIN=2, FUN=which.max) # Index of the max in each column
## Population Income Illiteracy Life Exp Murder HS Grad
## 5 2 18 11 1 44
## Frost Area
## 28 2
  • When output of the function passed to FUN is a vector, apply() output a matrix across the columns/rows
apply(state.x77, MARGIN=2, FUN=summary) 

在这里插入图片描述


三、Applying a custom function

For a custom function, we can just define it before hand, and the use apply() as usual

# Our custom function: second largest value
second.max = function(v) {  sorted.v = sort(v,decreasing = T)return(sorted.v[2])
}apply(state.x77, MARGIN=2, FUN=second.max) 
## Population Income Illiteracy Life Exp Murder HS Grad
## 18076.00 5348.00 2.40 72.96 13.90 66.70
## Frost Area
## 186.00 262134.00apply(state.x77, MARGIN=2, FUN=max) 
## Population Income Illiteracy Life Exp Murder HS Grad
## 21198.0 6315.0 2.8 73.6 15.1 67.3
## Frost Area
## 188.0 566432.0

四、Applying a custom function “on-the-fly”

Instead of defining a custom function before hand, we can define it “on-the-fly”.

# Compute trimmed means, defining this on-the-fly
apply(state.x77, MARGIN=2, FUN=function(v) {  sorted.v = sort(v,decreasing = T)return(sorted.v[2])
})## Population Income Illiteracy Life Exp Murder HS Grad
## 18076.00 5348.00 2.40 72.96 13.90 66.70
## Frost Area
## 186.00 262134.00
  • When the custom function is simple, this can be more convenient
# Compute trimmed means, defining this on-the-fly
apply(state.x77, MARGIN=2, FUN=function(v) {sort(v,decreasing = T)[2]})## Population Income Illiteracy Life Exp Murder HS Grad
## 18076.00 5348.00 2.40 72.96 13.90 66.70
## Frost Area
## 186.00 262134.00

五、Applying a function that takes extra arguments

Can tell apply() to pass extra arguments to the function in question. E.g., can use: apply(x, MARGIN=1, FUN=my.fun, extra.arg.1, extra.arg.2), for two extra arguments extra.arg.1, extra.arg.2 to be passed to my.fun()

# Our custom function: trimmed mean, with user-specified percentiles
kth.max = function(v,k) {  sorted.v = sort(v,decreasing = T)return(sorted.v[k])
}apply(state.x77, MARGIN=2, FUN=kth.max, k=10)
## Population Income Illiteracy Life Exp Murder HS Grad
## 5814.00 4903.00 1.80 72.13 11.10 59.90
## Frost Area
## 155.00 96184.00

六、What’s the return argument?

What kind of data type will apply() give us? Depends on what function we pass. Summary, say, with FUN=my.fun():

  • If my.fun() returns a single value, then apply() will return a vector
  • If my.fun() returns k values, then apply() will return a matrix with k rows (note: this is true regardless of whether MARGIN=1 or MARGIN=2)
  • If my.fun() returns different length outputs for different inputs, then apply() will return a list
  • If my.fun() returns a list, then apply() will return a list

七、Optimized functions for special tasks

Don’t overuse the apply paradigm! There’s lots of special functions that optimized are will be both simpler and faster than using apply(). E.g.,

  • rowSums(), colSums(): for computing row, column sums of a matrix
  • rowMeans(), colMeans(): for computing row, column means of a matrix
  • max.col(): for finding the maximum position in each row of a matrix

Combining these functions with logical indexing and vectorized operations will enable you to do quite a lot. E.g., how to count the number of positives in each row of a matrix?

x = matrix(rnorm(9), 3, 3)
# Don't do this (much slower for big matrices)
apply(x, MARGIN=1, function(v) { return(sum(v > 0)) })
## [1] 2 2 1# Do this insted (much faster, simpler)
rowSums(x > 0)
## [1] 2 2 1

八、lapply: elements of a list or vector

The lapply() function takes inputs as in: lapply(x, FUN=my.fun), to apply my.fun() across elements of a list or vector x. The output is always a list

my.list## $nums
## [1] 0.1 0.2 0.3 0.4 0.5 0.6
##
## $chars
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l"
##
## $bools
## [1] TRUE FALSE FALSE TRUE FALSE TRUE
lapply(my.list, FUN=mean) # Get a warning: mean() can't be applied to chars
## Warning in mean.default(X[[i]], ...): argument is not numeric or
## logical: returning NA
## $nums
## [1] 0.35
##
## $chars
## [1] NA
##
## $bools
## [1] 0.5lapply(my.list, FUN=summary)
## $nums
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 0.225 0.350 0.350 0.475 0.600
##
## $chars
## Length Class Mode
## 12 character character
##
## $bools
## Mode FALSE TRUE
## logical 3 3

九、sapply(): elements of a list or vector

The sapply() function works just like lapply(), but tries to simplify the return value whenever possible. E.g., most common is the conversion from a list to a vector

sapply(my.list, FUN=mean) # Simplifies the result, now a vector
## Warning in mean.default(X[[i]], ...): argument is not numeric or
## logical: returning NA
## nums chars bools
## 0.35 NA 0.50
sapply(my.list, FUN=summary) # Can't simplify, so still a list
## $nums
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 0.225 0.350 0.350 0.475 0.600
##
## $chars
## Length Class Mode
## 12 character character
##
## $bools
## Mode FALSE TRUE
## logical 3 3

十、tapply(): levels of a factor vector

The function tapply() takes inputs as in: tapply(x, INDEX=my.index, FUN=my.fun), to apply my.fun() to subsets of entries in x that share a common level in my.index

# Compute the mean and sd of the Frost variable, within each region
tapply(state.x77[,"Frost"], INDEX=state.region, FUN=mean)
## Northeast South North Central West
## 132.7778 64.6250 138.8333 102.1538tapply(state.x77[,"Frost"], INDEX=state.region, FUN=sd)
## Northeast South North Central West
## 30.89408 31.30682 23.89307 68.87652

十一、split(): split by levels of a factor

The function split() split up the rows of a data frame by levels of a factor, as in: split(x, f=my.index) to split a data frame x according to levels of my.index

# Split up the state.x77 matrix according to region
state.by.reg = split(data.frame(state.x77), f=state.region)class(state.by.reg) # The result is a list
## [1] "list"names(state.by.reg) # This has 4 elements for the 4 regions
## [1] "Northeast" "South" "North Central" "West"class(state.by.reg[[1]]) # Each element is a data frame
## [1] "data.frame"

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/458425.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

linux开机自启动三种方式

方式一、 1:rc.local 文件 1、执行命令:编辑 “/etc/rc.local” vi /ect/rc.local 2、然后在文件最后一行添加要执行程序的全路径。 例如,每次开机时要执行一个 hello.sh,这个脚本放在 / usr 下面,那就可以在 “/et…

深入了解 Android 中的命名空间:`xmlns:tools` 和其他常见命名空间

在 Android 开发中,xmlns (.xml的namespace)命名空间是一个非常重要的概念。通过引入不同的命名空间,可以使用不同的属性来设计布局、设置工具属性或者支持自定义视图等。除了 xmlns:tools 以外,还有很多常见的命名空间…

动态IP是什么?

随着互联网成为人们生活的重要组成部分,以信息传递为主导的时代种,网络连接质量对我们的工作效率、学习进度以及娱乐体验等方面都有很大影响。 动态IP,作为网络连接中的一种重要IP代理形式,越来越受到用户的欢迎。本文将深入解析…

计算机网络-CSMA/CD协议笔记及“争用期”的理解

假设a和b是总线型网络上相距最远的两个节点。 从零这个时刻a节点会往信道上发送数据,那么a节点发送的第一个比特,需要经过τ这么长的时间,也就是经过一个单向的传播时延之后。它的这个信号才可以被最远的这个节点检测到。那如果b结点在τ这个…

以bat脚本实现自动识别盘符名称

以bat脚本实现自动识别盘符名称 引言以bat脚本实现自动识别盘符名称运行结果 引言 请听题,如何自动识别电脑盘符的名称,比如,F盘的盘符名称为office,我应该如何自动识别呢? 这里我是以bat脚本实现 以bat脚本实现自动…

平均误差ME、均方误差MSE、均方根误差RMSE、平均均方根误差ARMSE辨析

四个性能指标的定义和作用的解释 ME(k) - 平均误差(Mean Error) 公式: M E ( k ) = ( 1 / M ) ∗ Σ ( x k − x ^ k ) , m = 1 , . . . , M ME(k) = (1/M) * Σ(xk - x̂k), m = 1, ..., M ME(k)=(1/M)∗Σ(xk−

VUE3实现古典音乐网站源码模板

文章目录 1.设计来源1.1 网站首页页面1.2 古典音乐页面1.3 著名人物页面1.4 古典乐器页面1.5 历史起源页面1.6 登录页面1.7 注册页面 2.效果和源码2.1 动态效果2.2 目录结构 源码下载万套模板,程序开发,在线开发,在线沟通 作者:xc…

【Unity踩坑】UWP应用未通过Windows应用认证:API不支持

在将Unity项目导出为XAML类型的UWP项目后,通过Visual Studio打包成功,但在进行Windows应用认证时结果是Failed。 其中的错误是某些dll里用到了Windows SDK不支持的API。 本次问题中涉及到的具体dll有两个:gilzoide-sqlite-net.dll和D3D12Cor…

排序

插入排序(最有价值) 类似于摸牌 InsertSort:O(N^2);最好:O(N) 最坏情况:逆序有序 最好情况:O(N)顺序有序 比冒泡排序实际应用更高效 以下是单趟排序,实现多趟需要再嵌套一个fo…

IDEA初探:深入理解 Structure 功能

一、Structure - 类视图 Structure 是 IDEA 中的一个视图工具,它提供了对当前文件中结构元素的快速访问。通过 Structure,我们可以方便地查看和导航到代码中的各个部分,从而提高代码编辑和浏览的效率。 1.1 基本概念 Structure 视图以树形结…

数据库文档插件 screw

pom 配置 <build><plugins><plugin><groupId>cn.smallbun.screw</groupId><artifactId>screw-maven-plugin</artifactId><version>1.0.5</version><dependencies><dependency><groupId>com.zaxxer<…

高效网络自动化:Python在网络基础中的应用

高效网络自动化&#xff1a;Python在网络基础中的应用 目录 &#x1f310; TCP/IP协议与网络层次模型&#x1f4bb; 使用socket编程实现网络通信&#x1f30d; HTTP协议与RESTful API的基本概念&#x1f4e1; 使用requests库进行HTTP请求和响应处理 1. &#x1f310; TCP/IP协…

数据结构-树

目录 概念 结点分类 根结点 结点的度&#xff08;De-gree&#xff09; 树的度 结点间关系 孩子&#xff08;Child&#xff09;、双亲&#xff08;Parent&#xff09; 兄弟&#xff08;Sibing&#xff09;、堂兄弟&#xff08;Cousins&#xff09; 祖先&#xff08;anc…

VAE中的“变分”什么

写在前面 VAE&#xff08;Variational Autoencoder&#xff09;&#xff0c;中文译为变分自编码器。其中AE&#xff08;Autoencoder&#xff09;很好理解。那“变分”指的是什么呢?—其实是“变分推断”。变分推断主要用在VAE的损失函数中&#xff0c;那变分推断是什么&#x…

C++ | Leetcode C++题解之第514题自由之路

题目&#xff1a; 题解&#xff1a; class Solution { public:int findRotateSteps(string ring, string key) {int n ring.size(), m key.size();vector<int> pos[26];for (int i 0; i < n; i) {pos[ring[i] - a].push_back(i);}vector<vector<int>>…

linux指令笔记

bash命令行讲解 lyt &#xff1a;是用户名 iZbp1i65rwtrfbmjetete2b2Z :这个是主机名 ~ &#xff1a;这个是当前目录 $ &#xff1a;这个是命令行提示符 每个指令都有不同的功能&#xff0c;大部分指令都可以带上选项来实现不同的效果。 一般指令和选项的格式&#xff1a;…

Linux 重启命令全解析:深入理解与应用指南

Linux 重启命令全解析&#xff1a;深入理解与应用指南 在 Linux 系统中&#xff0c;掌握正确的重启命令是确保系统稳定运行和进行必要维护的关键技能。本文将深入解析 Linux 中常见的重启命令&#xff0c;包括功能、用法、适用场景及注意事项。 一、reboot 命令 功能简介 re…

洛谷 P3130 [USACO15DEC] Counting Haybale P

原题链接 题目本质&#xff1a;线段树 感觉我对线段树稍有敏感&#xff0c;线段树一眼就看出来了&#xff0c;思路出来得也快&#xff0c;这道题也并不是很难。 解题思路&#xff1a; 这道题能看出来是线段树就基本成功一半了&#xff0c;区间修改区间查询&#xff0c;就基…

深入探索:深度学习在时间序列预测中的强大应用与实现

引言&#xff1a; 时间序列分析是数据科学和机器学习中一个重要的研究领域&#xff0c;广泛应用于金融市场、天气预报、能源管理、交通预测、健康监控等多个领域。时间序列数据具有顺序相关性&#xff0c;通常展示出时间上较强的依赖性&#xff0c;因此简单的传统回归模型往往…

使用微信免费的内容安全识别接口,UGC场景开发检测违规内容功能

大家好&#xff0c;我是小悟。 内容安全识别主要针对的是有UGC即用户生成内容的功能场景&#xff0c;通过结合内容安全的审核能力&#xff0c;应对文本、图片、音频内容类型下的敏感内容识别、涉黄内容识别、暴恐内容识别、辱骂内容识别等违规问题&#xff0c;可以提高审核效率…