机器学习中基本的数据结构说明

数据维度或数据结构

当我们在机器学习或深度学习的领域内处理数据，我们通常会遇到四种主要的数据结构：标量，向量，矩阵和张量。理解这些基本数据结构是非常重要的，因为它们是机器学习算法和神经网络的核心。下面是对这些概念的解释：

标量：在机器学习中，一个标量是一个单一的数量，或者说是一个单一的实数。例如，一个学习算法的学习率（learning rate）就是一个标量。
向量：向量就是一列有序的数。在机器学习中，我们经常会把样本的特征放在一个向量中。例如，假如我们有一个1000个样本的数据集，其中每个样本有10个特征，那么我们可以把这个数据集表示为一个1000x10的矩阵，其中每一行就是一个有10个元素的向量，这个向量就表示一个样本。
矩阵：矩阵是具有相同特性的对象的一个二维数组。在机器学习中，我们一般会把一个数据集表示为一个矩阵。每一行代表一个样本（例子），每一列代表一个特征。
张量：当我们需要处理的数据的维度超过2时，就需要用到张量了。矩阵是二维的，而张量则可以是任意维度。例如，我们用卷积神经网络（Convolutional Neural Network，CNN）处理图片时，一张图片通常由三个彩色通道（红，绿，蓝）构成，每个通道都是一个二维数组（矩阵），因此一张图片可以表示为一个3维的张量。

在深度学习中，我们常常需要处理四维张量，比如在处理一批训练样本时，我们会把它们放在一个四维张量中。
这四个维度分别是：样本数，通道数，图片高度，图片宽度。

Scalar : In machine learning, a scalar is a single quantity or a real number. For example, the learning rate of a learning algorithm is a scalar.
Vector : A vector is an ordered list of numbers. In machine learning, we frequently put the features of a sample into a vector. For example, if we have a dataset with 1000 samples, each having 10 features, we can represent this dataset as a 1000x10 matrix, where each row is a vector with 10 elements, representing a sample.
Matrix : A matrix is a two-dimensional array of objects with the same type. In machine learning, we typically represent a dataset as a matrix. Each row represents a sample (instance), and each column represents a feature.
Tensor : When we need to handle data with more than two dimensions, we use tensors. A matrix is two-dimensional, while a tensor can be of any dimension. For example, when we use convolutional neural networks (CNN) to process images, a picture is usually composed of three color channels (red, green, blue), each being a two-dimensional array (matrix), so a picture can be represented as a three-dimensional tensor.