统计文本文件中单词频率的 Swift 与 Bash 实现详解

在这里插入图片描述

网罗开发 （小红书、快手、视频号同名）

大家好，我是展菲，目前在上市企业从事人工智能项目研发管理工作，平时热衷于分享各种编程领域的软硬技能知识以及前沿技术，包括iOS、前端、Harmony OS、Java、Python等方向。在移动端开发、鸿蒙开发、物联网、嵌入式、云原生、开源等领域有深厚造诣。

图书作者：《ESP32-C3 物联网工程开发实战》
图书作者：《SwiftUI 入门，进阶与实战》
超级个体：COC上海社区主理人
特约讲师：大学讲师，谷歌亚马逊分享嘉宾
科技博主：极星会首批签约作者

文章目录

- 摘要
- 描述
- 题解答案
- - Bash 实现
  - Swift 实现
- 题解代码分析
- - Bash 解法
  - Swift 解法
- 示例测试及结果
- 时间复杂度
- 空间复杂度
- 总结
- 未来展望
- 参考资料

摘要

本文将探讨如何统计文本文件中每个单词的出现频率，具体实现包括 Bash 脚本的经典解法和 Swift 的高效实现。我们不仅会提供完整的代码，还将逐步拆解逻辑，帮助读者理解实现细节。同时，文章会分析时间与空间复杂度，并附上运行示例及结果。

描述

写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见，你可以假设：

words.txt只包括小写字母和 ' ' 。
每个单词只由小写字母组成。
单词间由一个或多个空格字符分隔。

示例:

假设 words.txt 内容如下：

the day is sunny the the
the sunny is is

你的脚本应当输出（以词频降序排列）：

the 4
is 3
sunny 2
day 1

说明:

不要担心词频相同的单词的排序问题，每个单词出现的频率都是唯一的。
你可以使用一行 Unix pipes 实现吗？

题解答案

Bash 实现

我们可以使用一行 Unix 管道命令来高效完成统计任务：

cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -rn | awk '{print $2, $1}'

Swift 实现

我们用 Swift 提供更具可读性和扩展性的解法：

import Foundationfunc countWordFrequencies(filePath: String) {do {let content = try String(contentsOfFile: filePath)let words = content.split { $0.isWhitespace }.map { String($0) }var wordCount: [String: Int] = [:]for word in words {wordCount[word, default: 0] += 1}let sortedWordCount = wordCount.sorted { $0.value > $1.value }for (word, count) in sortedWordCount {print("\(word) \(count)")}} catch {print("Error reading file: \(error.localizedDescription)")}
}// 示例调用
let filePath = "path/to/words.txt"
countWordFrequencies(filePath: filePath)

题解代码分析

Bash 解法

cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -rn | awk '{print $2, $1}'

cat words.txt: 读取文件内容。
tr -s ' ' '\n': 将所有空格替换为换行符，从而每行一个单词。
sort: 对单词排序，方便后续统计。
uniq -c: 统计每个单词的出现次数，并输出格式为 次数单词。
sort -rn: 按次数降序排列。
awk '{print $2, $1}': 调整输出顺序为 单词次数。

Swift 解法

读取文件: 使用 String(contentsOfFile:) 读取文本内容。
分割单词: 用 split 按空格切分字符串，并将结果转换为字符串数组。
统计频率: 利用字典存储每个单词的计数，wordCount[word, default: 0] += 1 实现自动初始化与计数。
排序: 使用 sorted 按频率降序排列。
输出结果: 遍历排序后的数组并打印结果。

示例测试及结果

输入文件 words.txt:

the day is sunny the the
the sunny is is

Bash 输出:

the 4
is 3
sunny 2
day 1

Swift 输出:

the 4
is 3
sunny 2
day 1

时间复杂度

Bash 实现:
- sort: O(n log n)，其中 n 是单词总数。
- uniq -c: O(n)。
- sort -rn: O(n log n)。
- 总复杂度：O(n log n)。
Swift 实现:
- 读取与分割: O(n)。
- 统计频率: O(n)。
- 排序: O(k log k)，其中 k 是唯一单词的个数。
- 总复杂度：O(n + k log k)。