1. 安装前的准备工作
1.1 更新系统并安装必要的工具
在终端中运行以下命令:
sudo apt-get update
sudo apt-get install -y ssh rsync curl
1.2 安装 Java
如果系统中没有安装 Java,可以通过以下命令安装 OpenJDK:
sudo apt-get install -y openjdk-8-jdk
验证 Java 是否安装成功:
java -version
1.3 配置 SSH 无密码登录
Hadoop 需要通过 SSH 进行节点间通信。首先,生成 SSH 密钥对:
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
然后,将公钥添加到授权密钥列表中:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
测试无密码登录:
ssh localhost
2. 下载和安装 Hadoop
2.1 下载 Hadoop
访问 Apache Hadoop 的官方网站并下载最新的稳定版本。你可以使用 curl
命令下载:
curl -O https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
2.2 解压 Hadoop
下载完成后,解压 Hadoop 压缩包:
tar -xzvf hadoop-3.3.6.tar.gz
2.3 配置环境变量
编辑 ~/.bashrc
(当前用户目录下,也就是用户的目录)添加 Hadoop 的环境变量:
export HADOOP_HOME=/home/hdfs/hadoop-3.3.6
export HADOOP_HDFS_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
保存文件并刷新环境变量:
source ~/.bashrc
3. 配置 Hadoop
3.1 配置 core-site.xml
编辑 $HADOOP_HOME/etc/hadoop/core-site.xml
文件,设置默认文件系统为 HDFS:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></property>
</configuration>
3.2 配置 hdfs-site.xml
编辑 $HADOOP_HOME/etc/hadoop/
hdfs-site.xml`,设置远程访问地址,以及namenode。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><!-- 设置数据副本数 --><property><name>dfs.replication</name><value>1</value></property><!-- NameNode的存储目录 --><property><name>dfs.namenode.name.dir</name><value>file:///home/hdfs/hadoop-3.3.6/namenode</value></property><!-- NameNode的RPC地址 --><property><name>dfs.namenode.rpc-address</name><value>0.0.0.0:9000</value></property><!-- DataNode的存储目录 --><property><name>dfs.datanode.data.dir</name><value>file:///home/hdfs/hadoop-3.3.6/datanode</value></property>
</configuration>
3.1 确定 Java 安装目录
sudo update-alternatives --config java
3.4 配置 JAVA_HOME
编辑 $HADOOP_HOME/etc/hadoop/hadoop-env.sh文件,配置 JAVA_HOME
设置:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
4. 启动 HDFS
4.1 格式化 NameNode
在首次启动 HDFS 之前,需要格式化 NameNode:
hdfs namenode -format
4.2 启动 HDFS
启动 NameNode 和 DataNode 服务:
start-dfs.sh
你可以使用以下命令检查进程是否启动成功:
jps
正常情况下,你应该会看到 NameNode
和 DataNode
进程在运行。
结束进程:
kill -9 pid #可以先通过jps查看进程,再杀掉
4.3 验证 HDFS
你可以通过浏览器访问 NameNode 的 Web UI,地址是:
http://192.168.186.77:9870
4.4 查看数据节点状态
hdfs dfsadmin -report
5. 项目结构
5.1 pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>3.3.2</version><relativePath/> <!-- lookup parent from repository --></parent><groupId>org.example</groupId><artifactId>hdfs_hadoop</artifactId><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.3.6</version></dependency></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><configuration><excludes><exclude><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></exclude></excludes></configuration></plugin></plugins></build>
</project>
5.2 HdfsHadoopApplication.java
package org.example;import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;@SpringBootApplication
public class HdfsHadoopApplication {public static void main(String[] args) {SpringApplication.run(HdfsHadoopApplication.class, args);}
}
5.3 HDFSService.java
package org.example.service;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.example.model.SimpleFileStatusDTO;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import java.util.*;@Service
public class HDFSService {private static final String HDFS_URI = "hdfs://192.168.186.77:9000";private static final String BASE_DIR = "/home"; // HDFS 上的基本目录private final FileSystem fileSystem;// 构造函数,初始化 FileSystempublic HDFSService() throws IOException, InterruptedException {Configuration configuration = new Configuration();configuration.set("fs.defaultFS", HDFS_URI);// 设置环境变量,指定 HDFS 用户System.setProperty("HADOOP_USER_NAME", "liber");// 初始化 FileSystemthis.fileSystem = FileSystem.get(URI.create(HDFS_URI), configuration);}// 1. 上传文件到 HDFSpublic void uploadFile(MultipartFile file, String subDirectory) throws IOException {// 生成新的文件名,避免重名冲突String originalFilename = file.getOriginalFilename();String newFilename = UUID.randomUUID() + "_" + originalFilename;// 目标目录路径String targetDirectory = BASE_DIR + (subDirectory.startsWith("/") ? subDirectory : "/" + subDirectory);Path directoryPath = new Path(targetDirectory);// 如果目录不存在,创建目录if (!fileSystem.exists(directoryPath)) {fileSystem.mkdirs(directoryPath);}// 目标文件路径Path destinationPath = new Path(targetDirectory + "/" + newFilename);// 上传文件try (FSDataOutputStream outputStream = fileSystem.create(destinationPath)) {outputStream.write(file.getBytes());}}// 2. 删除文件或目录public void deleteFile(String hdfsPath) throws IOException {fileSystem.delete(new Path(BASE_DIR+"/"+hdfsPath), true);}// 3. 列出目录内容public Map<String,Object> listFiles(String subDirectory) throws IOException {String directoryPath = BASE_DIR + (subDirectory.startsWith("/") ? subDirectory : "/" + subDirectory);FileStatus[] fileStatuses = fileSystem.listStatus(new Path(directoryPath));List<SimpleFileStatusDTO> fileStatusDTOList = new ArrayList<>();for (FileStatus fileStatus : fileStatuses) {fileStatusDTOList.add(new SimpleFileStatusDTO(fileStatus));}Map<String,Object> map=new HashMap<>();map.put("basePath", subDirectory);map.put("files", fileStatusDTOList);return map;}// 4. 创建目录public void createDirectory(String subDirectory) throws IOException {String targetDirectory = BASE_DIR + (subDirectory.startsWith("/") ? subDirectory : "/" + subDirectory);Path path = new Path(targetDirectory);if (!fileSystem.exists(path)) {fileSystem.mkdirs(path);} else {throw new IOException("Directory already exists: " + targetDirectory);}}// 5. 下载文件public InputStream readFileAsStream(String hdfsFilePath) throws IOException {Path path = new Path(BASE_DIR+hdfsFilePath);return fileSystem.open(path);}// 6. 重命名文件或目录public void rename(String sourceSubDirectory, String destSubDirectory) throws IOException {String sourcePath = BASE_DIR + (sourceSubDirectory.startsWith("/") ? sourceSubDirectory : "/" + sourceSubDirectory);String destPath = BASE_DIR + (destSubDirectory.startsWith("/") ? destSubDirectory : "/" + destSubDirectory);Path src = new Path(sourcePath);Path dst = new Path(destPath);if (!fileSystem.rename(src, dst)) {throw new IOException("Failed to rename: " + sourcePath + " to " + destPath);}}
}
5.4 SimpleFileStatusDTO.java
package org.example.model;import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.hadoop.fs.FileStatus;@Data
@NoArgsConstructor
public class SimpleFileStatusDTO {private String pathSuffix;private long length;private boolean isDirectory;public SimpleFileStatusDTO(FileStatus fileStatus) {String pathSuffix = fileStatus.getPath().toString();this.pathSuffix = pathSuffix.substring(pathSuffix.lastIndexOf("/")+1);this.length = fileStatus.getLen();this.isDirectory = fileStatus.isDirectory();}
}
5.5 HDFSController.java
package org.example.controller;import org.example.service.HDFSService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.InputStreamResource;
import org.springframework.http.HttpHeaders;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;import java.io.IOException;
import java.util.Map;@RestController
@RequestMapping("/hdfs")
public class HDFSController {private final HDFSService hdfsService;@Autowiredpublic HDFSController(HDFSService hdfsService) {this.hdfsService = hdfsService;}// 1. 上传文件@PostMapping("/upload")public ResponseEntity<String> uploadFile(@RequestParam("file") MultipartFile file,@RequestParam("hdfsDirectory") String hdfsDirectory) {try {hdfsService.uploadFile(file, hdfsDirectory);return ResponseEntity.ok("上传成功");} catch (IOException e) {return ResponseEntity.status(500).body(null);}}// 2. 下载文件@GetMapping("/download")public ResponseEntity<InputStreamResource> downloadFile(@RequestParam("hdfsFilePath") String hdfsFilePath) {try {String filename = hdfsFilePath.substring(hdfsFilePath.lastIndexOf("/") + 1);InputStreamResource resource = new InputStreamResource(hdfsService.readFileAsStream(hdfsFilePath));return ResponseEntity.ok().header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"" + filename + "\"").body(resource);} catch (IOException e) {return ResponseEntity.status(500).body(null);}}// 3. 删除文件或目录@DeleteMapping("/delete")public ResponseEntity<String> deleteFile(@RequestParam("hdfsPath") String hdfsPath) {try {hdfsService.deleteFile(hdfsPath);return ResponseEntity.ok("File deleted successfully");} catch (IOException e) {return ResponseEntity.status(500).body("Failed to delete file: " + e.getMessage());}}// 4. 列出目录内容@GetMapping("/list")public ResponseEntity<Map<String, Object>> listFiles(@RequestParam("directoryPath") String directoryPath) {try {Map<String, Object> files = hdfsService.listFiles(directoryPath);return ResponseEntity.ok(files);} catch (IOException e) {return ResponseEntity.status(500).body(null);}}// 5. 创建目录@PostMapping("/mkdir")public ResponseEntity<String> createDirectory(@RequestParam("directoryPath") String directoryPath) {try {hdfsService.createDirectory(directoryPath);return ResponseEntity.ok("Directory created successfully");} catch (IOException e) {return ResponseEntity.status(500).body("Failed to create directory: " + e.getMessage());}}// 6. 重命名文件或目录@PostMapping("/rename")public ResponseEntity<String> rename(@RequestParam("sourcePath") String sourcePath,@RequestParam("destPath") String destPath) {try {hdfsService.rename(sourcePath, destPath);return ResponseEntity.ok("File renamed successfully");} catch (IOException e) {return ResponseEntity.status(500).body("Failed to rename file: " + e.getMessage());}}
}
5.6 application.yml
spring:application:name: hdfs_hadoopservlet:multipart:max-file-size: 1024MBmax-request-size: 1024MB
5.7 index.html
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>HDFS 文件管理</title><!-- Vue.js CDN --><script src="https://cdn.jsdelivr.net/npm/vue@2"></script><!-- Axios CDN 用于 HTTP 请求 --><script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script><!-- Bootstrap CDN 用于样式 --><link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet"><style>.current-path {font-weight: bold;font-size: 1.2em;margin-bottom: 15px;}.go-up-item {background-color: #f8f9fa;cursor: pointer;}.go-up-item:hover {background-color: #e2e6ea;}.file-item {cursor: pointer;}.file-item:hover {background-color: #f1f3f5;}.btn-icon {background-color: transparent;border: none;color: #007bff;cursor: pointer;padding: 0.2rem;}.btn-icon:hover {color: #0056b3;}.form-inline {display: flex;align-items: center;gap: 10px;margin-bottom: 15px;}.form-inline input {flex: 1;}</style>
</head>
<body>
<div id="app" class="container mt-5"><h1 class="mb-4">HDFS 文件管理</h1><!-- 目录列表、创建目录和上传文件 --><div class="mb-3"><h4>管理目录</h4><div class="current-path"><span>📁 {{ currentPath }}</span></div><!-- 创建目录的内联表单 --><div class="form-inline"><input type="text" v-model="newDirectoryPath" placeholder="新目录名称" class="form-control"><button @click="createDirectory" class="btn btn-info">创建目录</button><button @click="showUploadDialog" class="btn btn-primary ms-2">上传文件</button></div><ul class="list-group"><li v-if="currentPath !== '/'" @click="goUpOneLevel" class="list-group-item go-up-item"><strong>🔙 返回上一级</strong></li><li v-for="file in files" :key="file.pathSuffix" class="list-group-item d-flex justify-content-between align-items-center file-item"><div @click="file.directory ? onDirectoryClick(file) : null"><span v-if="file.directory">📁</span><span v-else>📄</span>{{ file.pathSuffix }}<!-- 当是文件时显示文件大小 --><span v-if="!file.directory" class="text-muted">({{ formatFileSize(file.length) }})</span></div><div><button @click="showRenameDialog(file)" class="btn-icon"><span>✏️</span></button><button v-if="file.directory" @click="deleteFile(currentPath + '/' + file.pathSuffix)" class="btn-icon"><span>🗑️</span></button><button v-if="!file.directory" @click="downloadFile(currentPath + '/' + file.pathSuffix)" class="btn-icon"><span>⬇️</span></button><button v-if="!file.directory" @click="deleteFile(currentPath + '/' + file.pathSuffix)" class="btn-icon"><span>🗑️</span></button></div></li></ul></div><!-- 上传文件的模态框 --><div class="modal" tabindex="-1" role="dialog" id="uploadModal"><div class="modal-dialog" role="document"><div class="modal-content"><div class="modal-header"><h5 class="modal-title">上传文件</h5><button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button></div><div class="modal-body"><input type="file" @change="onFileChange" class="form-control"></div><div class="modal-footer"><button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button><button type="button" class="btn btn-primary" @click="handleUpload">上传</button></div></div></div></div><!-- 重命名的模态框 --><div class="modal" tabindex="-1" role="dialog" id="renameModal"><div class="modal-dialog" role="document"><div class="modal-content"><div class="modal-header"><h5 class="modal-title">重命名</h5><button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button></div><div class="modal-body"><input type="text" v-model="renameNewName" class="form-control"></div><div class="modal-footer"><button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button><button type="button" class="btn btn-primary" @click="handleRename">重命名</button></div></div></div></div>
</div><script>new Vue({el: '#app',data: {uploadFile: null,currentPath: '/', // 当前目录路径newDirectoryPath: '',files: [],renameFile: null, // 需要重命名的文件或目录renameNewName: '', // 新名称},methods: {// 处理文件选择onFileChange(event) {this.uploadFile = event.target.files[0];},// 显示上传模态框showUploadDialog() {const modal = new bootstrap.Modal(document.getElementById('uploadModal'));modal.show();},// 显示重命名模态框showRenameDialog(file) {this.renameFile = file;this.renameNewName = file.pathSuffix;const modal = new bootstrap.Modal(document.getElementById('renameModal'));modal.show();},// 上传文件async handleUpload() {try {const formData = new FormData();formData.append('file', this.uploadFile);formData.append('hdfsDirectory', this.currentPath);await axios.post('/hdfs/upload', formData, {headers: {'Content-Type': 'multipart/form-data'}});this.listFiles(); // 上传后刷新文件列表const modal = bootstrap.Modal.getInstance(document.getElementById('uploadModal'));modal.hide(); // 上传后隐藏模态框} catch (error) {console.error('上传文件时出错:', error);}},// 重命名文件或目录async handleRename() {try {const sourcePath = this.currentPath + '/' + this.renameFile.pathSuffix;const destPath = this.currentPath + '/' + this.renameNewName;await axios.post('/hdfs/rename', null, {params: { sourcePath, destPath }});this.listFiles(); // 重命名后刷新文件列表const modal = bootstrap.Modal.getInstance(document.getElementById('renameModal'));modal.hide(); // 重命名后隐藏模态框} catch (error) {console.error('重命名文件或目录时出错:', error);}},// 列出目录中的文件async listFiles() {try {const response = await axios.get('/hdfs/list', {params: { directoryPath: this.currentPath }});this.files = response.data.files; // 取出 files 数组this.currentPath = response.data.basePath; // 更新当前路径} catch (error) {console.error('列出文件时出错:', error);}},// 下载文件async downloadFile(filePath) {try {const response = await axios.get('/hdfs/download', {params: { hdfsFilePath: filePath },responseType: 'blob'});const url = window.URL.createObjectURL(new Blob([response.data]));const link = document.createElement('a');link.href = url;link.setAttribute('download', filePath.split('/').pop());document.body.appendChild(link);link.click();} catch (error) {console.error('下载文件时出错:', error);}},// 删除文件或目录async deleteFile(filePath) {try {await axios.delete('/hdfs/delete', {params: { hdfsPath: filePath }});this.listFiles(); // 刷新文件列表} catch (error) {console.error('删除文件或目录时出错:', error);}},// 创建新目录async createDirectory() {try {await axios.post('/hdfs/mkdir', null, {params: { directoryPath: this.currentPath + '/' + this.newDirectoryPath }});this.newDirectoryPath = ''; // 清空输入框this.listFiles(); // 创建目录后刷新文件列表} catch (error) {console.error('创建目录时出错:', error);}},// 返回上一级目录goUpOneLevel() {const pathParts = this.currentPath.split('/').filter(part => part);if (pathParts.length > 1) {pathParts.pop();this.currentPath = '/' + pathParts.join('/');} else {this.currentPath = '/';}this.listFiles(); // 刷新文件列表},// 进入一个目录onDirectoryClick(file) {if (!this.currentPath.endsWith('/')) {this.currentPath += '/';}if (!this.currentPath.endsWith(file.pathSuffix)) {this.currentPath += file.pathSuffix;}this.listFiles(); // 刷新文件列表以显示点击的目录的内容},// 格式化文件大小formatFileSize(size) {if (size < 1024) return size + ' B';else if (size < 1048576) return (size / 1024).toFixed(2) + ' KB';else if (size < 1073741824) return (size / 1048576).toFixed(2) + ' MB';else return (size / 1073741824).toFixed(2) + ' GB';}},mounted() {this.listFiles(); // 页面加载时加载初始目录中的文件}});
</script>
<!-- Bootstrap JS 用于模态框 -->
<script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.10.2/dist/umd/popper.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.min.js"></script>
</body>
</html>
6. 测试验证
6.1 创建目录
6.2 创建结果
6.3 上传文件
6.4 上传结果
6.5 重命名测试
6.6 重命名结果
6.7 其他
删除,下载等不再赘余。
6.8 查看HDFS默认的文件管理
7、遇到错误
第1个错误
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
下载对应版本的原生文件:https://github.com/cdarlint/winutils
然后修改下面内容到系统环境即可。
HADOOP_HOME=D:\hadoop-3.3.6\bin
%HADOOP_HOME%\bin
重要的话要说三遍:记得重启,记得重启,记得重启IDEA编译器。
第2个错误
Engine2 : Call: addBlock took 169ms
2024-08-11T19:14:22.716+08:00 DEBUG 13116 — [hdfs_hadoop] [ Thread-5] org.apache.hadoop.hdfs.DataStreamer : pipeline = [DatanodeInfoWithStorage[127.0.0.1:9866,DS-d52f1df8-88e2-4807-bc48-842e7b9f07a2,DISK]], blk_1073741826_1002
2024-08-11T19:14:22.716+08:00 DEBUG 13116 — [hdfs_hadoop] [ Thread-5] org.apache.hadoop.hdfs.DataStreamer : Connecting to datanode 127.0.0.1:9866
2024-08-11T19:14:22.718+08:00 WARN 13116 — [hdfs_hadoop] [ Thread-5] org.apache.hadoop.hdfs.DataStreamer : Exception in createBlockOutputStream blk_1073741826_1002
java.net.ConnectException: Connection refused: getsockopt
at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[na:na]
org.apache.hadoop.ipc.RemoteException: File /home/7dff6c94-88d2-4b62-83b9-92f93253b473_01.jpg could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
说明:一开始,我以为一个时间节点不行,后来想试一下集群还是不行,最后通过以下方式进行修改:
编辑 /etc/hosts
文件:
sudo nano /etc/hosts
- 我是把liber-vmware-virtual-platform 改为 192.168.186.77 连接成功,不然默认是127.0.0.1。
重要的话要说三遍:记得重启,记得重启,记得重启Ubuntu。
8. 总结
基于Ubuntu24.04 TLS 安装Hadoop以及配置HDFS,通过Spring Boot 和 Vue 实现一个简单的文件管理系统。