HDFS的API调用

原创已于 2025-06-18 15:55:54 修改 · 824 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#hdfs #hadoop #大数据

于 2025-06-18 15:55:12 首次发布

HDFS（Hadoop Distributed File System）API 允许开发者使用编程语言（如 Java、Python 等）与 HDFS 进行交互，实现文件和目录的创建、读取、写入、删除等操作。

一.启动集群

start-dfs.sh

二.创建项目（重点）

两种下载方式

1.下载sdk1.8不然会有很多错误

2.添加pom.xml的配置文件

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-slf4j-impl</artifactId>
        <version>2.12.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>3.1.3</version>
    </dependency>
</dependencies>

3.点击maven，将下面的依赖下载下来

4.在项目的src/main /resources目录下，新建一个文件，命名为“log4j2.xml”，在文件中填入

注意这是中文，复制

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="error" strict="true" name="XMLConfig">
    <Appenders>
        <!-- 类型名为Console，名称为必须属性 -->
        <Appender type="Console" name="STDOUT">
            <!-- 布局为PatternLayout的方式，
            输出样式为[INFO] [2018-01-22 17:34:01][org.test.Console]I'm here -->
            <Layout type="PatternLayout"
                    pattern="[%p] [%d{yyyy-MM-dd HH:mm:ss}][%c{10}]%m%n" />
        </Appender>
 
    </Appenders>
 
    <Loggers>
        <!-- 可加性为false -->
        <Logger name="test" level="info" additivity="false">
            <AppenderRef ref="STDOUT" />
        </Logger>
 
        <!-- root loggerConfig设置 -->
        <Root level="info">
            <AppenderRef ref="STDOUT" />
        </Root>
    </Loggers>
 
</Configuration>

5.创建HdfsClient类

public class hdfsClient {
    /**
     * 另外一台机器调用集群，对集群进行操作
     *1.获取Hadoop客户端
     *2.进行操作命令（创建一个文件夹）
     *3.关闭资源
     */
    
    @Test
    public void testmkdir1() throws URISyntaxException, IOException, InterruptedException {
        //连接集群的namenode地址
        URI uri=new URI("hdfs://172.18.0.2:9000");
        //创建一个配置文件
        Configuration configuration=new Configuration();
        //登陆用户
        String user="root";
        //获取到客户端对象
        FileSystem fs=FileSystem.get(uri,configuration,user);

        //创建一个文件夹
        fs.mkdirs(new Path("youxiuderen2"));

        //关闭资源
        fs.close();
    }
}

package org.lotus;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.Test;
 
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
 
public class HdfsClient {
 
/**
 * 客户端代码常用套路
 * 1.获取一个客户端
 * 2.执行相关的操作命令
 * 3.关闭资源
 */
    @Test
    public void testmkdir() throws URISyntaxException, IOException, InterruptedException {
        
        URI uri = new URI("hdfs://172.18.0.2:9000");
        Configuration configuration = new Configuration();
        
        FileSystem fs = FileSystem.get(uri, configuration, "root");
 
        fs.mkdirs(new Path("/xiyou/huaguoshan"));
 
        fs.close();
    }
}

点击运行，查看结果。

问题：
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

在pom.xml里面删改代码即可成功

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.lotus</groupId>
  <artifactId>HDFSClient</artifactId>
  <packaging>war</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>HDFSClient Maven Webapp</name>
  <url>http://maven.apache.org</url>
 
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
    </dependency>
    <dependency>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-slf4j-impl</artifactId>
      <version>2.12.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>3.1.3</version>
      <exclusions>
        <!-- 排除 Log4j 1.x -->
        <exclusion>
          <groupId>log4j</groupId>
          <artifactId>log4j</artifactId>
        </exclusion>
        <!-- 排除 SLF4J → Log4j 1.x 的桥接 -->
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  </dependencies>
 
  <build>
    <finalName>HDFSClient</finalName>
  </build>
</project>

三.上传

1.创建文件

新建一个File（此处为jianglanshu.txt）

2.将File上传到文件夹中（此处以youxiuderen1为例）

代码如下

 @Test
    public void testPut() throws URISyntaxException, IOException, InterruptedException {
        //连接集群的 namenode 的地址
        URI uri=new URI("hdfs://172.18.0.2:9000");
        //创建一个配置文件
        Configuration configuration=new Configuration();
        //登录用户
        String user="root";
        //获取客户端对象 
        FileSystem fs=FileSystem.get(uri,configuration,user);
        
        //参数解读：参数1：表示删除源数据；参数2：是否允许覆盖；参数3：源数据的路径；参数4：目的路 
        径（参数放到hdfs系统的具体路径）
        fs.copyFromLocalFile(false,false,new Path("/root/IdeaProjects/hdfsClient/jianglanshu.txt"),new Path("/youxiuderen1"));
        
        //关闭资源
        fs.close();

在此电脑找到文件路径（如下图）

3.查看

在终端查看

在master节点中查看

overwrite ：覆盖

4. 下载文件

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

public class hdfsClient {
    /**
     *
     */
    private FileSystem fs;

    @Before
    public void init() throws URISyntaxException,IOException,InterruptedException {
        URI uri=new URI("hdfs://172.18.0.2:9000");
        Configuration configuration=new Configuration();
        String user="root";
        fs=FileSystem.get(uri,configuration,user);
    }

    @After
    public void close() throws IOException {
        fs.close();
    }

    @Test
    public void testmkdir() throws URISyntaxException, IOException, InterruptedException {
        fs.mkdirs(new Path("/youxiuderen1"));
    }

    @Test
    public void testmkdir1() throws URISyntaxException, IOException, InterruptedException {
        fs.mkdirs(new Path("/youxiuderen2"));
    }

    @Test
    public void testPut() throws URISyntaxException, IOException, InterruptedException {
        fs.copyFromLocalFile(false,false,new Path("/root/IdeaProjects/hdfsClient/jianglanshu.txt"), new Path("/youxiuderen1"));
    }

    @Test
    public void testput1() throws  IOException{
        fs.copyFromLocalFile(true,true,new Path("/root/IdeaProjects/hdfsClient/jianglanshu.txt"),new Path("/youxiuderen1"));
    }
    
    @Test
    public void testGet() throws IOException {
        //参数解读：boolean delsrc：是否将源文件删除； Path src：要下载的文件路径（hdfs的路径）； Path dst：将文件下载到本地服务端的路径；boolean useRawLocalFileSystem是否开启校验
        fs.copyToLocalFile(false,new Path("/youxiuderen1/jianglanshu.txt"),new Path("/root/IdeaProjects/hdfsClient"),false);
    }
}

5.复制文件：调用 `fs.copyToLocalFile` 方法将 HDFS 上的文件复制到本地文件系统。

@Test
    public void testGet1() throws IOException {
        fs.copyToLocalFile(false, new Path("/youxiuderen1/jianglanshu.txt"), new Path("/root/IdeaProjects/hdfsClient/jianglanshu1.txt"), true);
    }

四.删除

例一

@Test
   public void testDeleta() throws IOException{
       fs.delete(new Path("/jdk-8u171-linux-x64.tar.gz",false);
}

结果：jdk文件被删除

这里改为默认值3

例二.删除空文件

结果：jj文件夹被删除了

例三.删除非空文件

出现报错，错误原因：“Directory is not empty”

解决办法

@Test
//删除非空文件夹，第二个参数选为true，会删除这个文件夹下面所有的文件和这个文件夹
    public void testDeleta2() throws IOException {
        fs.delete(new Path("/yinyue/seventeen"), true);
    }

五.文件名字更改

 @Test
//修改文件名字，第一个参数是：原本文件的路径和名字  第二个参数：被修改后的名字和路径
    public  void  testRename() throws IOException {
        fs.rename(new Path("/yinyue/twice/momo.txt"),new Path("/yinyue/twice/sana.txt"));
    }

查找文件

 @Test
//listFiles方法：返回该目录下所有子文件和子目录的详细信息，包括文件的长度，块大小，备份数，修改时间，所有者，权限
    public void testFileFiles() throws IOException {
//RemoteIterator代表返回的对象类型,LocatedFilesStatus范示，他表示的是，他的结构符合LocatedFiles
//fs.listFiles返回了所有文件的信息，返回的信息都存储在listFiles（listFiles将钱取出来，放在listFiles这个包里面）
        RemoteIterator<LocatedFileStatus> listFiles=fs.listFiles(new Path("/youxiuderen1"),true)；
//钱包里面有很多层，while循环，从第一层遍历到最后一层，相当于把每个夹层都翻一遍
        while (listFiles.hasNext()){
            //打开第i层，把钱拿出来
            LocatedFileStatus status=listFiles.next();
            //将文件名称（钱的名字例如100块）
            System.out.println(status.getPath().getName());
            //文件的长度
            System.out.println(status.getLen());
            //文件权限
            System.out.println(status.getPermission());
            //文件的分组
            System.out.println(status.getGroup());    
     
            System.out.println("--<*_*>--");
            //获取存储的块信息 
            BlockLocation[] blockLocations=status.getBlockLocations();
            //多个块，所以我们需要遍历 blockLocations（所有块信息，里面有很多小块信息，查看每个小块信息） 
            for (BlockLocation blockLocation:blockLocations){
                //获取存储这个块的host（他可能包含多个节点）
                String[] hosts=blockLocation.getHosts();
                //遍历所有的节点
                for (String host:hosts){
                    System.out.println(host);
                }
            }
            System.out.println(("--@_@--"));
        }
    }

下面是AI给出的解释

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.junit.Test;

import java.io.IOException;

public class HDFSTest {

    @Test
    public void testFileFiles() throws IOException {
        // 创建 Hadoop 配置对象
        Configuration conf = new Configuration();
        // 设置 HDFS 的地址
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
        // 获取 HDFS 文件系统对象
        FileSystem fs = FileSystem.get(conf);

        RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/youxiuderen1"), true);
        while (listFiles.hasNext()) {
            LocatedFileStatus status = listFiles.next();

            System.out.println(status.getPath().getName());
            System.out.println(status.getLen());
            System.out.println(status.getPermission());
            System.out.println(status.getGroup());

            System.out.println("--<*_*>--");

            BlockLocation[] blockLocations = status.getBlockLocations();

            for (BlockLocation blockLocation : blockLocations) {
                String[] hosts = blockLocation.getHosts();
                for (String host : hosts) {
                    System.out.println(host);
                }
            }
            System.out.println(("--@_@--"));
        }

        // 关闭文件系统连接
        fs.close();
    }
}

详细解释

@Test 注解：这是 JUnit 框架的注解，用于标记这是一个测试方法。JUnit 会自动识别并执行带有该注解的方法。
RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/youxiuderen1"), true);：
fs 是一个 HDFS 文件系统对象，通常通过 FileSystem 类的实例化得到。
listFiles 方法用于获取指定目录下的文件迭代器。
new Path("/youxiuderen1") 表示要遍历的目录路径为 /youxiuderen1。
true 表示递归遍历该目录及其子目录。

while (listFiles.hasNext()) 循环：用于遍历迭代器中的每个文件。
LocatedFileStatus status = listFiles.next();：获取迭代器中的下一个文件状态信息。
打印文件基本信息：
status.getPath().getName()：获取文件的名称。
status.getLen()：获取文件的长度（字节数）。
status.getPermission()：获取文件的权限信息。
status.getGroup()：获取文件所属的组。

BlockLocation[] blockLocations = status.getBlockLocations();：获取文件所在的数据块位置信息。
嵌套 for 循环：用于遍历每个数据块的位置信息，并打印出数据块所在的主机名。

注意事项

代码中的 fs 变量需要在测试类中进行初始化，通常通过 FileSystem 类的 get 方法来获取 HDFS 文件系统对象。
该代码依赖于 Hadoop 的相关库，需要确保项目中已经正确引入了这些库。
代码中抛出了 IOException 异常，需要在调用该方法时进行异常处理。

六.判断是文件还是文件夹

 @Test
    public void testListStatus() throws IOException {
        //获取在hdfs系统里面此/lotusinput路径下，所有文件以及文件夹的状态
        FileStatus[] listStatus=fs.listStatus(new Path("/lotusinput"));
        //遍历listStatus
        for (FileStatus fileStatus:listStatus){
            //isList判断是不是文件
            if (fileStatus.isFile()){
                System.out.println("file:"+fileStatus.getPath().getName());
            }else {
                System.out.println("dirc:"+fileStatus.getPath().getName());
            }
        }
    }