2021-06-10

Java 代码之文件处理

1.1. IO流

字节流：接口 InputStream、OutputStream
字符流：接口 Reader、Writer、BufferedReader、BufferedWriter
转换流：InputStreamReader、OutputStreamWriter

1.1.1 字节流

1. `FileInputStream` 与 `FileOutStream`

读取或写入单个字节

File file1 = new File("read.txt");
FileInputStream input = new FileInputStream(file1);
byte[] date = new byte[1024];                           // 每次可以读取的最大数量
int len = input.read(date);                             // 此时数据读取到数组中
String result = new String(date,0,len);                 // 将字节数组转换为 String
System.out.println(result);
input.close();

File file2 = new File("write.txt");
FileOutputStream output = new FileOutputStream(file2);
String msg = "Hello World!";
output.write(msg.getBytes());
output.close();

1.1.2 字符流

Writer 类有方法直接向目标源写入字符串，而在 Reader 类中没有方法可以直接读取字符串类型，只能读取字符数组

File file1 = new File("read.txt");
FileReader in = new FileReader(file1);
char[] date = new char[1024];               // 每次可读取的最大数量
int len = in.read(date);                    // 将数据读取到字符数组中
String result = new String(date,0,len);     // 将字符数组转换为 String
System.out.println(result);
in.close();

File file2 = new File("write.txt");
FileWriter out = new FileWriter(file2,true);
out.write("Hello World!");
out.flush();                                // 刷新缓存
out.close();

1.1.3. 转换流

也是字符流，字节流通向字符流的桥梁，一般可以设置字符集

File file1 = new File("read.txt");
FileInputStream input = new FileInputStream(file1);
InputStreamReader isr = new InputStreamReader(input,Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
String line = null;
while ((line = br.readLine()) != null) {
    System.out.println(line);
}
br.close();

File file2 = new File("write.txt");
FileOutputStream output = new FileOutputStream(file2);
OutputStreamWriter osw = new OutputStreamWriter(output,Charset.forName("UTF-8"));
BufferedWriter bw = new BufferedWriter(osw);
bw.write("Hello World!");
bw.flush();
bw.close();

1.2. NIO流

1.2.1. Path

文件系统都是 Tree 或者层级结构来组织文件的，任何一个节点可以是一个目录或者一个文件，在 NIO2 中称为 Path

1.2.1.1. 基本属性

// 获取文件路径
Path path = Paths.get("/data/logs/web.log");
// 父路径
System.out.printf("Parent:%s%n",path.getParent());
// 根路径，比如"/"、"C:";如果是相对路径，则返回null
System.out.printf("Root:%s%n",path.getRoot());
// 子路径，结果中不包含root，前开后闭
System.out.printf("Subpath[0,2]:%s%n",path.subpath(0,2));

// 获取路径中的文件名或者最后一个节点元素
System.out.printf("FileName:%s%n", path.getFileName());
// 路径节点元素的格式
System.out.printf("NameCount:%s%n", path.getNameCount());

// 遍历路径节点：方法1
Iterator<Path> names = path.iterator();
int i = 0;
while (names.hasNext()) {
    Path name = names.next();
    System.out.printf("Name %s:%s%n",i,name.toString());
    i++;
}

// 遍历路径节点：方法2
for(int j = 0; j < path.getNameCount(); j++) {
    System.out.printf("Name %s:%s%n",j,path.getName(j));
}

// 结果
FileName:web.log
Parent:/data/logs
Root:/
Subpath[0,2]:data/logs
NameCount:3
Name 0:data
Name 1:logs
Name 2:web.log
Name 0:data
Name 1:logs
Name 2:web.log

1.2.1.2. 路径转换

去除冗余路径展示标准路径 normalize()

1 2	Path path = Paths.get("/data/logs/../web.log"); System.out.printf("%s%n",path.normalize()); // 结果：/data/web.log

转换为 uri，使文件可以被外部资源访问（resource） toUri()

1 2	Path path = Paths.get("/data/logs/web.log"); System.out.printf("%s%n",path.toUri()); // 结果：file:///data/logs/web.log

转换绝对路径 toAbsolutePath()
如果路径为相对路径，则转换为绝对路径。对于JAVA程序而言，起始路径为 classpath。此方法不会检测文件是否真的存在或者有权限。
转换真实路径 toRealPath()
它会对文件是否存在、访问权限进行检测，需要捕获异常，常用于对“用户输入的 path ”进行校验和转换：

如果为相对路径，则将会转换为绝对路径，同 3
如果路径中包含冗余，则移除，同 3
如果是“符号连接”文件（软连接），则获取其实际 targe t路径（除非指定了NO_FOLLOW_LINKS）

路径合并 resolve() 和 resolveSibling()
当前 path 与参数进行路径合并，即 append

获取相对路径 relativize()
是 resolve() 逆操作，是母路径 - 组合的新路径，得到相对路径，对于 “/data” 与 “/data/logs/p1” 的相对路径为 “logs/p1”，反之为 “../../“

Path basePath = Paths.get("rss"); // 通过字符串获取路径
Path resolvePath = basePath.resolve("resolvePath"); // 组合 basePath 和 "resolvePath" 得到新路径
Path resolveSibling = basePath.resolveSibling("resolveSibling"); // 得到 basePath 兄弟路径 "resolveSibling"
Path relativizePath = basePath.relativize(resolvePath); // 得到路径 basePath - resolvePath 的相对路径

System.out.println("basePath = " + basePath.toAbsolutePath());
System.out.println("resolvePath = " + resolvePath.toAbsolutePath());
System.out.println("resolveSibling = " + resolveSibling.toAbsolutePath());
System.out.println("relativizePath = " + relativizePath);

// 结果
basePath = F:\workspace\IDEA\Java_Core2\rss
resolvePath = F:\workspace\IDEA\Java_Core2\rss\resolvePath
resolveSibling = F:\workspace\IDEA\Java_Core2\resolveSibling
relativizePath = resolvePath

转换成 File 类对象 toFile()

建 Scanner 对象

1	Scanner in = new Scanner(Paths.get("C:\\Users\test.txt"));

1.2.2. Files

Files 类中提供了大量静态方法，用于实现文件（目录）的创建、复制、迁移、删除以及访问文件数据等操作

1.2.2.1. 检测文件或目录

exist(Path)：true 时表示文件存在且有权限；false 时表示文件不存在或者存在但无权限
notExists(Path)：true 时表示文件不存在；false 时表示文件存在或者存在但无权限

情形	函数
文件存在且有权限	`exist(Path)`
文件存在但无权限	`!exist(Path) && !notExists(Path)`
文件不存在	`notExists(Path)`

判断文件（目录）具有读、写、执行的权限，可以通过如下方法：

1
2
3

Path path = Paths.get("data/logs/web.log");
boolean isRegularExecutableFile = Files.isRegularFile(path) &
        Files.isReadable(path) & Files.isExecutable(path);

1.2.2.2. 删除

Files.delete(Path): 如果指定路径不存在，报异常 NoSuchFileException
Files.deleteIfExists(Path) : 如果文件存在，才会删除，不会报异常。可以用来删除空目录
如果文件是软连接，则只删除连接文件而不会删除 target 文件，如果 path 为目录，则目录需要为空，否则删除失败（IOException）。

1.2.2.3. 文件（目录）复制

Files.copy(fromPath, toPath) : fromPath 和 toPath 都是 Path 对象, 如果目标路径已存在文件, 复制失败
Files.copy(fromPath, toPath, CopyIOption) :
Files.copy(inputStream, toPath) : 从输入流复制到目标路径
Files.copy(fromPath, outputStream) : 从源路径复制到输出流

注意 CopyOption 的相关选项。当被复制文件是软连接时，将会默认复制 target 文件，如果只想复制软连接，可以指定 NOFOLLOW_LINKS 选项。如下为 CopyOption 选项列表：

REPLACE_EXISTING：如果目标文件已经存在，则直接覆盖；如果目标文件是个软连接，则软连接文件本身被覆盖（而非连接文件的target文件）；如果复制的是目录，且目标目录不为空时，则会抛出异常（DirectoryNotEmptyException）。此参数通常必选。复制目录时，目标目录会自动创建，源目录中如果有文件，则不会复制文件，只会创建空的目标目录。source和target，要么同时是目录、要么同时是文件。
COPY_ATTRIBUTES：复制文件时，也同时复制目标文件的属性（metadata），对于文件属性（FileAttribute）的支持依赖于文件系统（和平台），不过 lastModifiedTime 通常会被复制。
NOFOLLOW_LINKS：继承自 LinkOption，表示如果文件是软连接，则不 followed，即只复制连接文件，不复制其 target 实际文件内容。

1.2.2.4. 移动

Files.move(fromPath, toPath)
Files.move(Path,Path,CopyIOption)

如果是目录，目录中包含文件时也可以移动的（这可能依赖于平台），子目录也一起移动，但是目标目录必须为空（DirectoryNotEmptyException）目标目录不需要提前创建，move 结束后，源目录将不存在。支持两种选项：

REPLACE_EXISTING：如果目标文件已存在，则覆盖；如果目标文件是软连接，则连接文件被覆盖但是其指向不会受影响。
ATOMIC_MOVE：原子复制，需要平台的文件系统支持（不支持则抛出异常），指定此参数时其他选项将被忽略；如果文件不能被原子复制（或者替换），则会抛出 AtomicMoveNotSupportedException。

1.2.2.5. 打开文件

Files 类中提供了多个静态的方法，用于直接读写文件。如下为文件写入时的几个选项参数（StandardOpenOptions）：

WRITE：打开文件用于 write 访问。
APPEND：在文件尾部追加数据，伴随用于 WRITE 或 CREATE 选项。
TRUNCATE_EXISTING：将文件 truncate 为空，伴随用于 WRITE 选项。比如，文件存在时，将文件数据清空并重新写入。
CREATE_NEW：创建新文件，如果文件已存在则抛出异常。
CREATE：如果文件已存在则直接打开，否则创建文件。
DELETE_ON_CLOSE：当文件操作关闭时则删除文件（close 方法或者 JVM 关闭时），此选项适用于临时文件（临时文件不应该被其他进程并发访问）。
SPARSE：创建一个“稀疏”文件，伴随使用 CREATE_NEW，适用于某些特殊的文件系统比如 NTFS，这些大文件允许出现 “gaps”（空洞）在某些情况下可以提高性能且这些 gaps 不消耗磁盘空间。
SYNC：对文件内容（data）或者 metadata 的修改，都会同步到底层存储。
DSYNC：对文件内容的修改，会同步到底层存储。

Path path1 = Paths.get("/data1/web.log");
Path path2 = Paths.get("/data2/web.log");

// 读取文件
byte[] arr = Files.readAllBytes(path1);
List<String> lines = Files.readAllLines(path1,Charset.forName("utf-8"));
BufferedReader reader = Files.newBufferedReader(path1);

// 写入文件
Files.write(path2,lines,Charset.forName("utf-8"),StandardOpenOption.APPEND);
...
BufferedWriter writer = Files.newBufferedWriter(path2,Charset.forName("utf-8"),StandardOpenOption.APPEND);
...

1.2.2.6. 创建文件

创建目录
如果目录已经存在会抛出异常 FileAlreadyExistsException，创建目录是原子性的
1
2
Path path = Paths.get("dir");
Files.createDirectory(path); // 创建以 path 为路径的目录

创建文件
如果文件已经存在会抛出异常 FileAlreadyExistsException，创建文件是原子性的

1 2	Path path = Paths.get("file"); Files.createDirectory(pat); // 创建以 path 为路径的文件, 文件可以与目录路径及同名

必须支持 Posix 权限的系统(linux)才需要指定 FileAttribute ，其他无需指定，否则报错

Path path = Paths.get("/data1/web.log");
List<String> lines = new ArrayList<String>();
lines.add("hello world!");

if (Files.notExists(path) || Files.deleteIfExists(path)) {
    Set<PosixFilePermission> perms = PosixFilePermissions.fromString("rw-r--r--");
    FileAttribute<Set<PosixFilePermission>> attrs = PosixFilePermissions.asFileAttribute(perms);
    Files.createFile(path, attrs);
    Files.write(path, lines, Charset.forName("utf-8"), StandardOpenOption.TRUNCATE_EXISTING);
} else {
    throw new RuntimeException("Get wrong sql file path!");
}

在给定位置或者系统指定位置，创建临时文件/目录
Win10x64 系统默认临时文件夹路径：C:\Users<usernamw>\AppData\Local\Temp

Path dir = ...;
Path newPath = Files.createTempFile(dir, prefix, suffix); // dir路径下, 创建以prefix为前缀, suffix为后缀的名称的文件
Path newPath = Files.createTempFile(prefix, suffix); // 系统默认临时目录路径下, 创建以prefix为前缀, suffix为后缀的名称的文件
Path newPath = Files.createTempDirectory(dir, prefix); // dir路径下, 创建以prefix为前缀, suffix为后缀的名称的目录
Path newPath = Files.createTempDirecotry(prefix); // 系统默认临时目录路径下, 创建以prefix为前缀, suffix为后缀的名称的目录

1.2.2.7. 读写文件

读取/写中小文件

/* 读取文件内容 */
// 一次按二进制读取所有文件内容
byte[] bytes = Files.readAllBytes(path); // 文件路径Path -> 二进制数组byte[]

// 一次按行读取文件所有内容
List<String> lines = Files.readAllLines(path);

/* 写文件内容 */
// 将 bytes 转换成字符串
String content = new String(bytes, charset); // charset指定字符编码, 如 StandardCharsets.UTF_8

// 写一个字符串到文件
Files.write(path, content.getBytes(charset)); 
// 追加字符串到文件
Files.write(path, content.getBytes(charset),StandardOpenOption.APPEND);
// 写一个行的集合到文件
Files.write(path, lines);

大文件
要处理大文件和二进制文件，需要用到输入流/输出流，或者使用读入器/写入器。

InputStream in = Files.newInputStream(path);
OutputStream out = Files.newOutputStream();
Reader reader = Files.newBufferedReader(path, charset);
Writer writer = Writer.newBufferedWriter(path, charset);

1.2.2.8. 获取文件信息

boolean exists(path) : 文件存在?
boolean isHidden(path) : 文件隐藏?
boolean isReadable(path) : 文件可读?
boolean isWritable(path) : 文件可写?
boolean isExecutable(path) : 可执行?
boolean isRegularFile(path) : 是普通文件? 等价于 !isSymbolicLink() && !isDirectory() && !isOther()
boolean isDirectory(path) : 是目录?
boolean isSymbolicLink(path) : 是符号链接?
long fileSize = Files.size(path) : 获取文件字节数
Files.readAttributes(path, BasicFileAttributes.class) : 获取基本文件属性集
Files.readAttributes(path, PosixiFileAttributes.class) : 如果文件系统兼容 POSIX, 才能获取到 PosixiFileAttributes 实例

1.2.2.9. 目录访问

遍历指定目录下各项，Files.list(Path)会返回Stream，而且是惰性读取，处理目录具有大量项时高效。不过，其不会进入子目录，进入子目录使用 Files.walk(Path)

try(Stream<Path> entries = Files.list(dirPath)) { // 读取目录涉及需要关闭系统资源, 使用try块. 不进入子目录
      entries.forEach(System.out::println); // 打印每个entries项, 也就是打印每个path
}

try(Stream<Path> entries = Files.walk(dirPath)) { // 会进入子目录
      entries.forEach(System.out.println);
}

/* 示例将当前目录rss下所有文件（包括目录）及子文件, 都复制到目录rss2下 */
Path source = Paths.get("rss"); // 根据实际情况设置字节的source路径
Path target = Paths.get("rss2");

try(Stream<Path> entries = Files.walk(source)) {
    entries.forEach( p-> {
            try{
                // 取得p相对于source的相对路径后, 再拼接到target路径下. 相当于是说, 将每个文件相对路径都由source转移到target下
              Path q = target.resolve(source.relative(p)); 
              if(!Files.exists(q)) {
                    if(Files.isDirectory(q)) Files.createDirectory(q); // 如果是目录, 在target路径下, 根据相对路径创建对应目录
                    else Files.copy(p, q);  // 如果是文件, 从source路径复制到target下
              }
            } catch(IOException e) {
                e.printStackTrace();
            }
    });
}

1.2.2.10. 目录流

使用 Files.walk(Path)有一个缺陷，无法方便地删除目录，因为要删除父目录，必须先删除子目录。否则，会抛出异常。
使用 File.newDirectoryStream(Path)对象，产生一个 DirectoryStream，对遍历过程可以进行更细粒度控制。其不是流，而是专门用于目录遍历的接口。它是 Iterable 的子接口。
还可以搭配 glob 模式来过滤文件

// 滤出dir目录下 后缀名为 "".java" 的文件
try(DirectoryStream<Path> entries = Files.newDirectoryStream(dir, "*.java")){
    for (Path entry: entries) {
        ...    
    }
}

1.2.2.11. 访问目录所有子孙

如果想要访问某个目录下所有子孙，可以使用 walkFileTree()，并向其传递一个 FileVisitor 对象。这个方法并非简单遍历。
在遇到文件或目录时、目录被处理前、目录被处理后、访问文件错误时等情形下，FileVisitor 会收到通知，然后指定执行方式：跳过该文件、跳过目录、跳过兄弟文件、终止访问。

// walkFileTree得到的通知：
FileVisitResult visitFile()  // 遇到文件或目录时
FileVisitResult preVisitDirectory() // 一个目录被处理前
FileVisitResult postVisitDirectory() // 一个目录被处理后
FileVisitResult visitFileFailed() // 试图访问文件失败, 或目录发生错误时

// 收到通知后, 可以设置指定的操作
FileVisitResult.CONTINURE // 继续访问下一个文件
FileVisitResult.SKIP_SUBTREE // 继续访问, 但不再访问这个目录下任何文件
FileVisitResult.SKIP_SIBLINGS // 继续访问, 但不再访问这个文件的兄弟文件(同一个目录下的文件)
FileVisitResult.TERMINATE // 终止访问

便捷类 SimpleFileVisitor 和 Files.walkFileTree() 可以实现对目录的细粒度访问，并在在收到相关通知时，有机会进行相应处理。
默认 SimpleFileVisitor 类实现 FileVisitor 接口，除 visitFileFailed() 外(抛出异常并终止访问)，其余方法都是直接继续访问，而不做任何处理。
注意：preVisitDirectory() 和 postVisitDirectory() 通常需要覆盖，否则，访问时遇到不允许打开的目录或者不允许访问的文件时立即失败，进而直接跳转到 visitFileFailed()

/********** 例1 : 打印给定目录下的所有子目录 **********/
Files.walkFileTree(Paths.get("F:\\test"), new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
        System.out.println(dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
        System.out.println("postVisitDirectory " + dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
        return FileVisitResult.SKIP_SUBTREE;
    }
});

// 结果
F:\test
F:\test\dir1
F:\test\dir1\subdir1
postVisitDirectory F:\test\dir1\subdir1
postVisitDirectory F:\test\dir1
F:\test\dir2
postVisitDirectory F:\test\dir2
F:\test\dir3
postVisitDirectory F:\test\dir3
postVisitDirectory F:\test

/********** 例2 : 删除目录树（包括其中的文件） **********/
Files.walkFileTree(Paths.get("F:\\test"), new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
        System.out.println(dir);

        // 删除dir路径下所有文件(不包含子目录)
        Files.list(dir).forEach(p->{
            try {
                if (!Files.isDirectory(p))
                    Files.delete(p);

            } catch (IOException e) {
                e.printStackTrace();
            }
        });

        return FileVisitResult.CONTINUE;
    }

    // 删除目录
    @Override
    public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
        System.out.println("postVisitDirectory " + dir);
        if (null != exc) throw exc;
        Files.delete(dir);
        return FileVisitResult.CONTINUE;
    }
} );

1.2.2.12. 文件通道

FileChannel 提供了一种通过通道来访问文件的方式，它可以通过带参数 position(int) 方法定位到文件的任意位置开始进行操作，还能够将文件映射到直接内存，提高大文件的访问效率。

1. 通道获取

通过 FileInputStream, FileOutputStream, RandomAccessFile 的对象中的 getChannel() 方法来获取，也可以通通过静态方法 FileChannel.open(Path, OpenOption …) 来打开。

// io 字节流
FileOutputStream ous = new FileOutputStream(new File("a.txt"));
FileChannel out = ous.getChannel(); // 获取一个只读通道
FileInputStream ins = new FileInputStream(new File("a.txt"));
FileChannel in = ins.getChannel();  // 获取一个只写通道

// io 随机访问文件
RandomAccessFile file = new RandomAccessFile("a.txt", "rw");
FileChannel channel = file.getChannel(); // 获取一个可读写文件通道

// nio
FileChannel channel = FileChannel.open(Paths.get("a.txt"), StandardOpenOption.READ); // 以只读的方式打开一个文件 a.txt 的通道

2. 读取数据

读取数据的 read(ByteBuffer buf) 方法返回的值表示读取到的字节数，如果读到了文件末尾，返回值为 -1。读取数据时，position 会往后移动。

FileChannel channel = FileChannel.open(Paths.get("a.txt"), StandardOpenOption.READ);
ByteBuffer buf = ByteBuffer.allocate(5);
while(channel.read(buf)!=-1){
    buf.flip();
    System.out.print(new String(buf.array()));
    buf.clear();
}
channel.close();

3. 写入数据

FileChannel channel = FileChannel.open(Paths.get("a.txt"), StandardOpenOption.WRITE);
ByteBuffer buf = ByteBuffer.allocate(5);
byte[] data = "Hello, Java NIO.".getBytes();
for (int i = 0; i < data.length; ) {
    buf.put(data, i, Math.min(data.length - i, buf.limit() - buf.position()));
    buf.flip();
    i += channel.write(buf);
    buf.compact();
}
channel.force(false);
channel.close();

Glett的码字间

理想三旬

Java 代码之文件处理

1.1. IO流

1.1.1 字节流

1. `FileInputStream` 与 `FileOutStream`

1.1.2 字符流

1.1.3. 转换流

1.2. NIO流

1.2.1. Path

1.2.1.1. 基本属性

1.2.1.2. 路径转换

1.2.2. Files

1.2.2.1. 检测文件或目录

1.2.2.2. 删除

1.2.2.3. 文件（目录）复制

1.2.2.4. 移动

1.2.2.5. 打开文件

1.2.2.6. 创建文件

1.2.2.7. 读写文件

1.2.2.8. 获取文件信息

1.2.2.9. 目录访问

1.2.2.10. 目录流

1.2.2.11. 访问目录所有子孙

1.2.2.12. 文件通道

1. 通道获取

2. 读取数据

3. 写入数据

1.1. IO流

1.1.1 字节流

1. FileInputStream 与 FileOutStream

1.1.2 字符流

1.1.3. 转换流

1.2. NIO流

1.2.1. Path

1.2.1.1. 基本属性

1.2.1.2. 路径转换

1.2.2. Files

1.2.2.1. 检测文件或目录

1.2.2.2. 删除

1.2.2.3. 文件（目录）复制

1.2.2.4. 移动

1.2.2.5. 打开文件

1.2.2.6. 创建文件

1.2.2.7. 读写文件

1.2.2.8. 获取文件信息

1.2.2.9. 目录访问

1.2.2.10. 目录流

1.2.2.11. 访问目录所有子孙

1.2.2.12. 文件通道

1. 通道获取

2. 读取数据

3. 写入数据

1. `FileInputStream` 与 `FileOutStream`