BufferedInputStream：为什么它能让 IO 快 1000 倍

一个有意思的问题：

为什么 read() 返回的是 int 而不是 byte？为什么加了一层缓冲，IO 性能就能提升 1000 倍？为什么 BufferedInputStream 能支持 mark()/reset()？

这三个问题看似独立，答案都指向同一个底层原理。

系统调用：性能的隐形杀手

每次 read() 都涉及用户态到内核态的切换。这不是简单的函数调用，是 CPU 特权级别的切换：

用户态 → 内核态 切换：数百到数千个 CPU 周期
一次内存读写：几十个 CPU 周期

换句话说，一次系统调用的开销，相当于几十到几百次内存操作。

无缓冲时：

java

FileInputStream fis = new FileInputStream("big.dat");
while (fis.read() != -1) { }  // 每读 1 字节 = 1 次系统调用
// 读 1MB 文件 = 1,048,576 次系统调用

有缓冲时：

java

BufferedInputStream bis = new BufferedInputStream(
    new FileInputStream("big.dat"));
byte[] buf = new byte[8192];
while (bis.read(buf) != -1) { }  // 每读 8KB = 1 次系统调用
// 读 1MB 文件 = 128 次系统调用

1,048,576 vs 128，这是 8000 倍的差距。

内部原理：缓冲区工作机制

BufferedInputStream 内部维护一个 byte 数组作为缓冲区：

java

public class BufferedInputStream extends FilterInputStream {
    // 缓冲区数组
    protected volatile byte[] buf;

    // 缓冲区有效数据末尾索引
    protected int count;

    // 当前读取位置（相对于缓冲区）
    protected int pos;

    // 标记位置
    protected int markpos = -1;

    // mark 的有效范围
    protected int marklimit;

    public synchronized int read() throws IOException {
        if (pos >= count) {
            // 缓冲区空了，调用底层流批量填充
            fill();
            if (pos >= count) return -1;
        }
        return buf[pos++] & 0xff;  // 从缓冲区取一字节
    }

    private void fill() throws IOException {
        // 调用底层流，一次性读满整个缓冲区
        int len = in.read(buf, 0, buf.length);
        count = (len == -1) ? 0 : len;
        pos = 0;
    }
}

核心逻辑：

第一次 read()：缓冲区空，调用 fill() 一次性从底层流读 8KB
后续 8192 次 read()：直接从缓冲区取，不触发系统调用
缓冲区再次空了：再次调用 fill()，又批量读 8KB

一次系统调用，读 8KB 数据，这才是快的本质。

mark() / reset()：缓冲区内的时光机

BufferedInputStream 支持 mark/reset，这功能必须在缓冲区内工作：

java

try (BufferedInputStream bis = new BufferedInputStream(
        new FileInputStream("data.bin"))) {
    bis.mark(100);  // 标记当前位置

    byte[] buf1 = new byte[10];
    bis.read(buf1);  // 读 10 字节

    bis.reset();    // 回到标记位置

    byte[] buf2 = new byte[10];
    bis.read(buf2); // 重新读同一段数据
}

原理：mark 时记录 pos，reset 时把 pos 改回去。数据已经在缓冲区里，不需要重新读磁盘。

注意：mark(100) 表示最多回退 100 字节，超过这个范围 mark 自动失效。

基本用法

java

// 默认 8KB 缓冲区
BufferedInputStream bis = new BufferedInputStream(
    new FileInputStream("data.bin"));

// 指定缓冲区大小
BufferedInputStream bis = new BufferedInputStream(
    new FileInputStream("data.bin"), 16384);  // 16KB

// 典型用法：缓冲 + 批量读
try (BufferedInputStream bis = new BufferedInputStream(
        new FileInputStream("big.dat"))) {
    byte[] buffer = new byte[8192];
    int len;
    while ((len = bis.read(buffer)) != -1) {
        process(buffer, len);
    }
}

缓冲区大小怎么选

场景	建议大小
普通文件	8192 字节（8KB）
大文件传输	1MB ~ 8MB
网络 IO	1460 字节（MTU）或 8192 字节

8KB 是最佳平衡点：匹配大多数文件系统块大小，内存占用可接受，系统调用次数大幅减少。

和 DataInputStream 配合

java

// 正确顺序：DataInputStream 包装 BufferedInputStream
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(
            new FileInputStream("data.bin")))) {
    int i = dis.readInt();
    double d = dis.readDouble();
}

DataInputStream 不知道也不关心底层有没有缓冲，它只管按类型读取。缓冲层的优化对它透明生效。

关闭流的问题

java

// ✅ 正确：只关 BufferedInputStream
try (BufferedInputStream bis = new BufferedInputStream(
        new FileInputStream("data.bin"))) {
    // 读取
}
// 底层 FileInputStream 会自动关闭

// ❌ 错误：只关 FileInputStream
BufferedInputStream bis = new BufferedInputStream(
    new FileInputStream("data.bin"));
bis.read();  // 用完了
new FileInputStream("data.bin").close();  // 关错了
// FileInputStream 没关闭，但 BufferedInputStream 也没关

永远只关最外层流。

回到开头的问题：

为什么 read() 返回 int？因为 -1 是流结束标记，byte 无法表示。
为什么缓冲让 IO 快 1000 倍？因为减少了系统调用次数。
为什么 BufferedInputStream 支持 mark/reset？因为数据在内存缓冲区里，可以随意定位。

理解了原理，才能用好工具。

Java 版本演进

JDK 安装配置

编辑器配置

IDEA 配置

程序运行与调试

入门案例

循环结构

传统日期 API

Java8+ 新时间 API

ArrayList

LinkedList

HashSet

TreeSet

HashMap

文件字节流

缓冲字节流

文件字符流

缓冲字符流

Java NIO 核心

Java NIO2 (Path API)

Lambda 表达式

Stream 流

新时间日期 API

JUnit 5 核心用法

BufferedInputStream：为什么它能让 IO 快 1000 倍

系统调用：性能的隐形杀手

内部原理：缓冲区工作机制

mark() / reset()：缓冲区内的时光机

基本用法

缓冲区大小怎么选

和 DataInputStream 配合

关闭流的问题

IDEA 配置

BufferedInputStream：为什么它能让 IO 快 1000 倍 ​

系统调用：性能的隐形杀手 ​

内部原理：缓冲区工作机制 ​

mark() / reset()：缓冲区内的时光机 ​

基本用法 ​

缓冲区大小怎么选 ​

和 DataInputStream 配合 ​

关闭流的问题 ​

BufferedInputStream：为什么它能让 IO 快 1000 倍

系统调用：性能的隐形杀手

内部原理：缓冲区工作机制

mark() / reset()：缓冲区内的时光机

基本用法

缓冲区大小怎么选

和 DataInputStream 配合

关闭流的问题