Add floating-point format in data_and_memory.md

2025-02-02 22:43:50 +08:00 · 2023-02-22 19:02:26 +08:00 · 2023-02-22 19:02:26 +08:00 · d87c9b5084
commit d87c9b5084
parent 3daaf30f23
6 changed files with 93 additions and 31 deletions
--- a/docs/chapter_array_and_linkedlist/array.md
+++ b/docs/chapter_array_and_linkedlist/array.md
@ -110,8 +110,8 @@ comments: true

 <p align="center"> Fig. 数组元素的内存地址计算 </p>

-```java title=""
-// 元素内存地址 = 数组内存地址 + 元素长度 * 元素索引
+```shell
+# 元素内存地址 = 数组内存地址 + 元素长度 * 元素索引
 elementAddr = firtstElementAddr + elementLength * elementIndex
 ```

--- a/docs/chapter_data_structure/data_and_memory.assets/IEEE-754-float.png
+++ b/docs/chapter_data_structure/data_and_memory.assets/IEEE-754-float.png
--- a/docs/chapter_data_structure/data_and_memory.md
+++ b/docs/chapter_data_structure/data_and_memory.md
@ -6,21 +6,15 @@ comments: true

 ## 基本数据类型

-谈到计算机中的数据，我们能够想到文本、图片、视频、语音、3D 模型等等，这些数据虽然组织形式不同，但是有一个共同点，即都是由各种基本数据类型构成的。
+谈到计算机中的数据，我们能够想到文本、图片、视频、语音、3D 模型等等，这些数据虽然组织形式不同，但都是由各种基本数据类型构成的。

-**「基本数据类型」是 CPU 可以直接进行运算的类型，在算法中直接被使用。**
+**「基本数据类型」是 CPU 可以直接进行运算的类型，在算法中直接被使用**。

 - 「整数」根据不同的长度分为 byte, short, int, long ，根据算法需求选用，即在满足取值范围的情况下尽量减小内存空间占用；
 - 「浮点数」代表小数，根据长度分为 float, double ，同样根据算法的实际需求选用；
 - 「字符」在计算机中是以字符集的形式保存的，char 的值实际上是数字，代表字符集中的编号，计算机通过字符集查表来完成编号到字符的转换。占用空间与具体编程语言有关，通常为 2 bytes 或 1 byte ；
 - 「布尔」代表逻辑中的“是”与“否”，其占用空间需要具体根据编程语言确定，通常为 1 byte 或 1 bit ；

-!!! note "字节与比特"
-
-    1 字节 (byte) = 8 比特 (bit) ， 1 比特即最基本的 1 个二进制位
-
-<p align="center"> Table. Java 的基本数据类型 </p>
-
 <div class="center-table" markdown>

 | 类别   | 符号        | 占用空间          | 取值范围                                       | 默认值         |
@ -40,9 +34,74 @@ comments: true

    以上表格中，加粗项在「算法题」中最为常用。此表格无需硬背，大致理解即可，需要时可以通过查表来回忆。

-**「基本数据类型」与「数据结构」之间的联系与区别**
+### 整数表示方式

-我们知道，数据结构是在计算机中 **组织与存储数据的方式**，它的主语是“结构”，而不是“数据”。比如，我们想要表示“一排数字”，自然应该使用「数组」这个数据结构。数组的存储方式使之可以表示数字的相邻关系、先后关系等一系列我们需要的信息，但至于其中存储的是整数 int ，还是小数 float ，或是字符 char ，**则与所谓的数据的结构无关了**。
+整数的取值范围取决于变量使用的内存长度，即字节（或比特）数。在计算机中， 1 字节 (byte) = 8 比特 (bit) ， 1 比特即 1 个二进制位。以 int 类型为例：
+
+1. 整数类型 int 占用 4 bytes = 32 bits ，因此可以表示 $2^{32}$ 个不同的数字；
+2. 将最高位看作符号位，$0$ 代表正数，$1$ 代表负数，从而可以表示 $2^{31}$ 个正数和 $2^{31}$ 个负数；
+3. 当所有 bits 为 0 时代表数字 $0$ ，从零开始增大，可得最大正数为 $2^{31} - 1$ ；
+4. 剩余 $2^{31}$ 个数字全部用来表示负数，因此最小负数为 $-2^{31}$ ；具体细节涉及到到“源码、反码、补码”知识，有兴趣的同学可以查阅学习；  
+
+其它整数类型 byte, short, long 取值范围的计算方法与 int 类似，在此不再赘述。
+
+### 浮点数表示方式 *
+
+细心的你可能会疑惑： int 和 float 长度相同，都是 4 bytes ，**但为什么 float 的取值范围远大于 int** ？这是因为浮点数 float 采用了不同的表示方式。
+
+IEEE 754 标准规定，32-bit 长度的 float 由以下部分构成：
+
+- 符号位 $\mathrm{S}$ ：占 1 bit ；
+- 指数位 $\mathrm{E}$ ：占 8 bits ；
+- 分数位 $\mathrm{N}$ ：占 24 bits ，其中 23 位显式存储；
+
+设 32-bit 二进制数的第 $i$ 位为 $b_i$ ，则 float 值的计算方法定义为
+
+$$
+\text { val } = (-1)^{b_{31}} \times 2^{\left(b_{30} b_{29} \ldots b_{23}\right)_2-127} \times\left(1 . b_{22} b_{21} \ldots b_0\right)_2
+$$
+
+转化到十进制下的计算公式为
+
+$$
+\text { val }=(-1)^{\mathrm{S}} \times 2^{\mathrm{E} -127} \times (1 + \mathrm{N})
+$$
+
+其中 $\mathrm{S} \in \{-1, 1\}$ , $\mathrm{E} \in \{ 1, 2, \dots, 254 \}$ , $(1 + \mathrm{N}) = 1+\sum_{i=1}^{23} b_{23-i} 2^{-i} \subset [1, 2 - 2^{-23}]$ 。
+
+![IEEE-754-float](data_and_memory.assets/IEEE-754-float.png)
+
+以上图为例，$\mathrm{S} = 0$ ， $\mathrm{E} = 124$ ，$\mathrm{N} = 2^{-2} + 2^{-3} = 0.375$ ，易得
+
+$$
+\text { val } = (-1)^0 \times 2^{124 - 127} \times (1 + 0.375) = 0.171875
+$$
+
+现在我们可以回答开始的问题：**float 的表示方式包含指数位，导致其取值范围远大于 int** 。根据以上计算， float 可表示的最大正数为 $2^{127} \times (2 - 2^{-23}) \approx 3.4 \times 10^{38}$ ，切换符号位便可得到最小负数。
+
+**浮点数 float 虽然拓展了取值范围，但副作用是牺牲了精度**。整数类型 int 将全部 32 位用于表示数字，数字是均匀分布的；而由于指数位的存在，浮点数 float 的数值越大，相邻两个数字之间的差值就会趋向越大。
+
+进一步地，指数位 $E = 0$ 和 $E = 255$ 具有特殊含义，**用于表示零、无穷大、$\mathrm{NaN}$ 等**。
+
+| 指数位 E           | 分数位 $\mathrm{N} = 0$ | 分数位 $\mathrm{N} \ne 0$    | 计算公式                                                     |
+| ------------------ | ----------------------- | ---------------------------- | ------------------------------------------------------------ |
+| $0$                | $\pm 0$                 | 次正规数（subnormal number） | $(-1)^{\mathrm{S}} \times 2^{-126} \times (0.\mathrm{N})$    |
+| $1, 2, \dots, 254$ | 正规数                  | 正规数                       | $(-1)^{\mathrm{S}} \times 2^{(\mathrm{E} -127)} \times (1.\mathrm{N})$ |
+| $255$              | $\pm \infty$            | $\mathrm{NaN}$               |                                                              |
+
+特别地，次正规数显著提升了小数精度：
+
+- 最小正正规数为 $2^{-126} \approx 1.18 \times 10^{-38}$ ；
+- 最小正次正规数为 $2^{-126} \times 2^{-23} \approx 1.4 \times 10^{-45}$ ；
+
+双精度 double 也采用类似 float 的表示方法，在此不再赘述。
+
+
+### 基本数据类型与数据结构的关系
+
+我们知道，**数据结构是在计算机中组织与存储数据的方式**，它的主语是“结构”，而不是“数据”。如果我们想要表示“一排数字”，自然想到使用「数组」数据结构。数组的存储方式可以表示数字的相邻关系、顺序关系，但至于其中存储的是整数 int ，还是小数 float ，或是字符 char ，**则与所谓的数据的结构无关了**。
+
+换言之，基本数据类型提供了数据的“内容类型”，而数据结构提供数据的“组织方式”。

 === "Java"

@ -105,7 +164,6 @@ comments: true
    float decimals[10];
    char characters[10];
    bool booleans[10];
-
    ```

 === "C#"
--- a/docs/chapter_introduction/algorithms_are_everywhere.md
+++ b/docs/chapter_introduction/algorithms_are_everywhere.md
@ -20,12 +20,16 @@ comments: true

 === "<1>"
    ![look_up_dictionary_step_1](algorithms_are_everywhere.assets/look_up_dictionary_step_1.png)
+
 === "<2>"
    ![look_up_dictionary_step_2](algorithms_are_everywhere.assets/look_up_dictionary_step_2.png)
+
 === "<3>"
    ![look_up_dictionary_step_3](algorithms_are_everywhere.assets/look_up_dictionary_step_3.png)
+
 === "<4>"
    ![look_up_dictionary_step_4](algorithms_are_everywhere.assets/look_up_dictionary_step_4.png)
+
 === "<5>"
    ![look_up_dictionary_step_5](algorithms_are_everywhere.assets/look_up_dictionary_step_5.png)

--- a/docs/chapter_preface/contribution.md
+++ b/docs/chapter_preface/contribution.md
@ -36,7 +36,7 @@ comments: true

 你可以使用 Docker 来部署本项目。

-```bash
+```shell
 git clone https://github.com/krahets/hello-algo.git
 cd hello-algo
 docker-compose up -d
@ -46,6 +46,6 @@ docker-compose up -d

 使用以下命令即可删除部署。

-```bash
+```shell
 docker-compose down
 ```