Oracle的实例恢复解析

407次阅读

共计 2986 个字符，预计需要花费 8 分钟才能阅读完成。

在 Oracle 数据库服务器异常断电重启后，数据库会进行实例恢复，那么实例恢复的过程中 Oracle 做了什么操作呢？参考官网在这里做一下解释，菜鸟水平有限，欢迎勘正。

首先说下实例恢复的定义：

Instance recovery is the process of applying records in the online redo log to data files to reconstruct changes made after the most recent checkpoint. Instance recovery occurs automatically when an administrator attempts to open a database that was previously shut down inconsistently.

Oracle Database performs instance recovery automatically in the following situations:

The database opens for the first time after the failure of a single-instance database or all instances of an Oracle RAC database. This form of instance recovery is also called crash recovery. Oracle Database recovers the online redo threads of the terminated instances together.
Some but not all instances of an Oracle RAC database fail. Instance recovery is performed automatically by a surviving instance in the configuration.

The SMON background process performs instance recovery, applying online redo automatically. No user intervention is required.

因此我们知道非一致性关闭会引发实例恢复（一致性关闭不会，参考 shutdown immediate 的官方定义）同时 RAC 节点宕机也会在一个存活节点进行实例恢复，其过程就是重构内存中的脏块并提交，同时对未提交的做出回滚，这个过程由 smon 后台进程负责。

实例恢复分两阶段：

1. 前滚：Rolling Forward

Oracle 根据 redo 日志中的记载：

1）对于提交的事务，根据日志进行内存中的脏块重现，然后进行 commit，并按正常机制正常写入磁盘。

2）对于未提交的事务，也根据 redo 进行脏块重现（原因是检查点队列里记录的未提交事务的脏块也可能在已提交事务的脏块之前），但只是重现，oracle 在此阶段完全不对这些脏块做其他操作。

由于一些大事务的更改已经被写入磁盘，以及前滚过程中生成的未提交事务的脏块，oracle 必须进行第二步的回滚。

2. 回滚：Rolling Back

对于所有未提交但是已经被写入磁盘的数据，oracle 根据 undo 进行回滚。

这里上一幅官网的图：

Oracle 的实例恢复解析

图：Basic Instance Recovery Steps: Rolling Forward and Rolling Back

图解：

我们看到实例恢复前 redo 日志中记载的日志对应着四种更改块（redo 只记载更改）：

1）已提交且被写入磁盘的更改块，oracle 对这种块无需做任何操作。

2）已提交但未被写入磁盘的更改块，oracle 会在前滚过程中在内存重现脏块，然后按正常机制提交。

3）未提交且未被写入磁盘的更改块。

4）未提交但已被写入磁盘的更改块。

由于回滚是按事务为单位进行处理的，因此对于 3、4 两种块的处理全部是在回滚阶段，oracle 根据 undo 进行所有未提交事务的回滚操作，用前镜像覆盖掉磁盘中的数据，这样就会处理掉第 3、4 种块。

此外，从上不难看出 oracle 默认 undo 中记载的事务进度是和 redo 日志中的完全一致的，不存在 undo 记载了事务被提交但是 redo 日志记录未提交的情况。

但是并不是所有情况都符合 Oracle 默认的预期，有时候数据库频繁掉电就可能出现无法成功进行实例恢复的情况，此时只能采取一些特殊手段对数据文件头和 SCN 做一些改动。

一般除非特别紧急的状况，否则不要用 BBED、强制推进 SCN 等”偏方“去打开数据库，对于一个成功的 DBA 来说，做好备份和灾备永远是最重要的工作。

Instance Recovery Phases
The first phase of instance recovery is called cache recovery or rolling forward, and involves reapplying all of the changes recorded in the online redo log to the data files. Because rollback data is recorded in the online redo log, rolling forward also regenerates the corresponding undo segments.

Rolling forward proceeds through as many online redo log files as necessary to bring the database forward in time. After rolling forward, the data blocks contain all committed changes recorded in the online redo log files. These files could also contain uncommitted changes that were either saved to the data files before the failure, or were recorded in the online redo log and introduced during cache recovery.

After the roll forward, any changes that were not committed must be undone. Oracle Database uses the checkpoint position, which guarantees that every committed change with an SCN lower than the checkpoint SCN is saved on disk. Oracle Database applies undo blocks to roll back uncommitted changes in data blocks that were written before the failure or introduced during cache recovery. This phase is called rolling back or transaction recovery.

更多 Oracle 相关信息见Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12

本文永久更新链接地址：http://www.linuxidc.com/Linux/2017-06/144601.htm

正文完

星哥玩云-微信公众号