current online redo log缺失后的恢复

1,717次阅读

共计 9545 个字符，预计需要花费 24 分钟才能阅读完成。

有朋友跟我说了一个关于 ORA-600[4000]错误的恢复，他是这么做的：
1、插入了 2000 条记录，但不 commit；
2、马上 shutdown abort；
3、接着 rm 掉所有的 redo log；
他尝试过一些恢复手段后，碰到了 ORA-600[4000]错误。
此时他陷入了一种两难的境地：如果他不用_corrupted_rollback_segments，则上述 ORA-600[4000]无法
解决；如果他用了_corrupted_rollback_segments，则 Oracle 报错：
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number with name “” too small

我看了他发过来的 trace 文件和操作步骤，里面还是有一些不必要的操作。

这样吧，我在这里就把他做过的事情再做一遍，然后我会尝试恢复上述数据库，希望如下的过程能对朋友们有所帮助：

首先我插入 2000 条记录但不 commit：
SQL_testdb>conn scott/tiger@testdb;
Connected.
SQL_testdb>create table testtb (type number,ts timestamp);

Table created.

SQL_testdb>begin
2 for i in 1..2000 loop
3 insert into testtb values (i,sysdate);
4 end loop;
5 end;
6 /

PL/SQL procedure successfully completed.

然后起另外一个 session，执行 shutdown abort：
SQL_testdb>shutdown abort
ORACLE instance shut down.

最后我把所有的 redo log 都 rm 掉：
$ ls -l
total 2543072
-rw-r—– 1 oracle dba 1531904 Dec 23 18:59 control01.ctl
-rw-r—– 1 oracle dba 1531904 Dec 23 18:59 control02.ctl
-rw-r—– 1 oracle dba 1531904 Dec 23 18:59 control03.ctl
-rw-r–r– 1 oracle dba 20979712 Dec 23 18:27 cwmlite01.dbf
-rw-r–r– 1 oracle dba 20979712 Dec 23 18:27 drsys01.dbf
-rw-r–r– 1 oracle dba 144842752 Dec 23 18:27 example01.dbf
-rw-r–r– 1 oracle dba 26222592 Dec 23 18:27 indx01.dbf
-rw-r–r– 1 oracle dba 18 Dec 23 18:47 login.sql
-rw-r–r– 1 oracle dba 20979712 Dec 23 18:27 odm01.dbf
-rw-r—– 1 oracle dba 104858112 Dec 23 18:27 redo01.log
-rw-r—– 1 oracle dba 104858112 Dec 23 18:27 redo02.log
-rw-r—– 1 oracle dba 104858112 Dec 23 18:59 redo03.log
-rw-r–r– 1 oracle dba 419438592 Dec 23 18:58 system01.dbf
-rw-r–r– 1 oracle dba 42999808 Dec 23 18:17 temp01.dbf
-rw-r–r– 1 oracle dba 10493952 Dec 23 18:27 tools01.dbf
-rw-r–r– 1 oracle dba 209723392 Dec 23 18:59 undotbs01.dbf
-rw-r–r– 1 oracle dba 26222592 Dec 23 18:27 users01.dbf
-rw-r–r– 1 oracle dba 39985152 Dec 23 18:27 xdb01.dbf
$rm redo*.log
$ ls -l
total 1928648
-rw-r—– 1 oracle dba 1531904 Dec 23 18:59 control01.ctl
-rw-r—– 1 oracle dba 1531904 Dec 23 18:59 control02.ctl
-rw-r—– 1 oracle dba 1531904 Dec 23 18:59 control03.ctl
-rw-r–r– 1 oracle dba 20979712 Dec 23 18:27 cwmlite01.dbf
-rw-r–r– 1 oracle dba 20979712 Dec 23 18:27 drsys01.dbf
-rw-r–r– 1 oracle dba 144842752 Dec 23 18:27 example01.dbf
-rw-r–r– 1 oracle dba 26222592 Dec 23 18:27 indx01.dbf
-rw-r–r– 1 oracle dba 18 Dec 23 18:47 login.sql
-rw-r–r– 1 oracle dba 20979712 Dec 23 18:27 odm01.dbf
-rw-r–r– 1 oracle dba 419438592 Dec 23 18:58 system01.dbf
-rw-r–r– 1 oracle dba 42999808 Dec 23 18:17 temp01.dbf
-rw-r–r– 1 oracle dba 10493952 Dec 23 18:27 tools01.dbf
-rw-r–r– 1 oracle dba 209723392 Dec 23 18:59 undotbs01.dbf
-rw-r–r– 1 oracle dba 26222592 Dec 23 18:27 users01.dbf
-rw-r–r– 1 oracle dba 39985152 Dec 23 18:27 xdb01.dbf

好了，现在我们来开始恢复。
现在直接 open resetlog 肯定是打不开的：
SQL_testdb>startup mount
ORACLE instance started.

Total System Global Area 505382744 bytes
Fixed Size 743256 bytes
Variable Size 285212672 bytes
Database Buffers 218103808 bytes
Redo Buffers 1323008 bytes
Database mounted.

SQL_testdb>recover database until cancel;
ORA-00279: change 188425 generated at 12/23/2009 18:27:25 needed for thread 1
ORA-00289: suggestion : /cadrasu01/app/oracle/product/9.2.0/dbs/arch1_1.dbf
ORA-00280: change 188425 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/dras21/testdb/system01.dbf’

ORA-01112: media recovery not started

SQL_testdb>alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/dras21/testdb/system01.dbf’

我们来创建 pfile：
SQL_testdb>create pfile=’/dras21/testdb/inittestdb.ora’ from spfile;

File created.

然后我们在 pfile 中加入 *._allow_resetlogs_corruption=TRUE 后再次 open resetlog：
SQL_testdb>startupmountpfile=’/dras21/testdb/inittestdb.ora’;
ORACLE instance started.

Total System Global Area 505382744 bytes
Fixed Size 743256 bytes
Variable Size 285212672 bytes
Database Buffers 218103808 bytes
Redo Buffers 1323008 bytes
Database mounted.

SQL_testdb>alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced

此时的 alert log 里会记录:
Errors in file /cadrasu01/app/oracle/admin/testdb/udump/testdb_ora_7090390.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number with name “” too small
Wed Dec 23 19:20:51 2009
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 7090390
ORA-1092 signalled during: alter database open resetlogs…
这里我们重现了那位朋友提到的错误。

上述 trace 文件（即 /cadrasu01/app/oracle/admin/testdb/udump/testdb_ora_7090390.trc）不具备参
考价值，我们来做一个 10046，步骤为：
SQL_testdb>startupmountpfile=’/dras21/testdb/inittestdb.ora’;
ORACLE instance started.

Total System Global Area 505382744 bytes
Fixed Size 743256 bytes
Variable Size 285212672 bytes
Database Buffers 218103808 bytes
Redo Buffers 1323008 bytes
Database mounted.

SQL_testdb>oradebug setmypid
Statement processed.

SQL_testdb>oradebug unlimit
Statement processed.

SQL_testdb>oradebug event 10046 trace name context forever,level 12
Statement processed.

SQL_testdb>recover database until cancel;
ORA-00279: change 208432 generated at 12/23/2009 19:45:06 needed for thread 1
ORA-00289: suggestion : /cadrasu01/app/oracle/product/9.2.0/dbs/arch1_1.dbf
ORA-00280: change 208432 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/dras21/testdb/system01.dbf’

ORA-01112: media recovery not started

SQL_testdb>alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced

此时，我们再去分析上述 10046 产生的 trace 文件，直接看上述 trace 文件的最末尾，我们发现有这样的内
容：
EXEC #10:c=0,e=236,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=1232000925678362
WAIT #10: nam=’db file sequential read’ ela= 176p1=1 p2=202p3=1
WAIT #10: nam=’db file sequential read’ ela= 183p1=1 p2=106p3=1
FETCH #10:c=0,e=454,p=2,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=1232000925678842
FETCH #4:c=0,e=1713,p=3,cr=5,cu=0,mis=0,r=0,dep=1,og=4,tim=1232000925678907
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number with name “” too small
EXEC #1:c=200000,e=5922169,p=75,cr=705,cu=1,mis=0,r=0,dep=0,og=4,tim=1232000925679637
ERROR #1:err=1092 tim=1602843205
也就是说 oracle 在读 file 1，block 202 和 file 1，block 106 的时候可能出了问题。
我用 BBED 看了上述 block，没发现有什么异常，也就是说这条路已经走不下去了。

好了，我们来换一条路，直接 open：
SQL_testdb>alter database open;
alter database open
*
ERROR at line 1:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [0], [57], [], [], [], [], []
呵呵，见到 4193 了，那这个库肯定可以打开了。

我们把 *._corrupted_rollback_segments=(_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5
$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$)加到上述 pfile 里再次 open：
SQL_testdb>startupmountpfile=’/dras21/testdb/inittestdb.ora’;
ORACLE instance started.

Total System Global Area 505382744 bytes
Fixed Size 743256 bytes
Variable Size 285212672 bytes
Database Buffers 218103808 bytes
Redo Buffers 1323008 bytes
Database mounted.

SQL_testdb>alter database open;

Database altered.

好了，现在我们已经成功把上述数据库给打开了，接下来的收尾工作就是要切换 undo tablespace，重建
spfile，这个大家肯定都知道，我这里无需再赘述。

最后我们来看一下那 2000 条数据还在不在：
SQL_testdb>conn scott/tiger@testdb;
Connected.

SQL_testdb>select count(*) from testtb;
select count(*) from testtb
*
ERROR at line 1:
ORA-00942: table or view does not exist

正如 kamus 所说，online redo log 缺失通常意味着必然会有数据的丢失。
Categories:
oracle experiences
Tags:
recover of missing current online redo log
No TrackBacks
TrackBack URL: http://dbsnake.com/cgi-bin/mt/mt-tb.cgi/98
1 Comment
张晓明 |December 24, 2009 11:07 AM|Reply
我又做了一次，新建了个库（原来那个库没了）
建库后
session 1
SQL> create tablespace zxm datafile ‘/zxm/testzxm1/zxm.dbf’ size 100m
2 ;
Tablespace created.
SQL> create user zxm identified by zxm default tablespace zxm;
User created.
SQL> grant connect,resource to zxm;
Grant succeeded.
SQL> alter system archive log current;
System altered.
SQL> alter system switch logfile;
System altered.
SQL> conn zxm/zxm
Connected.
SQL> create table a (id number,name varchar2(200));
Table created.
SQL> declare
2 i number;
3 begin
4 i:=1;
5 while i 6 loop
7 insert into a values (i,’asfdasfasfd’||i);
8 i:=i+1;
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
session 2
SQL> shutdown abort;
ORACLE instance shut down.
rm *.log
然后开始测试
SQL> startup mount;
SQL> recover database until cancel;
ORA-00279: change 179545 generated at 12/24/2009 09:17:16 needed for thread 1
ORA-00289: suggestion : /zxm/testzxm1/1_10_706437772.dbf
ORA-00280: change 179545 for thread 1 is in sequence #10
Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/zxm/testzxm1/testzxm1/system01.dbf’
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/zxm/testzxm1/testzxm1/system01.dbf’
SQL> create pfile=’/zxm/inita.ora’ from spfile;
File created.
添加 _resetlogs…. 参数
SQL> startup pfile=’/zxm/inita.ora’ mount;
ORACLE instance started.
Total System Global Area 209715200 bytes
Fixed Size 2019608 bytes
Variable Size 79695592 bytes
Database Buffers 121634816 bytes
Redo Buffers 6365184 bytes
Database mounted.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 5 was created in the future of incomplete recovery
ORA-01110: data file 5: ‘/zxm/testzxm1/zxm.dbf’
SQL> alter database datafile 5 offline drop;
Database altered.
SQL> alter database open resetlogs;
Database altered.