问题描述:
因为一个系统2010-06-17日发生了一个异常数据操作问题,更新了一个表的200多万条记录,属于逻辑操作错误。维护人员与开发商提出查询2010-06-17日归档日志的要求,想通过日志分析了解是否进行了这个变更。因为需求提出是2010-06-22日,所以时间过了近一周,数据库的归档日志需要从netbackup服务器的带库中导出到本地,以便进行日志的分析工作。
[@more@]
在进行日志恢复的操作时,发生了错误,无法将日志恢复到本地盘,具体操作及描述如下:
通过RMAN可以查询出2010-06-17日日志文件的序列号(SEQUENCE)的范围是:17018 and 17105
RMAN>LIST BACKUP OF ARCHIVELOG ALL;
11701710123519578102 16-JUN-10 10123540590941 17-JUN-10
11701810123540590941 17-JUN-10 10123541270660 17-JUN-10
…...
11710410125218669876 17-JUN-10 10125219611361 17-JUN-10
11710510125219611361 17-JUN-10 10125220525468 18-JUN-10
11710610125220525468 18-JUN-10 10125222513280 18-JUN-10
…….
RMAN>run{
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
SET ARCHIVELOG DESTINATION TO '/archivelog';
RESTORE ARCHIVELOG SEQUENCE BETWEEN 17018 and 17105;
release channel ch00;}
输出:
channel ch00: starting archive log restore to user-specified destination
archive log destination=/archivelog
channel ch00: restoring archive log
archive log thread=1 sequence=17018
channel ch00: restoring archive log
。。。。。。
channel ch00: restoring archive log
archive log thread=1 sequence=17035
channel ch00: reading from backup piece al_1029_1_721945418
ORA-19870: error reading backup piece al_1029_1_721945418
ORA-19507: failed to retrieve sequential file, handle="al_1029_1_721945418", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received from media manager layer, error text:
Failed to open backup file for restore.
channel ch00: starting archive log restore to user-specified destination
archive log destination=/archivelog
。。。。。。
released channel: ch00
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 06/23/2010 08:45:36
RMAN-06026: some targets not found - aborting restore
RMAN-06025: no backup of log thread 1 seq 17105 lowscn 10125219611361 found to restore
RMAN-06025: no backup of log thread 1 seq 17104 lowscn 10125218669876 found to restore
。。。。。。
这时候担心是不是SBT_TAPE的通道出现问题,进行了一次当天归档日志的备份,发现备份是正常的,说明介质通道应该没有问题,但马上对这个备份进行恢复,也出现上面描述的相同问题。
打开client端的netbackup的日志,日志描述如下:
$cd/usr/openv/netbackup/logs/user_ops/dbext/logs
$ls -lt
-rw-r--r-- 1 oracle oinstall153 Jun 23 09:44 12444.0.1277257483
-rw-r--r-- 1 oracle oinstall153 Jun 23 09:28 8231.0.1277256490
-rw-r--r-- 1 oracle oinstall153 Jun 23 09:24 8231.0.1277256245
-rw-r--r-- 1 oracle oinstall 1348 Jun 23 09:22 8231.0.1277256110
$more12444.0.1277257483
Restore started Wed Jun 23 09:44:43 2010
09:44:58 client pdgis-ora peername PDGIS-ORA is invalid for restore request
09:44:59 INF - Server status = 37
担心是不是直接使用rman访问netbackup的设备,存在部分参数没有设置正常的问题,拷贝了netbackup的恢复文件,并做了修改生成新的恢复脚本文件,执行编辑后脚本文件,问题仍然存在。
问题分析
1.首先寻找ORA-19511错误的解决,在google上找到下面的文章
文中描述:
通过查询metalink和google建议检查
On a Windows master server, run the command:
VERITASNetBackupbinadmincmdbpgetconfig
DISALLOW_CLIENT_LIST_RESTORE = YES
DISALLOW_CLIENT_RESTORE = YES
建议改为
DISALLOW_CLIENT_LIST_RESTORE = NO
DISALLOW_CLIENT_RESTORE = NO
或netbackup备份软件修改master server属性,选择client attributes
给allow client browse和allow client restore钩上即可
然后重新启动netbackup服务,再执行restore archivelog正常
进入netbackup服务器,发现client的配置已经是allow client browse和allow client restore钩上了,所以问题依然没有解决。
2.查找ORA-27029: skgfrtrv: sbtrestore returned error,找到下面的文章,
执行了sbttest命令,测试通道的有效性。结果如下:
[oracle@pdgisdb bin]$ sbttest sbt_tape
The sbt function pointers are loaded from libobk.so library.
-- sbtinit succeeded
-- sbtinit (2nd time) succeeded
sbtinit: Media manager supports SBT API version 2.0
sbtinit: Media manager is version 5.0.0.0
sbtinit: vendor description string=Veritas NetBackup for Oracle - Release 6.5 (2007072323)
sbtinit: allocated sbt context area of 8 bytes
sbtinit: proxy copy is supported
-- sbtinit2 succeeded
-- regular_backup_restore starts ................................
-- sbtbackup succeeded
write 100 blocks
-- sbtwrite2 succeeded
-- sbtclose2 succeeded
sbtinfo2: SBTBFINFO_NAME=sbt_tape
sbtinfo2: SBTBFINFO_SHARE=multiple users
sbtinfo2: SBTBFINFO_ORDER=sequential access
sbtinfo2: SBTBFINFO_LABEL=G:
sbtinfo2: SBTBFINFO_CRETIME=Wed Jun 23 10:55:41 2010
sbtinfo2: SBTBFINFO_EXPTIME=Sat Jul 24 10:55:41 2010
sbtinfo2: SBTBFINFO_COMMENT=Backup ID : pdgis-ora_1277261741
sbtinfo2: SBTBFINFO_METHOD=stream
-- sbtinfo2 succeeded
MMAPI error from sbtrestore: 7501, Failed to open backup file for restore.
-- sbtrestore failed
通过这个命令的输出结果,证明了rman恢复归档日志失败的原因不是oracle的问题,应该是介质管理的问题,是netbackup的server或者netbackup的client配置不正确的问题。
在ORACLE metalink上搜索“ORA-19511: Error received from media manager layer, error text:
Failed to open backup file for restore.”,找到了下面的文章。
ORA-27029, ORA-19511 and Veritas NetBackup status code 135. [ID 335850.1]
源文档 <>
文章中详细解释了问题的发生原因
Symptoms
RMAN-03002: failure of restore command at 09/27/2005 12:41:54
ORA-: failed to retrieve sequential file, handle="cntrl_1984_1_569535276",parms=""
ORA-: skgfrtrv: sbtrestore returned error
ORA-: Error received from media manager layer, error text: Failed to open for restore.
NetBackup status code 135.
Cause
Media Manager unable to find the backup file.
NetBackup status code 135: client peername is invalid for restore request
The master server authenticates the host requesting an Oracle RMAN restore by
performing a reverse IP lookup, gethostbyaddr().However, the packet transporting
the restore request was transmitted from an interface on the client which resolves
to a hostname which does not match the client name which performed the backup.
Hence, the NBU master server rejects the restore request.
Solution
Check that the Hostname or Ipaddress is set properly.
Refer
Article from Veritas:
源文档 <>
并且点出veritas的相关技术文档即
该文档中详细描述了问题,如下:
When using NetBackup Database extension for Oracle, a restore fails with RMAN error ORA-27029 and NetBackup status code 135.
Exact Error Message
ORA-27029: skgfrtrv: sbtrestore returned error
status code 135: client peername is invalid for restore request
Details:
Detailed Problem Description:
The master server authenticates the host requesting an Oracle RMAN restore by performing a reverse IP lookup, gethostbyaddr(). However, the packet transporting the restore request was transmitted from an interface on the client which resolves to a hostname which does not match the client name which performed the backup. Hence, the NBU master server rejects the restore request.
The host in the following example is named devo which resolves to NIC 172.31.46.28 and has a second NIC, named devo-b which resolves to 10.1.100.10, over which the backups and restores are to occur. The CLIENT_NAME in /usr/openv/netbackup/bp.conf is set to 'devo-b'.
The /usr/openv/netbackup/logs/bprd on the master server logged the failed validation request, peername is the result of gethostbyaddr():
16:08:26 [12763] <4> get_ccname: configured name is: devo-b
16:08:26 [12763] <2> process_request: restore request 66, bufr = 329199 66 oracle oinstall devo-b devo devo devo-b /usr/openv/netbackup/logs/user_ops/dbext/logs/11876.0.1019592506 NONE NONE 0 1019592506 1019586470 1019586470 1019592506 4 0 0 0 0 12 0 4 0 1 10004 0 0 0 C C C C C 0 1 0 1 0 0 0 0 9
16:08:26 [12763] <2> process_request: As rcvd from client:
16:08:26 [12763] <2> process_request: browse_clnt: devo-b
16:08:26 [12763] <2> process_request: requesting_clnt: devo
16:08:26 [12763] <2> process_request: destination_clnt: devo
16:08:26 [12763] <2> process_request: clnt_bp_conf_name: devo-b
16:08:26 [12763] <2> process_request: peername: devo-b
16:08:26 [12763] <2> process_request: ccname: devo-b
16:08:26 [12763] <2> process_request: keyword =
16:08:26 [12763] <2> process_request: restore_format: 0
16:08:26 [12763] <2> process_request: true_image: 0
16:08:26 [12763] <2> process_request: mpx_restore_possible: 1
16:08:26 [12763] <4> get_type_of_client_port: db_getCLIENT() failed: no entity was found (227)
16:08:26 [12763] <2> validate_hostname:Unknown hostname devo, switching to peername devo-b.
...lines deleted...
16:08:27 [12763] <4> get_type_of_client_free_browse: db_getCLIENT_by_hostname() failed: no entity was found (227)
...lines deleted...
16:08:27 [12763] <16> process_request: client devo-b peername devo-b is invalid for restore request
The error can also be verified by inspecting the progress file on the client if the /usr/openv/netbackup/logs/bprd log is not available, because the master server recorded the error in the progress file ( /usr/openv/netbackup/logs/user_ops/dbext/logs/ ):
16:08:27 client devo-b peername devo-b is invalid for restore request
16:08:28 INF - Server status = 135
Likewise, the/usr/openv/netbackup/logs/dbclient log on the Oracle database host also reflects progress file entry, rejecting the restore request:
System name: SunOS
Node name: devo
Release: 5.8
Version: Generic_108528-12
Machine: sun4u
User name: oracle
Group name: oinstall
Client Host: devo
...lines deleted...
15:29:02 [26493] <4> sendRequest: sending RESTORE request to bprd
15:29:02 [26493] <4> sendRequest: request:
15:29:02 [26493] <2> getsockconnected: host=nbu service=bprd address=10.1.100.29 protocol=tcp non-reserved port=13720
15:29:02 [26493] <2> bind_on_port_addr: bound to port 52000
15:29:02 [26493] <2> bprd_connect: no authentication required
15:29:03 [26493] <4> sendRequest: sending buf = 1019586470 1019586470 /cf_TRMAN2_t459959266_s112_p1
15:29:03 [26493] <4> sendRequest: Date range: ,
15:29:03 [26493] <4> serverResponse: entering serverResponse.
15:29:03 [26493] <4> serverResponse: initial client_read_timeout = <900>
15:29:08 [26493] <4> serverResponse: read comm client devo-b peername devo-b is invalid for restore request>
15:29:08 [26493] <4> serverResponse: read comm INF - Server status = 135>
15:29:08 [26493] <16> serverResponse: ERR - server exited with status 135: client is not validated to perform the requested operation
...lines deleted...
15:29:08 [26493] <4> closeApi: INF - EXIT STATUS 5: the restore failed to recover the requested files
Additional Environment Information:
Oracle 8.1.6, Solaris 2.6/Solaris 7/Solaris 8, HP-UX 11.00/HP-UX 11.11
Solution:
Add REQUIRED_INTERFACE =to the /usr/openv/netbackup/bp.conffile on the client host.
Example:
CLIENT_NAME = devo-b
REQUIRED_INTERFACE = devo-b
源文档 <>
问题解决
通过分析与文档的解释,确定了问题所在,即netbackup中设置的client name与client端的hostname不相同,导致了恢复的时候,无法找到正确的途径,导致恢复失败。
将netbackup服务器上的client name修改成netbackup客户端的hostname,重新运行sbttest sbt_tape,restore测试通过。
运行归档日志文件的恢复,也成功。
RMAN>run{
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
SET ARCHIVELOG DESTINATION TO '/archivelog';
RESTORE ARCHIVELOG SEQUENCE BETWEEN 17018 and 17105;
release channel ch00;}
allocated channel: ch00
channel ch00: sid=914 devtype=SBT_TAPE
channel ch00: Veritas NetBackup for Oracle - Release 6.5 (2007072323)
executing command: SET ARCHIVELOG DESTINATION
Starting restore at 23-JUN-10
channel ch00: starting archive log restore to user-specified destination
archive log destination=/archivelog
channel ch00: restoring archive log
archive log thread=1 sequence=17018
。。。。。。
channel ch00: restoring archive log
archive log thread=1 sequence=17035
channel ch00: reading from backup piece al_1029_1_721945418
channel ch00: restored backup piece 1
piece handle=al_1029_1_721945418 tag=TAG20100617T202337
channel ch00: restore complete, elapsed time: 00:06:07
channel ch00: starting archive log restore to user-specified destination
。。。。。。
archive log thread=1 sequence=17105
channel ch00: reading from backup piece al_1039_1_722031789
channel ch00: restored backup piece 1
piece handle=al_1039_1_722031789 tag=TAG20100618T202308
channel ch00: restore complete, elapsed time: 00:06:27
Finished restore at 23-JUN-10
released channel: ch00
问题分析与总结
netbackup服务器配置client时,client的名称最好与netbackup的客户端计算机的hostname相同,这样可以避免无法恢复的问题。