早上接到应用说数据库连接不上,登陆服务器发现2节点登陆不上,1节点可用登陆
[oracle@xsdbd32 ~]$ sqlplus testconn/oracle@10.10.10.159:1521/ngjkdb1
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 12 11:07:19 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-12537: TNS:connection closed
[oracle@xsdbd31 ~]$ sqlplus system/oracle@10.10.10.10:1521/ngjkdb1
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 12 09:27:26 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-01017: invalid username/password; logon denied
检查2遍的监听状态,看上去都正常
[oracle@xsdbd31 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-OCT-2018 09:26:12
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date 12-OCT-2018 09:25:49
Uptime 0 days 0 hr. 0 min. 22 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /grid/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /grid/app/grid/diag/tnslsnr/xsdbd31/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.10)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.158)(PORT=1521)))
Services Summary...
Service "lzldb1" has 1 instance(s).
Instance "lzldb11", status READY, has 1 handler(s) for this service...
Service "lzldb1XDB" has 1 instance(s).
Instance "lzldb11", status READY, has 1 handler(s) for this service...
The command completed successfully
测试连接性
1节点正常连接
[oracle@xsdbd31 ~]$ sqlplus system/oracle@10.10.10.10:1521/lzldb1
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 12 09:27:26 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-01017: invalid username/password; logon denied
2节点异常
SQL> create user testconn identified by oracle;
User created.
SQL> grant connect to testconn;
Grant succeeded.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
[oracle@xsdbd32 ~]$ sqlplus testconn/oracle@10.10.10.159:1521/lzldb1
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 12 09:58:57 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-12537: TNS:connection closed
Enter user-name: testconn
Enter password:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>
数据库连上去了,但是不知道是哪个节点
SQL> grant dba to testconn
[oracle@xsdbd32 ~]$ sqlplus testconn/oracle@10.10.10.159:1521/lzldb1
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 12 11:07:19 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-12537: TNS:connection closed
Enter user-name: testconn
Enter password:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> show parameter name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cell_offloadgroup_name string
db_file_name_convert string
db_name string lzldb1
db_unique_name string lzldb1
global_names boolean FALSE
instance_name string lzldb12
lock_name_space string
log_file_name_convert string
processor_group_name string
service_names string lzldb1
确实连上了2节点
这里可用分析出,我的sqlplus连接请求应该是被数据库接受到了,不然不可能登陆的上数据库
在最初是请求中,收到了tns的报错,然后再输入用户密码又进去了
检查告警日志无异常,cpu和内存,swap都很空闲
2节点监听日志:
12-OCT-2018 09:35:20 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user))(SERVICE_NAME=lzldb1)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user)8
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
12-OCT-2018 09:35:20 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user))(SERVICE_NAME=lzldb1)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user)8
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
12-OCT-2018 09:35:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user))(SERVICE_NAME=lzldb1)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user)8
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Linux Error: 32: Broken pipe
重启监听,无用,重启集群,监听日志仍然报错,走监听无法连接2节点
检查tcp协议
[root@xsdbd31 ~]# netstat -anop|grep tnslsnr
tcp 0 0 10.10.10.160:1521 0.0.0.0:* LISTEN 34242/tnslsnr off (0.00/0/0)
tcp 0 0 10.10.10.158:1521 0.0.0.0:* LISTEN 33996/tnslsnr off (0.00/0/0)
tcp 0 0 10.10.10.10:1521 0.0.0.0:* LISTEN 33996/tnslsnr off (0.00/0/0)
tcp 0 0 10.10.10.160:1521 10.10.10.11:15896 ESTABLISHED 34242/tnslsnr keepalive (2746.94/0/0)
tcp 0 0 10.10.10.158:1521 10.10.10.10:25658 ESTABLISHED 33996/tnslsnr keepalive (1337.91/0/0)
tcp 0 0 10.10.10.158:1521 10.10.10.10:25676 ESTABLISHED 33996/tnslsnr keepalive (1370.68/0/0)
tcp 0 0 10.10.10.160:1521 10.10.10.10:19531 ESTABLISHED 34242/tnslsnr keepalive (1370.68/0/0)
tcp6 0 0 ::1:52981 ::1:6100 ESTABLISHED 34242/tnslsnr keepalive (1370.68/0/0)
tcp6 0 0 ::1:52967 ::1:6100 ESTABLISHED 33996/tnslsnr keepalive (1337.91/0/0)
unix 2 [ ACC ] STREAM LISTENING 3357737141 33996/tnslsnr /var/tmp/.oracle/s#33996.1
unix 2 [ ACC ] STREAM LISTENING 3357737142 33996/tnslsnr /var/tmp/.oracle/s#33996.2
unix 2 [ ACC ] STREAM LISTENING 3357737140 33996/tnslsnr /var/tmp/.oracle/sLISTENER
unix 2 [ ACC ] STREAM LISTENING 3357740672 34242/tnslsnr /var/tmp/.oracle/sLISTENER_SCAN1
unix 2 [ ACC ] STREAM LISTENING 3357740673 34242/tnslsnr /var/tmp/.oracle/s#34242.1
unix 2 [ ACC ] STREAM LISTENING 3357740674 34242/tnslsnr /var/tmp/.oracle/s#34242.2
unix 3 [ ] STREAM CONNECTED 3357741362 34242/tnslsnr /var/tmp/.oracle/sLISTENER_SCAN1
unix 3 [ ] STREAM CONNECTED 3357730674 33996/tnslsnr /var/tmp/.oracle/sLISTENER
用strace更正lsnrctl进程,期间用sqlplus登陆数据库
strace -o/tmp/tnslsnr1.log -p 33996
查看生成的trace文件,也没有看出什么
检查监听进程权限和监听执行文件权限,没有问题
[root@xsdbd31 ~]# ps -ef|grep tnslsnr
grid 33996 1 0 09:25 ? 00:00:01 /grid/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 34242 1 0 09:26 ? 00:00:00 /grid/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
[grid@xsdbd32 ~]$ ls -lrt $ORACLE_HOME/bin/tnslsnr
-rwxr-x--x. 1 grid oinstall 974016 Jan 26 2018 /grid/app/11.2.0/grid/bin/tnslsnr
[root@xsdbd32 ~]# su - oracle
Last login: Fri Oct 12 12:13:38 CST 2018 on pts/0
[oracle@xsdbd32 ~]$ ls -lrt $ORACLE_HOME/bin/tnslsnr
-rwxr-x--x. 1 oracle oinstall 974016 Jan 25 2018 /oracle/app/oracle/product/11.2.0/db_1/bin/tnslsnr
检查oracle执行文件权限
[oracle@xsdbd32 trace]$ cd $ORACLE_HOME/bin
[oracle@xsdbd32 bin]$ ls -l oracle
-rwxr-s--x. 1 oracle asmadmin 239889136 Oct 11 23:27 oracle
这里少了一个s,正确的权限应该是 :6751 -rwsr-s--x oracle asmadmin
chmod 6751 oracle
重启下2节点上的数据库实例和监听,2实例终于可以正常登陆了
[oracle@xsdbd32 ~]$ sqlplus testconn/oracle@10.10.10.159/lzldb1
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 12 12:24:48 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>
但是
1、2节点监听日志仍然在报错:
12-OCT-2018 12:17:34 * <unknown connect data> * 12537
TNS-12537: TNS:connection closed
TNS-12560: TNS:protocol adapter error
TNS-00507: Connection closed
Linux Error: 115: Operation now in progress
Fri Oct 12 12:17:39 2018
12-OCT-2018 12:17:39 * <unknown connect data> * 12537
TNS-12537: TNS:connection closed
TNS-12560: TNS:protocol adapter error
TNS-00507: Connection closed
Linux Error: 115: Operation now in progress
12-OCT-2018 12:17:41 * service_update * lzldb11 * 0
12-OCT-2018 12:17:44 * <unknown connect data> * 12537
TNS-12537: TNS:connection closed
TNS-12560: TNS:protocol adapter error
TNS-00507: Connection closed
现在应用是正常了,数据库可以通过scan ip,vip正常连接
unknown connect data表示连接数据库的客户端在访问监听时没有给出正确的data,也就是说客户端访问了监听的端口,但是没有合法的连接信息。
开启监听trace跟踪该问题:(监听trace可以参考我的文章https://blog.csdn.net/qq_40687433/article/details/83089218)
LSNRCTL> set trc_level 16
LSNRCTL> show trc_file
直接读取trace文件,不格式化。在trace中找到相应ip
2018-10-15 17:56:13.800070 : nttvlser:valid node check on incoming node 10.xxx.xxx.xx4
总结:
1.在打完补丁或者对数据库完成任何大动作后,要手动测试下数据库的连接性,监听看上去正常也不一定能够连接数据库。
2. $ORACLE_HOME/bin/oracle文件权限很容易在打完补丁后被更改,需要手动更改回来。
其实在打完gi后,数据库拉不起,我改了$ORACLE_HOME/bin/oracle权限后还是不行,最后重启了crs再拉才拉起来。
$ORACLE_HOME/bin/oracle还会影响监听,就跟这次的问题一样,所有资源都正常,但是就是连不上。
3.dba是个细致的活,本来在$ORACLE_HOME/bin/oracle上已经栽过很多跟头了,也很注意的去检查了权限,但是还是看飘了,少了个s没有注意到。
这种问题去查又很难查到问题的根本原因,花了大力气,其实就是改下权限而已。
来源:CSDN
作者:Littleforest62
链接:https://blog.csdn.net/qq_40687433/article/details/83027830