小型机AIX系统巡检

吃可爱长大的小学妹 提交于 2019-12-10 19:46:16

一、硬件及外观指示灯巡检

硬件外观指示灯查看,检查系统硬件设备故障灯是否有亮。管理设备为HWC
1.电源
(1)检查主电源灯状态:指示灯常绿表示系统正在运行。
(2)检查直流电源指示灯状态:指示灯常绿表示电源供电正常
2.风扇
检查直流电源风扇状态:检查风扇是否散热

3.拓展:

(1)系统错误报告
a.# errpt -d H -T PERM //硬件的错误报告
b.# errpt -d S -T PERM //软件的错误报告
c.# errpt -aj *******|more //具体的错误信息
d.# errpt -d H -T PERM>/tmp/hwerror.log //可以把错误的报告存成一个文件

(2)系统硬件诊断
#diag
-> Diagnostic Routines
-> System Verification
-> All Resources
-> F7 或者 Esc+7

(3)显示内核启用的是32位还是64位
#bootinfo -K

(4)显示硬件32位还是64位:
#bootinfo -y

(5)显示以KB为单位的实际内存:
#bootinfo -r
32505856

(6)显示系统上的硬盘数量
#lspv
hdisk0 00c7c505bc0669c5 rootvg active
hdisk1 00c7c50592cdd77a rootvg active
hdisk2 00cb9934c0a92e73 datavg active
hdisk3 00c7c505ce5e6688 datavg active

二、系统情况查看

1.设备名、设备型号、产品序列号:

  • 命令:# uname -Mu //查看机器型号
    IBM,8233-E8B IBM,02064F5BR
    前部分为系统型号(可用命令:# uname -M查看)
    IBM,8233-E8B
    后部分为系统ID编号(可用命令:# uname -u查看)
    IBM,02064F5BR
    显示运行系统的硬件的计算机ID编号(可用命令:# uname -m查看)
    00F74F5B4C00
    显示uname的很多信息(系统名称、节点名称、版本、计算机ID)用命令:# uname -a
    AIX erpdb2 1 6 00F74F5B4C00

  • 拓展:
    (1)查看系统硬件资源列表:
    #lscfg
    (2) 查看芯片类型:
    #uname -p
    powerpc
    (3)显示系统名称:
    #uname -s
    AIX
    (4)显示节点名称:
    #uname -n
    erpdb2
    (5) 显示uname的很多信息(系统名称、节点名称、版本、计算机ID):
    #uname -a
    AIX erpdb2 1 6 00F74F5B4C00
    (6) 显示系统型号:
    #uname -M
    IBM,8233-E8B
    (7)显示操作系统版本:
    #uname -v
    (8) 显示运行系统的硬件的计算机ID编号:
    #uname -m
    00F74F654C00
    (9)显示系统ID编号:
    #uname -u
    IBM,02064F65R
    (10) 显示AIX的主要版本、次要版本和维护级:
    #oslevel -r
    6100-07
    root@erpdb2:/root# lslpp -h bos.rte
    Fileset Level Action Status Date Time


Path: /usr/lib/objrepos
bos.rte
6.1.7.0 COMMIT COMPLETE 08/11/13 11:38:42

Path: /etc/objrepos
bos.rte
6.1.7.0 COMMIT COMPLETE 08/11/13 11:38:42

2.处理器

(1)执行命令:lsdev –Cc processor
root@erpdb2:/root# lsdev -Cc processor
proc0 Available 00-00 Processor
proc4 Available 00-04 Processor
proc8 Available 00-08 Processor
proc12 Available 00-12 Processor
proc16 Available 00-16 Processor
proc20 Available 00-20 Processor
proc24 Available 00-24 Processor
proc28 Available 00-28 Processor
(2)情况说明:检查CPU的数量及状态。CPU状态为Available,表示CPU使用正常。

3.内存

(1)执行命令:lsattr –El mem0
root@erpdb2:/root# lsattr -E1 mem0
lsattr: Not a recognized flag: 1

Usage:
lsattr {-D[-O]| -E[-O] | -F Format [-Z Character]} -l Name [-a Attribute]…[-H]
[-f File]
lsattr {-D[-O]| -F Format [-Z Character]}{[-c Class][-s Subclass][-t Type]}
[-a Attribute]… [-H][-f File]
lsattr -R {-l Name | [-c Class][-s Subclass][-t Type]} -a Attribute [-H]
[-f File]
lsattr {-l Name | [-c Class][-s Subclass][-t Type]} -o Operation […]
-F Format [-Z Character][-f File][-H]
lsattr -h
(2)情况说明:检查内存数量及状态。size与goodsize的数量相等表示内存使用正常。

(3)拓展:
内存交换区的使用率是否超过 70%
root@erpdb2:/dev# lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type Chksum
hd6 hdisk0 rootvg 4096MB 1 yes yes lv 0

4.磁盘

(1)执行命令:lsdev –Cc disk
root@erpdb2:/root# lsdev -Cc disk
hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Defined 02-00-02 Other FC SCSI Disk Drive
hdisk2 Defined 02-01-02 Other FC SCSI Disk Drive
hdisk3 Defined 02-00-02 Other FC SCSI Disk Drive
hdisk4 Defined 02-01-02 Other FC SCSI Disk Drive
hdisk5 Defined 02-01-02 Other FC SCSI Disk Drive
hdisk6 Defined 02-01-02 Other FC SCSI Disk Drive
hdisk7 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk8 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk9 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk10 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk11 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk12 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk13 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk14 Available 01-T1-01 Hitachi Disk Array (Fibre)
hdisk15 Available 01-T1-01 Hitachi Disk Array (Fibre)
(2)情况说明:检查磁盘的数量及状态。磁盘状态为Available,表示磁盘使用正常

5.适配器

(1)执行命令:lsdev –Cc adapter
root@erpdb2:/root# lsdev -Cc adapter
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
ent1 Available Virtual I/O Ethernet Adapter (l-lan)
ent2 Available Logical Host Ethernet Port (lp-hea)
fcs0 Defined 02-00 4Gb FC PCI Express Adapter (df1000fe)
fcs1 Defined 02-01 4Gb FC PCI Express Adapter (df1000fe)
fcs2 Available 01-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 02-T1 Virtual Fibre Channel Client Adapter
lhea0 Available Logical Host Ethernet Adapter (l-hea)
sa0 Available 01-00 4 Port Async EIA-232 PCIe Adapter
vsa0 Available LPAR Virtual Serial Adapter
vscsi0 Available Virtual SCSI Client Adapter
vscsi1 Available Virtual SCSI Client Adapter

(2)情况说明:检查系统中配置了哪些适配器及使用状态。各个适配器的状态为Available,表示各个适配器目前工作正常。
若适配器状态为Define,则表示该设备已经被配置但是未被当前系统使用。

6.分页空间

(1)执行命令:lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type Chksum
hd6 hdisk0 rootvg 4096MB 1 yes yes lv 0
(2)情况说明:检查分页空间的分配数量及利用率。
Size为已分配的分页空间数量;
%Used为目前系统的分页空间使用率,该值若超过70%,表示系统内存不足

7.系统镜像(逻辑卷组状态)

(1)执行命令:lsvg –l rootvg
root@erpdb2:/root# lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 1 1 closed/syncd N/A
hd6 paging 16 16 1 open/syncd N/A
hd8 jfs2log 1 1 1 open/syncd N/A
hd4 jfs2 42 42 1 open/syncd /
hd2 jfs2 50 50 1 open/syncd /usr
hd9var jfs2 42 42 1 open/syncd /var
hd3 jfs2 41 41 1 open/syncd /tmp
hd1 jfs2 41 41 1 open/syncd /home
hd10opt jfs2 42 42 1 open/syncd /opt
hd11admin jfs2 1 1 1 open/syncd /admin
lg_dumplv sysdump 12 12 1 open/syncd N/A
livedump jfs2 1 1 1 open/syncd /var/adm/ras/livedump
loglv00 jfslog 1 1 1 open/syncd N/A
lv00 jfs 1 1 1 open/syncd /var/adm/csd
(2)情况说明:检查系统卷组的镜像状态。
各个逻辑卷的PPs数量应该为LPs数量的整数倍,倍数大于1并且能被PVs整除,表示卷组已经作镜像。各个逻辑卷的LV STATE应该为syncd,否则表示镜像不同步。

8.文件系统

(1)执行命令:df -k/df -m/df -g
root@erpdb2:/root# df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 10.50 5.90 44% 11500 1% /
/dev/hd2 12.50 10.07 20% 53335 3% /usr
/dev/hd9var 10.50 8.49 20% 8830 1% /var
/dev/hd3 10.25 9.77 5% 1223 1% /tmp
/dev/hd1 10.25 10.16 1% 72 1% /home
/dev/hd11admin 0.25 0.25 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 10.50 10.13 4% 10876 1% /opt
/dev/livedump 0.25 0.24 5% 46 1% /var/adm/ras/livedump
/dev/lv00 0.25 0.24 4% 18 1% /var/adm/csd
/dev/fslv00 49.81 29.24 42% 149399 3% /oracle
/dev/fslv01 199.25 54.21 73% 198 1% /archb
/dev/fslv03 98.00 60.91 38% 8047 1% /dsg
erpdb1:/archa 780.00 294.33 63% - - /archa
(2)情况说明:检查已经挂载的文件系统的使用状态。重点检查 /(根) /tmp(临时)/var (日志文件)这些动态文件系统的使用率最好保持在70%以下。

9.错误日志

(1)执行命令:errpt
root@erpdb2:/root# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AB59ABFF 0627042018 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0627042018 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0627042018 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0627042018 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0627042018 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0627042018 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0527195217 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0527195217 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0527195217 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0527195217 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0527195217 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0527195217 U U LIBLVM Remote node Concurrent Volume Group fail
3C81E43F 0320222916 P U topsvcs Late in sending heartbeat
AB59ABFF 0205172516 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0205172516 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0205172516 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0205172516 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0205172516 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0205172516 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0417173815 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0417173815 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0417173815 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0417173815 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0417173815 U U LIBLVM Remote node Concurrent Volume Group fail
AB59ABFF 0417173815 U U LIBLVM Remote node Concurrent Volume Group fail
3C81E43F 0604133314 P U topsvcs Late in sending heartbeat
(2)情况说明:检查系统日志是否有硬件或软件方面的永久错误。若发现错误类型为P的信息,则需要注意。

10.网卡配置情况

(1)执行命令:ifconfig –a
root@erpdb2:/root# ifconfig -a
en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 10.28.2.66 netmask 0xffffff00 broadcast 10.28.2.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en1: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 10.28.249.7 netmask 0xffffff00 broadcast 10.28.249.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 172.28.2.66 netmask 0xffffff00 broadcast 172.28.2.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

(2)情况说明:检查各个网卡的配置情况,状态是否为UP,检查ip地址和子网掩码等配置是否正确。

11.系统备份

(1)执行命令:lsmksysb -V -f
/dev/rmt0
root@erpdb2:/root# lsmksysb -V
0512-330 lsmksysb option V must be used in conjunction
with the f option.

Usage: lsmksysb [-b blocks] [-f device] [-l] [-a] [-c] [-s] [-r] [-d path]
[-B] [-D] [-L] [-V] [-n] [file_list]
-b blocks Number of 512-byte blocks to read in a single
input operation
-f device Name of device to restore the information from.
Default is /dev/rmt0
-l List backup information only.
-a Alter the tape block size if necessary to read the backup
(used only if tape device)
-c Colon separate the information listed about a backup
(used only with -l or -L options)
-s Indicates the backup is of a user volume group.
(if not specified, the default is a root volume group)
-r Restore files from the backup
Default is to list contents of backup
-d path Directory to have files restored into
(Used only with -r flag, default is current directory)
-B Display past volume group and system backups from log
-D Produces debug output
-L List fileset information (rootvg backup only).
-V Verify backup readability (tape only).
-n Do not restore ACLs, PCLs, or extended attributes.
file_list List of files to restore. (Used only with -r flag)

root@erpdb2:/dev# lsmksysb -f

0512-046 lsmksysb: Device /dev/rmt0 is not in the available state.

(2)情况说明:检查磁带备份的可读性。若没有错误显示,则表示备份磁带的数据是有效可恢复的。

(3)拓展:
#lsvg -l rootvg //察看有否符合要求的数据备份和保护
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd4 jfs 1 1 1 open/syncd /
hd2 jfs 11 11 1 open/syncd /usr
hd10opt jfs 1 1 1 open/syncd /opt
oraclelv jfs2 80 160 1 open/syncd /oracle
loglv00 jfs2log 1 1 1 open/syncd N/A
testlv jfs 10 20 1 closed/syncd /tmp/test

三、网络连通性检查

1.检查各个网卡配置情况及状态

(1)执行命令:Ifconfig –a
(2)情况说明:

2.检查系统网关是否正确,路由表是否正确

(1)执行命令:netstat –rm

Kernel malloc statistics:

******* CPU 0 *******
By size inuse calls failed delayed free hiwat freed
64 291 22618362 0 5 93 10484 0
128 1044 28736725 0 30 108 5242 0
256 29109 39768605 0 1817 4059 10484 0
512 29310 318252584 0 3726 754 13105 0
1024 518 14389110 0 128 42 5242 0
2048 1079 17002463 0 541 53 7863 0
4096 74 4023 0 25 13 2621 0
8192 11 24021 0 9 40 655 0
16384 256 729 0 50 20 327 0
32768 48 159 0 25 34 163 0
65536 117 366 0 76 8 163 0
131072 4 15 0 0 82 160 0

Streams mblk statistic failures:
0 high priority mblk failures
0 medium priority mblk failures
0 low priority mblk failures

(2)情况说明:

3.检查群集进程是否开启

(1)执行命令:lssrc –g cluster
root@erpdb2:/# lssrc -g cluster
Subsystem Group PID Status
clstrmgrES cluster 8847392 active
(2)情况说明:

4.检查当前群集状态网卡状态,资源组状态
(1)执行命令:clstat –a
root@erpdb2:/# clstat -a
ksh: clstat: not found.
(2)情况说明:

四、存储巡检

1. 检查外接存储状态

(1)执行命令:lspath
root@erpdb2:/# lspath
Enabled hdisk0 vscsi0
Enabled hdisk7 fscsi2
Enabled hdisk8 fscsi2
Enabled hdisk9 fscsi2
Enabled hdisk10 fscsi2l
Enabled hdisk11 fscsi2
Enabled hdisk7 fscsi2
Enabled hdisk8 fscsi2
Enabled hdisk9 fscsi2
Enabled hdisk10 fscsi2
Enabled hdisk11 fscsi2
Enabled hdisk7 fscsi3
Enabled hdisk8 fscsi3
Enabled hdisk9 fscsi3
Enabled hdisk10 fscsi3
Enabled hdisk11 fscsi3
Enabled hdisk7 fscsi3
Enabled hdisk8 fscsi3
Enabled hdisk9 fscsi3
Enabled hdisk10 fscsi3
Enabled hdisk11 fscsi3
Enabled hdisk12 fscsi2
Enabled hdisk12 fscsi2
Enabled hdisk12 fscsi3
Enabled hdisk12 fscsi3
Enabled hdisk13 fscsi2
Enabled hdisk14 fscsi2
Enabled hdisk15 fscsi2
Enabled hdisk13 fscsi3
Enabled hdisk14 fscsi3
Enabled hdisk15 fscsi3

2. 数据库运行状态监听是否正常

(1)执行命令:lsnrctl status
root@erpdb2:/# lsnrctl status
ksh: lsnrctl: not found.
(2)情况说明:

3.查看HA双机热备环境

(1)执行命令:Smitty hacmp
root@erpdb2:/# smitty hamp

±-------------------------------------------------------------------------+
| ERROR MESSAGE |
| |
| Press Enter or Cancel to return to the |
| application. |
| |
| 1800-007 There are currently no SMIT |
| screen entries available for this FastPath. |
| This FastPath may require installation of |
| additional software before it can be accessed. |
| |
| F1=Help F2=Refresh F3=Cancel |
| Esc+8=Image Esc+0=Exit Enter=Do |
±-------------------------------------------------------------------------+
(2)情况说明:

4.查看数据库报错信息

(1)执行命令:errpt | more
(2)情况说明:

五、检查数据库状态(Oracle)

root@erpdb1:/root# su - oracle------从root用户切换用户至oracle
[oracle@erpdb1:/home/oracle/]$su - sqlplus / as sysdba------进入sql模

1.查看数据文件状态

select name,status from v$datafile;

2.查看表空间状态

select tablespace_name,status from dba_tablespaces;

3.查看表空间空间及使用率

select tablespace_name, sum(bytes)/1024/1024 from dba_data_files group
by tablespace_name;

4.检查在线日志状态

select group#,status,type,member from v$logfile;

5.检查回滚状态

select segment_name,status from dba_rollback_segs;

6.检查控制文件状态,输出参数至少3个及以上

select status,name from v$controlfile;
status 应该为空。

7. 查看相关参数文件状态

select resource_name,max_utilization,initial_allocation,limit_value from v$resource_limit;

8.topas 数据库性能查看

RMAN> connect target / 进入验证路径
crosscheck archivelog all: 验证的是DB的归档日志即log_archive_dest参数指定位置的文件

9.oracle rac集群查看实力运行状态:crs_stat -t

[oracle@erpdb2:/home/oracle/]$crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

10.查看oracle数据库服务:

lsnrctl 进入该模式后 service
[oracle@erpdb2:/home/oracle/][oracle@erpdb2:/home/oracle/][oracle@erpdb2:/home/oracle/]lsnrctl

LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - Production on 10-DEC-2019 15:35:56

Copyright © 1991, 2010, Oracle. All rights reserved.

Welcome to LSNRCTL, type “help” for information.

LSNRCTL>

11.查看表空间使用大小:

select tablespace_name,sum(bytes)/1024/1024 from dba_free_space group by tablespace_name;

12.以GB显示查看数据库表空间分配及使用情况及剩余空间情况

SELECT a.tablespace_name,
round(a.bytes/1024/1024/1024,2) total,
round(b.bytes/1024/1024/1024,2) used,
round(c.bytes/1024/1024/1024,2) free,
round((b.bytes * 100) / a.bytes,2) "% USED ",
round((c.bytes * 100) / a.bytes,2) "% FREE "
FROM sys.smtsavaila,sys.smts_avail a, sys.smts_used b, sys.sm$ts_free c
WHERE a.tablespace_name = b.tablespace_name
AND a.tablespace_name = c.tablespace_name;

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!