ORACLE数据库管理-IO性能校准

帅比萌擦擦* 提交于 2019-12-14 01:09:40

通过数据库评估存储设备IO性能-Oracle 11g IO校准功能介绍

---------Oracle11g IO校准功能介绍

前言

  I/O子系统是ORACLE数据库的一个重要组成部分,因为I/O操作是贯穿数据库管理全过程,它操作的对象包括日志、表、索引、数据字典、以及一些排序、undo操作等等,每个数据库读取或者写入磁盘上的数据,都会产生磁盘IO,可以这么说一个正常业务的数据库系统,80%的性能消耗都与IO有关,相对于网络、CPU、内存等其他硬件的迅猛发展,磁盘的读写速度的发展却相对滞后,这也导致许多业务性能瓶颈集中在有限的磁盘IO上,一旦出现IO瓶颈导致的性能问题,表现为CPU有时候会花大部分的时间等待IO操作,我们把这种情况称为IO密集性(I/O-bound)系统。

 

我们在处理ZLHIS业务系统性能问题的时候,大部分也是在处理IO性能问题,主要表现在三方面:

1、HIS系统是业务高密集系统,在业务高峰期会形成大量集中的并发操作,产生大量的I/O操作;

2、不规范的SQL语句导致的过度的磁盘访问(如:全表扫描等);

3、由于硬件导致的存储IO本身的性能问题;

上图就是一个某客户真实环境下I/O性能瓶颈的AWR性能报告,TOT5等待事件中,主要就是I/O类型的等待,在处理类似问题的时候,我们都先假定存储的I/O性能是满足我们的业务需求的,忽略存储本身的性能问题,而着重解决1、2应用设计或者SQL代码不规范导致的过度I/O读取,但有时导致I/O性能的问题根本的原因恰恰就是存储,遇到这种问题的时候,我们过去都是通过文件拷贝、复制读写都操作主观的感受存储的性能,或者找硬件商协助分析,前一种方式不便于我们从数据指标上给存储性能予以定型,特别是在某些瞬时集中IO访问出现瓶颈的存储设备上给出充分的说服力,后一种方式如果遇到硬件商本身不配合,我们处理问题就非常的尴尬,这时候我们急需一种方式对够自主的对存储进行性能评估,给出评估性能指标以便对I/O性能进行量化,为问题的分析解决给出可靠的依据。

IO相关概念

在对存储性能进行评估之前,我们有必要了解几个关于IO的指标概念,只有对这几个指标概念有了了解,我们才能客观的评价一个存储性能的好坏。

IOPS(I/OOperations Per Second):是用来计算I/O流中每个节点中每秒传输的数量,表示每秒进行读写(I/O)操作的次数,多用于评估衡量存储随机访问的性能。IOPS通常对于小I/O,且传输I/O的数量比较大的情况下,是一个最主要的衡量指标。例如,典型的OLTP系统中,高的IOPS则意味相同时间内更多的数据库事务可以被存储系统处理。

IO响应时间(latency):指内核对磁盘发出一个读或者写的IO命令,到内核接收到回应的时间。

吞吐量(Throughput):来计算每秒在I/O流中传输的数据总量。这个指标,在大多数的磁盘性能计算工具中都会显示,最简单的在Windows文件拷贝的时候,就会显示MB/s,吞吐量衡量对于大I/O,特别是传输一定数据的时候最小化耗时非常有用,例如,备份数据的时候,在备份作业中,我们通常不会关心有多少I/O被存储系统处理了,而是完成备份总数据的时间多少。

以上三个指标基本上能够衡量存储的IO性能,其中IOPS和吞吐量是越大越好,IO响应时间当然是越短越好。

IO校准

存储整体性能主要由一系列关键组件层共同作用,包括HBA、Storage Switches、Storage Array和Physical Disks。这些对象共同合力,才能形成系统整体的IO能力有IO整体性能,通过Oracle的I/O校准功能,使您能够评估存储的整体性能,并判断和确认I/O性能问题是否由数据库或存储系统造成的。不同于其他借助外部I/O评估工具,Oracle的I/O校准功能的原理是数据库随机使用其数据文件访问存储,这样产生的结果能更加真实的反映数据库访问存储的实际性能,它可以帮助计算出当前存储最大的IOPS和吞吐量,要使用这个特性必须满足以下条件:

  • 数据库版本为11g

  • 操作用户必须要有sysdba权限

  • 数据库参数timed_statistics必须是true

  • 必须允许IO异步,但用的是文件系统,可以通过设置FILESYSTEMIO_OPTIONS参数为setall

  • 确保数据文件允许异步IO,可以通过下面的SQL语句确认:

COL NAME FORMAT A50

SELECT NAME,ASYNCH_IOFROM V$DATAFILE F,V$IOSTAT_FILE I

WHERE  F.FILE#=I.FILE_NO

AND    FILETYPE_NAME='Data File';

I/O校准是通过调用Oracle内部dbms_resoure_manager.cakibrate_io包来获取,其发出一系列I/O密集型的只读工作量到数据库文件,通过这些操作确定存储的最大IOPS(每秒IO请求数)和存储每秒能够执行的吞吐量MBPS(兆字节每秒I/O)。

I/O校准分为两步:第一步dbms_resource_manager.calibrate_io包按照数据文件块大小随机读取的所有数据文件,通过持续的读取操作能够获取存储的最大IOPS(max_iops),同时输出校准期间的平均延迟(actual_latency),当然你可以通过输入参数max_latency指定目标延迟(指定的最大可容忍数据库块大小的IO请求延迟的毫秒数)。第二步是通过dbms_resource_manager.calibrate_io包按照1M大小持续读取所有数据文件,这一步主要是为了获取最大吞吐量这个重要的指标。

如果用户能够提供的num_physical_disks输入参数可以使得I/O校准运行更准确,这个参数它指定在数据库中存储系统的物理磁盘的近似数,如果不清楚就输1也行,认为只是一块磁盘。

下面是一个执行DBMS_RESOURCE_MANAGER.CALIBRATE_IO包的案例,语句非常简单,如下:

SET SERVEROUTPUT ON

DECLARE

  lat INTEGER;

  iops INTEGER;

  mbps INTEGER;

BEGIN

--DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops,mbps, lat);

  DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10,iops, mbps, lat);

  DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);

  DBMS_OUTPUT.PUT_LINE ('latency  = ' || lat);

  dbms_output.put_line('max_mbps = ' || mbps);

end;

/

校准的操作很简单,但是在运行时需要注意以下几点事项:

  • 同一时间只能运行一个IO校准过程,千万不要并行运行,如果您同时运行,I/O校准将不能正常执行;

  • 因过程执行对IO消耗非常大,请确保实例在空闲状态下执行;

  • 如果是RAC环境,要确保所有节点实例都是启动状态;

  • 过程包中的num_physical_disks输入参数是可选的。磁盘数不用太准确,输入个近似值这样可以使得校准更快、更准确。

最后在I/O校准过程中,你可以在v$io_calibration_status视图查看校准状态。在I/O校准成功后,你可以在dba_rsrc_io_calibrate表查看校准结果,为了更好的理解I/O校准过程,我们拿台普通的台式机来进行下演示,看下是如何进行IO校准操作的。

 

通过v$io_calibration_status可以查看执行状态,可以看到进程正在执行,如下

 

在操作系统的任务管理器的性能监控中可以看到,每个数据文件都产生大量的IO读取,Oracle就是通过这种读取操作来进行存储性能的评估。

 

最后在DBA_RSRC_IO_CALIBRATE视图中,可以查询到本次IO校准的各个性能指标值如下,本次测试的存储性能,每秒持续读取数据块请求的最大数量(max_iops)为60,每秒最大可读取(max_mbps)为43mbps,单个进程每秒最大可读取(max_pmbps)为39mbps,读取数据块请求出现有16次延迟

 

这里我们注意了同样的IO校准执行2次,结果也会有所差异,不可能几次执行结果100%相同,这是因为存储性能涉及的因素非常多,比如当时存储的繁忙状态、温度、IO请求等,这些都会对校准有细微的影响,但是总的范围还是不会有太大的出入。

 

性能判断

通过校准我们得到了一些指标,那么怎样的存储性能才是满足业务需求的呢?严格意义上说,当然是IOPS越大,吞吐量越大越好,但是成本也会增加,因此实际情况下还是要根据用户业务的实际情况判断,合适就可以了。用户的IO需求可以通过业务高峰期AWR报告进行查看,通过生成业务高峰期的AWR报告,查看报告中的other instance activity stats这部分内容获取,以某用户的AWR性能报告为例,重点关注这几个指标[physical read total IOrequests],[ physical read total bytes],[ physical write total IO requests],[ physicalwrite total IO requests]每秒的值,因为我们IO校准也是以每秒为单位的统计。

 

我们可以计算出物理读和写每秒总的请求为70.74+80.62≈151次,物理读和写的每秒的大小为1.32+1.09≈2.41mbs=19.28mbps,有了这个参照,那我们存储校准的最大IOPS就应该不能低于151,每秒的吞吐量也不能低于19.28mbps,如果IO校准接近或者小于这个值就证明存储性能出现了严重的瓶颈,例如我们测试用的机器就无法满足这个用户的IO性能需求,需要提升性能以满足业务的需要。

 

METALINK文档信息如下:

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.

PURPOSE

In 11.2.0.2 Automatic Degree of Parallelism can only be used if  I/O statistics are gathered.

This Note explains what DBA has to do in order to make sure Automatic Degree of Parallelism works.

SCOPE

For DBAs 

The AutoDOP is not a feature to use more parallelism. It is a feature that restricts the parallel to maximize throughput,
so it is expected that with AutoDOP not all queries will run in parallel and the ones that do run in parallel may not run with full parallelism, as this is the technical specifications of the feature.

Ask Questions, Get Help, And Share Your Experiences With This Article

Would you like to explore this topic further with other Oracle Customers, Oracle Employees, and Industry Experts?

Click here to join the discussion where you can ask questions, get help from others, and share your experiences with this specific article.
Discover discussions about other articles and helpful subjects by clicking here to access the main My Oracle Support Community page for Database Datawarehousing.

DETAILS

When PARALLEL_DEGREE_POLICY is set to AUTO, Oracle Database determines whether a statement should run in parallel based on the cost of the operations in the execution plan and the hardware characteristics.

AutoDOP is used when PARALLEL or PARALLEL(AUTO) statement level hint is used regardless of the value of the PARALLEL_DEGREE_POLICY (see documentation).

 

IO Calibration

The hardware characteristics include I/O calibration statistics so these statistics must be gathered otherwise Oracle Database does not use the automatic degree parallelism feature.

If I/O calibration has not been run to gather the required statistics, the explain plan includes the following text in its notes section:  ": skipped because of IO calibrate statistics are missing"

explain plan for 
select /*+ parallel */ * from emp;

Plan hash value: 2873591275 
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    14 |   532 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |    14 |   532 |     2   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |    14 |   532 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| EMP      |    14 |   532 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

 Note 
 ----- 
 - dynamic sampling used for this statement (level=2) 
 - automatic DOP: skipped because of IO calibrate statistics are missing

The Oracle PL/SQL package DBMS_RESOURCE_MANAGER.CALIBRATE_IO is used to execute the calibration. The duration of the calibration is dictated by the NUM_DISKS variable as well as the number of nodes in the RAC cluster.

SET SERVEROUTPUT ON
DECLARE
   lat INTEGER;
   iops INTEGER;
   mbps INTEGER;
BEGIN
    --DBMS_RESOURCE_MANAGER.CALIBRATE_IO(, ,iops, mbps, lat);
    DBMS_RESOURCE_MANAGER.CALIBRATE_IO (28, 10, iops, mbps, lat);
   DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
   DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
   dbms_output.put_line('max_mbps = ' || mbps);
END;
/

 

Note that the first two variables (NUM_DISKS, MAX_LATENCY) are input variables, and the remaining three are output variables.

NUM_DISKS - To get the most accurate results, its best to provide the actual number of physical disks that are used for this database. The Storage Administrator can provide this value. Keep in mind that when ASM is used to manage the database files, say in the DATA diskgroup, then only physical disks that make up the DATA diskgroup should be used for the NUM_DISKS variable; i.e.; do not include the disks from the FRA diskgroup. In the example above the DATA diskgroup is made up of 28 physicals (presented as 4 LUNs or ASM disks)

LATENCY– Maximum tolerable latency in milliseconds for database-block-sized IO requests.

You find more information about CALIBRATE_IO in Note: 727062.1  Configuring and using Calibrate I/O.

In order to verify whether the calibration run was successful, query V$IO_CALIBRATION_STATUS after you executed DBMS_RESOURCE_MANAGER.CALIBRATE_IO call.

select * from V$IO_CALIBRATION_STATUS;

STATUS   CALIBRATION_TIME
-------- ----------------------------- 
READY    25-NOV-10 08.53.08.536

The execution plan now shows that the feature automatic degree of parallelism can be used:

explain plan for 
select /*+ parallel */ * from emp;
Plan hash value: 2873591275 
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    14 |   532 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |    14 |   532 |     2   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |    14 |   532 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| EMP      |    14 |   532 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Note 
----- 
- dynamic sampling used for this statement (level=2)
- automatic DOP: Computed Degree of Parallelism is 2 

 There is known issue with DBMS_RESOURCE_MANAGER.CALIBRATE_IO.

Note: 10180307.8 DBRM DBMS_RESOUCE_MANAGER.CALIBRATE_IO REPORTS VERY HIGH MAX_PMBPS If CALIBRATE_IO can not be used you can set the relevant value manual:

delete from resource_io_calibrate$;
insert into resource_io_calibrate$
values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0); 
commit;

 You have to restart the database after this change.

200 is a value that works well for Machine with a fast I/O subsystem, as example for Exadata Machines. If you set the number (max_pmbps) lower your calculated DOP will increase. If you set max_pmbps higher the calculated DOP are decreased. It appears that 200 is a reasonable value to deal with concurrency on a system.

 

Automatic DOP is not computed if
  -  the database is not opened, or
  -  the database is in restricted access (DBA) or read-only or migrate mode, or
  -  database is suspended, or
  -  instance is not open , or
  - the SQL cursor is not supported do run in AutoDOP mode.

Tuning Parameters

When you use AutoDOP, you may want to adjust some tuning parameters.  See Document 1549214.1 Setup, Monitor, And Tune Parallelism In The Database for information about these parameters.   The parallel_servers_target should always be smaller than parallel_max_servers, with parallel_servers_target anywhere from 75% to 50% of parallel_max_servers.  If you start seeing a lot of DOP downgrades, you should make the distance between the values for these two parameters greater.

parallel_servers_target
parallel_min_time_threshold

 



 

Overview I/O Calibration

Oracle Database 11g introduces an I/O Calibration mechanism, whereby you can run I/O calibration tests either through the Enterprise Manager Performance page or a PL/SQL package. Oracle’s I/O calibration is a variation on the Clarion tool. In an Oracle database, the I/O workload is of two basic types—small random I/O and large sequential I/O. OLTP applications usually experience the small random I/O workload, where the speed with which small I/O requests are serviced is paramount. Thus, disk spinning and seeking times are of critical importance. OLAP applications, on the other hand, employ the large sequential I/O in general. For these types of applications, the critical factor is the capacity of the I/O channel. The larger the I/O channels between the database server and the storage system, the larger the I/O throughput. Oracle uses the following two metrics, each measuring the efficacy of one type of I/O workload:

  • IOPS (I/O per second)  The IOPS rate is the number of small random I/Os the system can perform in a second and depends on the spin speed of disks. You can increase the IOPS rate by increasing the number of disks in the storage array or by using faster disk drives, which have a high RPM and lower seek time.
  • MBPS (megabytes per second)  This metric measures the data transfer rate between the server and the storage array and depends on the capacity of the I/O channel between the two systems. A larger I/O channel means a higher MBPS rate.

Two important terms need clarification in this discussion: throughput and latency. The throughput of a system determines how fast it can transfer data and is measured by the MBPS metric. The channel capacity determines the overall throughput of the system, and it thus puts the ceiling on the amount of data transfer. Latency refers to the lag between the time an I/O request is made and when the request is serviced by the storage system. High latency indicates a system that’s overloaded and you can reduce latency by striping data across multiple spindles, so different disks can service the same I/O request in parallel.

Oracle recommends that you use the new I/O Calibration tool to determine I/O metrics in a database. It takes about 10 minutes to run the tools and you should pick a time when the database workload is light to avoid overstressing the storage system. You can run only a single calibration task at a time. If you perform the task in an RAC environment, the workload is generated simultaneously from all instances in the system. You can either run the tool with Enterprise Manager or through PL/SQL.

Calibrating I/O Using PL/SQL

You can also use the new procedure CALIBRATE_IO from the DBMS_ RESOURCE_MANAGER package to run the I/O Calibration task. Here is an example:

begin
  exec dbms_resource_manager.calibrate_io(-
  num_disks         => 1,   -
  max_latency       => 10, -
  max_iops          => :max_iops, -
  max_mbps          => :max_mbps, -
  actual_latency    => :actual_latency);
 end;
 /
In the CALIBRATE_IO procedure, the following are the key parameters:

  • num_disks: Approximate number of disks in the storage array.
  • max_latency: Maximum tolerable latency (in milliseconds) for an I/O request.
  • max_ios: Maximum number of random DB block-sized read requests that can be serviced.
  • max_mbps: Maximum number of randomly distributed 1MB reads that can be serviced (in megabytes per second).
  • actual_latency: Average latency of DB block-sized I/O requests at max_iops rate (in milliseconds).

The procedure only works if asynchronous I/O is enabled. If asynchronous I/O is not enabled, the procedure returns the following error.
DECLARE
*
ERROR at line 1:
ORA-56708: Could not find any datafiles with asynchronous i/o capability
ORA-06512: at "SYS.DBMS_RMIN", line 453
ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1153
ORA-06512: at line 6
You can use the FILESYSTEMIO_OPTIONS static initialization parameter to enable or disable asynchronous I/O or direct I/O on file system files. This parameter is platform-specific and has a default value that is best for a particular platform.
FILESYTEMIO_OPTIONS can be set to one of the following values:

  • ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
  • DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
  • SETALL: enable both asynchronous and direct I/O on file system files.
  • NONE: disable both asynchronous and direct I/O on file system files.

SQL> SHOW PARAMETER FILESYSTEMIO_OPTIONS
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      none
ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;

Usage Notes

  • Only users with the SYSDBA privilege can run this procedure. Qualified users must also turn on timed_statistics, and ensure asynch_io is enabled for datafiles. This can be achieved by setting filesystemio_options to either ASYNCH or SETALL. One can also query the asynch_io status by means of the following SQL statement:
    col name format a50
    SELECT name, asynch_io FROM v$datafile f,v$iostat_file i
    WHERE f.file#        = i.file_no
    AND   filetype_name  = 'Data File';
  • Only one calibration can be run at a time. If another calibration is initiated at the same time, it will fail.
  • For an Oracle Real Application Clusters (Oracle RAC) database, the workload is simultaneously generated from all instances.

For timed_statistics paramter,please refer to following illustration :

TIMED_STATISTICS specifies whether or not statistics related to time are collected.Values:

  • true: The statistics are collected and stored in trace files or displayed in the V$SESSTATS and V$SYSSTATS dynamic performance views.
  • false: The value of all time-related statistics is set to zero. This setting lets Oracle avoid the overhead of requesting the time from the operating system.

Starting with release 11.1.0.7.0, the value of the TIMED_STATISTICS parameter cannot be set to false if the value of STATISTICS_LEVEL is set to TYPICAL or ALL.
On some systems with very fast timer access, Oracle might enable timing even if this parameter is set to false. On these systems, setting the parameter to true can sometimes produce more accurate statistics for long-running operations.

The [G]V$IO_CALIBRATION_STATUS views show the current status of the calibration runs. During a run the status of 'IN PROGRESS' is displayed. Once a run is complete the status switches to 'READY' and the calibration time is displayed.Besides,NOT AVAILABLE mean Calibration results not available and CALIBRATION_TIME tell us End time of the last calibration run
SQL> SELECT * FROM v$io_calibration_status;
STATUS        CALIBRATION_TIME
------------- -------------------------------
IN PROGRESS
SQL> SELECT * FROM v$io_calibration_status;
STATUS        CALIBRATION_TIME
------------- ---------------------------------------------------------------------------
READY         28-JUL-2008 14:37:38.410
1 row selected.

Once you execute the CALIBRATE_IO procedure, you can query the V$IO_ CALIBRATION_STATUS and the DBA_RSRC_IO_CALIBRATE views to check the results. Here’s a sample query:

SQL> select max_iops, max_mbps, max_pmbps, latency
     from dba_rsrc_io_calibrate;

MAX_IOPS      MAX_MBPS      MAX_PMBPS     LATENCY
----------   ------------  ------------  ----------
133           12             6             64
Oracle Database 11g collects I/O statistics in three different dimensions to provide a consistent set of statistics for I/O calls.These dimensions are RDBMS components grouped into 12 functional groups.

  • The V$IOSTAT_ FUNCTION view provides the details. I/O statistics are collected for each consumer group that’s part of the currently enabled resource plan.
  • The V$IOSTAT_CONSUMER_GROUP view has the details.
  • Individual file level I/O statistics are collected and stored in the V$IOSTAT_ FILE view for details.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!