How can I reduce CPU in SORT operation

问题

I am using DFSORT to copy the Tape data-set to a temp file, and processing around 80000000 records. Its taking 3 Hours to just copy the data-sets. is there any other way around to reduce the CPU time. Suggestions will be very helpful. Thank You.

    //STEP40  EXEC SORTD                                              
    //SORTIN   DD DSN=FILEONE(0),                           
    //            DISP=SHR                                            
    //SORTOUT  DD DSN=&&TEMP,                                       
    //            DISP=(NEW,PASS,DELETE),                          
    //            DCB=(RECFM=FB,LRECL=30050,BLKSIZE=0),               
    //            UNIT=TAPE                                           
    //SYSOUT   DD SYSOUT=*                                            
    //SYSPRINT DD SYSOUT=*                                            
    //SYSIN    DD *                                                   
         SORT FIELDS=(14,6,PD,A,8,6,PD,A,45,2,ZD,A)                   
         OUTREC IFTHEN=(WHEN=(70,18,CH,EQ,C' encoding="IBM037"'),     
                     OVERLAY=(70:C'  encoding="UTF-8"'))              
         OPTION DYNALLOC=(SYSDA,255)                                  
    /*

回答1:

A few comments on improving I/O performance which should improve your overall elapsed time.

On your SORTIN and SORTOUT DD statement add the following to your DCB.

From IBM's MVS JCL Manual on page 143.

//SORTIN   DD DSN=FILEONE(0),                           
//            DISP=SHR<b>,DCB=BUFNO=192</b>                                            
//SORTOUT  DD DSN=&&TEMP,                                       
//            DISP=(NEW,PASS,DELETE),                          
//            DCB=(RECFM=FB,LRECL=30050,BLKSIZE=0,BUFNO=192),
//            UNIT=TAPE

I chose 192 as its relatively cheap in terms of memory these days. Adjust for your environment. This essentially tells the system how many blocks to read with each I/O which reduces time related to I/O operations. You can play with this number to get an optimal result. The default is 5.

BUFNO=buffers
Specifies the number of buffers to be assigned to the DCB. The maximum normally is 255, but can be less because of the size of the region. Note: Do not code the BUFNO subparameter with DCB subparameters BUFIN, BUFOUT, or DD parameter QNAME.

You might consider the blocksize's. The blocksize on the output seems odd. Ensure that it is optimized for the device you are going to. For TAPE devices this should be as large as possible. For 3480 or 3490 devices this can be as large as 65535. You do not specify the LRECL but indicate that its 30050 then you could specify a BLKZIE of 60100 which would be two records per block. Better I/O efficiency.

Here is more information on BLKSIZE selection for tapes.


3490 Emulation (VTS)    262144 (256 KB)
3590                    262144 (256 KB) (note: on some older models the limit is  
                                               229376 (224 KB) 262144 (256 KB)

Last quick hint if you are actually using TAPE is to specify multiple TAPE devices. This will allow for one tape to be written to while mounting the next one. I've included the BUFNO example here as well:

//SORTOUT DD DSN=&&TEMP, // DISP=(NEW,PASS,DELETE), // DCB=(RECFM=FB,LRECL=30050,BLKSIZE=0,BUFNO=192), // UNIT=(TAPE,2)

Of course these optimizations depend on your physical environment and DFSMS setup.

回答2:

I love diagnosing these kinds of problems...

80M records at 30K each is about 2.5TB, and since you're reading and writing this data, s you're processing a minimum of 5TB (not including I/O to the work files). If I'm doing my math right, this averages 500MB/second over three hours.

First thing to do is understand whether DFSORT is really actively running for 3 hours, or if there are sources of wait time. For instance, if your tapes are multi-volume datasets, then there may be wait time for tape mounts. Look for this in the joblog messages - might be that 20 minutes of your 3 hours is simply waiting for the right tapes to be mounted.

You may also have a CPU usage problem adding to the wait time. Depending on how your system is setup, your job might be only getting a small slice of CPU time and waiting the rest of the time. You can tell by looking at the CPU time consumed (it's also in the joblog messages) and comparing it to the elapsed time...for instance, if your job gets 1000 CPU seconds (TCB + SRB) over 3 hours, you're averaging 9% CPU usage over that time. It may be that submitting your job in a different job class makes a difference - ask your local systems programmer.

Of course, 9% CPU time might not be a problem - your job is likely heavily I/O bound, so a lot of the wait time is about waiting for I/O to complete, not waiting for more CPU time. What you really want to know is whether your wait time is waiting for CPU access, waiting for I/O or some other reason. Again, your local systems programmer should be able to help you answer this if he knows how to read the RMF reports.

Next thing to do is understand your I/O a little better with a goal of reducing the overall number of physical I/O operations that need to be performed and/or making every I/O run a little faster.

Think of it this way: every physical I/O is going to take a minimum of maybe 2-3 milliseconds. In your worst case, if every one of those 160M records you're reading/writing were to take 3ms, the elapsed time would be 160,000,000 X .003 = 480,000 seconds, or five and a half days!

As another responder mentions, blocksize and buffering are your friends. Since most of the time in an I/O operation comes down to firing off the I/O and waiting for the response, a "big I/O" doesn't take all that much longer than a "small I/O". Generally, you want to do as few and as large physical I/O operations as possible to push elapsed time down.

Depending on the type of tape device you're using, you should be able to get up to 256K blocksizes on your tape - that's 7 records per I/O. Your BLKSIZE=0 might already be getting you this, depending how your system is configured. Note though that this is device dependent, and watch out if your site happens to use one of the virtual tape products that map "real" tape drives to disk...here, blocksizes over a certain limit (32K) tend to run slower.

Buffering is unfortunately more complex than the previous answer suggested...turns out BUFNO is for relatively simple applications using IBM's QSAM access method - and this isn't what DFSORT does. Indeed, DFSORT is quite smart about how it does it's I/O, and it dynamically creates buffers based on available memory. Still, you might try running your job in a larger region (for instance, REGION=0 in your JCL) and you might find DFSORT options like MAINSIZE=MAX help - see this link for more information.

As for your disk I/O (which includes those SORTWK datasets), there are lots of options here too. Your 30K LRECL limits what you can do for blocking to a good degree, but there are all sorts of disk tuning exercises you can go through, from using VIO datasets to PAVs (parallel access volumes). Point is, a lot of this is also configuration-specific, and so the right answer is going to depend on what your site has and how it's all configured.

But maybe the most important thing is that you don't want to go at it purely trial and error until you stumble across the right answer. If you want to learn, get familiar with RMF or whatever performance management tools your site has (or find a systems programmer that's willing to work with you) and dig in. Ask yourself, what's the bottleneck - why isn't this job running faster? Then find the bottleneck, fix it and move on to the next one. These are tremendous skills to have, and once you know the basics, it stops feeling like a black art, and more like a systematic process you can follow with anything.

回答3:

Since you write

... it takes 3 hours to complete...

I guess what you really want is to reduce elapsed time, not CPU time. Elapsed time depends on many factors such as machine configuration, machine speed, total system load, priority of your job, etc. Without more information about the environment, it is difficult to give advice.

However, I see you're writting the sort output to a temporary data set. I conclude, there is another step to read that data in. Why do you write this data to tape? Disk will surely be faster and reduce elapsed time.

Peter

来源：https://stackoverflow.com/questions/51570962/how-can-i-reduce-cpu-in-sort-operation

标签

mainframe

jcl

dfsort