How do “Fixed-length records” and “Fixed-length fields” increases database performance?

眉间皱痕 提交于 2019-12-06 13:20:07

问题


Could anyone please explain the below two statements w.r.t the Oracle external table performance improvement with the ORACLE_LOADER access driver:

  1. Fixed-length records are processed faster than records terminated by a string.
  2. Fixed-length fields are processed faster than delimited fields.

Explanation with code might help me to understand the concept in depth. here is the two syntax(s):

Fixed field length

create table ext_table_fixed (
   field_1 char(4),
   field_2 char(30)
)
organization external (
   type       oracle_loader
   default directory ext_dir
   access parameters (
     records delimited by newline
     fields (
       field_1 position(1: 4) char( 4),
       field_2 position(5:30) char(30)
    )
  )
  location ('file')
)
reject limit unlimited;

Comma delimited

create table ext_table_csv (
  i   Number,
  n   Varchar2(20),
  m   Varchar2(20)
)
organization external (
  type              oracle_loader
  default directory ext_dir
  access parameters (
    records delimited  by newline
    fields  terminated by ','
    missing field values are null
  )
  location ('file.csv')
)
reject limit unlimited;

回答1:


Simplified, conceptual, non-database-specific explanation:

When the maximum possible record length is known in advance, the end of the record/the beginning of the next record can be found in constant time. This is because that location is computable using simple addition, very much analogous to array indexing. Imagine that I'm using ints as pointers to records, and that the record size is an integer constant defined somewhere. Then, to get from the current record location to the next:

int current_record = /* whatever */;
int next_record = current_record + FIXED_RECORD_SIZE;

That's it!

Alternatively, when using string-terminated (or otherwise delimited) records and fields, you could imagine that the next field/record is found by a linear-time scan, which has to look at every character until the delimiter is found. As before,

char DELIMITER = ','; // or whatever
int current_record = /* whatever */;
int next_record = current_record;
while(character_at_location(next_record) != DELIMITER) {
    next_record++;
}

This might be a simplified or naïve version of the real-world implementation, but the general idea still stands: you can't easily do the same operation in constant time, and even if it were constant time, it's unlikely to be as fast as performing a single add operation.




回答2:


I checked this and in my case performance deteriorated! I have a 1GB csv file with integer values, each of them is 10 characters long with padding, fields separated by "," and records separated by "\n". I have to following script (I also tried to set the fixed record size and removed ltrim, but it didn't help).

SQL> CREATE TABLE ints_ext (id0 NUMBER(10),
  2                  id1 NUMBER(10),
  3                  id2 NUMBER(10),
  4                  id3 NUMBER(10),
  5                  id4 NUMBER(10),
  6                  id5 NUMBER(10),
  7                  id6 NUMBER(10),
  8                  id7 NUMBER(10),
  9                  id8 NUMBER(10),
 10                  id9 NUMBER(10))
 11  ORGANIZATION EXTERNAL (
 12  TYPE oracle_loader
 13  DEFAULT DIRECTORY tpch_dir
 14  ACCESS PARAMETERS (
 15         RECORDS DELIMITED BY NEWLINE
 16         BADFILE 'bad_%a_%p.bad'
 17         LOGFILE 'log_%a_%p.log'
 18         FIELDS TERMINATED BY ','
 19         MISSING FIELD VALUES ARE NULL)
 20  LOCATION ('data1_1.csv'))
 21  parallel 1
 22  REJECT LIMIT 0
 23  NOMONITORING;

SQL> select count(*) from ints_ext;

  COUNT(*)
----------
   9761289

Elapsed: 00:00:43.68
SQL> select /*+ parallel(1) tracing(STRIP,1) */ * from ints_ext;

no rows selected

Elapsed: 00:00:43.78

SQL> CREATE TABLE ints_ext (id0 NUMBER(10),
  2                  id1 NUMBER(10),
  3                  id2 NUMBER(10),
  4                  id3 NUMBER(10),
  5                  id4 NUMBER(10),
  6                  id5 NUMBER(10),
  7                  id6 NUMBER(10),
  8                  id7 NUMBER(10),
  9                  id8 NUMBER(10),
 10                  id9 NUMBER(10))
 11  ORGANIZATION EXTERNAL (
 12  TYPE oracle_loader
 13  DEFAULT DIRECTORY tpch_dir
 14  ACCESS PARAMETERS (
 15         RECORDS DELIMITED BY NEWLINE
 16         BADFILE 'bad_%a_%p.bad'
 17         LOGFILE 'log_%a_%p.log'
 18         FIELDS ltrim (
 19         id0 position(1:10) char(10),
 20         id1 position(12:21) char(10),
 21         id2 position(23:32) char(10),
 22         id3 position(34:43) char(10),
 23         id4 position(45:54) char(10),
 24         id5 position(56:65) char(10),
 25         id6 position(67:76) char(10),
 26         id7 position(78:87) char(10),
 27         id8 position(89:98) char(10),
 28         id9 position(100:109) char(10)
 29         ))
 30  LOCATION ('data1_1.csv'))
 31  parallel 1
 32  REJECT LIMIT 0
 33  NOMONITORING;

SQL> select count(*) from ints_ext;


  COUNT(*)
----------
   9761289

Elapsed: 00:00:50.38
SQL> 
select /*+ parallel(1) tracing(STRIP,1) */ * from ints_ext;

no rows selected

Elapsed: 00:00:45.26


来源:https://stackoverflow.com/questions/14538386/how-do-fixed-length-records-and-fixed-length-fields-increases-database-perfo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!