I'm processing multiple files as part of an external table. Is there any way I can get the file name being processed in external tables and place it in the table?
At the moment the only solution I can find is appending the file name to every record in the flat file which isn't ideal from an efficiency standpoint and involves modifying the original data. Surely external tables know what file is being processed at any time?
I'm not aware of any way to capture the file name within the access parameters. As a workaround, rather than modifying the original files you could use a preprocessor to append the file name on the fly. If you had two files, say file_1.csv
containing a,b,1
and file_2.csv
containing c,d,2
, you could have a small shell script like append_filename.sh
:
#!/bin/bash
while read line
do
printf "%s,%s\n" "${line}" "${1##*/}"
done < $1
which you can verify does something helpful by calling the script directly:
$ ./append_filename.sh file_1.csv
a,b,1,file_1.csv
You can then define your external table to call that via the preprocessor
clause, something like:
create table e42 (
col1 varchar2(10),
col2 varchar2(10),
col3 number,
filename varchar2(30)
)
organization external (
type oracle_loader
default directory d42
access parameters (
records delimited by newline
preprocessor 'append_filename.sh'
fields terminated by ','
)
location ('file_1.csv', 'file_2.csv')
);
Table E42 created.
Then the file name is picked up automatically:
select * from e42;
COL1 COL2 COL3 FILENAME
---------- ---------- ---------- ------------------------------
a b 1 file_1.csv
c d 2 file_2.csv
I've stripped the directory path so you only see the file name - you could keep the full path if you prefer, but that may not be necessary and could reveal OS details to people who can only query the table. Note the security guidelines; I've kept it simple here by using one directory for everything, but you should put the preprocessor somewhere else. And of course this is assuming a Unix-y platform or GNU tools; something similar should be possible with a batch file if you're using Windows.
This approach reading line-by-line will be relatively slow for large files; with a 1.5 million-row test file appending the file name took about 80 seconds on my platform. Other built-in tools will be faster; this version with sed
takes just over a second for the same file:
#!/bin/bash
sed -e 's!$!,'"${1##*/}"'!' $1
You could try other alternative such as awk
too; you'd probably need to test a few to see what works best (or fast enough) in your environment.
You might be able to extract that data from the all_external_locations
DDL. It contains the directory name and location parameters specified when you created the table.
来源:https://stackoverflow.com/questions/29104183/getting-an-external-tables-location-and-file-name