问题
I have a file which contains below details : file.txt
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `dv.par_kst`( |
| `col1` string, |
| `col2` string, |
| `col3` int, |
| `col4` int, |
| `col5` string, |
| `col6` float, |
| `col7` int, |
| `col8` string, |
| `col9` string, |
| `col10` int, |
| `col11` int, |
| `col12` string, |
| `col13` float, |
| `col14` string, |
| `col15` string) |
| PARTITIONED BY ( |
| `part_col1` int, |
| `part_col2` int) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION |
| 'hdfs://nameservicets1/dv/hdfsdata/par_kst' |
| TBLPROPERTIES ( |
| 'spark.sql.create.version'='2.2 or prior', |
| 'spark.sql.sources.schema.numPartCols'='2', |
| 'spark.sql.sources.schema.numParts'='1', |
| 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"col1","type":"string","nullable":true,"metadata":{}},{"name":"col2","type":"string","nullable":true,"metadata":{}},{"name":"col3","type":"integer","nullable":true,"metadata":{}},{"name":"col4","type":"integer","nullable":true,"metadata":{}},{"name":"col5","type":"string","nullable":true,"metadata":{}},{"name":"col6","type":"float","nullable":true,"metadata":{}},{"name":"col7","type":"integer","nullable":true,"metadata":{}},{"name":"col8","type":"string","nullable":true,"metadata":{}},{"name":"col9","type":"string","nullable":true,"metadata":{}},{"name":"col10","type":"integer","nullable":true,"metadata":{}},{"name":"col11","type":"integer","nullable":true,"metadata":{}},{"name":"col12","type":"string","nullable":true,"metadata":{}},{"name":"col13","type":"float","nullable":true,"metadata":{}},{"name":"col14","type":"string","nullable":true,"metadata":{}},{"name":"col15","type":"string","nullable":true,"metadata":{}},{"name":"part_col1","type":"integer","nullable":true,"metadata":{}},{"name":"part_col2","type":"integer","nullable":true,"metadata":{}}]}', |
| 'spark.sql.sources.schema.partCol.0'='part_col1', |
| 'spark.sql.sources.schema.partCol.1'='part_col2', |
| 'transient_lastDdlTime'='1587487456') |
+----------------------------------------------------+
from above file I want to extract PARTITIONED BY details.
Desired output :
part_col1 , part_col2
and these PARTITIONED BY is not fixed , means for some other file it might contains 3 or more , so I want extract all the PARTITIONED BY.
All the values between PARTITIONED BY and ROW FORMAT SERDE , removing spaces "`" and data types!
Could you please help me with this ?
回答1:
sed -nr '/PARTITIONED BY/,/ROW FORMAT SERDE/p' a.txt|sed -nr '/`/p'|cut -d '`' -f 2|xargs -n 1 echo -n " "
回答2:
my $text = do { local $/; <DATA> };
my @partitioned = ();
$text=~s#PARTITIONED BY\s*\(([^\(\)]*)\)# my $fulcontent=$1;
push (@partitioned, $1) while($fulcontent=~m/\`([^\`]+)\`/g);
($fulcontent);
#egs;
print join "\, ", @partitioned;
Output:
part_col1, part_col2
回答3:
When the layout of your result doesn't matter, you can ask sed
to consider lines between a start and an end tag, and only print such a line when a field can be found between 2 backquotes.
sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1/p' file.txt
Combining the results in a line as desired can be done with
printf "%s , " $(sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1 /p' file.txt) |
sed 's/ , $/\n/'
回答4:
Small perl script
- read whole file into
$data
variable - select all between
PARTITIONED BY (....)
- select into array only elements between `
- print result joined with
,
use strict;
use warnings;
use feature 'say';
my $data = do { local $/; <> };
my $re = 'PARTITIONED BY \((.*?)\)';
$data =~ /$re/sg;
my @part = $1 =~ /`(.*?)`/sg;
say join ', ', @part;
来源:https://stackoverflow.com/questions/61352700/find-and-extract-value-after-specific-string-from-a-file-using-bash-shell-script