How to use sed/grep to extract text between two words?

后端 未结 12 2357
春和景丽
春和景丽 2020-11-22 05:25

I am trying to output a string that contains everything between two words of a string:

input:

\"Here is a String\"

output:

相关标签:
12条回答
  • 2020-11-22 05:48

    This might work for you (GNU sed):

    sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file 
    

    This presents each representation of text between two markers (in this instance Here and String) on a newline and preserves newlines within the text.

    0 讨论(0)
  • 2020-11-22 05:48

    To understand sed command, we have to build it step by step.

    Here is your original text

    user@linux:~$ echo "Here is a String"
    Here is a String
    user@linux:~$ 
    

    Let's try to remove Here string with substition option in sed

    user@linux:~$ echo "Here is a String" | sed 's/Here //'
    is a String
    user@linux:~$ 
    

    At this point, I believe you would be able to remove String as well

    user@linux:~$ echo "Here is a String" | sed 's/String//'
    Here is a
    user@linux:~$ 
    

    But this is not your desired output.

    To combine two sed commands, use -e option

    user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
    is a
    user@linux:~$ 
    

    Hope this helps

    0 讨论(0)
  • 2020-11-22 05:49

    You can use two s commands

    $ echo "Here is a String" | sed 's/.*Here//; s/String.*//'
     is a 
    

    Also works

    $ echo "Here is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
     is a
    
    $ echo "Here is a StringHere is a StringHere is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
     is a 
    
    0 讨论(0)
  • 2020-11-22 05:49

    You can use \1 (refer to http://www.grymoire.com/Unix/Sed.html#uh-4):

    echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
    

    The contents that is inside the brackets will be stored as \1.

    0 讨论(0)
  • 2020-11-22 05:57

    Problem. My stored Claws Mail messages are wrapped as follows, and I am trying to extract the Subject lines:

    Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
     link in major cell growth pathway: Findings point to new potential
     therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
     Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
     a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
     identified [Lysosomal amino acid transporter SLC38A9 signals arginine
     sufficiency to mTORC1]]
    Message-ID: <20171019190902.18741771@VictoriasJourney.com>
    

    Per A2 in this thread, How to use sed/grep to extract text between two words? the first expression, below, "works" as long as the matched text does not contain a newline:

    grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
    
    [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
    

    However, despite trying numerous variants (.+?; /s; ...), I could not get these to work:

    grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
    grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
    etc.
    

    Solution 1.

    Per Extract text between two strings on different lines

    sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
    

    which gives

    [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]                              
    

    Solution 2.*

    Per How can I replace a newline (\n) using sed?

    sed ':a;N;$!ba;s/\n/ /g' corpus/01
    

    will replace newlines with a space.

    Chaining that with A2 in How to use sed/grep to extract text between two words?, we get:

    sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
    

    which gives

    [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular  link in major cell growth pathway: Findings point to new potential  therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is  Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as  a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway  identified [Lysosomal amino acid transporter SLC38A9 signals arginine  sufficiency to mTORC1]] 
    

    This variant removes double spaces:

    sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
    

    giving

    [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
    
    0 讨论(0)
  • 2020-11-22 06:00

    You can strip strings in Bash alone:

    $ foo="Here is a String"
    $ foo=${foo##*Here }
    $ echo "$foo"
    is a String
    $ foo=${foo%% String*}
    $ echo "$foo"
    is a
    $
    

    And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:

    $ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
    is a
    
    0 讨论(0)
提交回复
热议问题