I need to delete all strings consisting of a hyphen followed by a whitespace, but only when the whitespace is not followed by the word \"og\". Example file:
Kult
This might work for you (GNU sed):
sed -r 's/(- (og|eller))|- /\1/g' file
This relies on alternation to re-replace specific cases and the empty backreference to replace the general case.
Given this input file (I added - eller
s since you said in a comment you need to handle them too):
$ cat file
Kultur- og idrettsavdelinga skapar- eller nyska- pande kunst og utvik- lar- eller samfunnet
here's the common sed idiomatic approach:
$ sed 's/a/aA/g; s/- og/aB/g; s/- eller/aC/g; s/- //g; s/aC/- eller/g; s/aB/- og/g; s/aA/a/g' file
Kultur- og idrettsavdelinga skapar- eller nyskapande kunst og utviklar- eller samfunnet
The above works by turning all a
s (or whatever other char you like that's not in your target strings) into aA
so we can then turn the strings we're interested in, - og
and - eller
, into a<some other character>
, e.g. aB
and aC
and at that point we know the only occurrences of aB
and aC
in the input are the newly transformed - og
and - eller
since all of the existing a
s are now aA
.
Now we can just remove all remaining -
s from the file and then convert the aC
s back to - eller
and aB
s back to - og
s and finally all aA
s back to the original a
s.
You can also use a sed chain, first replacing - og
with something nonsensical (like booogabooga
), then performing the replacement, then reversing the booogabooga
.
sed -e 's/- og/booogabooga/g; s/- //g; s/booogabooga/- og/g'
Some versions of sed may need:
sed -e 's/- og/booogabooga/g' -e 's/- //g' -e 's/booogabooga/- og/g'
This can be slower and more painful, especially if you have multiple replacements as @Kusalananda suggests, but it is easier to understand.
The lookahead feature isn't available with sed, but you can describe all possibilities:
sed -e 's/\(- \(- \)*\)\([^o]\|$\|o\([^g]\|$\)\)/\3/g'
You can test it with: - - - - og - - oa - o
=> - og oa o