Find and replace URLs in postfix files - Linux/Ubuntu

[亡魂溺海] 提交于 2019-12-24 17:43:24

问题


I want to monitor a specific folder. Every new file in this folder should be scanned for URLs. These URLs should be edited, if the domain is not in a defined whitelist.

Example:

blabla http://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html

Whitelist:

http://www.white.com

Result:

blabla httx://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html

What i have tried so far is iwatch with this xml:

<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
<config>
  <guard email="root@localhost" name="IWatch"/>
  <watchlist>
    <title>URL_Filter</title>
    <contactpoint email="admin@test.com" name="Administrator"/>
    <path type="single" syslog="on" alert="off" events="create" exec="sed -i 's/http/httx' %f">/var/test</path>
  </watchlist>
</config>

So with iwatch i can observe the folder "/var/test" for new files. With the sed command i can replace every "http" with "httx". But i have no idea how i could put in a whitelist so that some URLs are not replaced...

--- edit --- Additional information: I want to edit all incoming postfix mails, so that there are no clickable links in it, except some domains, which are on the whitelist. The reason for that is to protect against phishing mails.

Return-Path: <example@gmail.com>
X-Original-To: example@test.de
Delivered-To: example@test.de
Received: from mail-lf0-x236.google.com (mail-lf0-x236.google.com [IPv6:2a00:1450:4010:c07::236])
        by xxxxxxx.hosteurope.de (Postfix) with ESMTPS id D255223CB59
        for <example@test.de>; Mon, 11 Apr 2016 14:44:10 +0200 (CEST)
Received: by mail-lf0-x236.google.com with SMTP id c126so154788483lfb.2
        for <example@test.de>; Mon, 11 Apr 2016 05:39:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:date:message-id:subject:from:to;
        bh=WwH+NIkCWDEoIkwbeCI4pf0jP0ya/ctbQ81pUsA4G7s=;
        b=ZS3Uo/cpVGNw3k38Js2+/DxVda0y2136oy4D4hsR0G25x2UjhyVU/yUcPl6qEdxt8i
         CQXZHQbaf8pzCdDaSq4VL9RC/sIgZy3PQzj6Cyrp3WTi6SMmQ65NwNBWLVGnpPcuzNW1
         IGC5N3rjj96ndYUAxia/tTcBX7ajS3Tw9Mc8yIaO13hSXMUCrTDIFZNzHR1ib7tLDpmX
         6EVyFhquhIfJVOhcuPgWUUxHly/FmZ++ucoHR0Yozj+dc1GJ6/ZYzUAPdGICelDY7ieG
         nvA7KH6+v6/zoWlbfkO9BmGzAPs6M4LGHilOjpMf/09Z2oMiV/WRDxe0WrCebQptpm2c
         xHPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:mime-version:date:message-id:subject:from:to;
        bh=WwH+NIkCWDEoIkwbeCI4pf0jP0ya/ctbQ81pUsA4G7s=;
        b=hAOSzKjertcsQIT/PHoZKsiKxLba8gaKOCmyNg7nmiPJjCWqobNvM5nf3sZP1Xhysi
         gGdvk9mmMugII8dsjc7mRhDkbCT1QKVz/0UBQ+CaP6sK7kGdWfdarphGgzUGA6Il5JZi
         lP4DpEQHUpG1wJ1r+dN2f+UT8tyfIwapXwo3g7FnkPLxmCq9CeqJeRlagL6vAacon8z7
         CjdTHB7fzEtYToSp+cDi3+yK4zS9p4rwF4H4Ds3bJqwM/PrcFJW0YYncDHdra5TwYf6U
         K6VRX19iUhQT4kTVFCtoNW9SU8Ri+Rc5VfvVTKRh4KwZ2uW5x8y07ucB0vZcAQdEnms4
         AWnQ==
X-Gm-Message-State: AD7BkJJEDmk9P+Kzcn1MT4lQxpU1aYU6x8uABSpohCbT7EeOFAXjT1y6n3sFcRj7tcfWc6eBAOL6bJ78jvVOlQ==
MIME-Version: 1.0
X-Received: by 10.112.63.196 with SMTP id i4mr8426739lbs.93.1460378359811;
 Mon, 11 Apr 2016 05:39:19 -0700 (PDT)
Received: by 10.114.66.51 with HTTP; Mon, 11 Apr 2016 05:39:19 -0700 (PDT)
Date: Mon, 11 Apr 2016 14:39:19 +0200
Message-ID: <CADF5gVU+C4BZCSFSiWeiBipBnDu5jTU+FVmLJbSQSbtMM9JZcQ@mail.gmail.com>
Subject: test
From: Example <example@gmail.com>
To: example@test.de
Content-Type: multipart/alternative; boundary=001a1133d4405fd878053034d55a
X-Scanned-By: MIMEDefang 2.71 on 5.38.258.144

--001a1133d4405fd878053034d55a
Content-Type: text/plain; charset=UTF-8

http://www.example.com
http://www.white.com

--001a1133d4405fd878053034d55a
Content-Type: text/html; charset=UTF-8

<div dir="ltr"><div><a href="http://www.example.com">http://www.example.com</a><br></div><a href="http://www.white.com">http://www.white.com</a><br></div>

--001a1133d4405fd878053034d55a--

回答1:


Just realized the bash script is un-necessary, we can do it using the following one-liner but it's really cryptic to read:

Input data:

$ cat data
sdfsdfsdfsdf http://www.whitedomain.com/red.html
bla http://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html
$ cat whitelist 
http://www.white.com
http://www.whitedomain.com
$

Final Output:

$ sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|')"'/! s/http/httx/g' data
sdfsdfsdfsdf http://www.whitedomain.com/red.html
bla httx://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html
$

Explanation:

Output of inner subshell command is a regex(to filter out lines during sed substitution command)

$ sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|'
http:\/\/www\.white\.com|http:\/\/www\.whitedomain\.com

Flow:

  1. form the regex dynamically using inner subshell command escaping all meta characters in sed and then piping it to paste to add alternations
  2. Using the above output in the sed command to filter out lines not having any of the whitelist domains and using those lines for substitution of http into httx

Edit1: Since sed is line oriented you will have to transform the data into lines of text like this:

$ cat data1 
<div dir="ltr"><div><a href="http://www.white.com">http://www.white.com</a><br></div><a href="http://www.example.com">http://www.example.com</a><br></div>
$ cat whitelist 
http://www.white.com
http://www.whitedomain.com
$ sed 's/</\n</g' data1 | sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|')"'/! s/http/httx/g'

<div dir="ltr">
<div>
<a href="http://www.white.com">http://www.white.com
</a>
<br>
</div>
<a href="httx://www.example.com">httx://www.example.com
</a>
<br>
</div>
$



回答2:


You can use Perl to do that. I recommend installing the Regexp::Common package from CPAN and using Regexp::Common::URI to find the URIs, then maintain a whitelist of host names and check those. It's a bit long for a one-liner though.

use strict;
use warnings;
use Regexp::Common qw /URI/;

my %whitelist = (
    'http://www.white.com' => 1,
    'http://www.example.org' => 1,
);

while (my $line = <>) {
    MATCH: foreach my $match ($line =~ /($RE{URI}{HTTP})/g ){
        # check the whitelist
        next MATCH if grep { $match =~ /^$_/i } %whitelist;

        # no whitelist entry, replace
        my $match_updated = $match;
        $match_updated =~ s/^http/httx/;
        $line =~ s/$match/$match_updated/;
    }
    print $line;
}

Save that as something meaningful, maybe remove_phishing_links.pl in a directory that the iwatch thingy can access. I'm doing ~, but I have no clue if that would work. Now you would call that in your iwatch file with something like this.

<path 
  type="single" 
  syslog="on" 
  alert="off" 
  events="create" 
  exec="perl -i ~/remove_phishing_links.pl %f">/var/test</path>

It will, just like the sed command, edit the file in %f in place. It reads line by line, finds http URIs, checks if they start with any of the whitelist entries, and if not, replaces the http with httx.

Note that this will not work for base64 encoded MIME emails, or if there are line breaks within the URIs.

If you don't wnat to install Regexp::Common, you can also borrow the regular expression for URIs from the URI module documentation on CPAN and alter it to only find https?.



来源:https://stackoverflow.com/questions/36543377/find-and-replace-urls-in-postfix-files-linux-ubuntu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!