问题
I'm writing a regular expression that can interactively validate SMTP responses codes, once the SMTP dialog is completed it should pass the following regex (some parentheses added for better readability):
^(220)(250){3,}(354)(250)(221)$
Or with(out) authentication:
^(220)(250)((334){2}(235))?(250){2,}(354)(250)(221)$
I'm trying to do rewrite the above regexes so that I can interactively check if the dialog is going as expected, otherwise politely send a QUIT
command and close the connection saving bandwidth and time, but I'm having a hard time writing an optimal regex. So far I've managed to come up with:
^(220(250(334(235(250(354(250(221)?)?)?){0,})?){0,2})?)?$
Which, besides only matching authenticated connections, has some bugs... For instance, it matches:
220250334235250354250221
220250334334235250354250221
I've also tried the following modification:
^(220(250)?)?((334(235)?){2})?(250(354(250(221)?)?)?){0,}$
This one accepts non-authenticated responses but it fails to match 220250334
and wrongly matches 220250334334235250354250221
(at least 2 250
are needed before the 354
response code).
Can someone help me out with this? Thanks in advance.
An example of what I'm trying to do:
$smtp = fsockopen('mail.example.com', 25);
$result = null;
$commands = array('HELO', 'AUTH LOGIN', 'user', 'pass', 'MAIL FROM', 'RCPT TO', 'RCPT TO', 'DATA', "\r\n.", 'QUIT');
foreach ($commands as $command)
{
$result .= substr(fgets($smtp), 0, 3);
if (preg_match('~^(220(250)?)?((334){1,2}(235)?)?(250(354(250(221)?)?)?){0,}$~S', $result) > 0)
{
fwrite($smtp, $command . "\r\n");
}
else
{
fwrite($smtp, "QUIT\r\n");
fclose($smtp);
break;
}
}
Which should act as a replacement for the following procedural code:
$smtp = fsockopen('mail.example.com', 25);
$result = substr(fgets($smtp), 0, 3); // 220
if ($result == '220')
{
fwrite($smtp, 'HELO' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 220
if ($result == '250')
{
fwrite($smtp, 'AUTH LOGIN' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 334
if ($result == '334')
{
fwrite($smtp, 'user' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 334
if ($result == '334')
{
fwrite($smtp, 'pass' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 235
if ($result == '235')
{
fwrite($smtp, 'MAIL FROM' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 250
if ($result == '250')
{
foreach ($to as $mail)
{
fwrite($smtp, 'RCPT TO' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 250
if ($result != '250')
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
break;
}
}
if ($result == '250')
{
fwrite($smtp, 'DATA' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 354
if ($result == '354')
{
fwrite($smtp, "\r\n.\r\n");
$result = substr(fgets($smtp), 0, 3); // 250
if ($result == '250')
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
if ($result == '221')
{
echo 'SUCESS!';
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
}
else
{
fwrite($smtp, 'QUIT' . "\r\n");
$result = substr(fgets($smtp), 0, 3); // 221
fclose($smtp);
}
回答1:
I presume you're building a string with all the response codes you receive, stripping out the rest of the message?
This is probably not the answer you want, but I can't help but get the feeling that regex is just not the right tool here. Regular expressions are good at parsing text into tokens or extracting interesting sub-strings out of a larger string. But you already have tokens (SMTP response codes) and you're trying to ensure that they arrive in the expected order. I'd just add the response codes to a queue and after every addition check whether the start of the queue matches one of the expected pattern for the state that you're in. If it does, remove that part from the queue and go to the next state. There are only a few states, so I'd just write code specific to those, rather than try to abstract it into some kind of a state machine.
If you do go the Regex way you might want to keep space in the string as separators - it would not only make it easier to match codes, but easier to read the program as well.
Edit: Thanks for posting the code. It's pretty much what I assumed. You're basically trying to create an abstract solution to this problem, so you have the ability to send an a given array of commands and expect back a given pattern of responses. You really don't need to make it abstract - the added complexity is huge and unlikely to pay off in re-use. Just write the code that says: send X, if you receive Y continue, otherwise QUIT. It will be so much easier and more readable.
回答2:
It's amazing how regular expressions become so much easier after a good night of sleep, here it is:
(?>220(?>250(?>(?>334){1,2}(?>235)?)?(?>(?>250){1,}(?>354(?>250(?>221)?)?)?)?)?)?
Which can be simplified to this:
^220(?>250(?>(?>334){1,2}(?>235)?)?(?>(?>250){1,}(?>354(?>250)?)?)?)?$
Since the first response code (220) is not optional and we will always send the last QUIT
command.
来源:https://stackoverflow.com/questions/2917998/regex-to-validate-smtp-responses