问题
Working on a socketbased scanner (continuous stream) using Flex for pattern recognition. Flex doesn't find a match that overlaps 'array bounderies'. So I implemented yywrap() to setup new array content as soon yylex() detects <> (it will call yywrap). No success so far.
Basically (for pin-pointing my problem) this is my code:
%{
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define BUFFERSIZE 26
/* 0123456789012345678901234 */
char cbuf1[BUFFERSIZE] = "Hello everybody, lex is su"; // Warning, no '\0'
char cbuf2[BUFFERSIZE] = "per cool. Thanks! ";
char recvBuffer[BUFFERSIZE];
int packetCnt = 0;
YY_BUFFER_STATE bufferState1, bufferState2;
%}
%option nounput
%option noinput
%%
"super" { ECHO; }
. { printf( "%c", yytext[0] );}
%%
int yywrap()
{
int retval = 1;
printf(">> yywrap()\n");
if( packetCnt <= 0 ) // Stop after 2
{
// Copy cbuf2 into recvBuffer
memcpy(recvBuffer, cbuf2, BUFFERSIZE);
//
yyrestart(NULL); // ?? has no effect
// Feed new data to flex
bufferState2 = yy_scan_bytes(recvBuffer, BUFFERSIZE);
//
packetCnt++;
// Tell flex to resume scanning
retval = 0;
}
return(retval);
}
int main(void)
{
printf("Lenght: %d\n", (int)sizeof(recvBuffer)) ;
// Copy cbuf1 into recvBuffer
memcpy(recvBuffer, cbuf1, BUFFERSIZE);
//
packetCnt = 0;
//
bufferState1 = yy_scan_bytes(recvBuffer, BUFFERSIZE);
//
yylex();
yy_delete_buffer(bufferState1);
yy_delete_buffer(bufferState2);
return 0;
}
This is my output:
dkmbpro:test dkroeske$ ./text
Lenght: 26
Hello everybody, lex is su>> yywrap()
per cool. Thanks! >> yywrap()
So no match on 'super'. According to the doc the lexxer is not 'reset' between yywrap's. What do I miss? Thanks.
回答1:
The mechanism for providing a stream of input to flex
is to provide a definition of the YY_INPUT macro, which is called every time flex
needs to refill its buffer [note 1]. The macro is called with three arguments, roughly like this:
YY_INPUT(buffer, &bytes_read, max_bytes)
The macro is expected to read up to max_bytes
into buffer
, and to set bytes_read
to the actual number of bytes read. If there is no more input in this stream, YY_INPUT
should set bytes_read
to YY_NULL
(which is 0). There is no way to flag an input error other than setting the end of file condition. Do not set YY_INPUT
to a negative value.
Note that YY_INPUT
does not provide an indication of where to read the input from or any sort of userdata
argument. The only provided mechanism is the global yyin
, which is a FILE*
. (You could create a FILE*
from a file/socket descriptor with fdopen
and get the descriptor back with fileno
. Other workarounds are beyond the scope of this answer.)
When the scanner encounters the end of a stream, as indicated by YY_INPUT
returning 0, it finishes the current token [note 2], and then calls yywrap
to decide whether there is another stream to process. As the manual indicates, it does not reset the parser state (that is, which start condition it happens to be in; the current line number if line counting is enabled, etc.). However, it does not allow tokens to span two streams.
The yywrap
mechanism is most commonly used when a parser/scanner is applied to a number of different files specified on the command line. In that use case, it would be a bit odd if a token could start in one file and continue into another one; most language implementations prefer their files to be somewhat self-contained. (Consider multi-line string literals, for example.) Normally, you actually want to reset more of the parser state as well (the line number, certainly, and sometimes the start condition), but that is the responsibility of yywrap
. [note 3]
For lexing from a socket, you'll probably want to call recv
from your YY_INPUT
implementation. But for experimentation purposes, here's a simple YY_INPUT
which just returns data from a memory buffer:
/* Globals which describe the input buffer. */
const char* my_in_buffer = NULL;
const char* my_in_pointer = NULL;
const char* my_in_limit = NULL;
void my_set_buffer(const char* buffer, size_t buflen) {
my_in_buffer = my_in_pointer = buffer;
my_in_limit = my_in_buffer + buflen;
}
/* For debugging, limit the number of bytes YY_INPUT will
* return.
*/
#define MY_MAXREAD 26
/* This is technically incorrect because it returns 0
* on EOF, assuming that YY_NULL is 0.
*/
#define YY_INPUT(buf, ret, maxlen) do { \
size_t avail = my_in_limit - my_in_pointer; \
size_t toread = maxlen; \
if (toread > avail) toread = avail; \
if (toread > MY_MAXREAD) toread = MY_MAXREAD; \
*ret = toread; \
memcpy(buf, my_inpointer, toread); \
my_in_pointer += toread; \
} while (0)
Notes
This is not quite true; the buffer state includes a flag which indicates whether the buffer can be refilled. If you use
yy_scan_bytes
, the buffer state created is marked as non-refillable.It's actually a bit more complicated than that, because flex scanners sometimes need to look ahead in order to decide which token has been matched, and the end-of-stream indication might occur during the lookahead. After the scanner backs up to the end of the recognized token, it still has to rescan the lookahead characters, which may contain several more tokens. To handle this, it sets a flag in the buffer state which indicates that end-of-stream has been reached, which prevents
YY_INPUT
from being called each time the scanner hits the end of the buffer. Despite this, it's probably a good idea to make sure that yourYY_INPUT
implementation will continue to return end-of-stream in case it is called again after an end-of-stream return.For another concrete example, suppose you wanted to implement some kind of
#include
mechanism.flex
provides theyy_push_state/yy_pop_state
mechanism which allows you to implement an include stack. You'd callyy_push_state
once theinclude
directive has been scanned, butyy_pop_state
needs to be called fromyywrap
. Again, very few languages would allow a token to start in the included source file and continue following theinclude
directive.
回答2:
Thanks to rice the answer is in redefining the YY_INPUT macro. So I did:
#undef YY_INPUT
#define YY_INPUT(buf, result, max_size) inputToFlex(buf, &result, max_size)
....
void inputToFlex(char *buf, unsigned long int *result, size_t max_size)
{
if( recv(psock, recvBuffer, RECVBUFFERSIZE, MSG_WAITALL) )
{
memcpy(buf, recvBuffer, RECVBUFFERSIZE );
*result = RECVBUFFERSIZE;
}
else
{
*result = YY_NULL;
}
}
This works perfectly, it calls yywrap() when the socket is closed (by the client). Remark the MSG_WAITALL I'm using instead of the more common '0'.
Also note rici's comment 2. If your scanner needs to look-a-head my solution is not sufficient and you need to implement '1 character overlapping buffer management'.
Thank you flex. (it also works very nice for binary streams)
来源:https://stackoverflow.com/questions/23979378/flex-continuous-scanning-stream-from-socket-did-i-miss-something-using-yywra