how to implement lexical analyser and parser in Haskell

前端 未结 1 1349
攒了一身酷
攒了一身酷 2021-01-31 06:59

I got this piece of code here, it is a program written in an imperative programming language structured in Haskell, so the question is \"how can I implement a lexer analyser and

相关标签:
1条回答
  • 2021-01-31 07:30

    Getting Started Parsing with Parsec

    I'll not write the whole thing for you, but I'll get you started with each bit. The three stages we'll go through are:

    1. Define the grammar
    2. Make an Abstract Syntax Tree. (This will look like the grammar, so will be quite easy.)
    3. Write a parser. (This will also look like the grammar, so will be quite easy.)

    We could make a separate lexing stage between 2 and 3, but Parsec is happy to do both levels. Lexing is where you split the input into tokens - bits of the input that make sense - the equivalent of words in human language, which are also known as lexemes. Skipping a separate lexing phase means we need to be a little more explicit about whitespace etc.

    1. Grammar

    First you need to define the grammar. Best done with paper and pencil, but I'll get you started:

      program ::= line {[newline line]}
      line ::= num dot whitespace statement
      statement ::= declaration | write | ifstatement | goto | assignment | stop
      declaration = "Int" whitespace varname equals int
      varname = letter {[alphanum]}
      -- more things here, including the more interesting ifstatement:
      ifstatement = "if" whitespace expression whitespace equals expression whitespace statement
    
    
      -- lower level stuff:
      dot = "."
      int = digit {[digit]}
      digit = "0" | "1" | ...  | "9"
      whitespace = " " okspace | "\t" okspace
      okspace = "" | whitespace
    

    Have a think about how that matches your sample program, and think about how you'll finish it off:

    1. Int n=5
    2. write n
    3. Int fac=1
    4. if 0 n goto 8          -- unusual
    5. fac := fac * n
    6. n := n+1               -- complex
    7. goto 4
    8. write fac
    9. stop
    

    The if statement in line 4 is unusual because there's no = or == in it. Perhaps that's to simplify the grammar and it can only accept single variables or integers, with a space between. Perhaps that's a typo and you mean to have an equals sign and arbitrary expressions. Find out which and rewrite the ifstatement part of the grammar.

    The assignment in line 6 is complex, because here you have to parse arbitrary arithmetic expressions. As far as I recall there are loads of examples of that knocking about, so I'll gladly skip that for now. Make it another question if you get stuck with that bit, but hopefully you'll have built up your parsing skills with the rest of it first.

    2. Abstract Syntax Tree (AST)

    An Abstract Syntax Tree represents the combination of tokens that make up your input. In Haskell we can define our own to suit the context, which will make life much simpler

    I'm actually compiling this answer (a good way of checking for typos etc), which is why I need some declarations at the top of the code:

    module ParseLang where
    
    import Text.Parsec hiding (Line)
    import Text.Parsec.String
    import Control.Applicative hiding ((<|>), many)
    

    We'll just make a Program a list of Lines, but enforce with the parser that there has to be at least one.

    type Program = [Line]
    

    For a Line, it needs a number, and a statement, but the dot is just syntax we don't need to store. We can store the line number as an Int, so although that allows negative numbers in the type declaration, again the parser won't accept negative.

    data Line = Line {aNum::Int, aStatement :: Statement} 
       deriving Show
    

    Multiple options are easy to define:

    data Statement = Declaration VarName Int
                   | Write VarName
                   | IfStatement Expression Expression Statement
                   | Goto LineNumber
                   | Assignment VarName Expression
                   | Stop
       deriving Show
    

    Notice the absence of all the syntax cruft/connectives/equal signs leaving just the bits that change.

    I'm stopping there - you can finish:

    data Expression = Expression -- I've left this one for you
       deriving Show
    
    type VarName = String   -- could use newtype for type safety for these to
    type LineNumber = Int   
    

    The lower level syntax doesn't need representing in the AST because we'll use strings for it.

    3. The Parser

    This bit is now nice and easy. Let's start at the bottom of the syntax tree and work up.

    num :: Parser Int
    num = read <$> many digit
    

    We've used <$>, which is a synonym for fmap that we got by importing Control.Applicative. Here it alters the values returned by the parser using the pure function on the left, in this case, read. Have a look at this other answer for an introduction to fmap if you're not used to it.

    Let's make a parser that parses a string literal then some whitespace:

    whitespace = space >> spaces  -- a space then optional spaces
    
    lit :: String -> Parser String
    lit xs = string xs <* whitespace
    

    Now that <* is interesting. It looks like <*>, which in effect combines two parsers, and is used in conjunction with <$> which in effect maps a pure function onto the result. *> and <* combine two parsers, but ignore the output of one of them, so string "goto" <* whitespace parses a "goto" and some whitespace, but throws away the whitespace.

    Now we're ready to parse a goto statement:

    goto :: Parser Statement
    goto = Goto <$> (lit "goto" *> num)
    

    Now let's have a go at varName

    varName :: Parser VarName
    varName = (:) <$> letter <*> many (alphaNum <|> oneOf "'_")
    

    A couple of things are going on there.

    1. <|> is alternative choice - one or the other, so (alphaNum <|> oneOf "'_") accepts an alphanumeric character or one of that pair of innocent characters ' and _ you might want to include in a variable name.

    2. f <$> parser1 <*> parser2 is a really nice way of combining parsers. It runs parser1, then parser2, then maps the function f over the results f that they produced. It works with many parsers:

    --ifstatement = "if" whitespace expression whitespace equals whitespace expression whitespace statement
    ifStatement :: Parser Statement
    ifstatement = IfStatement 
              <$> (lit "if" >> expression) 
              <*> (lit "=" >> expression)    -- I put = in, but see below
              <*> (whitespace >> statement)  -- I'd be happier with a then in here
    

    If you only allow a VarName or Int instead of a general Expression, then you don't need the equals sign.

    Here's how you put it together:

    statement :: Parser Statement
    statement = goto
            <|> stop
            <|> declaration 
            <|> write
            <|> ifStatement
            <|> assignment
    
    
    --program ::= line {[newline line]}
    program :: Parser Program
    program = many1 line
    
    
    --line ::= num dot whitespace statement
    line :: Parser Line
    line = Line <$> (num <* char '.') <*> (statement <* char '\n')
    

    But I'll leave you with error messages every time you try to use a parser you've not finished yet, so the whole thing will compile OK and the bits you've defined should work.

    stop =        error "You've not yet defined stop"
    declaration = error "You've not yet defined declaration"
    write =       error "You've not yet defined write"
    ifStatement = error "You've not yet defined ifStatement"
    assignment =  error "You've not yet defined assignment"
    expression =  error "You've not yet defined expression"
    
    0 讨论(0)
提交回复
热议问题