How can we use our parser combinator function *chain* to construct more sophisticated parsers?

Well, let’s just try to build a simple parser chain:

`chain1 = chain item (\a -> item)`

While this will indeed parse the original input sequentially (you can easily verify with *parse chain1 “123”*), it will throw away the first result, namely *a*.

Something more must be done: we must explicitly combine the results of both sub-parses. Therefore we need explicit access to the result of the second parse, too. Thus:

chain2 = chain item (\a ->

chain item (\b ->

inject [a,b]))

*[The complete code so far can be downloaded here.]*

*“But where are the Monads?”* you may ask. Well, let’s just turn our whole parser into one. Very little code is needed. In fact just:

instance Monad Parser where

return = inject

(>>=) = chain

So, take a deep breath and read on…

]]>We just turned our parser into a Monad by adding

instance Monad Parser where

return = inject

(>>=) = chain

To see the benefit this gives us let’s return to our parser chain.

We can now express it like this:

mchain1 = do

a <- item

b <- item

return [a,b]

Previously we had written

chain2 = chain item (\a ->

chain item (\b ->

inject [a,b]))

We still use the same variables and the same result, but the chaining itself actually got simpler. And the longer the chain, the more we gain with the Monad’s machinery.

And all this without any magic. By saying

instance Monad Parser where

return = inject

(>>=) = chain

we turned our Parser into a Monad. All previous code will still work, but in addition to that we can now use the syntactic sugar that Monads provide (do-notation).

Before we analyze that sugar in more detail let’s first have a look at the Monad instantiation of *Parser*. A Monad instance must provide at least two functions, *return* and *>>=*. Since our parser already came equipped with everything necessary, we just had to tell GHC (or whatever you’re using), that it should use the appropriate methods: *chain* as our implementation of *>>=* and *inject* as implementation of *return*.

While *inject* is rather straightforward (it just returns its argument as the result of the parse and doesn’t touch the input), *chain* may eventually require a little more thought to understand. The most interesting part is actually the last line:

`(Just a, x) -> parse (f a) x`

In case the first parse succeeded we have its result in *a* and the remaining input in *x*.

Executing *f* on the result gives us the parser we then use to parse the remaining input.

So, without using the Monad’s machinery we essentially had to manually build a function *f* that would give us a parser doing the second parse and combining it with the result of the first parse.

When using the Monad machinery we just make two parses (they’re chained implicitly by the Monad) and return the combined result. No manual function generation!

Note that the values are sort of “extracted” from the parser monad: our *chain* function determines the value to be “extracted”. The *ParseState* remains hidden behind the scenes. The *return* then wraps our values again.

Note, too, that pattern matching can extract the value in very much the same way.

Instead of

`parse (do a <- item; "-> use value of 'a' <-") input`

you can write

`case parse item input of (Just a, _) -> "-> use value of 'a' <-"`

In both cases will the parse value be bound to *a*.

Let’s now make the Monad’s syntactic sugar more explicit:

mchain1 = do

a <- item

b <- item

return [a,b]

is the same as

mchain2 = do

item >>= (\ a ->

item >>= (\b ->

return [a,b]))

which is the same as (substituting our own function names):

mchain3 = do

item `chain` (\a ->

item `chain` (\b ->

inject [a,b]))

which is identical to

mchain4 = do

chain item (\a ->

chain item (\b ->

inject [a,b]))

Compare this to

chain2 =

chain item (\a ->

chain item (\b ->

inject [a,b]))

*[The code for this part can be downloaded here.]*

A word of caution: being a beginner in Haskell myself I do not claim to have the widest of horizons talking about Monads, in fact it’s still rather limited. But I do have a very fresh memory of where the difficulties were for me. Might be a plus. But well, let’s just get started…

Heavily inspired by *“Graham Hutton: Programming in Haskell”* and *“Bryan O’Sullivan, John Goerzen & Don Stewart: Real World Haskell”* this will be about a Parser.

The type definitions:

`type ParseState = String`

newtype Parser a = P (ParseState -> (Maybe a, ParseState))

That is: a *ParseState* is just a String (the remaining input to parse). A *Parser* is a function from a *ParseState* to a pair of *Maybe a* (the result) and another *ParseState* (the input left to parse).

It is now easy to write a function *parse* that drives parsing:

parse :: Parser a -> ParseState -> (Maybe a, ParseState)

parse (P p) ps = p ps

*parse* takes a *Parser* and a *ParseState* and returns a pair of *Maybe a* (the result) and another *ParseState* (the input left to parse). Exactly as we wanted – see above.

Ok, now let’s write three simple parsers. One to handle failures, one to yield a constant result and one that simply returns the first character of the parse state as result.

-- fail a little more gracefully then just calling 'error'

failure s = P (\inp -> (Nothing, ""))

`-- return parameter as parse result without touching the parser's input`

inject :: a -> Parser a

inject s = P (\inp ->(Just s, inp))

`-- just return first item of parse input`

item :: Parser Char

item = P (\inp -> case inp of

[] -> (Nothing, [])

(s:ss) -> (Just s, ss))

So far we can do simple parsing, like for examle for the first item:

`parse item "123"`

In order to build more interesting parsers from these simple building blocks, however, we need the possibility to combine parsers. Two useful combinations are sequential composition (chaining parsers) and alternative parsing (use either parser1 or – if that fails – parser2).

Let’s write two functions that allow just that:

-- combine two parsers into a sequential one

-- Note that the second argument is a function, *not* a parser, that is

-- supposed to return a parser which can/should consider it's argument

-- (the result of the first parse).

chain :: Parser a -> (a -> Parser b) -> Parser b

chain pa f = P resultParser

where

resultParser input = case parse pa input of

(Nothing, x) -> (Nothing, input)

(Just a, x) -> parse (f a) x

-- choice: parse input with 'p1' and return result or, if the parse failed then

-- parse with 'p2' and return result no matter what

por :: Parser a -> Parser a -> Parser a

por p1 p2 = P (\inp -> case parse p1 inp of

(Nothing, x) -> parse p2 inp

x -> x)

While this gives us already everything we need to build rather sophisticated parsers, we still need to understand how to best make use of these building blocks.

So let’s continue here…

]]>