Quantcast

scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

Alex Repain
Hello,

first and as a newcomer, I apologize if it feels like I mailed the wrong list.

I've been working this summer on packrat algorithms and left-recursion support, using Scala as a main tool. While documenting on the subject, it appeared that the implementation of the Scala 2.8 Packrat parsers was using the algorithm we were studying ( here are the original algorithm description a report about the Scala parser combinator implementation of it ).

Our implementation of the algorithm was failing on some tricky left-recursion cases, so I tried to achieve them using Scala's parser combinators, and I obtained the same failing results.

As an example let's take the following left-recursive grammar :

S <- A | B
A <- Sa | a
B <- Sb | b

This example grammar is supposed to be able to produce any word composed of the letters 'a' or 'b', without limitation of length (empty word excepted).
Scala code is as follow :

package RDP

import scala.util.parsing.combinator.{ImplicitConversions, PackratParsers}

object Test extends PackratParsers with ImplicitConversions  { 
  type Elem = Char  
  lazy val S = A | B
  lazy val A:PackratParser[Any] = S ~ a | a
  lazy val B:PackratParser[Any] = S ~ b | b
  lazy val a = elem('a')
  lazy val b = elem('b')
  def printResult[T](in: java.lang.CharSequence) =   
    println(parse(S, new scala.util.parsing.input.CharSequenceReader(in)))
  def parse[T](p: Parser[T], in: scala.util.parsing.input.Reader[Char]): ParseResult[T] =   
    phrase(p)(in)
}

object CombinatorsTest extends Application {
 
List("a","b","aba","bab","bba","aaa","baaa","aab") foreach (Test go _)
}

and the results with the Scala packrat parsers' algorithm are the following :

[1.2] parsed: a
[1.2] parsed: b
[1.2] failure: `a' expected but b found
aba
 ^
[1.3] failure: `a' expected but b found
bab
  ^
[1.2] failure: `a' expected but b found
bba
 ^
[1.4] parsed: ((a~a)~a)
[1.5] parsed: (((b~a)~a)~a)
[1.3] failure: `a' expected but b found
aab
  ^

The global behaviour on that particular example is to refuse any word that would contain a 'b' which is not in first position of the word. In a more general way, we concluded that many cases of double left-recursion (same 'head' rule or not) are actually unsupported by the algorithm.

My questions are : was this behaviour detected when deciding to use the given algorithm ? If yes, is there a reason why it is considered correct, or at least why it has been implemented as such ?

Warth et al 's algorithm kinda seem to be approved as a solution to support indirect left-recursion, but are there some known limitations of it that could be the source of this misleading behaviour ?

Thanks for your attention,
Alex REPAIN
[hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

Alex Repain
As no answer has been coming until now, and just to let you know, I've been working this summer, along with a research lab of the Kumamoto University (Japan), on an improved algorithm that could take in account the left-recursion cases that seems to be left behind by the current implementation of Scala Packrat parsers. Yet, we need to make sure the new algorithm we are developping is of any use. One of the biggest difficulties with Japanese research might be the international communication of their work, which is why I relayed the information here.

As Scala 2.8 implementation is currently the most famous implementation of the algorithm we tried to improve, that discussion might be of good help for us. Is the Scala Packrat Parsers support for left-recursion supposed to be complete, from the specs of the language itself ?

Alex Repain

2010/10/4 Repain Alex <[hidden email]>
Hello,

first and as a newcomer, I apologize if it feels like I mailed the wrong list.

I've been working this summer on packrat algorithms and left-recursion support, using Scala as a main tool. While documenting on the subject, it appeared that the implementation of the Scala 2.8 Packrat parsers was using the algorithm we were studying ( here are the original algorithm description a report about the Scala parser combinator implementation of it ).

Our implementation of the algorithm was failing on some tricky left-recursion cases, so I tried to achieve them using Scala's parser combinators, and I obtained the same failing results.

As an example let's take the following left-recursive grammar :

S <- A | B
A <- Sa | a
B <- Sb | b

This example grammar is supposed to be able to produce any word composed of the letters 'a' or 'b', without limitation of length (empty word excepted).
Scala code is as follow :

package RDP

import scala.util.parsing.combinator.{ImplicitConversions, PackratParsers}

object Test extends PackratParsers with ImplicitConversions  { 
  type Elem = Char  
  lazy val S = A | B
  lazy val A:PackratParser[Any] = S ~ a | a
  lazy val B:PackratParser[Any] = S ~ b | b
  lazy val a = elem('a')
  lazy val b = elem('b')
  def printResult[T](in: java.lang.CharSequence) =   
    println(parse(S, new scala.util.parsing.input.CharSequenceReader(in)))
  def parse[T](p: Parser[T], in: scala.util.parsing.input.Reader[Char]): ParseResult[T] =   
    phrase(p)(in)
}

object CombinatorsTest extends Application {
 
List("a","b","aba","bab","bba","aaa","baaa","aab") foreach (Test go _)
}

and the results with the Scala packrat parsers' algorithm are the following :

[1.2] parsed: a
[1.2] parsed: b
[1.2] failure: `a' expected but b found
aba
 ^
[1.3] failure: `a' expected but b found
bab
  ^
[1.2] failure: `a' expected but b found
bba
 ^
[1.4] parsed: ((a~a)~a)
[1.5] parsed: (((b~a)~a)~a)
[1.3] failure: `a' expected but b found
aab
  ^

The global behaviour on that particular example is to refuse any word that would contain a 'b' which is not in first position of the word. In a more general way, we concluded that many cases of double left-recursion (same 'head' rule or not) are actually unsupported by the algorithm.

My questions are : was this behaviour detected when deciding to use the given algorithm ? If yes, is there a reason why it is considered correct, or at least why it has been implemented as such ?

Warth et al 's algorithm kinda seem to be approved as a solution to support indirect left-recursion, but are there some known limitations of it that could be the source of this misleading behaviour ?

Thanks for your attention,
Alex REPAIN
[hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

Martin Odersky


On Tue, Oct 12, 2010 at 12:00 PM, Repain Alex <[hidden email]> wrote:
As no answer has been coming until now, and just to let you know, I've been working this summer, along with a research lab of the Kumamoto University (Japan), on an improved algorithm that could take in account the left-recursion cases that seems to be left behind by the current implementation of Scala Packrat parsers. Yet, we need to make sure the new algorithm we are developping is of any use. One of the biggest difficulties with Japanese research might be the international communication of their work, which is why I relayed the information here.

As Scala 2.8 implementation is currently the most famous implementation of the algorithm we tried to improve, that discussion might be of good help for us. Is the Scala Packrat Parsers support for left-recursion supposed to be complete, from the specs of the language itself ?

The language does not specify the parser combinator library at all. I am not sure what the library documentation says -- does it mention left recusion elimination?

Cheers

 -- Martin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

Adriaan Moors-4
Hi Alex,

First of all, it's great to see improvements of our implementation of parser combinators!

Tiark (who's traveling at the moment) would know more about the packrat version -- I only worked on the initial naive implementation (which is described in our TR as not supporting LR: http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW491.pdf)

the packrat implementation does mention supports for LR, but there might be a bug of course.

cheers
adriaan

On Tue, Oct 12, 2010 at 5:53 PM, martin odersky <[hidden email]> wrote:


On Tue, Oct 12, 2010 at 12:00 PM, Repain Alex <[hidden email]> wrote:
As no answer has been coming until now, and just to let you know, I've been working this summer, along with a research lab of the Kumamoto University (Japan), on an improved algorithm that could take in account the left-recursion cases that seems to be left behind by the current implementation of Scala Packrat parsers. Yet, we need to make sure the new algorithm we are developping is of any use. One of the biggest difficulties with Japanese research might be the international communication of their work, which is why I relayed the information here.

As Scala 2.8 implementation is currently the most famous implementation of the algorithm we tried to improve, that discussion might be of good help for us. Is the Scala Packrat Parsers support for left-recursion supposed to be complete, from the specs of the language itself ?

The language does not specify the parser combinator library at all. I am not sure what the library documentation says -- does it mention left recusion elimination?

Cheers

 -- Martin


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

Alex Repain
Hi Adriaan,

Well, I don't think of it as a real bug, but as a flaw in the algorithm used by the implementation. We simulated the algorithm by hand, and with other implementations. On the same corner cases, the results are the same. What we are trying to improve is the algorithm itself (the one described here), and the Scala packrat combinators is one of our references in terms of implementation for this algorithm. As far as I know, this pretty recent algorithm (publication is from 2007) has never been questioned deeply, that's why we have to be careful about our own work, and make sure that the problems we are detecting are really due to the algorithm, and not our implementation.

You can see the problem as a bug only if the specifications for the Scala packrat parsers are stating a complete support for left-recursion. If the flaws in the algorithm that Scala uses have been already detected and mentioned, then this is not a bug, just something to be improved in the future.

Anyway, thanks for your kind answer,
Alex

2010/10/12 Adriaan Moors <[hidden email]>
Hi Alex,

First of all, it's great to see improvements of our implementation of parser combinators!

Tiark (who's traveling at the moment) would know more about the packrat version -- I only worked on the initial naive implementation (which is described in our TR as not supporting LR: http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW491.pdf)

the packrat implementation does mention supports for LR, but there might be a bug of course.

cheers
adriaan


On Tue, Oct 12, 2010 at 5:53 PM, martin odersky <[hidden email]> wrote:


On Tue, Oct 12, 2010 at 12:00 PM, Repain Alex <[hidden email]> wrote:
As no answer has been coming until now, and just to let you know, I've been working this summer, along with a research lab of the Kumamoto University (Japan), on an improved algorithm that could take in account the left-recursion cases that seems to be left behind by the current implementation of Scala Packrat parsers. Yet, we need to make sure the new algorithm we are developping is of any use. One of the biggest difficulties with Japanese research might be the international communication of their work, which is why I relayed the information here.

As Scala 2.8 implementation is currently the most famous implementation of the algorithm we tried to improve, that discussion might be of good help for us. Is the Scala Packrat Parsers support for left-recursion supposed to be complete, from the specs of the language itself ?

The language does not specify the parser combinator library at all. I am not sure what the library documentation says -- does it mention left recusion elimination?

Cheers

 -- Martin



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: scala 2.8 packrat parser combinators : unsupported left-recursion cases ?

Alex Repain
In reply to this post by Martin Odersky


2010/10/12 martin odersky <[hidden email]>


On Tue, Oct 12, 2010 at 12:00 PM, Repain Alex <[hidden email]> wrote:
As no answer has been coming until now, and just to let you know, I've been working this summer, along with a research lab of the Kumamoto University (Japan), on an improved algorithm that could take in account the left-recursion cases that seems to be left behind by the current implementation of Scala Packrat parsers. Yet, we need to make sure the new algorithm we are developping is of any use. One of the biggest difficulties with Japanese research might be the international communication of their work, which is why I relayed the information here.

As Scala 2.8 implementation is currently the most famous implementation of the algorithm we tried to improve, that discussion might be of good help for us. Is the Scala Packrat Parsers support for left-recursion supposed to be complete, from the specs of the language itself ?

The language does not specify the parser combinator library at all. I am not sure what the library documentation says -- does it mention left recusion elimination?


Somehow I can't find other information than the scaladoc page itself, which states that :
Packrat Parsing is a technique for implementing backtracking, recursive-descent parsers, with the advantage that it guarantees unlimited lookahead and a linear parse time. Using this technique, left recursive grammars can also be accepted.

A reference is also given to :
Alessandro Warth, James R. Douglass, Todd Millstein: "Packrat Parsers Can Support Left Recursion." PEPM'08

That paper was the starting point of our work, and our main result has been to find corner-cases which the algorithm presented can't handle. We have been trying to find a more general algorithm able to handle more cases of left-recursion -- the absolute support is yet to be proven for our algorithm, though it seems promising. I'm really interested in carrying on this work, and if our algorithm takes a more formal, official shape, I might bother you about this again.

As of now, given the current implementation of Scala packrat parsers, it seems that the full support for left-recursion can't be guaranteed (unless we made a huge confusion about what to expect from very intricated left-recursive grammars, unfortunately I can't reject that possibility).

Cheers,
Alex REPAIN
Loading...