This week on Perl 6 (8/26 - 9/1, 2002)
by Piers Cawley
|
Pages: 1, 2, 3
Rule, rx and sub
Deborah Ariel Pickett summarized the state of her understanding of the
difference between rule and rx and wondered if there was any
case where
... rule ...
and ... rx ...
(given the same ...s in both cases), lead to valid, but different
semantics. Uri Guttman thinks not. Damian thinks so, and provided an
example. (It was joked, on the London.pm mailing list (by Damian
himself) that Damian is currently our only real, live Perl6
interpreter.). Luke Palmer raised a red flag about Damian's example;
Damian thinks it wasn't a red flag, but left it to Larry to
adjudicate. This also provoke a certain amount of discussion about the
philosophy behind some of the design decisions so far.
Glenn Linderman wondered whether rx shouldn't be respelled, as the term
'regex' is being deprecated. Damian suggested that rx actually
stood for 'Rule eXpedient', but I'm not sure he convinced anyone
(himself included). Ever the linguist, Larry observed that 'we can
tweak what people mean by "regular expression", but there's no way on
earth we can stop them from using the term.' and that, no matter how
many editions it goes through, Friedl's book is always going to be
called Mastering Regular Expressions. So, Larry is 'encouraging use
of the technical term "regex" as a way to not precisely mean "regular
expression".'
Piers Cawley raised a question about when } terminates a
statement and got it wrong. This subthread led to a short discussion
on good Perl 6 style. Damian told us that 'Any subroutine/function
like if that has a signature that ends in a &sub argument can be
parsed without the trailing semicolon', which I don't remember seeing
in any Apocalypse. This led to a discussion about what was legal in a
prototype specifier, ending when Larry told us that it'd be possible
to specify a grammar as a function's prototype.
Auto deserialization
At the root of what turned into a large thread, Steve Canfield asked a
deceptively simple question: '[Will] code like this Do What I Mean:
my Date $bday = 'June 24, 2002''? We weren't entirely sure what
he meant by that...
The thread was long, and pretty much unsummarizable, but we ended up
with the rather pleasant looking my Date $date .= new('Jun 24,
20002'), the idea being that, because $date is known to be a
Date, even if it's undefined, then it's possible to make a static
method call on it. The response to this suggestion spilt over into
the next week, but 'favourable' would be a good description of it.
Hypothetical synonyms
Aaron Sherman wondered if he would be able to write
$stuff = $field if m{^\s*[
"(.*?)" {let $field = $1} |
(\S+) {let $field = $2}]};
Larry thought
my $stuff;
m{^\s*[
"$stuff:=(.*?)" |
$stuff:=(\S+)
]};
was a better way of doing it, saying that he saw no 'particular reason why a top-level regex can't refer to variables in the surrounding scope, either by default, or via a :modifier of some sort.'
Uri Guttman, in possibly the first ever Perl 6 golf post (he denies
it's really golf), suggested a way of shortening the pattern further,
and Larry trumped him by shortening it to my $field =
/<shellword>/, which led Nicholas Clark to wonder about oneliners
along the lines of my $data = /<xml>/ and wondered if the Perl
regex engine would be faster than using expat. Nick also wondered
if Perl 6 would give shorter golf solutions than Perl 5.
There was quite a bit more in this thread, but my summarizing skills are failing.
http://makeashorterlink.com/ -- Thread starts here, it's jolly good.
Does ::: constrain the pattern engine implementation
Deven T. Corzine wondered if the presence of ::: and friends in the
pattern language meant we'd constrained the possible implementation of
the pattern engine before we'd started, and if we could implement
something that didn't do backtracking. General opinion seemed to be
that we couldn't avoid backtracking, but Deven wondered if it wouldn't
be possible to use a non backtracking implementation for some special
cases. The consensus appears to be 'If you build it, and it's faster,
we will come'.
Backtracking into { code }
Ken Fox wondered if
rule expr1 { <term> { /@operators/ or fail } <term> };
and
rule expr2 { <term> @operators <term> }
were equivalent. Damian thought not, and added that expr1 should
probably be rewritten as rule expr1 { <term> { m:cont/@operators/
or fail } <term> }. Larry says that we will backtrack into
subrules.
Again, the whole thread is worth reading if you're interested in the rules/patterns/regex engine.
Prebinding questions
Philip Hellyer asked a bunch of questions sparked by Damian's talk about Perl 6 to London.pm (at the Conway Hall no less, London.pm knows how to find appropriate venues). Damian answered them.
@array = %hash
Nicholas Clark noted that @array = %hash, for a hash of n elements
would return an array of n pairs. The Perl5 style, returning a list of
2n elements, keys and values interleaved would be @array =
%hash.kv. All this led Nick to wonder what happened in the other
direction. Obviously %hash = @list_of_pairs was going to do the
right thing, but what about %hash = @kv_array. And, more
worryingly, what about
%hash = ("Something", "mixing", pairs => "and", "scalars");
It turns out that the @kv_array case will Just Work, and the last
case will cause discussion to break out. Damian thought that the
example above would throw an error because there are 5 elements in the
list. Another school thought that, because PAIRs are first class
objects in Perl 6, the code should work, with one of the keys of the
hash being the pair (pairs = 'and')>. Damian thought not, and
discussion ensued. I'm afraid I'm not entirely well qualified to
summarize this thread as I'm one of those who thinks Damian is wrong,
or at least, not yet sufficiently correct. However, for now, the state
of the design is such that pairs are 'special' and it takes an effort
of compile time will to use them as keys (or values come to that) in
a hash.
Regex stuff
Choosing a deliberately vague subject line in an effort to give the summarizer a headache, Piers Cawley asked a question about binding to numeric hypotheticals. It turns out that binding to a numeric hypothetical variable in a regular expression is special cased (resetting the numeric 'counter') and even mentioned in the appropriate apocalypse, and the problem that Piers thought he saw doesn't actually exist.
Atomicness and \n
Aaron Sherman wondered what \n would be translated to in a Perl 6
pattern. Aaron proposed <[\x0a\x0d...]+>. Damian thought it was
<[\x0a\x0d]>, and Ken Fox thought it would be something like
\0xd \0xa | \x0d | \x0a. Personally I think it'll be [ \0xd
\0xa | <[\x0a\x0d...]> ]. (I also believe that whoever came up with
the idea of a two character end of line marker should be taken out and
shot, but that's another story entirely).

