Exegesis 3
by Damian Conway
|
Pages: 1, 2, 3, 4, 5, 6, 7
Editor's note: this document is out of date and remains here for historic interest. See Synopsis 3 for the current design information.
Better living through sigils
We should also note that the binding of the @costs array: my @costs := @%data{$file}{costs};
shows yet another case where Perl 6's sigil semantics are much DWIM-mier
than those of Perl 5.
In Perl 5 we would probably have written that as:
local *costs = \ @$data{$file}{costs};
and then spent some considerable time puzzling out why it wasn't working,
before realising that we'd actually meant:
local *costs = \ @{$data{$file}{costs}};
instead.
That's because, in Perl 5, the precedence of a hash key is relatively low, so:
@$data{$file}{costs} # means: @{$data}{$file}{costs}
# i.e. (invalid attempt to) access the 'costs'
# key of a one-element slice of the hash
# referred to by $data
# problem is: slices don't have hash keys
whereas:
@{$data{$file}{costs}} # means: @{ $data{$file}{costs} }
# i.e. dereference of array referred to by
# $data{$file}{costs}
The problem simply doesn't arise in Perl 6, where the two would be written
quite distinctly, as:
%data{@($file)}{costs} # means: (%data{@($file)}).{costs}
# (still an error in Perl 6)
and:
@%data{$file}{costs} # means: @{ %data{$file}{costs} }
# i.e. dereference of array referred to by
# %data{$file}{costs}
respectively.
That's not a number...now that's a number!
|
|
One of the perennial problems with Perl 5 is how to read in a number. Or rather, how to read in a string...and then be sure that it contains a valid number. Currently, most people read in the string and then either just assume it's a number (optimism) or use the regexes found in perlfaq4 or Regexp::Common to make sure (cynicism).
Perl 6 offers a simpler, built-in mechanism.
Just as the unary version of binary underscore (_) is Perl 6's explicit stringification specifier, so to the unary version of binary plus is Perl 6's explicit numerifier. That is, prefixing an expression with unary + evaluates that expression in a numeric context. Furthermore, if the expression has to be coerced from a string and the string does not begin with a valid number, the stringification operator returns NaN, the not-a-number value.
That makes it particularly easy to read in numeric data reliably:
my $inflation;
print "Inflation rate: " and $inflation = +<>
until $inflation != NaN;
The unary + takes the string returned by <> and converts
it to a number. Or, if the string can't be interpreted as a number, +
returns
NaN. Then we just go back and try again until we do get
a valid number.
Note that these new semantics for unary + are a little different from its role in Perl 5, where it is just the identity operator. In Perl 5 it's occasionally used to disambiguate constructs like:
print ($x + $y) * $z; # in Perl 5 means: ( print($x+$y) ) * $z;
print +($x + $y) * $z; # in Perl 5 means: print( ($x+$y) * $z );
To get the same effect in Perl 6, we'd use the adverbial colon instead:
print ($x + $y) * $z; # in Perl 6 means: ( print($x+$y) ) * $z;
print : ($x + $y) * $z; # in Perl 6 means: print( ($x+$y) * $z );
Schwartzian pairs
Another handy use for pairs is as a natural data structure for implementing the Schwartzian Transform. This caching technique is used when sorting a large list of values according to some expensive function on those values. Rather than writing: my @sorted = sort { expensive($a) <=> expensive($b) } @unsorted;
and recomputing the same expensive function every time each value is compared
during the sort, we can precompute the function on each value once. We
then pass both the original value and its computed value to sort,
use the computed value as the key on which to sort the list, but then return
the original value as the result. Like this:
my @sorted = # step 4: store sorted originals
map { $_.[0] } # step 3: extract original
sort { $a.[1] <=> $b.[1] } # step 2: sort on computed
map { [$_, expensive($_) ] } # step 1: cache original and computed
@unsorted; # step 0: take originals
The use of arrays can make such transforms hard to read (and to maintain), so people sometimes
use hashes instead:
my @sorted =
map { $_.{original} }
sort { $a.{computed} <=> $b.{computed} }
map { {original=>$_, computed=>expensive($_)} }
@unsorted;
That improves the readability, but at the expense of performance. Pairs
are an ideal way to get the readability of hashes but with (probably) even
better performance than arrays:
my @sorted =
map { $_.value }
sort { $a.key <=> $b.key }
map { expensive($_) => $_ }
@unsorted;
Or in the case of our example program:
@costs = map { $_.value }
sort { $a.key <=> $b.key }
map { amortize($_) => $_ }
@costs ^* $inflation;
Note that we also used a hyper-multiplication (^*) to multiply
each cost individually by the rate of inflation before sorting them. That's
equivalent to writing:
@costs = map { $_.value }
sort { $a.key <=> $b.key }
map { amortize($_) => $_ }
map { $_ * $inflation }
@costs;
but spares us from the burden of yet another map.
More importantly, because @costs is an alias for @%data{$file}{costs}, when we assign the sorted list back to @costs, we're actually assigning it back into the appropriate sub-entry of %data.
The ∑ of all our fears
Perl 6 will probably have a built-in sum operator, but we might still prefer to build our own for a couple of reasons. Firstly sum is obviously far too long a name for so fundamental an operation; it really should be ∑. Secondly, we may want to extend the basic summation functionality somehow. For instance, by allowing the user to specify a filter and only summing those arguments that the filter lets through.Perl 6 allows us to create our own operators. Their names can be any combination of characters from the Unicode set. So it's relatively easy to build ourselves a ∑ operator:
my sub operator:∑ is prec(\&operator:+($)) (*@list) {
reduce {$^a+$^b} @list;
}
We declare the ∑ operator as a lexically scoped subroutine.
The lexical scoping eases the syntactic burden on the parser, the semantic
burden on other unrelated parts of the code, and the cognitive burden on
the programmer.
The operator subroutine's name is always operator:whatever_symbols_we_want. In this case, that's operator:∑, but it can be any sequence of Unicode characters, including alphanumerics:
my sub operator:*#@& is prec(\&operator:\) (STR $x) {
return "darn $x";
}
my sub operator:† is prec(\&CORE::kill) (*@tIHoH) {
kill(9, @tIHoH) == @tIHoH or die "batlhHa'";
return "Qapla!";
}
my sub operator:EQ is prec(\&operator:eq) ($a, $b) {
return $a eq $b # stringishly equal strings
|| $a == $b != NaN; # numerically equal numbers
}
# and then:
warn *#@& "QeH!" unless E<dagger> $puq EQ "Qapla!";
Did you notice that cunning $a == $b != NaN test in operator:EQ?
This lovely Perl 6 idiom solves the problem of numerical comparisons
between non-numeric strings.
In Perl 5, a comparison like:
$a = "a string";
$b = "another string";
print "huh?" if $a == $b;
will unexpectedly succeed (and silently too, if you run without
warnings), because the non-numeric values of both the scalars are converted to zero in
the numeric context of the ==.
But in Perl 6, non-numeric strings numerify to NaN. So, using Perl
6's multiway comparison feature, we can add an
extra != NaN to the equality test to ensure that we compared genuine
numbers.
Meanwhile, we also have to specify a precedence for each new operator we define. We do that with the is prec trait of the subroutine. The precedence is specified in terms of the precedence of some existing operator; in this case, in terms of Perl's built-in unary +:
my sub operator:∑ is prec( \&operator:+($) )To do this, we give the is prec trait a reference to teh existing operator. Note that, because there are two overloaded forms of operator:+ (unary and binary) of different precedences, to get the reference to the correct one we need to specify its complete signature (its name and parameter types) as part of the enreferencing operation. The ability to take references to signatures is a standard feature in Perl 6, since ordinary subroutines can also be overloaded, and may need the same kind of disambiguation when enreferenced.
If the operator had been binary, we might also have had to specify its associativity (left, right, or non), using the is assoc trait.
Note too that we specified the parameter of operator:∑ with a flattening asterisk, since we want @list to slurp up any series of values passed to it, rather than being restricted to accepting only actual array variables as arguments.
The implementation of operator:∑ is very simple: we just apply the built-in reduce function to the list, reducing each successive pair of elements by adding them.
Note that we used a higher-order function to specify the addition operation. Larry has decided that the syntax for higher-order functions requires that implicit parameters be specified with a $^ sigil (or @^ or %^, as appropriate) and that the whole expression be enclosed in braces.
So now we have a ∑ operator:
$result = ∑ $wins, $losses, $ties;but it doesn't yet provide a way to filter its values. Normally, that would present a difficulty with an operator like ∑, whose *@list argument will gobble up every argument we give it, leaving no way -- except convention -- to distinguish the filter from the data.
But Perl 6 allows any subroutine -- not just built-ins like print -- to take one or more "adverbs" in addition to its normal arguments. This provides a second channel by which to transmit information to a subroutine. Typically that information will be used to modify the behaviour of the subroutine (hence the name "adverb"). And that's exactly what we need in order to pass a filter to ∑.
A subroutine's adverbs are specified as part of its normal parameter list, but separated from its regular parameters by a colon:
my sub operator:∑ is prec(\&operator:+($)) ( *@list : $filter //= undef) {...
This specifies that operator:∑ can take a single scalar adverb,
which is bound to the parameter $filter. When there is no adverb
specified in the call, $filter is default-assigned the value undef.
We then modify the body of the subroutine to pre-filter the list through a grep, but only if a filter is provided:
reduce {$^a+$^b} ($filter ?? grep &$filter, @list :: @list);
}
The ?? and :: are the new way we write the old ?:
ternary operator in Perl 6. Larry had to change the spelling because he
needed the single colon for marking adverbs. But it's a change for the
better anyway --it was rather odd that all the other short-circuiting logical
operators (&& and || and //) used doubled
symbols, but the conditional operator didn't. Well, now it does. The doubling
also helps it stand out better in code, in part because it forces you to
put space around the :: so that it's not confused with a package
name separator.
You might also be wondering about the ambiguity of ??, which in Perl 5 already represents an empty regular expression with question-mark delimiters. Fortunately, Perl 6 won't be riddled with the nasty ?...? regex construct, so there's no ambiguity at all.
Adverbial semantics can be defined for any Perl 6 subroutine. For example:
sub mean (*@values : $type //= 'arithmetic') {
given ($type) {
when 'arithmetic': { return sum(@values) / @values; }
when 'geometric': { return product(@values) ** (1/@values) }
when 'harmonic': { return @values / sum( @values ^** -1 ) }
when 'quadratic': { return (sum(@values ^** 2) / @values) ** 0.5 }
}
croak "Unknown type of mean: '$type'";
}
Adverbs will probably become widely used for passing this type of "out-of-band"
behavioural modifier to subroutines that take an unspecified number of
data arguments.


