Apocalypse 3
Operators
by Larry WallOctober 02, 2001
Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.
To me, one of the most agonizing aspects of language design is coming up with a useful system of operators. To other language designers, this may seem like a silly thing to agonize over. After all, you can view all operators as mere syntactic sugar -- operators are just funny looking function calls. Some languages make a feature of leveling all function calls into one syntax. As a result, the so-called functional languages tend to wear out your parenthesis keys, while OO languages tend to wear out your dot key.
But while your computer really likes it when everything looks the same, most people don't think like computers. People prefer different things to look different. They also prefer to have shortcuts for common tasks. (Even the mathematicians don't go for complete orthogonality. Many of the shortcuts we typically use for operators were, in fact, invented by mathematicians in the first place.)
So let me enumerate some of the principles that I weigh against each other when designing a system of operators.
- Different classes of operators should look different. That's why filetest operators look different from string or numeric operators.
- Similar classes of operators should look similar. That's why the filetest operators look like each other.
-
Common operations should be ``Huffman coded.'' That is, frequently used
operators should be shorter than infrequently used ones. For how
often it's used, the
scalaroperator of Perl 5 is too long, in my estimation. -
Preserving your culture is important. So Perl borrowed many of its
operators from other familiar languages. For instance, we used Fortran's
**operator for exponentiation. As we go on to Perl 6, most of the operators will be ``borrowed'' directly from Perl 5. - Breaking out of your culture is also important, because that is how we understand other cultures. As an explicitly multicultural language, Perl has generally done OK in this area, though we can always do better. Examples of cross-cultural exchange among computer cultures include XML and Unicode. (Not surprisingly, these features also enable better cross-cultural exchange among human cultures -- we sincerely hope.)
- Sometimes operators should respond to their context. Perl has many operators that do different but related things in scalar versus list context.
-
Sometimes operators should propagate context to their arguments. The
xoperator currently does this for its left argument, while the short-circuit operators do this for their right argument. - Sometimes operators should force context on their arguments. Historically, the scalar mathematical operators of Perl have forced scalar context on their arguments. One of the RFCs discussed below proposes to revise this.
- Sometimes operators should respond polymorphically to the types of their arguments. Method calls and overloading work this way.
- Operator precedence should be designed to minimize the need for parentheses. You can think of the precedence of operators as a partial ordering of the operators such that it minimizes the number of ``unnatural'' pairings that require parentheses in typical code.
- Operator precedence should be as simple as possible. Perl's precedence table currently has 24 levels in it. This might or might not be too many. We could probably reduce it to about 18 levels, if we abandon strict C compatibility of the C-like operators.
- People don't actually want to think about precedence much, so precedence should be designed to match expectations. Unfortunately, the expectations of someone who knows the precedence table won't match the expectations of someone who doesn't. And Perl has always catered to the expectations of C programmers, at least up till now. There's not much one can do up front about differing cultural expectations.
It would be easy to drive any one of these principles into the ground, at the expense of other principles. In fact, various languages have done precisely that.
My overriding design principle has always been that the complexity of the solution space should map well onto the complexity of the problem space. Simplification good! Oversimplification bad! Placing artificial constraints on the solution space produces an impedence mismatch with the problem space, with the result that using a language that is artificially simple induces artificial complexity in all solutions written in that language.
One artificial constraint that all computer languages must deal with is
the number of symbols available on the keyboard, corresponding roughly
to the number of symbols in ASCII. Most computer languages have
compensated by defining systems of operators that include digraphs,
trigraphs, and worse. This works pretty well, up to a point. But
it means that certain common unary operators cannot be used as the
end of a digraph operator. Early versions of C had assignment operators
in the wrong order. For instance, there used to be a =- operator.
Nowadays that's spelled -=, to avoid conflict with unary minus.
By the same token (no pun intended), you can't easily define a
unary = operator without requiring a space before it most of the time,
since so many binary operators end with the = character.
Perl gets around some of these problems by keeping track of whether
it is expecting an operator or a term. As it happens, a unary operator
is simply one that occurs when Perl is expecting a term. So Perl could
keep track of a unary = operator, even if the human programmer might
be confused. So I'd place a unary = operator in the category
of ``OK, but don't use it for anything that will cause widespread
confusion.'' Mind you, I'm not proposing a specific use for a unary
= at this point. I'm just telling you how I think. If we ever do
get a unary = operator, we will hopefully have taken these issues
into account.
While we can disambiguate operators based on whether an operator or a
term is expected, this implies some syntactic constraints as well. For
instance, you can't use the same symbol for both a postfix operator and
a binary operator. So you'll never see a binary ++ operator in
Perl, because Perl wouldn't know whether to expect a term or operator
after that. It also implies that we can't use the ``juxtaposition''
operator. That is, you can't just put two terms next to each other,
and expect something to happen (such as string concatenation, as in
awk). What if the second term started with something looked like an
operator? It would be misconstrued as a binary operator.
Well, enough of these vague generalities. On to the vague specifics.
The RFCs for this apocalypse are (as usual) all over the map, but don't cover the map. I'll talk first about what the RFCs do cover, and then about what they don't. Here are the RFCs that happened to get themselves classified into chapter 3:
RFC PSA Title
--- --- -----
024 rr Data types: Semi-finite (lazy) lists
025 dba Operators: Multiway comparisons
039 rr Perl should have a print operator
045 bbb C<||> and C<&&> should propagate result context to both sides
054 cdr Operators: Polymorphic comparisons
081 abc Lazily evaluated list generation functions
082 abc Arrays: Apply operators element-wise in a list context
084 abb Replace => (stringifying comma) with => (pair constructor)
104 ccr Backtracking
138 rr Eliminate =~ operator.
143 dcr Case ignoring eq and cmp operators
170 ccr Generalize =~ to a special "apply-to" assignment operator
283 ccc C<tr///> in array context should return a histogram
285 acb Lazy Input / Context-sensitive Input
290 bbc Better english names for -X
320 ccc Allow grouping of -X file tests and add C<filetest> builtin
Note that you can click on the following RFC titles to view a copy of the RFC in question. The discussion sometimes assumes that you've read the RFC.






