Return-Path: owner-perl5-porters@perl.org Delivery-Date: Thu Jul 29 23:49:24 1999 Received: from defender.perl.org (root@defender.perl.org [209.45.167.243]) by jhereg.perl.com (8.9.0/8.9.0) with ESMTP id XAA10017 for ; Thu, 29 Jul 1999 23:49:21 -0600 Received: (from majordomo@localhost) by defender.perl.org (8.9.3/8.9.3/Debian/GNU) id BAA26834 for perl5-porters-outgoing; Fri, 30 Jul 1999 01:42:53 -0400 Received: from elegant.com (firewall.elegant.com [209.20.46.2]) by defender.perl.org (8.9.3/8.9.3/Debian/GNU) with SMTP id BAA26827 for ; Fri, 30 Jul 1999 01:42:47 -0400 Received: by elegant.com (Smail3.1.28.1 #3) id m11A5OF-0003mgC; Fri, 30 Jul 99 01:39 EDT Message-ID: From: jmm@elegant.com (John Macdonald) Subject: Why ?? should live. To: perl5-porters@perl.org Date: Fri, 30 Jul 1999 01:39:31 -0400 (EDT) Cc: jmm@elegant.com In-Reply-To: from "John Macdonald" at Jul 13, 99 12:56:00 pm X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-perl5-porters@perl.org Precedence: bulk X-Loop: Perl5-Porters John Macdonald wrote : || Moore, Paul wrote : || || I think that the *first* thing needed is a "tutorial" style document, || || explaining what ?? is intended for, how to think about it, what the || || subtleties are ("if ($a ?? $b) tests the first defined value of $a and || || $b. There is no reason why the first defined value need be a true value, || || which is what if tests for"), and so on. A FMTYEWTK document, if you || || like, but also with a strong element of tutorial. || || || || Once we have this, and everyone understands the issues, then the || || arguments over merit may be a little less heated. || || || || (The next step is "put up or shut up", someone to submit a working || || patch, but let's get the understanding done first.) || || || || Funny, this sounds a bit like what Sarathy suggested a while ago... || || Funny thing, I had decided last night to do just that ... :-) || || I'm really supposed to be wrapping up the last details of the wolf || book right now. Fortunately, my part of the wolf is very close to || complete. || || So, while it will be a few days before I can write up an overview of || the issues regarding ?? (and variants like first_defined), I'll || certainly get it done in the next week. Well it has pver 2 weeks, but here it is finally. (My real job was unreasonable enough to demand some of my time.) Sorry for the delay. The ?? Operator --------------- This is written for p5p, as part of the discussion of the proposed ?? operator. It contains the following sub-sections: Section 1. summary for p5p of the issues addressed by ?? and alternatives, and my rationale/recommendations for which should be added to the language. Section 2. ?? operator documentation This is written along the lines of documentation for an existing operator, worded as if the ?? operator had already been accepted as part of the language. On the other hand, this description does not assume that any other change has been made to perl; in particular, that none of the other possible changes discussed later in section 1 has been added. If others were to also be added, some changes in this section would be useful. Further, it is not yet in POD format -- that can easily be changed if the operator is actually being added to the language. I've assumed that -w would warn about using an array variable as an argument to ??. There is a good argument for making it an error instead, and requiring an explicit scalar() if that is desired -- that would reserve the possibility of defining a meaning for array arguments to "??" at some later time, if someone ever happens to think of a useful meaning. Section 1. rationale for ?? and alternative proposals ----------------------------------------------------- Proposal 1: Add a pair of operators, "??" and the corresponding ----------- "??=". "??" would be a short-circuit operator that returns its left operand if it is defined, otherwise returning its right operator. There are at least two situations where these would be valuable. First, to initializing a variable. Select the first from a number of choices, some of which might not have been set (e.g. $opt_X might depend upon whether the caller provided a -X command line argument, $ENV{X_control} might have been set by the user or by a recursive invokation from this same program, an optional argument might have been omitted on a function call). Second, for implementing a cache. A particular instance of a function might have been previously computed. (The "??=" does a better job of implementing a cache than the Orcish "||=", since it will cache defined false values too. It is not quite perfect for this purpose, it will recompute a value if the previously computed result was "undef". See below.) Often, perl data structures can be designed so that they have a convenient meaning when used in a boolean context. However, initialization and caching considerations are orthogonal to the design of the data. You still need to initialize data, you still can benefit from caching data, even when it has been designed to have a useful boolean meaning for normal use within the program. It is generally a poor trade-off to make initialization easier at the cost of making subsequent use harder. When used for either initialization or caching, there is most often going to be a single instance of the "??" operator rather than a cascaded series of "??" operators. For caching, it will always be one, for intialization there will sometimes be a cascade but more often there will just be a single default provided as an alternative if the earlier execition has not already provided an explicit setting. I'll return to this point in the discussion of first and first_defined below. For caching, what would be perfect would be to have: $cache{$key} ??= computation($key); mean: exists $cache{$key} or $cache{$key} = computation($key); That would permit the result of computation($key) to be cached even if it returned "undef". The exists operator is defined only for hash lookups, however, and that would significantly reduce the applicability of the operator, so, I'm happy to accept the not quite perfect use of "defined" instead of "exists" for the definition of the operator, and to require the use of the second alternative explicitly for such cases where it is of value. The short-circuit aspect of the definition is essential for caching, and it is valuable for initialization since it permits expensive operatiosn (and operations with side-effects) to be bypassed when possible. Proposal 2: first, first_defined ----------- Proposals have also been made for new function/keyword operators: first_defined LIST first { EXPR } LIST (There are two alternatives for each of these, depending upon whether the elements of LIST are subject to short-circuit evaluation. To have either of them provide short-circuit evaluation would require changes to the core of perl from the present. It is easy now to write subs that accomplish these purposes without providing short-circuit evaluation -- however, as mentioned above, the short-circuit evaluation is often strongly desired or critical for the use of these operators. Since, the variants that omit short-circuiting require no core changes as well as being often inadequate, I'll ignore them for the rest of this discussion.) "first_defined" would be the same as a cascade of "??" -- operators, it would return the first element of LIST that was defined. "first" would return the first element of LIST for which EXPR returned true. (EXPR would be invoked with $_ set to each element in turn.) Thus: first { defined $_ } LIST first_defined LIST would have the same meaning. "first" can be applied to a wider range of purposes - it is a short-circuit scalar form of grep, while "first_defined" is slightly less verbose for its one purpose. Rather than define one of these as part of the language (to get the short-circuit behaviour), a more general solution that has been suggested has been to add a short-circuit capability to the language. Then, both first and first_defined could be written as simple subroutines. The "lazy" attribute has been proposed as one alternative. At various times in the past there have been other proposals for iterators, they fit well into short-circuit lists. Recommendations: ---------------- 1. The "??" and "??=" operators should be added to the language. They provide a light-weight convenience for the common simple case. 2. The first and first_defined operators should not be added by themselves (as a way to provide short-circuit operation for either of them). They are more powerful than "??", but in the majority of cases the difference is unimportant and "??", possibly cascaded, is exactly what is needed. 3. A general purpose short-circuit mechanism, such as the lazy attribute, would also be a valuable addition for a number of purposes, but even if such an addition is being considered, "??" would still be worth adding to the language. Section 2. ?? and ??= operator specifications --------------------------------------------- Binary "??" performs a short-circuit operation to select a defined value (if any). If the left operand is defined it is returned and the right operand is not even evaluated. If the left operand is not defined, the right operand is evaluated and returned. Scalar context is used when either operand is evaluated. If the ?? operator is used in lvalue context, both operands must be lvalues. The "??=" operator is the normal assignment variant for the "??" operator. Its left operand must be an lvalue, its right operand need not be. If the left operand initially is undefined it will be assigned the value of the right operand; otherwise it is left unchanged. Applying "defined" to an array variable is not especially meaningful -- it is always true -- so a warning is given if -w is in effect and an array variable is provided as an operand of "??" or "??=". These operators are often useful for initializing a variable from a collection of possibly defined alternatives. The variable might be an argument to a function: # The number of ordered permutations for k elements chosen # out of a set of n elements. sub choose { my $n = shift; my $k = shift ?? $n; my $res = 1; $res += $n-- while $i--; return $res; } If choose is invoked with only one argument, computing the number of ways of ordering that many elements is a useful default. But it would be perfectly reasonable for a program to compute 0 as an explicit second argument -- so, if 0 is provided it should be used in place of the default. Another possibility is initializing a variable for the mainline of a program: Getopts("d:"); $debug = $opt_d ?? $ENV{PROG_DEBUG} ?? 0; If the command were invoked as: export PROG_DEBUG=1 # ... prog -d0 the $debug variable would be set to 0, using the explicit command line argument to override the non-zero setting of the PROG_DEBUG variable. Compare that to using the "||" operator: $debug = $opt_d || $ENV{PROG_DEBUG} || 0; which would accept a "true" value of PROG_DEBUG even over an explicit setting of the command line option. In addition to initialization, this operator is useful for implementing a cache for an expensive operation. Joseph Hall shows: $m{$a} ||= -M $a in a sort as an example of what he calls "The Orcish Maneuver" (making a pun on OR-Cache). However, using "||=" only saves the computation of the value if the result previously computed was not one of perl's "false" values. Using "??=" instead will skip the computation unless the computed result is "undef". This alternative can be called "The Quickish Maneuver" because it provides a speedup using the QUEstion-question operator to implement a CACHE. The same effect as "??" can be achieved using: $debug = $opt_d; $debug = $ENV{PROG_DEBUG} unless defined $debug; $debug = 0 unless defined $debug; or: defined $debug = $opt_d or defined $debug = $ENV{PROG_DEBUG} or $debug = 0; but using "??" is more concise. The short-circuit aspect of "??" can be used to avoid expensive computations. This can be especially true of the "??=" assignment operator form, when there is a complicated expression involved to find the variable: $foo->{table}{$key} ??= Value::new($value); rather than: my $ft = $foo->{table}; defined $ft->{$key} or $ft->{$key} = Value::new($value); (The "??=" form also has the advantage that it works correctly even if $foo->{table} has not yet been defined.) It is instructive to compare the meaning of the operators "??", "||", and "&&" when used in a cascaded series. $x = $a || $b || $c || $d; # first "true" value in the "list" $x = $a && $b && $c && $d; # first "false" value in the "list" $x = $a ?? $b ?? $c ?? $d; # first "defined" value in the "list" In each case, it is the *value* that satisfies the requirement that is returned (or the final value if none of them satisfies the requirement). This aspect of returning the value instead of a boolean true or false is a significant advantage of Perl over languages like C, awk, and shells especially for "||". Using the "??" operator in a boolean context DOES NOT answer the question "Are any of these values defined?" unless you explicitly test its result with the defined operator: if (defined $a ?? $b ?? $c ?? $d ) Without such an explicit "defined" test on the result: if ($a ?? $b ?? $c ?? $d ) is answering the question "Is the first defined value TRUE?", so it may provide a boolean false result if the first defined value is one of perl's false values. While this may seem a bit surprising, you'll find that the answer to "Are any of these values defined?" is usually inadequate. Sometimes you'll care WHICH one it was that was defined, but you'll almost always want to know the value that one is defined to. Conveniently, the "??" operator does give you such a value ready for you to use. You may find it useful to consider cascaded short-circuit operators: if ($result = $a OP $b OP $c OP $d ) to be asking: OP boolean meaning $result -- --------------- ------- && "Are we all agreed?" "YES or why not" || "Does anyone think so?" "NO or why so" ?? "What's your opinion?" "The first real opinion." When you use the "??" operator in a boolean contex, you may find it useful to think of "undef" as "I don't care.". This is no low precedence named operator that corresponds to "??" (in the way that "and" and "or" correspond to "&&" and "||"). -- objects: | John Macdonald Think of them as data with an attitude. | jmm@elegant.com