Return-Path: owner-perl5-porters@perl.org
Delivery-Date: Thu Jul 29 23:49:24 1999
Received: from defender.perl.org (root@defender.perl.org [209.45.167.243])
	by jhereg.perl.com (8.9.0/8.9.0) with ESMTP id XAA10017
	for <tchrist@perl.com>; Thu, 29 Jul 1999 23:49:21 -0600
Received: (from majordomo@localhost)
	by defender.perl.org (8.9.3/8.9.3/Debian/GNU) id BAA26834
	for perl5-porters-outgoing; Fri, 30 Jul 1999 01:42:53 -0400
Received: from elegant.com (firewall.elegant.com [209.20.46.2])
	by defender.perl.org (8.9.3/8.9.3/Debian/GNU) with SMTP id BAA26827
	for <perl5-porters@perl.org>; Fri, 30 Jul 1999 01:42:47 -0400
Received: by elegant.com (Smail3.1.28.1 #3)
	id m11A5OF-0003mgC; Fri, 30 Jul 99 01:39 EDT
Message-ID: <m11A5OF-0003mgC@elegant.com>
From: jmm@elegant.com (John Macdonald)
Subject: Why ?? should live.
To: perl5-porters@perl.org
Date: Fri, 30 Jul 1999 01:39:31 -0400 (EDT)
Cc: jmm@elegant.com
In-Reply-To: <m1145qb-0003lpC@elegant.com> from "John Macdonald" at Jul 13, 99 12:56:00 pm
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-perl5-porters@perl.org
Precedence: bulk
X-Loop: Perl5-Porters

John Macdonald wrote :
|| Moore, Paul wrote :
|| || I think that the *first* thing needed is a "tutorial" style document,
|| || explaining what ?? is intended for, how to think about it, what the
|| || subtleties are ("if ($a ?? $b) tests the first defined value of $a and
|| || $b. There is no reason why the first defined value need be a true value,
|| || which is what if tests for"), and so on. A FMTYEWTK document, if you
|| || like, but also with a strong element of tutorial.
|| || 
|| || Once we have this, and everyone understands the issues, then the
|| || arguments over merit may be a little less heated.
|| || 
|| || (The next step is "put up or shut up", someone to submit a working
|| || patch, but let's get the understanding done first.)
|| || 
|| || Funny, this sounds a bit like what Sarathy suggested a while ago...
|| 
|| Funny thing, I had decided last night to do just that ... :-)
|| 
|| I'm really supposed to be wrapping up the last details of the wolf
|| book right now.  Fortunately, my part of the wolf is very close to
|| complete.
|| 
|| So, while it will be a few days before I can write up an overview of
|| the issues regarding ?? (and variants like first_defined), I'll
|| certainly get it done in the next week.

Well it has pver 2 weeks, but here it is finally.  (My real job
was unreasonable enough to demand some of my time.)  Sorry for
the delay.

The ?? Operator
---------------

This is written for p5p, as part of the discussion of the proposed ??
operator.  It contains the following sub-sections:

  Section 1. summary for p5p of the issues addressed by ?? and
	alternatives, and my rationale/recommendations for which
	should be added to the language.

  Section 2. ?? operator documentation
	This is written along the lines of documentation for an
	existing operator, worded as if the ?? operator had already
	been accepted as part of the language.  On the other hand,
	this description does not assume that any other change has
	been made to perl; in particular, that none of the other
	possible changes discussed later in section 1 has been
	added.  If others were to also be added, some changes in this
	section would be useful.  Further, it is not yet in POD
	format -- that can easily be changed if the operator is
	actually being added to the language.

	I've assumed that -w would warn about using an array variable
	as an argument to ??.  There is a good argument for making it
	an error instead, and requiring an explicit scalar() if that
	is desired -- that would reserve the possibility of defining
	a meaning for array arguments to "??" at some later time, if
	someone ever happens to think of a useful meaning.

Section 1. rationale for ?? and alternative proposals
-----------------------------------------------------

Proposal 1:   Add a pair of operators, "??" and the corresponding
-----------   "??=".  "??" would be a short-circuit operator that
	      returns its left operand if it is defined, otherwise
	      returning its right operator.

There are at least two situations where these would be valuable.

First, to initializing a variable.  Select the first from a number of
choices, some of which might not have been set (e.g. $opt_X might
depend upon whether the caller provided a -X command line argument,
$ENV{X_control} might have been set by the user or by a recursive
invokation from this same program, an optional argument might have
been omitted on a function call).

Second, for implementing a cache.  A particular instance of a
function might have been previously computed.  (The "??=" does a
better job of implementing a cache than the Orcish "||=", since it
will cache defined false values too.  It is not quite perfect for
this purpose, it will recompute a value if the previously computed
result was "undef".  See below.)

Often, perl data structures can be designed so that they have a
convenient meaning when used in a boolean context.  However,
initialization and caching considerations are orthogonal to the
design of the data.  You still need to initialize data, you still can
benefit from caching data, even when it has been designed to have a
useful boolean meaning for normal use within the program.  It is
generally a poor trade-off to make initialization easier at the cost
of making subsequent use harder.

When used for either initialization or caching, there is most often
going to be a single instance of the "??" operator rather than a
cascaded series of "??" operators.  For caching, it will always be
one, for intialization there will sometimes be a cascade but more
often there will just be a single default provided as an alternative
if the earlier execition has not already provided an explicit
setting.  I'll return to this point in the discussion of first and
first_defined below.

For caching, what would be perfect would be to have:

    $cache{$key} ??= computation($key);

mean:

    exists $cache{$key} or $cache{$key} = computation($key);

That would permit the result of computation($key) to be cached even
if it returned "undef".  The exists operator is defined only for hash
lookups, however, and that would significantly reduce the
applicability of the operator, so, I'm happy to accept the not quite
perfect use of "defined" instead of "exists" for the definition of
the operator, and to require the use of the second alternative
explicitly for such cases where it is of value.

The short-circuit aspect of the definition is essential for caching,
and it is valuable for initialization since it permits expensive
operatiosn (and operations with side-effects) to be bypassed when
possible.

Proposal 2:   first, first_defined
-----------

Proposals have also been made for new function/keyword operators:

    first_defined LIST
    first { EXPR } LIST

(There are two alternatives for each of these, depending upon whether
the elements of LIST are subject to short-circuit evaluation.  To
have either of them provide short-circuit evaluation would require
changes to the core of perl from the present.  It is easy now to
write subs that accomplish these purposes without providing
short-circuit evaluation -- however, as mentioned above, the
short-circuit evaluation is often strongly desired or critical for
the use of these operators.  Since, the variants that omit
short-circuiting require no core changes as well as being often
inadequate, I'll ignore them for the rest of this discussion.)

"first_defined" would be the same as a cascade of "??" -- operators,
it would return the first element of LIST that was defined.

"first" would return the first element of LIST for which EXPR
returned true.  (EXPR would be invoked with $_ set to each element in
turn.)  Thus:

    first { defined $_ } LIST
    first_defined LIST

would have the same meaning.  "first" can be applied to a wider range
of purposes - it is a short-circuit scalar form of grep, while
"first_defined" is slightly less verbose for its one purpose.

Rather than define one of these as part of the language (to get the
short-circuit behaviour), a more general solution that has been
suggested has been to add a short-circuit capability to the
language.  Then, both first and first_defined could be written as
simple subroutines.  The "lazy" attribute has been proposed as one
alternative.  At various times in the past there have been other
proposals for iterators, they fit well into short-circuit lists.

Recommendations:
----------------

1.  The "??" and "??=" operators should be added to the language.
    They provide a light-weight convenience for the common simple
    case.

2.  The first and first_defined operators should not be added by
    themselves (as a way to provide short-circuit operation for
    either of them).  They are more powerful than "??", but in
    the majority of cases the difference is unimportant and "??",
    possibly cascaded, is exactly what is needed.

3.  A general purpose short-circuit mechanism, such as the lazy
    attribute, would also be a valuable addition for a number of
    purposes, but even if such an addition is being considered, "??"
    would still be worth adding to the language.

Section 2. ?? and ??= operator specifications
---------------------------------------------

Binary "??" performs a short-circuit operation to select a defined
value (if any).  If the left operand is defined it is returned and
the right operand is not even evaluated.  If the left operand is not
defined, the right operand is evaluated and returned.  Scalar context
is used when either operand is evaluated.  If the ?? operator is used
in lvalue context, both operands must be lvalues.

The "??=" operator is the normal assignment variant for the "??"
operator.  Its left operand must be an lvalue, its right operand need
not be.  If the left operand initially is undefined it will be
assigned the value of the right operand; otherwise it is left
unchanged.

Applying "defined" to an array variable is not especially meaningful
-- it is always true -- so a warning is given if -w is in effect and
an array variable is provided as an operand of "??" or "??=".

These operators are often useful for initializing a variable from a
collection of possibly defined alternatives.  The variable might be
an argument to a function:

    # The number of ordered permutations for k elements chosen
    #    out of a set of n elements.
    sub choose {
	my $n = shift;
	my $k = shift ?? $n;

	my $res = 1;
	$res += $n-- while $i--;
	return $res;
    }

If choose is invoked with only one argument, computing the number of
ways of ordering that many elements is a useful default.  But it
would be perfectly reasonable for a program to compute 0 as an
explicit second argument -- so, if 0 is provided it should be used in
place of the default.  Another possibility is initializing a variable
for the mainline of a program:

    Getopts("d:");
    $debug = $opt_d ?? $ENV{PROG_DEBUG} ?? 0;

If the command were invoked as:

    export PROG_DEBUG=1

    # ...

    prog -d0

the $debug variable would be set to 0, using the explicit command
line argument to override the non-zero setting of the PROG_DEBUG
variable.  Compare that to using the "||" operator:

    $debug = $opt_d || $ENV{PROG_DEBUG} || 0;

which would accept a "true" value of PROG_DEBUG even over an explicit
setting of the command line option.

In addition to initialization, this operator is useful for
implementing a cache for an expensive operation.  Joseph Hall shows:

    $m{$a} ||= -M $a

in a sort as an example of what he calls "The Orcish Maneuver"
(making a pun on OR-Cache).  However, using "||=" only saves the
computation of the value if the result previously computed was not
one of perl's "false" values.  Using "??=" instead will skip the
computation unless the computed result is "undef".  This alternative
can be called "The Quickish Maneuver" because it provides a speedup
using the QUEstion-question operator to implement a CACHE.

The same effect as "??" can be achieved using:

    $debug = $opt_d;
    $debug = $ENV{PROG_DEBUG}  unless defined $debug;
    $debug = 0                 unless defined $debug;

or:

    defined        $debug = $opt_d 
	or defined $debug = $ENV{PROG_DEBUG}
	or         $debug = 0;

but using "??" is more concise.

The short-circuit aspect of "??" can be used to avoid expensive
computations.   This can be especially true of the "??=" assignment
operator form, when there is a complicated expression involved to
find the variable:

    $foo->{table}{$key} ??= Value::new($value);

rather than:

    my $ft = $foo->{table};
    defined $ft->{$key} or $ft->{$key} = Value::new($value);

(The "??=" form also has the advantage that it works correctly even
if $foo->{table} has not yet been defined.)

It is instructive to compare the meaning of the operators "??", "||",
and "&&" when used in a cascaded series.

    $x = $a || $b || $c || $d;	# first "true"    value in the "list"
    $x = $a && $b && $c && $d;	# first "false"   value in the "list"
    $x = $a ?? $b ?? $c ?? $d;	# first "defined" value in the "list"

In each case, it is the *value* that satisfies the requirement that
is returned (or the final value if none of them satisfies the
requirement).  This aspect of returning the value instead of a
boolean true or false is a significant advantage of Perl over
languages like C, awk, and shells especially for "||".

Using the "??" operator in a boolean context DOES NOT answer the
question "Are any of these values defined?" unless you explicitly
test its result with the defined operator:

    if (defined $a ?? $b ?? $c ?? $d )

Without such an explicit "defined" test on the result:

    if ($a ?? $b ?? $c ?? $d )

is answering the question "Is the first defined value TRUE?", so it
may provide a boolean false result if the first defined value is one
of perl's false values.

While this may seem a bit surprising, you'll find that the answer to
"Are any of these values defined?" is usually inadequate.  Sometimes
you'll care WHICH one it was that was defined, but you'll almost
always want to know the value that one is defined to.  Conveniently,
the "??" operator does give you such a value ready for you to use.

You may find it useful to consider cascaded short-circuit operators:

    if ($result = $a OP $b OP $c OP $d )

to be asking:

    OP      boolean meaning           $result
    --      ---------------           -------
    &&      "Are we all agreed?"      "YES or why not"
    ||      "Does anyone think so?"   "NO or why so"
    ??      "What's your opinion?"    "The first real opinion."

When you use the "??" operator in a boolean contex, you may find it
useful to think of "undef" as "I don't care.".

This is no low precedence named operator that corresponds to "??" (in
the way that "and" and "or" correspond to "&&" and "||").

-- 
objects:                                    | John Macdonald
    Think of them as data with an attitude. |   jmm@elegant.com


