Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 3
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

RFC 283: tr/// in array context should return a histogram

Yes, but ...

While it's true that I put that item into the Todo list ages ago, I think that histograms should probably have their own interface, since the histogram should probably be returned as a complete hash in scalar context, but we can't guess that they'll want a histogram for an ordinary scalar tr///. On the other hand, it could just be a /h modifier. But we've already done violence to tr/// to make it do character counting without transliterating, so maybe this isn't so far fetched.

One problem with this RFC is that it does the histogram over the input rather than the output string. The original Todo entry did not specify this, but it was what I had intended. But it's more useful to do it on the resulting characters because then you can use the tr/// itself to categorize characters into, say, vowels and consonants, and then count the resulting V's and C's.

The Perl CD BookshelfThe Perl CD Bookshelf
May 2001
0-596-00164-9, Order Number: 1649
672 pages, $79.95, Features CD-ROM

On the other hand, I'm thinking that the tr/// interface is really rather lousy, and getting lousier every day. The whole tr/// interface is kind of sucky for any sort of dynamically generated data. But even without dynamic data, there are serious problems. It was bad enough when the character set was just ASCII. The basic problem is that the notation is inside out from what it should be, in the sense that it doesn't actually show which characters correspond, so you have to count characters. We made some progress on that in Perl 5 when, instead of:

    tr/abcdefghijklmnopqrstuvwxyz/VCCCVCCCVCCCCCVCCCCCVCCCCC/

we allowed you to say:

    tr[abcdefghijklmnopqrstuvwxyz]
      [VCCCVCCCVCCCCCVCCCCCVCCCCC]

There are also shenanigans you can play if you know that duplicates on the left side prefer the first mention to subsequent mentions:

    tr/aeioua-z/VVVVVC/

But you're still working against the notation. We need a more explicit way to put character classes into correspondence.

More problems show up when we extend the character set beyond ASCII. The use of tr/// for case translations has long been semi-deprecated, because a range like tr/a-z/A-Z/ leaves out characters with diacritics. And now with Unicode, the whole notion of what is a character is becoming more susceptible to interpretation, and the tr/// interface doesn't tell Perl whether to treat character modifiers as part of the base character. For some of the double-wide characters it's even hard to just look at the character and tell if it's one character or two. Counted character lists are about as modern as hollerith strings in Fortran.

So I suspect the tr/// syntax will be relegated to being just one quote-like interface to the actual transliteration module, whose main interface will be specified in terms of translation pairs, the left side of which will give a pattern to match (typically a character class), and the right side will say what to translation anything matching to. Think of it as a series of coordinated parallel s/// operations. Syntax is still open for negotiation till apocalypse 5.

But there can certainly be a histogram option in there somewhere.

RFC 084: Replace => (stringifying comma) with => (pair constructor)

I like the basic idea of pairs because it generalizes to more than just hash values. Named parameters will almost certainly be implemented using pairs as well.

I do have some quibbles with the RFC. The proposed key and value built-ins should simply be lvalue methods on pair objects. And if we use pair objects to implement entries in hashes, the key must be immutable, or there must be some way of re-hashing the key if it changes.

The stuff about using pairs for mumble-but-false is bogus. We'll use properties for that sort of chicanery. (And multiway comparisons won't rely on such chicanery in any event. See above.)

RFC 081: Lazily evaluated list generation functions

Sorry, you can't have the colon--at least, not without sharing it. Colon will be a kind of ``supercomma'' that supplies an adverbial list to some previous operator, which in this case would be the prior colon or dotdot.

(We can't quite implement ?: as a : modifier on ?, because the precedence would be screwey, unless we limit : to a single argument, which would preclude its being used to disambiguate indirect objects. More on that later.)

The RFCs proposal concerning attributes::get(@a) stuff is superseded by value properties. So, @a.method() should just pull out the variable's properties directly, if the variable is of a type that supports the methods in question. A lazy list object should certainly have such methods.

Assignment of a lazy list to a tied array is a problem unless the tie implementation handles laziness. By default a tied array is likely to enforce immediate list evaluation. Immediate list evaluation doesn't work on infinite lists. That means it's gonna fill up your disk drive if you try to say something like:

    @my_tied_file = 1..Inf;

Laziness should be possible, but not necessarily the norm. It's all very well to delay the evaluation of ``pure'' functions in the realm of math, since presumably you get the same result no matter when you evaluate. But a lot of Perl programming is done with real world data that changes over time. Saying somefunc($a .. $b) can get terribly fouled up if $b can change, and the lazy function still refers to the variable rather than its instantaneous value. On the other hand, there is overhead in taking snapshots of the current state.

On the gripping hand, the lazy list object is the snapshot of the values, that's not a problem in this case. Forget I mentioned it.

The tricky thing about lazy lists is not the lazy lists themselves, but how they interact with the rest of the language. For instance, what happens if you say:

    @lazy = 1..Inf;
    @lazy[5] = 42;

Is @lazy still lazy after it is modified? Do we remember the @lazy[5] is an ``exception'', and continue to generate the rest of the values by the original rule? What if @lazy is going to be generated by a recursive function? Does it matter whether we've already generated @lazy[5]?

And how do we explain this simply to people so that they can understand? We will have to be very clear about the distinction between the abstraction and the concrete value. I'm of the opinion that a lazy list is a definition of the default values of an array, and that the actual values of the array override any default values. Assigning to a previously memoized element overrides the memoized value.

It would help the optimizer to have a way to declare ``pure'' array definitions that can't be overridden.

Also consider this:

    @array = (1..100, 100..10000:100);

A single flat array can have multiple lazy lists as part of it's default definition. We'll have to keep track of that, which could get especially tricky if the definitions start overlapping via slice definitions.

In practice, people will treat the default values as real values. If you pass a lazy list into a function as an array argument, the function will probably not know or care whether the values it's getting from the array are being generated on the fly or were there in the first place.

I can think of other cans of worms this opens, and I'm quite certain I'm too stupid to think of them all. Nevertheless, my gut feeling is that we can make things work more like people expect rather than less. And I was always a little bit jealous that REXX could have arrays with default values. :-)

Pages: 1, 2, 3, 4, 5, 6

Next Pagearrow