Sign In/My Account | View Cart  
advertisement


Listen Print

Apocalypse 5
by Larry Wall | Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

RFC 361: Simplifying split()

The RFC makes five suggestions. I'll consider them one by one.

The first argument to split is currently interpreted as a regexp, regardless of whether or not it actually is one. (Yes, split '.', $foo doesn't split on dot -- it's currently the same an split /./, $foo.) I suggest that split be changed to treat only regexps as regexps, and everything else as literals.

Fine, I think. If the first argument to split is untyped, it should parse correctly, either evaluating a quoted string immediately or deferring interpretation of a regex. One could even do something like split on the first delimiter matched by another pattern:

    split _/(,|;)/;

That would split on either all commas or all semicolons, depending on which it found first in the string. The _ forces the regex to return a string, which is whatever was captured by the parens in this case.

Empty trailing fields are currently suppressed (although a -1 as the third argument disables this). I suggest that empty trailing fields be retained by default.

Probably okay, though we need a way to translate old code. It was originally done this way because split on whitespace would typically return an extra field after the newline. But most newlines will be prechomped in Perl 6.

When not in list context, split currently splits into @_. I suggest that this side-effect be removed.

Fine. It's easy enough to translate to an explicit assignment.

split ?pat? in any context currently splits into @_. I suggest that this side-effect be removed.

Fine. I don't think anyone uses that.

split ' ' (but not split / /) currently splits on whitespace, but also removes leading empty fields. I suggest that this irregularity be removed.

The question is, what to replace it with, since it's a very handy construct. We could use a different conventional pattern:

    @array = split /<ws>/, $string;

Or we could say that it's now a split on whitespace only if the split argument is unspecified. That wouldn't work very well with the old syntax, where we often have to supply the second argument. But given that the =~ operator now serves as a topicalizer for any term, we could translate:

    @array = split ' ', $string;

to this:

    @array = $string =~ split;

Oddly, this probably also works:

    $string =~ (@array = split);

or maybe even this:

    @array = split given $string;

But I think I like the OO notation better here anyway:

    @array = $string.split;

In fact, split may not be a function at all. The default split might just be a string method and use unary dot:

    @array = .split;

We still have the third argument to deal with, but that's likely to be specified like this:

    @array = $string.split(limit => 3);

We could conceivably make a different method for word splitting, much like REXX does:

    @array = .words;

Then a limit could be the first argument:

    @array = .words(3);

But there almost doesn't need to be such a method, since

    @array = m/ [ (\S*) \s* ]* /;

will do the right thing. Admittedly, a .words method would be much more readable...

Fortunately, split is a function, so I can put off that decision till Apocalypse 29. :-)

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow