℞ 12: Explicit encode/decode
While the standard Perl Unicode preamble makes Perl's filehandles use UTF-8 encoding by default, filehandles aren't the only sources and sinks of data. On rare occasions, such as a database read, you may be given encoded text you need to decode.
The core Encode module
offers two functions to handle these conversions. (Remember that
decode() means to convert octets from a known encoding into Perl's
internal Unicode form and
encode() means to convet from Perl's
internal form into a known encoding.)
use Encode qw(encode decode); # given $bytes, containing octets in a known encoding my $chars = decode("shiftjis", $bytes, 1); # given $chars, a string encoded in Perl's internal format my $bytes = encode("MIME-Header-ISO_2022_JP", $chars, 1);
For streams all in the same encoding, don't use encode/decode; instead set the ﬁle encoding when you open the ﬁle or immediately after with
binmode as described in a future reference. Remember the canonical rule of Unicode: always encode/decode at the edges of your application.
Previous: ℞ 11: Names of CJK Codepoints
Series Index: The Standard Preamble