Perl Unicode Cookbook: Make All I/O Default to UTF-8

℞ 18: Make all I/O and args default to utf8

The core rule of Unicode handling in Perl is “always encode and decode at the edges of your program”.

If you’ve configured everything such that all incoming and outgoing data uses the UTF-8 encoding, you can make Perl perform the appropriate encoding and decoding for you. As documented in perldoc perlrun, the -C flag and the PERL_UNICODE environment variable are available. Use the S option to make the standard input, output, and error filehandles use UTF-8 encoding. Use the D option to make all other filehandles use UTF-8 encoding. Use the A option to decode @ARGV elements as UTF-8:

     $ perl -CSDA ...
# or
     $ export PERL_UNICODE=SDA

Within your program, you can achieve the same effects with the open pragma to set default encodings on filehandles and the Encode module to decode the elements of @ARGV:

     use open qw(:std :utf8);
     use Encode qw(decode_utf8);
     @ARGV = map { decode_utf8($_, 1) } @ARGV;

Previous: ℞ 17: Make File I/O Default to UTF-8

Series Index: The Standard Preamble

Next: ℞ 19: Specify a File’s Encoding

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub