Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Meditations

( [id://480]=superdoc: print w/replies, xml ) Need Help??

If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
22 years, and about a quarter century of Perl
4 direct replies — Read more / Contribute
by talexb
on Dec 12, 2023 at 12:06

    It's that day again, my Monk Day. I'm up to 22 years on this site, and as the title says, probably 25 years noodling around with Perl. What a long, strange trip it's been.

    I still host the monthly Perl Mongers monthly meeting in Toronto, and our discussion in November edged over to which editor people used. Like anything else, you use whatever tool works best for you. I'm very impressed by people who use emacs -- it seems insanely complicated. Using tmux with one panel as an editor and another as a bash prompt, is about as complicated as I get.

    Perl's the same situation -- if it's a tool you like and can get things done with, great. If there's a new shiny thing you prefer, great. I'm too old to be swayed by people saying, "Oh, no one uses Perl anymore." I use it, thus negating their generalization.

    Dare I say .. here's to another 22 years? :)

    Edit: OK, I got my quip this time:

    Happy Monkday!!1! You've been here 22 invigorating years. Did you make a wish?
    Nope, no wish. Just a reminder that I hear in my head, now and again: "Your fear is boring." Just keep moving.

    Also, Happy Winter Solstice, for all those that observe.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

App-lcpan: Amazing Dependency Graph
3 direct replies — Read more / Contribute
by kcott
on Dec 06, 2023 at 18:39

    I was asked to evaluate the lcpan script from the App-lcpan distribution for $work.

    I thought I'd just share its amazing dependency graph. ☺️

    — Ken

Google inactive account policy
No replies — Read more | Post response
by Cow1337killr
on Nov 28, 2023 at 17:33

    I don't know if any of my PerlMonks associates have source code or whatever on Google Drive or important email conversations with fellow programmers on GMail, but there is a distinct possibility that some of you have been heads-down coding and are not aware of...

    https://www.forbes.com/sites/daveywinder/2023/11/28/gmail-and-photos-content-purge-starts-in-5-days-protect-your-data-now/?sh=2dae705950e7

    It is not just your GMail account or pictures of you hanging out with Larry...

    https://support.google.com/accounts/answer/12418290#zippy=%2Cif-you-want-to-recover-your-account%2Chow-to-check-your-activity-status

    Time is not on your side. ("December 1, 2023 is the earliest a Google Account will be deleted due to this policy.")

    And now my editorial:

    Here we are on the cusp of A.I. and the world's biggest A.I. / Search engine / Fill in the blanks company has nothing better to do than send us this Christmas present (i.e., a lump of coal)?

PWC 244 task 2 in linear time
3 direct replies — Read more / Contribute
by Anonymous Monk
on Nov 27, 2023 at 13:22

    Disclaimer: it's clickbait. The plot is curved, solution isn't linear, despite lack of nested loops, -- but fast.

    Task 2: Group Hero
    Submitted by: Mohammad S Anwar
    
    You are given an array of integers representing the strength.
    
    Write a script to return the sum of the powers of all possible 
    combinations; power is defined as the square of the largest number 
    in a sequence, multiplied by the smallest.
    
    Example 1
    
    Input: @nums = (2, 1, 4)
    Output: 141
    
    Group 1: (2) => square(max(2)) * min(2) => 4 * 2 => 8
    Group 2: (1) => square(max(1)) * min(1) => 1 * 1 => 1
    Group 3: (4) => square(max(4)) * min(4) => 16 * 4 => 64
    Group 4: (2,1) => square(max(2,1)) * min(2,1) => 4 * 1 => 4
    Group 5: (2,4) => square(max(2,4)) * min(2,4) => 16 * 2 => 32
    Group 6: (1,4) => square(max(1,4)) * min(1,4) => 16 * 1 => 16
    Group 7: (2,1,4) => square(max(2,1,4)) * min(2,1,4) => 16 * 1 => 16
    
    Sum: 8 + 1 + 64 + 4 + 32 + 16 + 16 => 141
    
Emojis for Perl Monk names
10 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Nov 17, 2023 at 03:02
Perl Secret Operator Emojis
6 direct replies — Read more / Contribute
by eyepopslikeamosquito
on Nov 05, 2023 at 01:40

    To continue my learning of Unicode Emojis, I thought it'd be fun to try to concoct emojis for some of Perl's famous secret operators.

    Here's what I've come up with so far:

    NameOperatorEmojiNotes
    Diamond<>💎nicknamed by Geneva Wall circa 1994
    Spaceship<=>🚀nicknamed by Heidi Wall and Randal L. Schwartz (merlyn 🪄) circa 1994
    Saturn*=( )=🪐scalar/list context
    Kite~~<>🪁a single line of input
    Babycart@{[ ]}👒🛷list interpolation, invented by Larry 🧱 (TimToady) circa 1994

    *Note: I have chosen not to use the taboo name for this secret operator used at perlsecret

    I found this exercise to be surprisingly challenging. Unable to find an emoji for BooK's quirky Babycart (or a pram), I opted instead to look for emojis that do justice to the LanX's imaginative suggestion of Mexican Sledge -- because he visualizes @{[ ]} as a guy with sombrero dragging a sled(ge) uphill.

    Please feel free to suggest alternative Unicode emojis for the secret operators above and to concoct emojis for other secret operators.

    See Also

    Updated: removed duplicate Mexican Sledge reference; added attribution to Heidi Wall; added Wall emoji for Larry and magic wand emoji for merlyn.

    👁️🍾👍🦟
2024 Perl Conference - Science Track Interest Survey
4 direct replies — Read more / Contribute
by oodler
on Nov 01, 2023 at 00:22
NHL Hockey Fans?
2 direct replies — Read more / Contribute
by stevieb
on Oct 24, 2023 at 02:39

    I'm Canadian, and hockey is in my blood. I've been on skates since I was before I remember.

    Being from Toronto, I have my team.

    For the last couple of years, I used an app on my phone that would alert me 15 minutes before my teams played (Toronto and Edmonton) so that I had an opportunity to set up recording on my PVR.

    In the last two days, I had to refuse the changes in said app's T&S because they wanted info beyond what was needed, so I decided immediately to delete the app. I then of course had to figure out a way to write my own alert software.

    I have. As I've done for a long time, I take a problem, and create a Perl solution to produce data from a source that is external. This is no different.

    I already have prototype working software. My question is this:

    If you're a hockey fan, is there anything specifically you'd want to look up? I already have it pegged as NHL::API, so your feedback will dictate the interface.

    Thanks :)

    -stevieb

RFC: Export tags for builtin pragma
3 direct replies — Read more / Contribute
by kcott
on Oct 19, 2023 at 13:42

    G'day All,

    I'm intending to propose export tags for the builtin pragma. I'd appreciate any comments you may have about this. Thankyou.

    A Brief History of the 'builtin' Pragma

    My Usage of the 'builtin' Pragma

    When released, I often play around with experimental features; however, I never use them in production-grade code. I did the same with the builtin pragma and found many to be useful: in some cases, I also found the import lists to be quite unwieldy.

    When Perl v5.40.0 is released, presumably sometime next year, I will probably start using many of the stable functions provided by the builtin pragma in production-grade code. I would like easier to use import lists; accordingly, I'm proposing a number of export tags.

    Proposed Export Tags for the 'builtin' Pragma

    :bool
    Exports: true, false, is_bool.
    :weak
    Exports: weaken, unweaken, is_weak.
    :ref
    Exports: blessed, refaddr, reftype.
    :round
    Exports: ceil, floor.
    :stable
    Exports all stable (i.e. non-experimental) functions.
    :all
    Exports all functions.

    — Ken

A Perl 3 bug in the debugger
2 direct replies — Read more / Contribute
by pemungkah
on Sep 20, 2023 at 17:25
    I recently posted about this on my blog, but it's worth a quick post here too.

    There's a fun little bug in the debugger, which you can see like this. Create a dumb little script. Anything will do.

    #!/bin/perl use strict; use warnings; print "we"; print "just"; print "need"; print "something"; print "to"; print "list";

    Now let's start up the debugger.

    perl -d zz.pl Loading DB routines from perl5db.pl version 1.77 Editor support available. Enter h or 'h h' for help, or 'man perldebug' for more help. main::(zz.pl:5): say "we"; DB<1>

    All as expected, but now:

    DB<1> l 1.2 1.2 use strict; DB<2> l 2.2 use warnings; 3.2 use feature 'say'; 4.2 5.2: say "we"; 6.2: say "just"; 7.2: say "need"; 8.2: say "something"; 9.2: say "to";

    That's kind of unexpected, but it gets better!

    DB<2> l 1.1.3.5 1.1.3.5 use strict; DB<3> l 2.1 use warnings; 3.1 use feature 'say'; 4.1 5.1: say "we"; 6.1: say "just"; 7.1: say "need"; 8.1: say "something"; 9.1: say "to";

    Why does this happen? well it goes back to commit a687059cbaf, which is the one that moves the debugger into lib in Perl 3. The pattern used to capture the line number specification is (\d\$\.)+, which matches all kinds of things, including floating-point numbers, IPv4 addresses, and other junk. The overall pattern used to parse the l command arguments changes over time, but that basic match to extract a "line number" never does.

    You may be thinking, "yeah, okay, I see that, but why does the debugger show the floating-point line number?" The reason is that the line number spec is captured as a string. THe debugger stores the source code of the current file in an array whose name is not a valid Perl variable name, and uses the line number spec captured by the l command to index it.

    When Perl indexes an array, the index value is converted to an integer if possible, because array indexes have to be integers. The line spec we have is captured and stored as a string, so when we try to index the source array with it, "1.22" becomes the integer 1, and we find line 1. The command uses the value of the index variable (remember, that's still a string!) to print the line number, and so we end up with a floating-point line number.

    Now, when we run the bare l command, the string "1.22" is still in the list command's "last line listed" variable, and Perl simply takes that variable and adds 1 to its contents to look for the next line. Since the contents are a string that looks like a floating point number, Perl converts it to a float, adds 1.0 (so we don't downgrade it from a float), and assigns that back to the current line number,so we get lines 2.22, 3.22, and so on.

    I've submitted a patch to fix this for 5.40, but it's pretty surprising that we've had this bug for 32 years!

A small step...and a giant leap for Bod
2 direct replies — Read more / Contribute
by Bod
on Sep 07, 2023 at 06:05

    My 1000th post!

    It's a little under 3 years since I created an account here on Perl Monks. For many years, I'd occasionally visited The Monastery thanks to Google leading me here when I asked a Perl-related question - which was quite frequently. But, back in November 2020, I had a few new projects going on and thought I needed to "raise my game". I had no concept of what that actually meant.

    My expectation was that I'd learn a few new coding styles, be a bit quicker and, perhaps, a bit clearer.

    What has actually happened, and continues to happen, is that my whole approach to writing code has changed...drastically!

    Allow me to illustrate by way of an example...

    Just this week I was writing some code for the admin part of my partner's website Pawsies. We need to be able to upload pictures of dogs that we look after, and it's helpful if those pictures are square so they display consistently.

    The whole website uses Template - something I discovered in The Monastery. The web scripts are in their own directory and not mixed up with the other site files, again a learning from The Monastery. The upshot is sites that are easier to navigate, easier to link to as everything is not in the cgi-bin and easier to maintain.

    However, it goes much further than that.

    In the past, if I wanted square images, I would have hard-coded the logic to produce them into the script that needed them. But this week, instead I wrote a module to do only that operation. It is where my thought process started, it was not an afterthought. The design started with deciding exactly what it was supposed to do and by jotting down how I would know if it was successful. The basis of a test!

    I then looked to see if there were any extra generalisations that could be made to make it more useful to other people or when I reuse it elsewhere. So, a resizing parameter was added to change the size of the square image and a position parameter was added to determine where abouts the square is taken from in the original image.

    Only then was the code written followed by the tests...

    Once the tests all ran fine, it was bundled up and uploaded to CPAN for all to use. Currently as a dev release so I get some test results before the production release. It's all looking good...

    Before joining The Monastery this would have been a bit of messy, but functional code locked away somewhere in a difficult-to-maintain script. I considered CPAN modules to be for other, superior "proper" coders...not for me. Now I look for the best way to do it for my needs now, my needs in future when I come to maintain the code or need similar functionality and for the needs of the wider Perl community as The Monastery has shown me that I have something to contribute as well as to learn.

    Watch out for a testing question coming soon - one of the tests for Image::Square required visually inspecting the output image and I don't know how to convert that into a usable test...see, lots more to learn and that's something I fully embrace!

    Thank you to everyone who has helped, inspired, questioned and critised me over the last 1000 post - it really is appreciated 👍

[NTF] Nice Perl ideas I have no time for
3 direct replies — Read more / Contribute
by Discipulus
on Sep 07, 2023 at 05:32
    Hello dear community,

    Being our venerable halls quiet nowadays I propose this meditation to share ideas of programs, modules and everything you want we have no time enough to develop them further.

    In my world ideas have no copyright and should instead circulate freely as there is the chance they are grasped by an enlightened soul who can squeeze the best from them.

    Even more: there are amateur programmers with nice ideas and professional ones with few ones. It is not something to complain about, we are different brains with different skills, inclinations and.. free hours :)

    I'd like to see at least some demo code for these ideas with the goal well explained as any possible path of implematation or critical parts, not just: /I'll save the world with a oneliner/.

    We can use a tag for these post like [NTF] (No Time For) and post them in reply at this post or as new Meditation.

    I'll start with a first one if are ok with this ..nice perl idea :)

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Looking over the fence
1 direct reply — Read more / Contribute
by karlgoethebier
on Aug 27, 2023 at 15:27

    Clack

    (defvar *handler* (clack:clackup (lambda (env) (declare (ignore env)) '(200 (:content-type "text/plain") ("Hello, Clack!")))))

    «The Crux of the Biscuit is the Apostrophe»

New built-in perl5.38 try/catch syntax
No replies — Read more | Post response
by eyepopslikeamosquito
on Aug 23, 2023 at 06:16

    After recently installing perl v5.38, I stumbled upon some cool improvements to Perl's built-in try/catch syntax while watching the excellent What's new in Perl v5.38 youtube talk, delivered by Paul "LeoNerd" Evans at TPRC 2023 Toronto.

    New perl 5.38 use feature 'try'

    • perl v5.34 added try/catch syntax based on CPAN module Syntax::Keyword::Try
    • perl v5.36 added "finally" blocks to try/catch, also inspired by Syntax::Keyword::Try
    • perl v5.36 added use feature 'defer', allowing you to create defer blocks that run at the time that execution leaves the block it's declared inside (which seems to be inspired by the classic RAII programming idiom)
    • use v5.38 implies use feature 'try'

    Some perldoc References

    • die - is how you throw exceptions in Perl
    • eval - used to execute a little Perl program, trapping any errors encountered so they don't crash the calling program
    • Carp - alternative warn and die for modules
    • autodie - Replace functions with ones that succeed or die with lexical scope
    • Fatal - Replace functions with equivalents which succeed or die

    To get a feel for how all this works in practice, I created a simple example, consisting of two files in a scratch directory, TestTry.pm and trytest.pl, shown below.

    TestTry.pm

    package TestTry; use strict; use warnings; print "TestTry: module load\n"; sub life { my $n = shift; defined($n) or die "error: no argument provided"; print "TestTry::life n='$n'\n"; $n =~ /^\d+$/ or die "input error: '$n' must consist of digits only +"; $n == 42 or die "Sadly there is no meaning in your life (n=$n) +"; print "TestTry: congrats, your life has meaning!\n"; print "TestTry::life end\n"; } 1;

    trytest.pl

    # trytest.pl - a simple test of new perl 5.38 try syntax: # Put TestTry.pm in same dir as trytest.pl and run with: # perl -I . trytest.pl # Note: use v5.38 implies use strict and warnings use v5.38; # use feature 'try'; # throws 'try/catch is experimental' warnings use experimental 'try'; use TestTry; sub do_one { my $number = shift; try { TestTry::life($number); } catch ($e) { chomp $e; print "trytest: caught '$e'\n"; } finally { print "trytest: in finally block\n"; } } print "trytest: start\n"; do_one("invalid"); do_one(13); do_one(42); print "trytest: end\n";

    Example run

    With that done, assuming you have perl 5.38 installed, you can run:

    $ perl -I . trytest.pl TestTry: module load trytest: start TestTry::life n='invalid' trytest: caught 'input error: 'invalid' must consist of digits only at + TestTry.pm line 11.' trytest: in finally block TestTry::life n='13' trytest: caught 'Sadly there is no meaning in your life (n=13) at Test +Try.pm line 12.' trytest: in finally block TestTry::life n='42' TestTry: congrats, your life has meaning! TestTry::life end trytest: in finally block trytest: end

    Summary

    I really like this new try/catch syntax and am looking forward to Perl providing built-in exception handling without having to install CPAN modules, such as Try::Tiny and TryCatch.

    Remembering the smartmatch/Switch debacle, I'm also a fan of this new gentler way of introducing experimental new features into the Perl core.

    Reference

    See Also

    Updated: Expanded "See Also" section.

Handling of Unicode File Names
No replies — Read more | Post response
by NERDVANA
on Aug 22, 2023 at 23:27

    The Problem

    I have long been bothered by the problem where I read a directory name which happens to be a UTF-8 representation of unicode, then append a unicode string to that name, then try writing out to that new filename but get an error that the directory does not exist:

    $ perl -E 'mkdir("\x{100}")' $ perl -MB -E 'my @d= <*>; say B::perlstring($_) for @d' "\304\200" $ perl -E 'my ($d)= <*>; open(my $f, ">", "$d/\x{101}.txt") or die "$! +"' No such file or directory at -e line 1.

    Why? Because Perl passes the scalar to C library's 'open' and that delivers a UTF-8 encoding of the entire string, and the bytes that came from glob (and were never decoded from UTF-8) get their individual UTF-8 bytes encoded as UTF-8 characters.

    Perl expects the user to keep track of which strings are unicode and which strings are bytes, and never mix the two. In the example above, the real problem/bug is that glob returns bytes, and "$d/\x{101}.txt" is mixing bytes with unicode, producing garbage.

    While that answer is technically correct, I'm not satisfied with it, because it results an a sub-optimal user experience. A user *ought* to be able to list a directory, and have Unicode, append unicode to it, and write them back out. This process ought to be easy, instead of splattering the code with calls to encode() and decode(). Why can't we have nice things?

    (The problem is even worse on Windows, where you must configure your program to run with the UTF-8 codepage or else you get even worse garbage, since Perl internally uses the ANSI variants of the Win32 API which replaces unrepresentable characters with placeholders)

    What Does Python Do

    Python 2 had a system where unicode strings were represented differently from ascii strings, and so the solution in Python 2 was "unicode in, unicode out". In other words, if you call a directory listing with a unicode directory path, all the results come back as unicode strings. So what happens if you try reading an invalid UTF-8 sequence when you requested Unicode return values? it just returns a non-unicode string in the mix with the unicode ones.

    $ python2.7 Python 2.7.18 (default, Oct 10 2021, 22:29:32) [GCC 11.1.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> l=os.listdir(".") >>> l ['\xc4\x80'] >>> l=os.listdir(u".") >>> l [u'\u0100']
    (now write a file alongside it which is one correct UTF8 character and one non-utf8 byte)
    $ perl -MB -E 'open(my $f, ">", "\x{C4}\x{80}\x{A0}.txt") or die "$!"' $ python2.7 Python 2.7.18 (default, Oct 10 2021, 22:29:32) [GCC 11.1.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> l=os.listdir('.') >>> l ['\xc4\x80\xa0.txt', '\xc4\x80'] >>> l=os.listdir(u'.') >>> l ['\xc4\x80\xa0.txt', u'\u0100']
    So, does this API behavior result in a sensible developer experience?
    >>> open(l[1]+'/'+l[0], 'w') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0 +: ordinal not in range(128)
    The answer to "what happens when you try combining ascii directory with unicode filename" is "it doesn't let you do that". So, that saves the developer from head-scratching i/o errors, and puts the exception closer to the source of the problem.

    Unfortunately, Perl can't adopt this solution because Perl doesn't have a logical separation between Unicode and Ascii strings. (yes there is Perl's utf8 flag, but that's not a logical difference between contents of scalars. References available upon request.)

    But, in Python 3.0, all strings are unicode! (similar in some ways to perl's stance) So what did they do for this situation?

    $ python3 Python 3.11.3 (main, Jun 5 2023, 09:32:32) [GCC 13.1.1 20230429] on l +inux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> l=os.listdir('.') >>> l ['&#256;\udca0.txt', '&#256;'] >>>
    So, er.... they return an invalid representation of the bytes? That is "\x{100}" followed by "\x{DCA0}" in place of the byte "\x{A0}". What is the Unicode 0xDC00 range? It's called the "Low Surrogate Area", and unicode.org says
    Low Surrogate Area Range: DC00-DFFF Isolated surrogate code points have no interpretation; consequently, no character code charts or names lists are provided for this range. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. ... For a complete understanding of high-surrogate code units low-surrogate code units, and surrogate pairs used for the UTF-16 encoding form, see the appropriate sections of the Unicode Standard
    So basically, Python 3 encodes stray non-utf8 bytes as values in a reserved-for-other-uses set of codepoints which should never appear in a real unicode string. Does it work correctly for round trips?
    >>> open(l[1]+'/'+l[0], "w") <_io.TextIOWrapper name='&#256;/&#256;\udca0.txt' mode='w' encoding='U +TF-8'> >>> l=os.listdir('\u0100') >>> l ['&#256;\udca0.txt'] ^d $ perl -E ' sub escapestr { $_[0] =~ s/([^\x20-\x7E])/sprintf("\\x%02X", ord $1)/egr } say escapestr($_) for <\x{100}/*>' \xC4\x80/\xC4\x80\xA0.txt

    Sure enough, it round-trips those 0xDC00-0xDCFF codepoints back to the single non-unicode bytes they came from.

    What Can We Do In Perl?

    The python3 +0xDC00 solution could be used in Perl to handle non-utf8 characters in a new unicode-friendly API. But, how does this work out alongside our other APIs?

    Lets suppose we add a new feature "unicodefilenames". (hopefully we wouldn't have to type that much, and could eventually lump it in with "use v5.50")

    use feature 'unicodefilenames'; my ($d)= <*>; open(my $f, ">", "$d/\x{101}.txt") or die "$!";
    This works now. But what happens if we pass these file name to other modules in our program?
    package New; use v5.42; use feature 'unicodefilenames'; Old->foo($_) for <*>; package Old; use v5.38; sub foo($fname) { open my $fh, "<", $fname; }

    Whoops. The new unicode names get passed to a module that expects "a filename", and all filenames were previously strings of bytes, so it will get encoded as plain-old-utf8 which doesn't respect the conversion from "\xDCA0" to "\xA0". So, anyone with a european locale having lots of upper Latin-1 will end up with frequent breakage.

    What if Perl handled the "\xDC00" range specially regardless of the feature bit? This would break any old code that had been writing filenames using those characters. But nobody should ever be writing them... because it would only ever occur in a UTF-16 encoding. So the only reason anyone would legitimately want to write them was if they took a UTF-16 encoded string and then further encoded that as utf-8 and wanted it to be a filename.

    Assuming p5p decided that was an acceptable amount of back-compat breakage, what else could go wrong?

    package New; use v5.42; use feature 'unicodefilenames'; Old->foo($_) for <*>; package Old; use v5.38; sub foo($fname) { my $dir= "tmp\x{85}"; mkdir $dir or die "$!"; system("cp -a $fname $dir/$fname") == 0 or die "$!"; }

    Whoops, there are two bugs here. First, the Old module doesn't know that it is being given a unicode filename. Then, not anticipating this to be a problem, it combines that string with a non-unicode string, resulting in garbage. Then as a second problem, it shells out to a command, and the Perl interpreter has no way of knowing whether this is a "filename" situation where 0xDC00 should be re-interpreted. Keep in mind that people might have all sorts of reasons for passing invalid unicode (or utf-16 codes) as arguments to external programs. (well, maybe not, but it seems a lot more likely than passing them as filenames to filesystem APIs)

    But wait, what does Python do for passing bytes to external programs if all their strings are unicode?

    $ python3 Python 3.11.3 (main, Jun 5 2023, 09:32:32) [GCC 13.1.1 20230429] on l +inux Type "help", "copyright", "credits" or "license" for more information. >>> import subprocess (Wrapped for readability) >>> subprocess.run([ 'perl','-E', 'sub escapestr { $_[0] =~ s/([^\x20-\x7E])/sprintf("\\x%02X", ord $1)/egr } say escapestr($ARGV[0])', "\x80"]) C280 >>> subprocess.run([ 'perl','-E', 'sub escapestr { $_[0] =~ s/([^\x20-\x7E])/sprintf("\\x%02X", ord $1)/egr } say escapestr($ARGV[0])', "\x80"]) 80

    Woah! Pretty bold there, Python! If you want to pass the byte 0x80 as a parameter to an external program, you'd need to encode it as "\xDC80" in your always-unicode strings. (Or, use the Python3 "bytes" object instead of trying to carry around raw bytes inside unicode strings, which is what all the tutorials teach) Anyway, interesting and all, but I'm guessing this is a step too far for perl 5.

    So back to filenames. What can we do? It looks like the only way we can prevent bugs from erupting everywhere is to keep using strings of plain bytes, with unicode converted to UTF-8 (or perhaps encoded according to locale, if anyone ever uses non-utf8 locales anymore). But, what if we wrap filenames with objects?

    package New; use v5.36; use Path::UTiny; # imagine a unicode-aware Path::Tiny # Create directory named "\xC4\x80" path("\x{100}")->mkdir; for (path(".")->children) { # compares as unicode Old->foo($_) if $_->name eq /\x{100}/; } package Old; use v5.36; sub foo($dir) { # stringify to bytes, creates file "\xC4\x80/\x80.txt" open my $f, '>', "$dir/\x80.txt"; }

    This actually works! To be clear, I'm proposing that the path object would track unicode internally (where it could use Python3's trick of remapping the ambiguous bytes) and any time it was coerced to a string by unsuspecting legacy code, or by PerlIO API calls, it would yield the usual UTF-8 bytes.

    The downside is that you still can't write

    $path= path("$path/$unicode")
    because that would still be combining unicode with non-unicode. The ".=" operator could be overloaded to return new Path objects, but that might also surprise users when $x .= "/$y" has different results than $x= "$x/$y" so maybe not.

    Conclusion

    I don't see any practical way for Perl 5 to upgrade to unicode filenames in plain strings and native PerlIO functions. It would create about as many problems as it would solve. But, a new path object library that works with unicode internally but stringifies to bytes would have a chance of being useful for working with unicode without breaking too many common assumptions.


Add your Meditation
Title:
Meditation:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":


  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2024-04-20 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found