Is there a module for object-oriented substring handling/substitution?

smls has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Is there a module for object-oriented substring handling/substitution? by choroba (Cardinal) on Jan 24, 2013 at 23:21 UTC
Do you know that a reference to substr works as you described? `my $string = 'abcdefghijklmnopqrstuvwxyz'; my $substring = \substr $string, 10, 10; $$substring = uc $$substring; print $string, "\n";` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Is there a module for object-oriented substring handling/substitution? (2 substrs) by tye (Sage) on Jan 25, 2013 at 01:57 UTC
Do you know that a reference to substr works as you described? Only if you use a single substring. I have a partially-written module that does this type of thing and supports multiple simultaneous substrings of the same string such that they cooperate (which leads to some tricky bits which were fun to hash out). But tye having an unpublished module isn't really a reason to avoid writing one's own version of something similar. - tye	[reply]
Re^3: Is there a module for object-oriented substring handling/substitution? (2 substrs) by BrowserUk (Patriarch) on Jan 25, 2013 at 02:37 UTC
They long since lifted the 'only one lvalue ref' limitation. This is 5.10.1: `$s = 'abcdefghijklmnopqrstuvwxyz';; @r = map \substr( $s, $_*4, 2), 0..6;; $$_ = uc $$_ for @r;; say $s;; ABcdEFghIJklMNopQRstUVwxYZ say $];; 5.010001` [download] It does have its limitations though. (Unsurprisingly) The lvalue refs do not adjust to accommodate replacements that alter the length of the string: `$$_ = $_ for @r;; say $s;; REF(REF(REF(REF(REF(REF(REF(0x3e82050)3e821e8)3e82260)3e820e0)11c458)3 +e820c8)335dc0)cdEFghIJklMNopQRstUVwxYZ` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: Is there a module for object-oriented substring handling/substitution? (cooperate) by tye (Sage) on Jan 25, 2013 at 03:15 UTC
Re: Is there a module for object-oriented substring handling/substitution? by roboticus (Chancellor) on Jan 25, 2013 at 02:36 UTC
smis: Sorry, I don't know of a module that does that. Just for my own curiosity....what would you use something like that for? I can't think of a reason I would want something like that, so I can't suggest any possible searches to help. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^2: Is there a module for object-oriented substring handling/substitution? by Anonymous Monk on Jan 25, 2013 at 03:06 UTC
For speed? Helps with creating an editor :) see Why Emacs Lisp Is More Powerful Than Perl For Text Processing	[reply]
Re^3: Is there a module for object-oriented substring handling/substitution? by roboticus (Chancellor) on Jan 26, 2013 at 00:49 UTC
Ah, thanks for that link! It lead to a nice hour or so of reading & thinking. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^3: Is there a module for object-oriented substring handling/substitution? by smls (Friar) on Jan 26, 2013 at 15:43 UTC
Anonymous Monk: I don't care much for speed, I care about convenience and elegance. Regarding the use-case of creating editors, they might prefer to use their own special-purpose class for performance reasons, with integrated support for efficient feedback on state changes of tracked ranges. For example, the Kate editor (coded in C++ with Qt and KDE libs) uses a light-weight class called MovingRange for keeping track of persistent ranges withing an opened document. They created this solution from scratch in 2010, dropping their previously used, more generic framework called "SmartRanges", in part due to performance reasons (see blog post). Now, if this is a performance-critical code path that benefits from special-case optimization even in an editor written in C++, it probably will be even more so in an editor written in Perl... For my purposes, notification about state changes is not needed, nor is performance a critical consideration so having a full substring class that stores a copy of its text (rather than just a thin "range" class pointing to a location withing the parent string) should not be a problem.	[reply]
Re^2: Is there a module for object-oriented substring handling/substitution? by smls (Friar) on Jan 26, 2013 at 15:14 UTC
roboticus: Well, there have been several times in the past I would have found a module like this useful. My current use-case, which led to me writing this thread, is updating table values in a wiki page by programmatically editing the page's source code (which is available in MediaWiki format). More precisely, the problem at hand is like this: Within a wiki page, there is a special section (identified by it's section header). This section in turn can have an arbitrary number of subsection (each with a unique subsection header). Each of these subsections contains, among other things, a special table. The Perl script is supposed to update the values in a specific column in each of these tables (identified by the word in the column's header cell). Which value goes into a particular table cell in that column, depends on the corresponding value in the first column (i.e. the ID column), as well as the title of the subsection that the table belongs to. Now, the Perl script should play nice with human editing of the same wiki page. Humans will fill in the remaining columns of the aforementioned tables, as well as the rest of the wiki page, and may freely add formatting, move things (like table rows and columns) around, etc. The Perl script must not touch anything on that wiki page except for the specific values it substitutes for new values. This also means no whitespace or formatting changes, so using a generic wiki text parser and dumper is out of the question. Last but not least, the solution should be elegant and easy to maintain and expand. For example if the wiki page is radically re-factored so that the script breaks, I want to be able to fix the script easily (even if I haven't looked at its Perl source code for months), i.e. without having to write complex five-line regexes from scratch. And in the future I might want to add support for automatically adding new table rows if expected values in the ID column were not found in one of the tables, and things like that - so the design should be flexible enough to account for that. In the absence of a module like I described in the OP, I would be using `s/.../CODE/e` blocks for this, but as I hinted in the OP, this might not provide the desired maintainability and elegance.	[reply] [d/l]
Re^3: Is there a module for object-oriented substring handling/substitution? by roboticus (Chancellor) on Jan 26, 2013 at 17:25 UTC
smis: Ok, now I understand what you're asking for. I had a slightly different model in mind. So you're looking for the ability to do something like: `# X is regex stuff to detect start of "interesting region", Y detects +end if ($clob =~ /(.)(X.Y)(.)/) { my ($stuff_before, $stuff_to_edit, $stuff_after) = ($1, $2, $3); $stuff_to_edit =~ s/foo/bar/g; $clob = $stuff_before . $stuff_to_edit . $stuff_after; }` [download] But without all the gymnastics of dismantling and rebuilding the string. I can see where that would be pretty nice since a large $clob would force you to double the storage space and the associated string manipulations. ...roboticus When your only tool is a hammer, all problems look like your thumb.*	[reply] [d/l]
Re^4: Is there a module for object-oriented substring handling/substitution? by smls (Friar) on Jan 26, 2013 at 22:44 UTC
Re^3: Is there a module for object-oriented substring handling/substitution? by LanX (Saint) on Jan 26, 2013 at 20:31 UTC
OK now with the knowledge of your use case, I'd rather recommend to work with a document tree representing your wiki page as a hash of hashes. Much like a DOM-tree, you could traverse it for whatever markup-element ("table") you want. Parse the wiki-page into a tree, manipulate the tree and rebuild the page again. Otherwise: If you insist to stick persistent meta-informations to ranges of characters, then you should better work with arrays of characters. You could tie or bless the scalar elements with whatever info you want. If your user inserts or deletes anything from the array your metainfos will move accordingly. And if you wanna go the full "emacs way" you need to realize linked lists. The easiest way is having 2 element arrays `[$value,$successor_ref]` EDIT: After some meditation, IMHO if you need full interactivity, better stay with the AoH with the document tree, and a "cursor" pointing to the current element. Whenever the user does insert characters update the tree at the point the cursor points to. You'll also need to store informations like "parent", "child", "nextSibling" ... Have a look at DOM or XML modules at CPAN for inspiration. Cheers Rolf	[reply] [d/l]
Re^4: Is there a module for object-oriented substring handling/substitution? by smls (Friar) on Jan 26, 2013 at 21:54 UTC
Re^5: Is there a module for object-oriented substring handling/substitution? by LanX (Saint) on Jan 26, 2013 at 22:08 UTC
Some notes below your chosen depth have not been shown here
Re: Is there a module for object-oriented substring handling/substitution? by BrowserUk (Patriarch) on Jan 26, 2013 at 23:32 UTC
supported by linked substrings. Seems to me you are looking for a complex solution when a simple one will do. In order to mark your substrings, you have to know where they start and end\|length. If you simply unpack the string into an array on those same boundaries, you can edit the individual elements to your hearts content, and when you pack/join them back together, you will have exactly the same effect as your linked substrings without the overhead of all the behind-the-scenes jiggery pockery required to make the latter work. It's simple. And I like simple :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Is there a module for object-oriented substring handling/substitution? by Anonymous Monk on Jan 28, 2013 at 09:31 UTC
https://www.mediawiki.org/wiki/VisualEditor/WikiDom_Specification WikiDom is a serialization of Wikitext based on JSON and optimized for transport and adaptive processing. The structure is based on two basic types of nodes, branches and leafs. Branch nodes have child nodes and leaf nodes have content. A node can not be a branch and a leaf. Content objects in leaf nodes use offset annotations for formatting.	[reply]
Re: Is there a module for object-oriented substring handling/substitution? by thundergnat (Deacon) on Jan 25, 2013 at 17:14 UTC
I am not aware of a module to do anything like that specifically, and doubt that one would be more speedy/useful in a general case than just doing regex substitutions directly. Strings in Perl are not just byte arrays like they are in many other languages, so you will likely not gain much by trying to treat them like one. That being said, I can think of reasons when you might want to do something like that anyway. Many years ago I wrote a special purpose text editor in perl/Tk to help produce texts for Project Gutenberg. The perl/Tk text widget provides some basic search and replace functionality but has many limitations so I wrote something similar to what you are asking as a work around. It was not OO, and was pretty heavily tied to perl/Tk::Text, so it probably isn't useful as drop in code, but you might be able to glean some useful bits if you end up writing something yourself. Be aware that I started writing this in the heady days when perl 5.6 reigned supreme and I was a perl newby, so many of the design decisions were questionable by modern standards. Also, I have not been associated with it since 2005 or so, so the codebase may have moved on. If you are still curious after all those caveats, the code is still available on sourceforge.	[reply]


Perl Monk, Perl Meditation
	PerlMonks