The documentation below is, itself, incomplete. I don't show how the copy-on-write part actually happens, but give me a day or so.
Oh, and if anyone here is good with malloc() and free() in C and can tell me where in my code I should be free()ing things, I'd be very grateful.
NAME
COW - How Copy-On-Write Can Be Implemented
SYNOPSIS
Copy-On-Write (or COW) is a concept that delays the copying of potentially large data (strings, specifically) as long as possible. It can be implemented via ``magic'', and this should be trivial for the Perl programming language.
DESCRIPTION
The Problem
Copy-On-Write (or COW) is a concept that delays the copying of potentially large data (strings, specifically) as long as possible. Here is an example:
$big = "_" x 10_000;
$big is now 10,000 characters long. Let's say we want to store a substring of that in $small.
$small = substr $big, 500, 2000;
$small is now 2,000 characters long. That's a lot to copy. :( What if we could alias $small to the 2,000 characters starting at position 500 in $big, and only copy the 2,000 characters when we needed to (when $big changes)?
Aliasing via tie()
That'd be cool. $small would be magical. Here's a demonstration:
alias($small, \$big, 500, 2000);
sub alias { tie $_[0], 'SubstringAlias', \$_[0], @_[1,2,3] }
package SubstringAlias;
sub TIESCALAR { my ($class, $orig, $targ, $pos, $len) = @_; bless [ $orig, $targ, $pos, $len ], $class; }
# when we get $small's value # we return the substring of the # target variable sub FETCH { my ($self) = @_; my ($me, $str_ref, $pos, $len) = @$self; return substr $$str_ref, $pos, $len; }
# when we store a value in $small # we should stop it from being an alias! sub STORE { my ($self, $value) = @_; untie ${ $self->[0] }; ${ $self->[0] } = $value; }
Now $small is a tied variable. When we access its contents, the object that is tied to $small has its FETCH() method called:
print $small; # like (tied $small)->FETCH() # where (tied $small) returns the object
Self-Aware Aliasing
However, what happens when $big changes?! $small would reflect those changes. We don't want that. :( We need a way to make $small get those 2000 characters at the last possible moment -- when $big changes. This ``change'' is called ``writing'', because we're writing some value to $big. We want a ``lazy'' alias, and we can implement this with another tie():
#!/usr/bin/perl
$big = "Perl is a wonderful language"; alias($small, \$big, 5, 15);
print "<$big>\n"; print "<$small>\n\n";
$big = "foo";
print "\n<$big>\n"; print "<$small>\n";
sub alias { tie $_[0], 'SubstringAlias', \$_[0], @_[1,2,3] }
package SubstringAlias;
sub TIESCALAR { my ($class, $orig, $targ, $pos, $len) = @_; my $self = bless [ $orig, $targ, $pos, $len ], $class;
# make $big a tied variable TOO! tie $$targ, 'AliasTarget', $targ, $self; return $self; }
sub FETCH { print "* SubstringAlias::FETCH()\n"; my ($self) = @_; my ($me, $str_ref, $pos, $len) = @$self; return substr $$str_ref, $pos, $len; }
# stop being an alias if we # are assigned to # NOTE: unties both sub STORE { print "* SubstringAlias::STORE()\n"; my ($self, $value) = @_; my ($me, $targ) = @$self; untie $$me; untie $$targ; $$me = $value; }
package AliasTarget;
sub TIESCALAR { my ($class, $orig, $copy_to) = @_; bless [ $orig, $$orig, $copy_to ], $class; }
# just display our value sub FETCH { print "* AliasTarget::FETCH()\n"; my ($self) = @_; return $self->[1]; }
# now we copy our value # to the alias before # we change our value # NOTE: unties both sub STORE { print "* AliasTarget::STORE()\n"; my ($self, $value) = @_; my ($me, $str, $target) = @$self; untie $$me; $target->STORE(substr $str, $target->[2], $target->[3]); $$me = $value; }
Whew. That program works, and when run, produces the following output:
* AliasTarget::FETCH() <Perl is a wonderful language> * SubstringAlias::FETCH() * AliasTarget::FETCH() <is a wonderful >
* AliasTarget::STORE() * SubstringAlias::STORE()
<foo> <is a wonderful >
The diagnostic print() statements show us what was going on. We see that when we ask for $big's value, we call its FETCH() method, and when we ask for $small's value, we call ITS FETCH() method, which in turn has to call $big's FETCH() method.
When we store the value ``foo'' in $big, we copy the substring of $big to $small before we untie the two of them and leave them as normal strings. That was a bit of work. Wouldn't it be nice if there was a language that did that all transparently?
Let's make one up!
Let's call it COW.
Introduction to the COW Programming Language
COW is a very simple language. It supports the following operations:
# comments like Perl's
# assigning a string to a variable var = "string"; var = 'string';
# printing a variable or a string print var; print "string"; print 'string';
# assigning another variable to a variable var = other_var;
# "cow"ing another variable to a variable var = cow other_var;
Those are all pretty self-explanatory, except for the last one. The cow function means ``alias var to other_var until other_var is written to, at which point, copy the contents of other_var to var.`` That's a lot for that function to mean, but that's what it does. This is good, because if we never change other_var, we won't have to worry about the duplicated memory (if we copied other_var's contents to var, we'd have a copy of the string).
How is COW implemented? It's not all too hard. It takes a bit of ``magic'', though. Let's step through a program:
name = "japhy";
This creates a scalar value (SV) in the program's symbol table. All SVs have an internal structure like so:
struct _sv { list *dependecies; /* who is aliasing me? */ MG *magic; /* what type of magic do I have? */
int line; /* what line number was I defined on? */ int flags; /* what flags do I have on? */ int str_len; /* what is the length of my string value? */ STR str; /* the char* representing the string */ };
Whew. That's C code, if you're not familiar with it. Anyway, safe to say the first line of our program doesn't do much. Here's the internal view of the name variable:
name --> dependencies = NULL magic = NULL line = 1 flags = 0 str_len = 5 str = "japhy"
Ok. Now let's look at the next line of the program:
alias = cow name;
This creates another variable, alias, and tells it to point to name. This line does a lot of stuff. Let's see what it does to name first:
name --> dependencies = SV*(alias) magic = NULL line = 1 flags = SVf_COW str_len = 5 str = "japhy"
The dependencies list contains a pointer to alias's SV. Here's what alias looks like:
alias --> dependencies = NULL magic = { type = MGt_COW ftbl = { get = mg_cow_get set = mg_cow_set len = mg_cow_len clear = mg_cow_clear free = mg_cow_free } object = SV(name) } line = 2 flags = SVf_MAGIC str_len = 5 str = SvSTR(SV*(name))
In alias's representation, the SVf_MAGIC flag has been turned on. The type of magic is MGt_COW. The ftbl is a struct of pointers to functions to call for the requested action. Finally, the object field holds the SV the magic functions have access to -- in this case, the place we read from (the SV of name). str is not a copied string, but merely a pointer to name's str field.
Onto the next line:
print alias;
The print function works something like this:
void op_print_str (STR *s) { printf("%s", s); }
void op_print_sv (SV *sv) { /* this will update sv->str if needed */ if (SvMAGIC(sv)) { MG *mg; if ((mg = SvMG(sv)) && mg->ftable->get) CALL(mg->ftable->get)(sv, mg); } op_print_str(SvSTR(sv)); }
What's happening here is SvMAGIC(sv) checks to see if sv has the SVf_MAGIC flag on. If so, we extract the MG structure via SvMG(sv) and store it in mg. If that is true (not NULL) and there is a get function defined for it, that function is called -- the CALL() macro just turns a function pointer into a function:
#define CALL(fptr) (*fptr)
The get function will probably update the str field of the SV structure. After that's done, we pass the str field, gotten via SvSTR(sv), to the op_print_str() function.
CREDITS
I borrowed a lot of the design of the scalars and magic from Perl. But that's good, since I hope to translate the implementation directly to a Perl magic structure.
AUTHOR
Jeff japhy Pinyan, [mailto://japhy@perlmonk.org].
Download the source at: http://japhy.perlmonk.org/COW/.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;