Ruby has a cool feature called copy-on-write which I abbreviate as COW. Perl needs to use this, and I'm going to implement it (as magic, if I can). As an exercise, I'm writing a simple language. A brief introduction is presented here. If you feel like playing with the language, by all means, go ahead, but be aware that it's not implemented as much as this documentation tells. This document is the plan, and I have yet to code it all. The source code isn't documented much at all, but once the language gets the features I wrote it to have, I'll spend time documenting it.

The documentation below is, itself, incomplete. I don't show how the copy-on-write part actually happens, but give me a day or so.

Oh, and if anyone here is good with malloc() and free() in C and can tell me where in my code I should be free()ing things, I'd be very grateful.


COW - How Copy-On-Write Can Be Implemented


Copy-On-Write (or COW) is a concept that delays the copying of potentially large data (strings, specifically) as long as possible. It can be implemented via ``magic'', and this should be trivial for the Perl programming language.


The Problem

Copy-On-Write (or COW) is a concept that delays the copying of potentially large data (strings, specifically) as long as possible. Here is an example:

  $big = "_" x 10_000;

$big is now 10,000 characters long. Let's say we want to store a substring of that in $small.

  $small = substr $big, 500, 2000;

$small is now 2,000 characters long. That's a lot to copy. :( What if we could alias $small to the 2,000 characters starting at position 500 in $big, and only copy the 2,000 characters when we needed to (when $big changes)?

Aliasing via tie()

That'd be cool. $small would be magical. Here's a demonstration:

  alias($small, \$big, 500, 2000);

  sub alias { tie $_[0], 'SubstringAlias', \$_[0], @_[1,2,3] }

  package SubstringAlias;

    my ($class, $orig, $targ, $pos, $len) = @_;
    bless [ $orig, $targ, $pos, $len ], $class;

  # when we get $small's value
  # we return the substring of the
  # target variable
  sub FETCH {
    my ($self) = @_;
    my ($me, $str_ref, $pos, $len) = @$self;
    return substr $$str_ref, $pos, $len;

  # when we store a value in $small
  # we should stop it from being an alias!
  sub STORE {
    my ($self, $value) = @_;
    untie ${ $self->[0] };
    ${ $self->[0] } = $value;

Now $small is a tied variable. When we access its contents, the object that is tied to $small has its FETCH() method called:

  print $small;  # like (tied $small)->FETCH()
                 # where (tied $small) returns the object

Self-Aware Aliasing

However, what happens when $big changes?! $small would reflect those changes. We don't want that. :( We need a way to make $small get those 2000 characters at the last possible moment -- when $big changes. This ``change'' is called ``writing'', because we're writing some value to $big. We want a ``lazy'' alias, and we can implement this with another tie():


  $big = "Perl is a wonderful language";
  alias($small, \$big, 5, 15);

  print "<$big>\n";
  print "<$small>\n\n";

  $big = "foo";

  print "\n<$big>\n";
  print "<$small>\n";

  sub alias { tie $_[0], 'SubstringAlias', \$_[0], @_[1,2,3] }

  package SubstringAlias;

    my ($class, $orig, $targ, $pos, $len) = @_;
    my $self = bless [ $orig, $targ, $pos, $len ], $class;

    # make $big a tied variable TOO!
    tie $$targ, 'AliasTarget', $targ, $self;
    return $self;

  sub FETCH {
    print "* SubstringAlias::FETCH()\n";
    my ($self) = @_;
    my ($me, $str_ref, $pos, $len) = @$self;
    return substr $$str_ref, $pos, $len;

  # stop being an alias if we
  # are assigned to
  # NOTE: unties both
  sub STORE {
    print "* SubstringAlias::STORE()\n";
    my ($self, $value) = @_;
    my ($me, $targ) = @$self;
    untie $$me;
    untie $$targ;
    $$me = $value;

  package AliasTarget;

    my ($class, $orig, $copy_to) = @_;
    bless [ $orig, $$orig, $copy_to ], $class;

  # just display our value
  sub FETCH {
    print "* AliasTarget::FETCH()\n";
    my ($self) = @_;
    return $self->[1];

  # now we copy our value
  # to the alias before
  # we change our value
  # NOTE: unties both
  sub STORE {
    print "* AliasTarget::STORE()\n";
    my ($self, $value) = @_;
    my ($me, $str, $target) = @$self;
    untie $$me;
    $target->STORE(substr $str, $target->[2], $target->[3]);
    $$me = $value;

Whew. That program works, and when run, produces the following output:

  * AliasTarget::FETCH()
  <Perl is a wonderful language>
  * SubstringAlias::FETCH()
  * AliasTarget::FETCH()
  <is a wonderful >

  * AliasTarget::STORE()
  * SubstringAlias::STORE()

  <is a wonderful >

The diagnostic print() statements show us what was going on. We see that when we ask for $big's value, we call its FETCH() method, and when we ask for $small's value, we call ITS FETCH() method, which in turn has to call $big's FETCH() method.

When we store the value ``foo'' in $big, we copy the substring of $big to $small before we untie the two of them and leave them as normal strings. That was a bit of work. Wouldn't it be nice if there was a language that did that all transparently?

Let's make one up!

Let's call it COW.

Introduction to the COW Programming Language

COW is a very simple language. It supports the following operations:

  # comments like Perl's

  # assigning a string to a variable
  var = "string";
  var = 'string';

  # printing a variable or a string
  print var;
  print "string";
  print 'string';

  # assigning another variable to a variable
  var = other_var;

  # "cow"ing another variable to a variable
  var = cow other_var;

Those are all pretty self-explanatory, except for the last one. The cow function means ``alias var to other_var until other_var is written to, at which point, copy the contents of other_var to var.`` That's a lot for that function to mean, but that's what it does. This is good, because if we never change other_var, we won't have to worry about the duplicated memory (if we copied other_var's contents to var, we'd have a copy of the string).

How is COW implemented? It's not all too hard. It takes a bit of ``magic'', though. Let's step through a program:

  name = "japhy";

This creates a scalar value (SV) in the program's symbol table. All SVs have an internal structure like so:

  struct _sv {
    list *dependecies;  /* who is aliasing me? */
    MG *magic;          /* what type of magic do I have? */

    int line;           /* what line number was I defined on? */
    int flags;          /* what flags do I have on? */
    int str_len;        /* what is the length of my string value? */
    STR str;            /* the char* representing the string */

Whew. That's C code, if you're not familiar with it. Anyway, safe to say the first line of our program doesn't do much. Here's the internal view of the name variable:

  name -->      dependencies = NULL
                magic = NULL
                line = 1
                flags = 0
                str_len = 5
                str = "japhy"

Ok. Now let's look at the next line of the program:

  alias = cow name;

This creates another variable, alias, and tells it to point to name. This line does a lot of stuff. Let's see what it does to name first:

  name -->      dependencies = SV*(alias)
                magic = NULL
                line = 1
                flags = SVf_COW
                str_len = 5
                str = "japhy"

The dependencies list contains a pointer to alias's SV. Here's what alias looks like:

  alias -->     dependencies = NULL
                magic = {
                  type = MGt_COW
                  ftbl = {
                    get = mg_cow_get
                    set = mg_cow_set
                    len = mg_cow_len
                    clear = mg_cow_clear
                    free = mg_cow_free
                  object = SV(name)
                line = 2
                flags = SVf_MAGIC
                str_len = 5
                str = SvSTR(SV*(name))

In alias's representation, the SVf_MAGIC flag has been turned on. The type of magic is MGt_COW. The ftbl is a struct of pointers to functions to call for the requested action. Finally, the object field holds the SV the magic functions have access to -- in this case, the place we read from (the SV of name). str is not a copied string, but merely a pointer to name's str field.

Onto the next line:

  print alias;

The print function works something like this:

  void op_print_str (STR *s) {
    printf("%s", s);

  void op_print_sv (SV *sv) {
    /* this will update sv->str if needed */
    if (SvMAGIC(sv)) {
      MG *mg;
      if ((mg = SvMG(sv)) && mg->ftable->get)
        CALL(mg->ftable->get)(sv, mg);

What's happening here is SvMAGIC(sv) checks to see if sv has the SVf_MAGIC flag on. If so, we extract the MG structure via SvMG(sv) and store it in mg. If that is true (not NULL) and there is a get function defined for it, that function is called -- the CALL() macro just turns a function pointer into a function:

  #define CALL(fptr) (*fptr)

The get function will probably update the str field of the SV structure. After that's done, we pass the str field, gotten via SvSTR(sv), to the op_print_str() function.


I borrowed a lot of the design of the scalars and magic from Perl. But that's good, since I hope to translate the implementation directly to a Perl magic structure.


Jeff japhy Pinyan, [mailto://].

Download the source at:

Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;