Hungarian notation, kind of

Introduction

Hungarian notation (and any derivation thereof) has its cons and pros. I understand that there are people who like it, and that there are people opposed to it. Some would argue that Perl code doesn't need any kind of Hungarian notation, because we already have our sigils that mark $variable as a scalar, @variable as an array, and %variable as a hash.

However, a scalar is a flexible thing. I don't have to tell you it doesn't just hold strings and numbers, it can also hold a multitude of different kind of references. In my experience, in can be helpful to make it obvious what sort of data you expect to be in a scalar, so that you can see at all times what you're dealing with. I usually do that by adding a short two or three letter suffix to a variable name (and in rare occasions, just one or even four letters).

a) Non-object references

$child_nodes_ar     ar: Array Reference
$attributes_hr      hr: Hash Reference
$callback_cr        cr: Code (or Callback) reference
$next_node_it       it: Iterator
$report_fhi         fhi: File Handle for Input   *1
$summary_fho        fho: File Handle for Output  *1
$time_re            re: Regexp
[download]

^*1: it could be argued that file handles are object references, but there aren't many situations I use them that way, hence I list them as non-object references.

b) Object references

In applications where you have only one database, you could easily store the database handle in $dbh, but what if your application has to talk to multiple databases, for instance when you have to write code to glue a Wordpress webshop to an external bookkeeping system. Names such as $wordpress_dbh and $bookkeeping_dbh clearly indicate which database each dbh refers to.

It doesn't end just there. I like to use suffixes for many kinds of object types:

$yesterday_dt       dt: Date::Time
$website_ht         ht: HTML::Template
$website_tt         tt: Template Toolkit
$sessions_slct:     slct: IO::Select 
$www_sck            sck: a socket of kinds (most likely IO::Socket)
[download]

b.1) GUI components

Some conventions I find myself using when programming a GUI around a program:

$settings_mw mw: Tk Main Window $login_tl tl: Tk TopLevel window $process_but: but: button $trim_chk: chk: checkbox $markup_frm: frm: frame $encode_opt: opt: radio button

c) Non-reference variables

Even plain non-reference scalars could do with some kind of clarifications as to what they're supposed to contain, or how they're used:

$input_buf          buf: a buffer
$valid_b            b: used as a boolean
$entities_cnt       cnt: count, or ammount
$log_f              f: a format for sprintf() or printf()
$article_id         id: a unique identifier
$element_idx        idx: an index
$session_ser        ser: a serialized object
$long_name_out      out: output value for the current subroutine
[download]

c.1) Content format

$report_html        html: content in html format
$report_json        json: content in json format
$report_txt         txt: content in plain text format
$report_yaml        yaml: content in yaml format 
$report_xml         xml: content in xml format
[download]

c.2) Different kind of arrays

There are also cases where I like to suffix my arrays to indicate their use:

@nodes_s            s: stack (so I should only use push, pop, and $nod
+e_s[-1] on it)
@nodes_q            q: a queue (so I should only use push, shift, and 
+$node_q[0] on it)
@nodes_out          out: output value for the current subroutine
[download]

Conclusion

I find sticking to such suffixes greatly improve readability of my code. When I come back to a piece of code because I discovered a bug or want to improve the way something works, I can dive straight into some relevant subroutine and see what each variable is supposed to be. Note that none of these suffixes tell me whether a scalar is a string or an int or what-have-you: those details should be clear by the main name of the variable, if they're not obvious by the suffix (for example, a $..._idx will hardly ever be a string ;) ).

Another thing, which is only tangentially relevant, is that I always use underscores_to_separate words and suffixes in a variable name, and never camelCasing. That saves me from wondering, "what was it again? $myAwesomeScalar, $MyAwesomeScalar, or $my_awesome_scalar?" It will always be the latter.

Comment on Hungarian notation, kind of Select or Download Code

Replies are listed 'Best First'.
Re: Hungarian notation, kind of by tobyink (Canon) on Jan 21, 2013 at 16:15 UTC
The one place I find this sort of thing useful is where you're dealing with a bunch of different pieces of data, some of which are in a "processed" state, and other in an "unprocessed" state. For example, you have a `$title`, `$author` and `$body` for an article; the `$body` is stored in your database as a blob of HTML, so doesn't need entity encoding when you pump it into an HTML template, but the other two are stored as text in the database, so need entity encoding when inserted into the template. So I'd call them something like `$title_txt`, `$author_txt` and `$body_html`. Other examples might be where some of your bits of data are UTF-8 encoded byte strings, and others are Unicode character strings; or some bits of data are tainted and others are not; etc. `package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name`	[reply] [d/l] [select]
Re: Hungarian notation, kind of by davido (Cardinal) on Jan 21, 2013 at 17:43 UTC
I find it particularly useful when unpacking args that are passed as references. Then I can look at the subroutine and remember immediately to pass a reference rather than a list. `# Using two-letter notation... sub foo { my( $something, $that_ar ) = @_; foreach my $item ( @{$that_ar} ) {... }` [download] I know at a glance that this sub takes an array ref. If I didn't use some sort of notation, and if the sub's code were a little more complicated, it helps to clarify. I'm in the habit of using (mostly) four-letter versions: _aref, _href, _fh, _sref. Perhaps there's value in shortening the identifiers by a couple of strokes (line length), but old habits die hard. Dave	[reply] [d/l]
Re: Hungarian notation, kind of by davies (Prior) on Jan 21, 2013 at 17:50 UTC
One of the first things I found when first considering Hungarian was ~~http://www.joelonsoftware.com/articles/wrong.html~~ update - moved to https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/. I don't agree with everything Spolsky writes (especially on Excel - he should be sentenced to use it), but I like this piece and have used Hungarian widely since reading it. There are several points to note, though. The first is the difference between "Apps Hungarian" and "Systems Hungarian". Actually, I find even SH useful in some circumstances. The commonest is when dealing with columns in Excel. These can be specified numerically or alphabetically, so I will always specify "ncol" or "scol". I also find SH useful in Perl as I'm not used to the relatively weak typing and find SH helps me remember my original intentions. Second, consistency is very useful. Another reason I prefer Hungarian is that Excel uses it internally for constants (I wish it were consistent and used it for all data types, but you can't have everything. Why does Selection get an initial capital? Ask Spolsky). It also uses a capital letter to start all functions, so I use lower case Hungarian prefixes and capitals to start functions. The effect of this is greater in VBA (Excel / VBA is where I spend most of my time) than in sensible languages. In a crash, there is no way of getting the return stack unless you organise it explicitly. I therefore wrap everything in error trapping and raise errors to enable me to see the return stack. So when I get a line like `If Initialise Then Err.Raise knCall, , ksCall`, it is obvious from the lack of a prefix that Initialise is a function returning a Boolean rather than a Boolean variable. Another reason for including upper case characters in VBA (I don't know about other MS languages, but I believe that some have the same feature) is that the IDE automatically converts the case to the case of the original declaration. This means that anything that appears as 100% lower case is automatically a typo and is easily spotted. I realise that I'm writing a lot about VBA here, but one of the points you raise is consistency. You can't be consistent with everything. Most of the Perl I write is to control Excel, and in that situation it makes sense to follow Excel's rules, even in Perl. If you read the Spolsky article I linked to, you will see him using upper case in various situations, so if you want to be consistent with him you will use CamelCase rather than underscores. But standards differ, so it helps to be flexible in your consistency so that you do not automatically use something that is inconsistent with someone else's standard. You describe one thing I've never seen before, though, namely Hungarian suffixes. I (and I'm an accountant, not a programmer) have always seen them as prefixes. If you have any references for this use, I'd like to read the pros and cons. If not, be aware that you're likely to have to change to prefixes if coding in any sort of team. Regards, John Davies	[reply] [d/l]
Re: Hungarian notation, kind of by Anonymous Monk on Jan 21, 2013 at 16:44 UTC
If you find that it improves the clarity and the readability of your code, and that you actually use it consistently, then do so ... with blessings. Any practice that improves clarity, and that is actually used consistently, is beneficial. Code that consistently does something-else that is also clear, is also beneficial.	[reply]
Re: Hungarian notation, kind of by sundialsvc4 (Abbot) on Jan 22, 2013 at 16:24 UTC
To me, consistency is key. If your application has to work alongside other code (e.g. in VB) that already uses this nomenclature, then follow suit. If you are working with a legacy application that didn’t use it, don’t introduce it. Figure out what the Romans were doing and keep doing it. These notations can unintentionally be misleading when used with a “typeless” language, such as Perl is and such that VB often is. You can legitimately find that your `szFoo` variable contains an integer, or that the value it contains has magically transmogrified itself into one. Thus, the social-contract that has been implied by the naming can’t be upheld by the technology, which might produce confusion. Nevertheless, human beings will still benefit from consistency in naming, whatever that consistency is.	[reply]

Back to Meditations