Code review: Start and End markers to ensure Serial data is valid

stevieb has asked for the wisdom of the Perl Monks concerning the following question:

Changing it up a bit here. I'm doing some prototype code that will eventually be translated into C/XS, but I'm having some brain fart moments here trying to sort out whether my logic is, actually logical.

Because the distribution isn't available yet, it's not test-worthy. What I'm looking for is eyes on the logic itself, not whether it works or not. What's supposed to happen is this:

caller sends in a start data marker ([), an end data marker (]), a digit that represents the number of times the start/end markers should be seen before identifying we have data (3), and finally, a test "reset" identifier (!) for the transmitter to say "wait, reset that crap, let's start over"

I've got it working in basic test cases, but I'm wondering if some keen-eyed Monks can have a glance at the situation (I've left everything as simple-to-understand if() statements for this review) to see how I could expand on the assurance aspect of ensuring the receiver only gets good data, and continues to work if not.

Receiver (rx):

use warnings;
use strict;

use RPi::Serial;

my $s = RPi::Serial->new('/dev/ttyUSB0', 9600);

my $data;
my ($rx_started, $rx_ended) = (0, 0);

while (1){
    if ($s->avail){
        my $data_populated = rx('[', ']', 3, '!');

        if ($data_populated){
            print "$data\n";
            rx_reset();
        }
    }
}

sub rx {
    my ($start, $end, $delim_count, $rx_reset) = @_;

    my $c = chr($s->getc); # getc() returns the ord() val on a char* p
+erl-wise

    if ($c ne $start && ! $rx_started == $delim_count){
        rx_reset();
        return;
    }
    if ($c eq $rx_reset){
        rx_reset();
        return;
    }
    if ($c eq $start){
        $rx_started++;
        return;
    }
    if ($c eq $end){
        $rx_ended++;
    }
    if ($rx_started == $delim_count && ! $rx_ended){
        $data .= $c;
    }
    if ($rx_started == $delim_count && $rx_ended == $delim_count){
        return 1;
    }
}
sub rx_reset {
    $rx_started = 0;
    $rx_ended = 0;
    $data = '';
}
[download]

Sender (tx):

use warnings;
use strict;

use RPi::Serial;

my $s = RPi::Serial->new('/dev/ttyS0', 9600);

for (qw(x [[[hello]]] ! [[[world]]] a b)){
    # x     = no start marker seen yet, reset
    # [[[   = start data ok
    # hello = data
    # ]]]   = end data ok
    # !     = RX reset command
    # [[[   = start data ok
    # world = data
    # ]]]   = end data ok
    # a     = no start marker set, reset
    # b     = no start marker set, reset

    $s->puts($_);
}
[download]

Works fine in such a situation, here's the output after starting rx in one window, and running tx in another window after that:

$ perl examples/rx.pl
hello
world
[download]

I apologize for the lack of a testable example, but this is a thought-process-only kind of post I'm trying to get ideas from by eye at this time.

Thanks all,

-stevieb

Comment on Code review: Start and End markers to ensure Serial data is valid Select or Download Code

Replies are listed 'Best First'.
Re: Code review: Start and End markers to ensure Serial data is valid (updated) by haukex (Archbishop) on Aug 03, 2019 at 21:22 UTC
This is a really classic use for state machines. The more variables and code paths you introduce, the more complex your logic gets and the more cases you have to cover. Instead, the really basic state machine structure looks either like this: `my $state = 'idle'; while ( my $input = <> ) { if ( $state eq 'idle' ) { if ( $input =~ /FOO/ ) { $state = 'foo'; } elsif ( $input =~ /BAR/ ) { $state = 'bar'; } elsif ( $input =~ ... else { die "unknown input: $input" } } elsif ( $state eq 'foo' ) { if ( $input =~ /FOO/ ) { $state = 'foo'; } elsif ( $input =~ ... else { die "unknown input: $input" } } elsif ( $state eq ... else { die "internal error: bad state $state" } }` [download] Or like this: `my $state = 'idle'; while ( my $input = <> ) { if ( $input =~ /FOO/ ) { if ( $state eq 'idle' ) { $state = 'foo'; } elsif ( $state eq 'foo' ) { $state = 'foo'; } elsif ( $state eq ... else { die "internal error: bad state $state" } } elsif ( $input =~ /BAR/ ) { if ( $state eq 'idle' ) { $state = 'bar'; } elsif ( $state eq ... else { die "internal error: bad state $state" } } elsif ( $input =~ ... else { die "unknown input: $input" } }` [download] Here's a solution to parsing your example data that uses only the state variables `$state` and `$curdata` to hold state, with output being temporarily stored in `@outdata`. I didn't implement the handling of the `[]3` sequence or `!`, since those can be implemented as extensions of this pattern. Note that it takes a bit of discipline to stick to this pattern and to remember to cover all cases in all states. For example, if you really are making the number of opening and closing brackets variable, you'll need two more state variables: one to store the expected number of brackets, and one to count the number of seen brackets. You'll need to remember reset these variables in all the right places. `<update>` In fact, it'd be the most formal approach to always cover every input in every state (or every state with every input), and to set all the state variables in every branch of the code, and think about what the output should be for every branch of the code. Of course, that gets quite redundant, and if a variable hasn't changed, it doesn't need to be re-set. But the takeaway here is that one at least needs to think about it, and sticking with the formal structure above helps in doing so. `</update>` In the following code, I've opted to make three different start states and two different end states, which may seem wasteful, but often encoding a couple of extra bits into the state is the "safer" way to code a state machine, instead of adding yet another state variable (and if you draw out a state diagram, it'll be easier to understand). If all the string comparisons bother you, then use constants and use integer comparisons on `$state`. use warnings; use strict; use Test::More tests=>2; is_deeply parse_data('x[[[hello]]]![[[world]]]ab'), ['hello','world']; is_deeply parse_data('x[y[[z[[[[[[[[[foo]1]]2]]]3'), ['[[[[[[foo]1]]2']; sub parse_data { my @indata = split //, shift; ; my $state = 'await_start1'; my $curdata = undef; my @outdata; for my $c (@indata) { if ( $state eq 'await_start1' ) { if ( $c eq '[' ) { $state = 'await_start2' } } elsif ( $state eq 'await_start2' ) { if ( $c eq '[' ) { $state = 'await_start3' } else { $state = 'await_start1' } } elsif ( $state eq 'await_start3' ) { if ( $c eq '[' ) { $state = 'in_data'; $curdata = '' } else { $state = 'await_start1' } } elsif ( $state eq 'in_data' ) { if ( $c eq ']' ) { $state = 'got_end1' } else { $curdata .= $c; $state='in_data' } } elsif ( $state eq 'got_end1' ) { if ( $c eq ']' ) { $state = 'got_end2' } else { $curdata .= ']'.$c; $state='in_data' } } elsif ( $state eq 'got_end2' ) { if ( $c eq ']' ) { push @outdata, $curdata; $curdata = undef; $state = 'await_start1'; } else { $curdata .= ']]'.$c; $state='in_data' } } else { die "internal error: bad state $state" } } return \@outdata; } [download] In general, when designing serial protocols, there are two routes to take: packet-based and stream-based. Packet-based protocols usually transmit a header that often includes a fixed start sequence (so that synchronization can be reacquired if a byte is lost or inserted) plus length byte(s) (depending on the maximum packet size), and a checksum at the end. Stream-based protocols, like this one, usually just include some kind of start sequence (like your `[[[`), an end sequence, and hopefully a checksum at the end. Personally I also include line-based protocols in this category, they're just a slight variation on the stream idea. Stream-based protocols have the issue that the start sequence and any other special bytes need to be escaped (which is probably why you've made the number of `[`s variable?), while packet-based protocols don't have that issue - instead, they have the problem that if transmitted over a system where bytes can be dropped or inserted, reacquiring synchronization in such cases takes more code, although state machines can handle that too, of course. Note that there are of course other ways to escape data than to wrap a bunch of `[[[]]]`s around it - think quoted strings, for example `:-)` Update: Minor edits for clarity. Also, if your goal, as the title says, is to ensure the data is transmitted correctly, then you really should use CRCs. For short pieces of data over mediums that are unlikely to corrupt the data, even something as simple as a CRC-8 is better than nothing. Update 2: A couple more edits in the text with some more thoughts.	[reply] [d/l] [select]
Re: Code review: Start and End markers to ensure Serial data is valid by roboticus (Chancellor) on Aug 04, 2019 at 17:30 UTC
stevieb: At first, I was going to give a reply somewhat like haukex's, but use a table-driven state machine, but as haukex already mentioned state machines with implementations, I thought I'd just talk about how I try to ensure my state machine code has as few error(s) as possible. Heh, I intended to make it short, but it still rambled on a bit longer than expected. 8^P First, even for a simple state machine, I'll sketch it out on a piece of paper with plenty of whitespace, as I tend to draw/erase a good bit while refining it. I'll also make the nodes kinda big with some partitions in them, like this: `+-----------------------+ \| Node_Name (notes) \| +-------+--------+------+ \| Enter \| Remain \| Exit \| +-------+--------+------+` [download] Generally, I try to keep the Enter/Remain/Exit boxes empty, but those are the boxes I use to document special actions to take when entering the node from a different node, when remaining in the same node after consuming an event, or to perform immediately before exiting the node. Your code gets complicated if you have too many actions occurring at various times. So I start off by putting in the actions I expect to need where I think I'll need them. As I refine the machine, I'll try to remove things or move them to places that are more consistent with the rest of the machine to simplify implementation. I also allow edges to have actions, which I'll put in parenthesis, if I need them. So I tried to draw a machine for your problem and I got this far: +--------------------+ +--------------------+ \| Idle \|--[BEG]->\| Open \| +------+------+------+ +------+------+------+ \| \| \| \|<-[RST]--\|cnt=1 \| \| \| +------+------+------+ +------+------+------+ ^ ^ ^ \| ^ \| \| \| +-[RST]-+ [eps:cnt==3] \| [BEG(++cnt)] [RST] [eps:cnt==3] \| \| \| [END(--cnt)] \| (trim,publish) \| \| \| \| \| \| \| \| +-----+ +--------------------+ \| v \| Close \| \| +--------------------+ +------+------+------+ +--\| Data \| \|cnt=1 \| \| \|<-[END]--+-----+-------+------+ +------+------+------+ \| \|addChar\| \| \| ^ +-----+-------+------+ [END(--cnt)] \| [BEG(++cnt)] \| [else] \| \| \| +---------+ [download] (Whew! ASCII art is difficult to draw as well as ugly. I should really try to start learning the Perlmonks codebase and try to implement my dream extension: a limited SVG handler to allow simple text/line/box/circle drawings.) During and after drawing a machine, I like to imagine what the problems are at each possible state, and look for edges that aren't handled to see what implications arise. Thinking about the variables we want to use to capture the state and looking at what cases we want to handle when examining each edge, you can come up with various questions and/or test cases to think about. A few I came up with are: Are the BEG/END delimiters allowed to be inside the data? `"[[[data[more[data]]]"` ==?== `"datamoredata"` `"[[[data[[[]data]]]"` ==?== ( `"data[[[]data"` or `"data[[[data"` ) Do the BEG/END delimiters allow you to escape-out noisy junk? `"[[[data]junk[data]]]"` ==?== `"datadata"` After coming up and deciding on what to do about all the special cases, I then start to review the machine to see how to simplify it. By this, I don't necessarily mean to reduce the number of states and/or edges (though if possibilities arise, I'll certainly try to do so). Rather, I want to simplify the implementation of the machine. Having actions occurring in various places in the machine makes implementation confusing. For example, I have actions occurring on edges, and in the onEntry, and onRemain slots on some nodes. Sometimes a bit of thinking can let you move some actions around to reduce the number of places you have to think about actions. Another thing in a state machine that can cause complexity are "special" edge types, such as the "else" edge type (meaning anything but END/BEG on the Close state) and the epsilon nodes that activate if we're in a state (such as Open/Close) and a condition happens to be true. Sometimes you can simplify the code by adding edges/nodes to the state machine: it may make the machine a bit larger, but can let you put actions in more consistent locations and/or let you remove some custom edge types you may otherwise need. Ah, well, enough rambling on this. I've got a couple Raspberry Pi computers to bring on-line. BTW: Since you're the Perlberry Pi guy, is there a hardcopy option available or coming soon for "Programming the Raspberry Pi with Perl"? I'm not really an eBook sorta guy, I'd rather buy something I can put on the bookshelf. (Yeah, I frequently print PDFs so I can tuck 'em in my clipboard and read 'em at lunch or whatever.) Edit: Added code tags to test cases to prevent linkification. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.


Perl: the Markov chain saw
	PerlMonks