Parsing Exercise set

David Arnold has asked for the wisdom of the Perl Monks concerning the following question:

All,

I have a files of exercises written in latex that I'd like to parse. I am new to this so be kind.

The exercise occurs between

\begin{exer}
\end{exer}
[download]

pairs. Inside this pair are pairs:

\begin{exertext}...\end{exertext} delimiting the problem statement.

\begin{soln}...\end{soln} delimiting the problem solution

\begin{answer}...\end{answer} delimiting a backanswer that goes in the back of the textbook.

Finally, there are sporadic directions that are included betwen \begin{instructions}...\end{instructions} pairs. A short example:

\begin{instructions} In Exercises~\ref{2.nformst}  and~\ref{2.nformend
+} ,
given the function $\phi$, place the ordinary differential equation
$\phi(t,y,y')=0$ in normal form.
\end{instructions}

\begin{exer}
\begin{exertext}
\label{2.nformst} $\phi(x,y,z)=x^2z+(1+x)y$
\end{exertext}
\begin{soln}
$\phi(t,y,y') = t^2y'+(1+t)y = 0$ must be solved for $y'$.  We get
$$y'= -\frac{(1+t)y}{t^2}.$$
\end{soln}
\begin{answer}$y'= -\dfrac{(1+t)y}{t^2}.$\end{answer}
\end{exer}

\begin{exer}
\begin{exertext}
\label{2.nformend} $\phi(x,y,z)=xz-2y-x^2$
 \end{exertext}
\begin{soln}
 $\phi(t,y,y')=ty'-2y-t^2$ must be solved for $y'$. We get
  $$y'=\frac{2y+t^2}{t}.$$
 \end{soln}
\end{exer}

\begin{instructions}
In Exercises~\ref{2.1st}--\ref{exer2.1.6}, show that the given
solution is a general
solution of the differential equation. Use a computer or calculator to
sketch the solutions for the given values of the
arbitrary constant. Experiment with different intervals for $t$ until 
+you
have a plot that shows what you consider to be the most important
behavior of the family.
\end{instructions}

\begin{exer}
\begin{exertext}
\label{2.1st} $y'=-ty$, $y(t)=Ce^{-(1/2)\,t^2}$, $C=-3,-2,\ldots,3$
\end{exertext}
\begin{soln}
$y(t)' = -Cte^{-(1/2)\,t^2}$ and $-ty(t) = -tCe^{-(1/2)\,t^2}$, so
$y'=-ty$. \par
\exfig{ex2.1ans01}
\end{soln}
\begin{answer}\exfig{ex2.1ans01}\end{answer}
\end{exer}
[download]

Is there a module on CPAN which I can use to parse this document. My goal is to put all the exercise problem statements (delimited by \begin{exertext}...\end{exertext}) in a file, all the solutions in a second file, all the backanswers in a third file, etc.

Thanks.

Comment on Parsing Exercise set Select or Download Code

Replies are listed 'Best First'.
Re: Parsing Exercise set by jZed (Prior) on Apr 13, 2005 at 21:42 UTC
Perhaps LaTex::Parser.	[reply]
Re: Parsing Exercise set by jimbojones (Friar) on Apr 13, 2005 at 22:25 UTC
Here's a go without using a big regex. It looks for begin{} and end{} keys and stores the text between the two in an anon array in a hash indexed by that key. use strict; use warnings; #-- hash that will be keyed by Latex element my %in = (); while ( <DATA> ) { if ( /\\begin\{(\S+?)\}/ ) { #-- tell that we're in a block $in{$1}->{Status} = "in"; #-- add a new element to the anon array containing this info push @{$in{$1}->{Text}}, ""; } if ( /\\end\{(\S+?)\}/ ) { $in{$1}->{Status} = "pending"; } #-- now loop on all keys, see we are in that element. foreach my $key ( keys %in ) { my $status = $in{$key}->{Status}; if ( $status eq "in" \|\| $status eq "pending" ) { #-- add text to last element of the array $in{$key}->{Text}->[$#{$in{$key}->{Text}}] .= $_; } $in{$key}->{Status} = "out" if $status eq "pending"; } } #-- write it out. Here loop over all possible keys. Could be restricte +d to # 'exertext', 'answers', 'soln' foreach my $key ( keys %in ) { my $file = $key . ".tex"; print "Creating file '$file'\n"; open( FILE, '>', $file) or die "Cannot open file '$file' for writing: + $!\n"; #-- write each element of array my $i = 1; foreach my $text ( @{$in{$key}->{Text}} ) { print FILE "%% Starting Element $i\n"; print FILE $text; $i++; } close FILE; } [download] The output is instructions.tex soln.tex answer.tex exer.tex exertext.tex With nested tags, the sub tags are included in the given tag (so that exer.tex includes exertext.tex, soln.tex, answer.tex). If you ran it on a full doc, presumably "document.tex" would be the same as your input doc. - j	[reply] [d/l]
Re: Parsing Exercise set by davidrw (Prior) on Apr 13, 2005 at 22:12 UTC
Here's a quick & dirty regex method (note that it's destructive to $s, and that it wouldn't support additional tags, and the tag order matters, but hopefully it's a good start): `#!/usr/bin/perl use strict; undef $/; my $s = <DATA>; my @questions; while( $s =~ s! (?:\\begin{instructions}(.?)\\end{instructions}\s)? \\begin{exer}\s* \\begin{exertext}(.?)\\end{exertext}\s \\begin{soln}(.?)\\end{soln}\s (?:\\begin{answer}(.?)\\end{answer}\s)? \\end{exer} !!sx ){ push @questions, { instructions => $1, exertext => $2, soln => $3, answer => $4, }; } use Data::Dumper; print Data::Dumper::Dumper \@questions; __DATA__ #### place the sample input here ####` [download]	[reply] [d/l]
Re: Parsing Exercise set by chas (Priest) on Apr 13, 2005 at 21:36 UTC
It should be fairly easy to write a script to do what you want. For example, to collect the questions, I'd probably use an array @exertext, read the file line by line concatening as the lines are read; each time \begin{exertext} is matched, I'd start a new array element, and each time \end{exertext} is matched, I'd stop concatenating. Then, I could remove the 2 tags from the array element if I wished (but maybe they should be saved.) You can do the same for the solutions or answers. The only time you might have a problem is if, for example, \end{exertext} and \begin{soln} occur on the same line, but maybe that doesn't occur. So each exercise would be a new array element. Then you can easily print them to a new file (maybe with numbering or other formatting.) I've done things very similar many many times and rolled my own script. Sometimes something won't be quite right, and you have to make a small change to correct things, but that's in the nature of coding. I don't know of a convenient module to use, but if there is one, I'm sure you'll get the info in a reply soon. chas (Update: If the file is formatted in a very disorganized way, it becomes more difficult to extract the data, but that doesn't seem to be the case in the example you gave. Using a module is fine if there is an appropriate one, but sometimes it can take a while to install and understand, while a handrolled script might be written in 20 minutes. Also fixed an error.)	[reply]


Welcome to the Monastery
	PerlMonks