Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Parsing Exercise set

by David Arnold (Initiate)
on Apr 13, 2005 at 21:06 UTC ( [id://447586]=perlquestion: print w/replies, xml ) Need Help??

David Arnold has asked for the wisdom of the Perl Monks concerning the following question:

All,

I have a files of exercises written in latex that I'd like to parse. I am new to this so be kind.

The exercise occurs between

\begin{exer} \end{exer}
pairs. Inside this pair are pairs:

\begin{exertext}...\end{exertext} delimiting the problem statement.

\begin{soln}...\end{soln} delimiting the problem solution

\begin{answer}...\end{answer} delimiting a backanswer that goes in the back of the textbook.

Finally, there are sporadic directions that are included betwen \begin{instructions}...\end{instructions} pairs. A short example:

\begin{instructions} In Exercises~\ref{2.nformst} and~\ref{2.nformend +} , given the function $\phi$, place the ordinary differential equation $\phi(t,y,y')=0$ in normal form. \end{instructions} \begin{exer} \begin{exertext} \label{2.nformst} $\phi(x,y,z)=x^2z+(1+x)y$ \end{exertext} \begin{soln} $\phi(t,y,y') = t^2y'+(1+t)y = 0$ must be solved for $y'$. We get $$y'= -\frac{(1+t)y}{t^2}.$$ \end{soln} \begin{answer}$y'= -\dfrac{(1+t)y}{t^2}.$\end{answer} \end{exer} \begin{exer} \begin{exertext} \label{2.nformend} $\phi(x,y,z)=xz-2y-x^2$ \end{exertext} \begin{soln} $\phi(t,y,y')=ty'-2y-t^2$ must be solved for $y'$. We get $$y'=\frac{2y+t^2}{t}.$$ \end{soln} \end{exer} \begin{instructions} In Exercises~\ref{2.1st}--\ref{exer2.1.6}, show that the given solution is a general solution of the differential equation. Use a computer or calculator to sketch the solutions for the given values of the arbitrary constant. Experiment with different intervals for $t$ until +you have a plot that shows what you consider to be the most important behavior of the family. \end{instructions} \begin{exer} \begin{exertext} \label{2.1st} $y'=-ty$, $y(t)=Ce^{-(1/2)\,t^2}$, $C=-3,-2,\ldots,3$ \end{exertext} \begin{soln} $y(t)' = -Cte^{-(1/2)\,t^2}$ and $-ty(t) = -tCe^{-(1/2)\,t^2}$, so $y'=-ty$. \par \exfig{ex2.1ans01} \end{soln} \begin{answer}\exfig{ex2.1ans01}\end{answer} \end{exer}

Is there a module on CPAN which I can use to parse this document. My goal is to put all the exercise problem statements (delimited by \begin{exertext}...\end{exertext}) in a file, all the solutions in a second file, all the backanswers in a third file, etc.

Thanks.

Replies are listed 'Best First'.
Re: Parsing Exercise set
by jZed (Prior) on Apr 13, 2005 at 21:42 UTC
Re: Parsing Exercise set
by jimbojones (Friar) on Apr 13, 2005 at 22:25 UTC
    Here's a go without using a big regex. It looks for begin{} and end{} keys and stores the text between the two in an anon array in a hash indexed by that key.

    use strict; use warnings; #-- hash that will be keyed by Latex element my %in = (); while ( <DATA> ) { if ( /\\begin\{(\S+?)\}/ ) { #-- tell that we're in a block $in{$1}->{Status} = "in"; #-- add a new element to the anon array containing this info push @{$in{$1}->{Text}}, ""; } if ( /\\end\{(\S+?)\}/ ) { $in{$1}->{Status} = "pending"; } #-- now loop on all keys, see we are in that element. foreach my $key ( keys %in ) { my $status = $in{$key}->{Status}; if ( $status eq "in" || $status eq "pending" ) { #-- add text to last element of the array $in{$key}->{Text}->[$#{$in{$key}->{Text}}] .= $_; } $in{$key}->{Status} = "out" if $status eq "pending"; } } #-- write it out. Here loop over all possible keys. Could be restricte +d to # 'exertext', 'answers', 'soln' foreach my $key ( keys %in ) { my $file = $key . ".tex"; print "Creating file '$file'\n"; open( FILE, '>', $file) or die "Cannot open file '$file' for writing: + $!\n"; #-- write each element of array my $i = 1; foreach my $text ( @{$in{$key}->{Text}} ) { print FILE "%% Starting Element $i\n"; print FILE $text; $i++; } close FILE; }
    The output is
    • instructions.tex
    • soln.tex
    • answer.tex
    • exer.tex
    • exertext.tex
    With nested tags, the sub tags are included in the given tag (so that exer.tex includes exertext.tex, soln.tex, answer.tex). If you ran it on a full doc, presumably "document.tex" would be the same as your input doc.

    - j

Re: Parsing Exercise set
by davidrw (Prior) on Apr 13, 2005 at 22:12 UTC
    Here's a quick & dirty regex method (note that it's destructive to $s, and that it wouldn't support additional tags, and the tag order matters, but hopefully it's a good start):
    #!/usr/bin/perl use strict; undef $/; my $s = <DATA>; my @questions; while( $s =~ s! (?:\\begin{instructions}(.*?)\\end{instructions}\s*)? \\begin{exer}\s* \\begin{exertext}(.*?)\\end{exertext}\s* \\begin{soln}(.*?)\\end{soln}\s* (?:\\begin{answer}(.*?)\\end{answer}\s*)? \\end{exer} !!sx ){ push @questions, { instructions => $1, exertext => $2, soln => $3, answer => $4, }; } use Data::Dumper; print Data::Dumper::Dumper \@questions; __DATA__ #### place the sample input here ####
Re: Parsing Exercise set
by chas (Priest) on Apr 13, 2005 at 21:36 UTC
    It should be fairly easy to write a script to do what you want. For example, to collect the questions, I'd probably use an array @exertext, read the file line by line concatening as the lines are read; each time \begin{exertext} is matched, I'd start a new array element, and each time \end{exertext} is matched, I'd stop concatenating. Then, I could remove the 2 tags from the array element if I wished (but maybe they should be saved.) You can do the same for the solutions or answers. The only time you might have a problem is if, for example, \end{exertext} and \begin{soln} occur on the same line, but maybe that doesn't occur. So each exercise would be a new array element. Then you can easily print them to a new file (maybe with numbering or other formatting.) I've done things very similar many many times and rolled my own script. Sometimes something won't be quite right, and you have to make a small change to correct things, but that's in the nature of coding.
    I don't know of a convenient module to use, but if there is one, I'm sure you'll get the info in a reply soon.
    chas
    (Update: If the file is formatted in a very disorganized way, it becomes more difficult to extract the data, but that doesn't seem to be the case in the example you gave. Using a module is fine if there is an appropriate one, but sometimes it can take a while to install and understand, while a handrolled script might be written in 20 minutes. Also fixed an error.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://447586]
Approved by chas
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-19 13:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found