Re^3: Perl custom sort for Portuguese Lanaguage

Replies are listed 'Best First'.
Re^4: Perl custom sort for Portuguese Lanaguage by Galdor (Sexton) on Jul 08, 2020 at 06:38 UTC
sure. No there is no need to sort sections individually - merely strip all blank lines and all lines start with a hash - here are a few small samples (each is a separate *.dict file): `# Drink a cerveja\|beer a laranja\|orange a água\|water beber\|to drink o copo de vinho\|glass of wine o copo\|glass or cup o sumo\|juice` [download] and ... `# numbers zero\|zero um\|one dois\|two tręs\|three quatro\|four cinco\|five seis\|six sete\|seven oito\|eight nove\|nine dez\|ten ## 11 - 19 onze\|eleven doze\|twelve treze\|thirteen catorze\|fourteen quinze\|fifteen dezasseis\|sixteen dezasssete\|seventeen dezoito\|eighteen dezanove\|nineteen` [download] and even ... # time # DO NOT SORT! # the time o segundo\|second o minuto\|minute a ora\|hour # the day o dia\|day a noite\|night a madrugada\|early morning a manhă\|morning a tarde\|afternoon a noite\|night o meio dia\|midday a meia noite\|midnight # days of week a semana\|week o fim-de-semana (os fims-de-samana)\|week-end a Segunda-feira\|Monday a Terça-feira\|Tuesday a Quarta-feira\|Wednesaday a Quita-feira\|Thursday a Sexta-feira\|Friday o Sábado\|Saturday o Domingo\|Sunday # Months of the year o męs (os meses)\|month o ano\|year Janeiro\|January Fevereiro\|February Março\|March Abril\|April Maio\|May [download] so in the end they will all be "filtered" into one big dictionary output file - one per line in "pt dictionary" aphabetical order. They are kinda in markdown format so I could print each out individually without doing any sorting - but also enjoy benefit of having a "big database" of words to do word-tests, a personal dictionary, and cool stuff like that... If I want "sub-section" sorting as you suggest I will just break them out into separate files (I guess) ... Thanks !	[reply] [d/l] [select]
Re^5: Perl custom sort for Portuguese Lanaguage (updated x2) by haukex (Archbishop) on Jul 08, 2020 at 18:03 UTC
In that case it's fairly easy. I used Text::CSV to read the data file, but AFAIK it doesn't support ignoring comment lines. If you are certain your files are always going to be as simple as you showed, only two columns separated by `\|` and no `\|`s anywhere else, no quoted fields, etc., then it's also possible to parse the file manually with a regex, for example: `open my $fh, '<:encoding(UTF-8)', $filename or die "$filename: $!"; my @rows = map { /^([^\|]+)\\|([^\|]+?)$/ or die $_; [$1,$2] } grep { /\S/ && !/^\s#/ } <$fh>; close $fh;` [download] And then you can use `@rows` instead of `@$rows` in my example above. Update:* Minor simplification to code. Update 2: And soonix makes a good point that continuing to use Text::CSV is also most likely fine, since it's probably safe to assume that you don't have any actual data that starts with `#`.	[reply] [d/l] [select]
Re^6: Perl custom sort for Portuguese Lanaguage by hippo (Bishop) on Jul 08, 2020 at 20:04 UTC
I used Text::CSV to read the data file, but AFAIK it doesn't support ignoring comment lines. This works for me: `csv (in => 'quux.csv', filter => {1 => sub { !/^#/ }});` [download]	[reply] [d/l]
Re^7: Perl custom sort for Portuguese Lanaguage by haukex (Archbishop) on Jul 08, 2020 at 21:06 UTC
Re^8: Perl custom sort for Portuguese Lanaguage by choroba (Cardinal) on Jul 08, 2020 at 21:27 UTC
Some notes below your chosen depth have not been shown here
Re^8: Perl custom sort for Portuguese Lanaguage by soonix (Canon) on Jul 09, 2020 at 06:31 UTC
Some notes below your chosen depth have not been shown here
Re^7: Perl custom sort for Portuguese Lanaguage by Tux (Canon) on Jul 09, 2020 at 13:54 UTC
Re^8: Perl custom sort for Portuguese Lanaguage by haukex (Archbishop) on Jul 09, 2020 at 14:08 UTC
Some notes below your chosen depth have not been shown here


We don't bite newbies here... much
	PerlMonks