Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: How to walk through convoluted data?

by perl-diddler (Chaplain)
on Jul 21, 2021 at 20:26 UTC ( [id://11135275]=note: print w/replies, xml ) Need Help??


in reply to Re: How to walk through convoluted data?
in thread How to walk through convoluted data?

How/where should I post the files? I need to heavily trim them, but the two of interest would be the repomd.cml which has the cpeid in it and the names of the other xml files of the group. and the 'primary.xml' which has a list of all the rpms in the release.

out of the 4 repos released / day, (oss/non-oss/src-oss/src-non-oss) I've been using src-non-oss for recent test runs since it's the shortest. with repomd.xml at 8869 and primary.xml at 41033 bytes.


Vs. for 'oss', ( repomd's are about the same), but primary.xml varying alot depending on an individual update, but say, with the same date as src-non-oss, 162MB.primary.xml has 3.2M lines and 67370 different rpm descriptions.

From beginning of repomd.xml through its cpeid entry, and including the listing for the primary.xml file. I'll list here:

<?xml version="1.0" encoding="UTF-8"?> <repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http:// +linux.duke.edu/metadata/rpm"> <revision>1625990264</revision> <tags> <content>pool</content> <content>gpg-pubkey-3dbdc284-53674dd4.asc?fpr=22C07BA534178CD02EFE +22AAB88B2FD43DBDC284</content> <content>gpg-pubkey-39db7c82-5f68629b.asc?fpr=FEAB502539D846DB2C09 +61CA70AF9E8139DB7C82</content> <content>gpg-pubkey-307e3d54-5aaa90a5.asc?fpr=4E98E67519D98DC7362A +5990E3A5C360307E3D54</content> <repo>obsproduct://build.opensuse.org/openSUSE:Factory/openSUSE/20 +210710/i586</repo> <repo>obsproduct://build.opensuse.org/openSUSE:Factory/openSUSE/20 +210710/x86_64</repo> <distro cpeid="cpe:/o:opensuse:opensuse:20210710">openSUSE Tumblew +eed</distro> </tags> <data type="primary"> <checksum type="sha256">60ac248489df31c61277a6872279561730d27d51b3 +bb7d15368d75b69d1ac80c</checksum> <open-checksum type="sha256">d101bad38f3a987c9a790f927031cfcc68c15 +98b4d6f329447c6fe338cfb7128</open-checksum> <location href="repodata/60ac248489df31c61277a6872279561730d27d51b +3bb7d15368d75b69d1ac80c-primary.xml.gz"/> <timestamp>1625990264</timestamp> <size>18659084</size> <open-size>171435824</open-size> </data>

That gives me my distro version or date (the cpeid number) and the location of the first primary.xml file of rpms that have changed since "yesterday" (previous release).

The header and 1st package of a primary for an oss release are below:

<?xml version="1.0" encoding="UTF-8"?> <metadata xmlns="http://linux.duke.edu/metadata/common" xmlns:rpm="htt +p://linux.duke.edu/metadata/rpm" packages="66746"> <package type="rpm"> <name>2048-cli</name> <arch>i586</arch> <version epoch="0" ver="0.9.1+git.20181118" rel="1.11"/> <checksum type="sha256" pkgid="YES">310f3c8e912923da08eab8debafd6fc0 +3afe9e1ae97304bcd029658959e099d0</checksum> <summary>A CLI version of the "2048" game</summary> <description>2048 is a mathematics-based puzzle game where the playe +r has to slide tiles on a grid to combine them and create a tile with the number 2048 +. The player has to merge the similar number tiles (2n) by moving the ar +row keys in four different directions. When two tiles with the same number touch, they will merge into one.</description> <packager>https://bugs.opensuse.org</packager> <url>https://github.com/tiehuis/2048-cli</url> <time file="1616702669" build="1616702650"/> <size package="20045" installed="26081" archive="27080"/> <location href="i586/2048-cli-0.9.1+git.20181118-1.11.i586.rpm"/> <format> <rpm:license>MIT</rpm:license> <rpm:vendor>openSUSE</rpm:vendor> <rpm:group>Amusements/Games/Strategy/Other</rpm:group> <rpm:buildhost>lamb25</rpm:buildhost> <rpm:sourcerpm>2048-cli-0.9.1+git.20181118-1.11.src.rpm</rpm:sourc +erpm> <rpm:header-range start="5096" end="9153"/> <rpm:provides> <rpm:entry name="2048-cli" flags="EQ" epoch="0" ver="0.9.1+git.2 +0181118" rel="1.11"/> <rpm:entry name="2048-cli(x86-32)" flags="EQ" epoch="0" ver="0.9 +.1+git.20181118" rel="1.11"/> </rpm:provides> <rpm:requires> <rpm:entry name="libncurses.so.6"/> <rpm:entry name="libncurses.so.6(NCURSEST6_5.7.20081102)"/> <rpm:entry name="libtinfo.so.6"/> <rpm:entry name="libtinfo.so.6(NCURSES6_TINFO_5.0.19991023)"/> <rpm:entry name="libtinfo.so.6(NCURSES6_TINFO_5.7.20081102)"/> <rpm:entry name="libc.so.6(GLIBC_2.7)"/> </rpm:requires> <file>/usr/bin/2048-cli</file> </format> </package>

I'm NOT include most fields -- only ones I need for downloading and sorting.

I'm also only downloading archs useful to me. as determined by my constants section:

use constant RepoNames => qw(oss non-oss src-oss src-non-oss); use constant ArchNames => qw(noarch nosrc src x86_64); use constant RepoMDFile => 'repomd.xml'; use constant Wanted_Names => {qw(susedata 1 appdata 1 other 1 filelists 1 primary 1 appdata-ic +ons 1)}; use constant RType => { map { $_ => $_ } @{[RepoNames]} }; use constant Archt => { map { $_ => $_ } @{[ArchNames]} }; sub Repo_valid($) { my $p = shift if HASH $_[0]; ErV RType, shift } sub Arch_valid($) { my $p = shift if HASH $_[0]; ErV Archt, shift; } our @EXPORT; use mem(@EXPORT = (qw( RType Archt Repo_valid Arch_valid RepoMDFile Wanted_Names ) ) ); use Xporter;
Hopefully that gives at least a bit more context. Can add more later if wanted, but already feel like I'm overwhelming....

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11135275]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-19 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found