Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

ELISHEVA's scratch non-scratchpad

by ELISHEVA (Prior)
on Aug 14, 2009 at 03:30 UTC ( [id://788502]=largedoc: print w/replies, xml ) Need Help??

PmDev: Technical notes and documentation


Table of contents

  1. Data model
  2. Code base

PerlMonks Data Model

Note 1:Subtypes noted only when subtypes change behavior without adding database tables.
Note 2:Supertypes chosen based on data inheritance.
Note 3:Highlighted items are things I'm not sure about.

Table groupings

General questions

  • Ids on many nodes are marked "autoincrement" and yet the documentation implies that the value in that field is the same as the related "node". Is the auto_increment ever used? Or is the field always populated? If auto-increment is used when? If not, why marked auto-increment? Are there ever problems with the id overwriting the defined id? Or would that only happen if the node is saved with id=0 or null (as in mysql)?

  • Nodetype and maintenance appear to be handling things that could be implemented with stored procedures and view based permissions. Also why is the db schema information reproduced in a table? - has any thought been given to moving some of that functionality into the database?
  • The data model contains a lot of objects with attributes in multiple tables. Normally the integrity of these objects would be preserved by encapsulating updates in a transaction. Have node caching issues related to early versions of mySQL not supporting transactions and row level locking (see Re: Recent slowness and outage and following discussion) been resolved? If not, why not? If so, how?

  • What happened to the "symlink" table in the Everything Bible?

Design philosophy

From a design point of view the database tables in the PerlMonks database can be divided into two groups: database tables storing nodes and database tables storing "other information".

Node tables

A node is anything that can be directly created, edited, or deleted by a human being and is not merely the history of a site process (e.g. node approvals or consideration). There is a wide range of information that fits that description and all of it is stored in the database: metadata and integrity rules, root nodes and replies on SOPW, the images displayed in the header, the code that runs each box in the sidebar of a PerlMonks page, the standard layout of each page (header, footer, sidebar on the right), and much more.

Each node has an entry in the node table. This entry assigns the node a node id and node type. It also tracks essential audit information about the node: the owner and the last update date, for example. For many nodes this is not enough information so each node may also link in a 1-to-1 fashion to several additional tables. These tables store extra attributes: document text, an ad-hoc list of settings, additional users who may edit the node, the position of the node within a hierarchy of nodes, and much more.

The set of tables to which a node links is determined by its node type. The node type determines all of the data, integrity constraints, processing and display requirements of the node. It also controls who may add, update, or delete the node and its attributes.

Though normal PerlMonks users will never see a page "list nodetypes", the definition of a node type is also considered a node since a human being must define its rules. In fact it is the very first node created in the database (id=1) and its assigned node type itself. (If that sounds confusing, you might want to first twist your mind into a mobius strip and then reread the sentence). The available node types and their relationships one to another can be seen in Type Tree. You can use that page to click through to a detailed description of the node type and/or to list of nodes belonging to that node type.

Non-node tables

The second group of database tables, non-node tables, tend to store one of two types of data: (a) data about a node that have a 1-to-Many relationship to that node. (b) data about the history of a process. For example, when the text field of certain nodes contain links to other nodes, the PerlMonks parsing engine extracts those links and saves them in the links table.which nodes does this apply to? which text fields?. The entries in the links table don't count as nodes because they are never edited independently of the containing node's text field. Similarly edithistory records aren't nodes in their own right because they are the by-product of the node editing process and have no existence apart from the node and its editing process.

The records in non-node database tables don't have display pages. In some cases, you can view the database table contents using one of the queries in sqlquery.TODO: list sqlqueries for each non-node table.

Basic site structure

  • node: A unit of site content, code or meta data. The node stores attributes common to all units: a name, an id, audit trail information. In addition, for convenience, a few other attributes that are applicable most but not all units (e.g. a vote tally) are defined for each node.

    Subtypes:dbtable, file, pmmodule
    Subtypes with settings (see below for definition of settings): nodeball, theme

    • node_id: Primary key. Auto-incrementing id that identifies a node.
    • type_nodetype: the id of the node type, nodetype: nodetype.nodetype_id.
    • title: The title of the node
    • author_user: The author of the node.is this always the user who hit the create button? If not, when and how can it change?
    • createtime: The time the node was created.???how is this set???
    • nodeupdated:??? Is this the last update? Default is CURRENT_TIMESTAMP: is it reset after each update?
    • hits:???is this in use? How is "hit" defined: number of times requested? number of distinct IPs requesting? other?
    • reputation.??? is this the net number of up/down votes? If so, where are the separate up and down vote counts stored? Or is it calculated by comparing reputation to votes cast?
    • votescast: The number of votes cast for the node.
    • lockedby_user: The user who locked this node.??? Who can lock? Who can edit after a lock? Does the node have to be unlocked or can the user who locked it edit it without unlocking? How is "unlawful" locking prevented? How is the user identified? user.user_id? or by node.node_id of their user node???
    • locktime: The time when the lock ws applied.
    • core: ???
    • package: ???
    • postbonus: ???
    • ucreatetime: ???
    • node_iip: ???

  • setting: Extra information for a node that stores a collection of settings.

    Supertype: node
    Subtypes: restricted_setting

    • setting_id node.node_id for this node
    • vars a URL encoded hash containing variables for this node.as in Perl code for a hash???

  • document: Extra attributes for node that contains text. May be used either for site content or to define some part of the site's look and feel.

    Supertype: node
    Subtypes: superdoc,restricted_superdoc, sillyscript, pmdevsuperdoc,includefile
    perlquestion,perlmeditation,perltutorial, monkdiscuss,CUFP,poem,obfusticated, offtopicroot
    sitefaqlet, sitedoclet, alphafaqlet, perlfaq_nodetype
    testquestion,fullpage,perlexercise, superquestion,superquestionarea, strangedoc,strangenode,request, testtype,pmdevtopic,sectioncontainer, perlman,data,perlcraft
    Subtypes with settings: doclist,faqlist,tutlist,catqalist

    • document_id: Primary key. Auto-incrementing id that uniquely identifies a document.
    • doctext: text of document
    • lastedit: last edit to document???

  • links: Connects one node to another via a link embedded in a document node's text. stores links parsed from document text?

    • from_node: node.node_id of node containing the link.
    • to_node: node.node_id of link target.
    • linktype: ???
    • hits: ???
    • food: ???

  • note: Extra attibutes for a document that participates in a hierarchy of nodes. Both the order among siblings and the immediate parent are stored.

    Supertype: node
    Subtypes: perlsolution, categorized_question, categorized_answer,pmdevnote

    • note_id - Primary Key. node.node_id of the node.
    • parent_node - node.node_id of the parent node
    • position - position relative to siblings - does not appear to be in use as of 2009-08-12 (researched by jdporter an ELISHEVA via code search/grep and examination of node's htmlpages)
    • root_node - node.node_id of hierarchy root

  • nodegroup (concept): This is a conceptual building block representing a collection of nodes. The members of the node are stored in a database table, known as the "group table". There is no database table corresponds (1to1) to this entity other than node (defined above).

    Supertype: node
    Subtypes: nodegroup (not to be confused with the db table which stores the members of the group, not the group definition),nodeletgroup, usergroup,sourcecodesection,QandASection

  • nodegroup (db table): A table storing the members of various groups. data structure implies that a node may belong to more than one group, i.e. it supports M-to-M relationships between nodes. Is this true?

    • nodegroup_id: one field of a two field primary key. node.node_id of a group to which a node belongs.
    • rank: second field of two field primary key. The position of this node relative to other nodes in this group.
    • node_id: node.node_id of a node belonging to this group.
    • orderby: Everything bible says used internally when nodes are reorderd. How???

  • nodetype: Extra information for nodes that define a node "class". Nodes belonging to the same class can have their own rules for user permissions, an SQL table to store special attributes and a table to store group members. Nodetypes may also be arranged in a hierarchy by inheriting from other node types.

    Supertype: node

    • nodetype_id: ???
    • readers_user: ???
    • writers_user: ???
    • deleters_user: ???
    • restrict_nodetype: ???
    • extends_nodetype the node id of the superclass node typelinks to node.node_id?
    • restrictdupes: ???
    • grouptable: ???
    • updaters_user: ???

  • htmlcode: Extra attributes for a node that contains a perl script. The Perl script must return a string containing content to be inserted. Htmlcode may be inserted into nodelets and htmlpages by using PM markup. When the nodelet or htmlpage code is parsed, the string returned by the Perl script will replace the link. Htmlcode nodes can be used to encapsulate HTML generation that is shared by two or more htmlpages or nodelets.

    Supertype: node
    Subtypes:opcode,accessrule
    Subtypes with document and settings: sqlquery

    • htmlcode_id: Primary key. The node's node.node_id
    • code: a Perl script (maximum length 64K) that returns a string.

Site look and feel (themes)

The PerlMonks site's look and feel is defined using a combination of htmlpages,containers and a variety of components that can be embedded within them. Htmlpages define how to render content for a particular display type. Containers define the HTML that surrounds that content.

These components may be either free standing or bundled together into a theme. Free standing components define site wide defaults. Themes define a set of htmlpages, containers and HTML/CSS default attributes that can override the site-wide defaults to give the website a particular look and feel.

Theme components

  • rawpage: page delivered as is without any evaluationwhat are scripts used for? are they part of site maintenance? content? look and feel? what are rawpages used for (all the nodes have no content)?

    Supertype: Document
    Subtypes: css,Dynamic_CSS (0 nodes),script (gods only)

    • rawpage_id: Primary key. The node.node_id of a raw page node.
    • datatype. ???
    • lastedit: ???

  • htmlpage: Extra attributes for a node that determines how to render a GET request based on the nodetype of the requested node and the value of the display type parameter. The content of the requested node is passed through an evaluation process before being rendered (evalCode, evalX, and finally eval). The name is somewhat of a misnomer because (a) the page may contain more than just HTML markup and (b) because the final output may be any mime type, not just HTML. How is the HTML page determined from node id and displaytype? Is it always nodetypename whitespace displaytype?

    Supertype: Node

    • htmlpage_id: Primary key. The node.node_id of the htmlpage node.
    • pagetype_nodetype. The node.node_id of the node type which this html page is designed to render.
    • displaytype: The value of the display type parameter which this html page is designed to render.
    • page: the text to insert in the "parent_container". This text may be a mixture of plain text, static HTML, PM markup, and embedded Perl code. Must be < 64K. It is responsible for formatting the content of the selected node in a manner consistent with the selected code type. The resulting output should be suitable for insertion between whatever tags "parent_container" has placed around "CONTAINED_STUFF".
    • parent_container: the container into which the contents of "page" should be rendered once they have been evaluated.
    • ownedby_theme: the node.node_id of the theme to which this htmlpage belongs
    • mimetype: the mime type of the generated page. If left blank, the mime type will default to HTML.

  • nodelet: Extra attributes for a node that contains that generates refreshable content for insertion into a div, table row or other html tag defined by a container. This container is specified s part of the nodelet definition. Nodelets may be inserted into an htmlpage using PM markup. Can the content may be refreshed independently of the page?
  • Supertype: Node

    • nodelet_id: Primary key. The node's node.node_id
    • nltext: cached contents of "nlcode" after parsing. Must be < 64K.
    • nlcode: A mix of HTML, PM markup and embedded Perl code similar to the content of a container, htmlpage, or htmlcode node. Must be < 64K.
    • updateinterval: The refresh rate for regenerating the node content. Where possible the contents of "nltext" will be used in lieu of parsing "nlcode" afresh.
    • parent_container: the container into which this nodelet's content will be inserted.
    • nlgoto: ???
    • lastupdate: Timestamp storing the most recent update of these attributes. This will be set whenever any (?) attribute is changed, including "nltext".

  • container: Extra information for a node that defines instructions for wrapping raw content in HTML. Containers usually define a single tag and its contents, for example a table, table row, div, or even the entire content of an <html tag. Is there an easy way to tell where a particular container is used? The themes on PM seem to have few if any containers and yet the PM site has many, many containers. Are containers omitted from the themes because all defined themes share the same containers? Because containers need only be made part of a theme if the theme is exported and we don't have a reason to export themes? Other?

    Supertype: Node
    Subtypes: theme containerwhat ist this subtype used for?

    • container_id: Primary key. Stores the node.node_id of a container node.
    • context: instructions for wrapping raw content in HTML. The instructions may be a mixture of static HTML, PM markup, and embedded Perl code. The insertion point for the raw context is represented by the string "CONTAINED_STUFF". Text before and after this string is optional.
    • parent_container: node.node_id of the container node where this container node is embedded. Container nodes may be nested within other container nodes. When raw content is inserted into a container, the PM website engine checks to see if the container has a parent. If so, the container and its raw content become the "raw content" for the parent and are inserted in place of the "CONTAINED_STUFF" string. This process of insertion continues until a container that has no parent is reached.

    The container insertion hierarchy (as of 2009-07-31).

  • image: Extra attributes for a node containing a single image. The attributes contain both the image and values for the <img> tag attributes. Image nodes may be inserted into htmlpages and nodelets using PM markup.

    Supertype: node

    • image_id: Primary key. Auto-incrementing image id.Only 23 rows so maybe this stores graphics used on the site?
    • src: URL for full image?
    • alt: text description for accessibility?
    • thumbsrc: URL for thumb image?
    • description: Full description, notes about reason/role of image?

  • string: Extra information for a node containing static HTML content only. This node is used in a variety of ways by the cabal groups. The SiteDocClan) uses it to store definitions of common PerlMonks terms. The pedagogues use it to store links to external tutorials. Could/has string be used to store boilerplate text for htmlpages and nodelets? Why a special table for strings? why not make them a subtype of document? performance? oversight?

    Supertype: node
    Subtypes:docstring,faqstring,tutstring,catqastring,linktype

    • string_id: node.node_id for the node that defines the string;
    • text: the text of the string.

  • rawdata: Extra information for a node storing data with an unspecified use. The rawdata display page merely inserts it between [%...%] implying that it can be used to store Perl code and/or literals. However, data from these nodes are not usually displayed directly. Rather the data is retrieved and processed by Perl scripts or Perl embedded into htmlpage or htmlcode nodes. For example, user image browser inserts the contents of this node within an <img> tag.

    • rawdata_id: Primary Key. node.node_id of the rawdata node
    • datatype: possible values?
    • databytes: the actual data
    • lastedit: audit trails

Themes

  • theme: This is a conceptual entity representing a collection of attributes that define the look and feel of the site. There is no dedicated database table for themes. Themes store their attributes in the database tables node and setting.

    If data alone is considered, themes are a subtype of setting. However, the node type heirarchy defined in nodetype classifies themes as a subtype of nodegroup. what are the implications, if any, of this discrepency?

  • themesetting: Extra attributes for a theme that inherits default attributes from another "parent" theme.

    Supertype: setting

    • themesetting_id: node.node_id for the node that defines the theme;
    • parent_theme: node.node_id of the theme that sets defaults for this theme.

Site help system

  • nodehelp: this only has one row. Was it ever used? How was it meant to differ from "hint"?
  • hint: PerlMonks site documentation??? Appears to be annotations describing key site compnonts. Also appears not to have been maintained? Could this be used to add annotations to nodes defining nodetypes and dbtables? Or was it intended as a tooltip (e.g. height, width attributes)?

    Supertype: document

    • hint_id: node.node_id of a hint.
    • height: ???
    • width: ???
    • explains_node: node.node_id of node explained by hint.

  • perlfunc: local PerlMonks documentation for a Perl (pre 5.6) function. Now obsolete - current documentation is at http://perldoc.perl.org.
  • perlman: A conceptual entity storing local PerlMonks documentation for perl syntax (pre 5.6). Now cbsolete - current documentation is at http://perldoc.perl.org. It has no attributes of its own - all of its attributes may be found in the dbtables used by document.

    Supertype: document

Group editing with revision history

Nodes are normally edited only by the user or user group that created them. However, PerlMonks supports three modes of group editing: group owned nodes, moderated nodes and wiki nodes.

The following types of nodes are all group edited. Unless noted otherwise, they all track edit history.

Group edited nodes need special tables to track who can edit and the history of edits performed so far:

  • contributor: extra attributes for moderated nodes. Moderated nodes are nodes where the owner of the node differs from the author of the node. This permits a node to have two editors: the original author and members of the owning group.

    Supertype: node
    Subtypes:categorized question, categorized answer

    • contributor_id: Primary key. Links to node.node_id of moderated node.
    • original_author: the node.node_id of the user who created the node and is the primary author of its content.

  • wiki: Extra attributes for wiki nodes. Wiki nodes can specify separate reader and writer groups.

    Supertype: document

    • wiki_id - node.node_id of the wiki node.
    • readers - user group's node id? other?
    • writers - user group's node id? other?

  • edithistory: dbtable storing a history of changes to nodes that may be edited by a group rather than an individual. Edits to the site documentation and wikis described above always create entries in this table. Edits to any other node create entries in this table only if edited by a janitor using the node editors page.

    • row_id: primary key
    • starttext: Is this the text or the position?
    • endtext: Is this the text or the position?
    • fieldname: name of edited dbfield?
    • editor_user: the node.node_id of the user who made the edit.
    • edithistory_id: ???
    • edittime: the date and time of the edit.
    • private: ???

Site management

Database management

  • dbtable: conceptual entity storing the definition of a database table. This entity has no attributes of its own. Rather it uses the attributes defined for node.

    Supertype: node

  • dbcolumn stores information about each column in a database table.
  • maintenance: Extra attributes for a node defining a Perl script that creates or deletes records the database backing the PM website.

    Supertype: htmlcode

    • maintence_id: Primary key. The node's node.node_id
    • maintence_nodetype: the node type or datbase table created or deleted by this script
    • maintence_type: the maintence action

  • sqlquery: A node that defines an SQL query. It has no attributes of its own beyond those provided by the htmlcode, document, and setting database tables.

    Supertype: htmlcode, document, and setting

Codebase management

Aside from patch, the tools for managing site enhancements seem little used. How do we manage bug reports in PM? enhancements? It seems like the process is very ad-hoc and informal. Would we benefit from something more formal? (i.e. would it enable us to bring on/manage more volunteers?) If so, would we want to build on what is already in PM? Integrate PM with bugzilla? other?

  • devtask: Extra attributes for a node that defines an enhancement to the site. this doesn't seem to be currently in use. why?

    Supertype: document

    • devtask_id: node.node_id of a site enhancement.
    • status: possible values???
    • priority: possible values???
    • lead_user: who is lead? author? implementer?

  • bug: Extra attributes for a node defining a bug report. All the existing records appear to be blank aside from a joke entry and some stray very old (2000) bugs that had no assignment nor followup.deadwood?

    Supertype: node

  • patch: Extra attributes for a node defining a proposed patch to the look and feel of the PerlMonks website or one of its themes.

    Supertype: htmlcode

    • patch_id: node.node_id for the node that defines the patch.
    • for_node: node.node_id of the htmlpage, nodelet, or htmlcode to be patched.
    • field: the name of the field that will be updated by the patch, e.g. 'code' for htmlcode nodes, 'page' for htmlpage nodes, and 'vars' for settings nodes.
    • reason:reason for patch
    • applied:timestamp wehn patch was applied.
    • applied_by:node.node_id of user who applied the patch

  • nodepin

    <TODO>

Node approval process

There are a number of different tables here, but the tables seem to overlap in function. Which of these tables are actually in use at this point?

  • approved: stores a user's approval of a node action, e.g. keep, edit, reap. <scan class="highlight">does this table have a primary key? All I see is a non-unique index. Shouldn't node_id+user_id+timestamp be unique? (on the off chance that a node is consider by a user, unconsidered by a janitor and then reconsidered by the same user?
    • user_id: node.node_id of user who approved node_id.
    • node_id: node.node_id of node that was approved.
    • action: code identifying what action was approved.
    • tstamp: timestamp of user's approval

  • considernodes: dbtable storing the description of a janitorial action request (a.k.a. consideration)

    • considernodes_id: id of a janitorial action request. is this the node or a separate id (there is no node id on the consideration!)?
    • del: tally of reap votes
    • keep: tally of keep votes
    • edit: tally of edit votes
    • considertime:datetime when consideration was requested
    • description:reason for consideration

  • considervote: dbtable storing a user's vote on a janitorial action request (a.k.a. consideration)

    • considervote_id: part 1 of two field primary key, links to considernodes.considernodes_id.
    • voter_user: part 2 of two field primary key, links to node.node_id of user who voted
    • voted: vote of user: one of the following: 'edit', 'keep', 'delete'

  • editorvote: stores each janitor's final decision about a request for janitorial action.

    • editorvote_id: part 1 of 3 field primary key, links to considernodes.considernodes_id.
    • voter_user: part 2 of 3 field primary key, links to node.node_id of the janitor who voted
    • votetype: part 3 of a 3 field primary key, one of the following: 'edit', 'keep', 'delete'
    • weight: a weighting factor applied to this janitor's voteThis field appears to be ignored in editor_vote. Are there other places it is currently used?

  • reapednode: describes the reason and final vote tally for a reaped node. some of the data that Visit Reaped Nodes seems to be taking from "data" have their own fields (author_user, createtime, type_nodetype). Are these fields currently used or have they been deprecated/ignored in favor of the hash?

    • node_id: Primary Key. node.node_id of the reaped node
    • data: a hash storing the title, author, and text of the original reaped node.
    • author_user: ???
    • createtime: ???
    • reason: reason for reaping - taken from the description on the janitorial action request (considernodes)
    • del: delete tally - taken from the janitorial action request (considernodes)
    • keep: keep tally - taken from the janitorial action request (considernodes)
    • edit: edit tally - taken from the janitorial action request (considernodes)
    • type_nodetype: ???
    • reputation: the node reputation at the time of reaping.

  • request - a document node that stores an ad-hoc janitorial request. This can be used for requests that are too complex to be handled via the consideration mechanism and for requests by monks who do not yet have consideration powers. Requests have no attributes of their own. They use the same tables as documents to store their attributes.

    Supertype: document

  • approval: dbtable storing who and what approved a node. the approval nodelet uses "approved". Is this table in use? kept for historic reasons? If historic, why wasn't data converted over? Is there interest in doing it in the future?
  • approvalstatus: this has very few rows (38) and much the same data as approval and approved. Is it in use?
  • approvalhistory: this has very few rows (90) and much the same data as approval and approved. Is it in use?

Request management

Is cachedinfo currently in use? How? Or are we using another mechanism for caching site code?

  • cachedinfo: Stores pre-parsed and evaluated code stored in the node identified via "code_id"

Site statistics

Which of these stat tables are in use? dbstats+dbstattype+dbstatduration would be able to track all statistics but there are several other tables that capture specific stats with or without start time and duration. Which of these are in use if any? is dbstat in use?

  • dbstats: indivual stat keyed by stat type, start date, duration.
  • dbstattype: types of stats tracked by dbstats
  • dbstatduration: duration enum used by dbstats
  • version: tracks updates on cached nodes. See Everything/NodeCache.pm

  • HTTP_USER_AGENT: number of hits for each user agent type. Has no start date or duration(!)
  • referrer: number of hits and time of most recent hit for each referrer URL (limit of 128 characters). Has no start date or duration(!)
  • traffic_stats: number of hits per hour ???
  • hitsinfo: number of hits on a day ???
  • dailystatistics: total number of users and nodes hit. called "daily status" but has fields for 1,2,4 weeks?
  • stats: same as daily stats but adds total number of votes, ips, impressions, and xpsum

Site content

Simple content nodes

Simple content nodes inherit their data from document and store any extra attributes in a single database table named after the nodetype.

  • quest: Extra attributes for a node defining a time bound activity (has start and end time). The document text stores a description of the activity. what ever happened to quests? How are they used today (aside from polls)? Are the XP rewards still used (since XP was revised)?

    Supertype: document

  • sourcecode: Extra attributes for data added to the code catecombs.

    Supertype: document

  • Snippet:

    Supertype: node
    Subtypes:reaped_snippet

  • perlnews

    Supertype: document

  • review

    Supertype: document
    Subtypes:modulereview,bookreview

  • scratchpad: Extra attributes for a node that defines a user's personal scratchpad.

    Supertype: document

    • scratchpad_id - node.node_id of the scratchpad node.
    • foruser_id - node.node_id of the user who owns this scratch pad.does this always equal node author? if not, why?
    • private_text - scratchpad text that is visible only to the user foruser_id.

  • mail

    Supertype: document

Polls

<TODO>

Node stats

<TODO>

Search

<TODO>

User data

  • protouser:

    Does this store default user attributes? dead wood? - table has no rows and fields seem to identify per-user information which does not have any natural defaults. Or were these meant to be stored validation rules?

    Subtypes:zombie_user

  • newuser:

    what is the reason for a separate "newuser" table? Do new users have to be approved or verified (been a while :-))? If so what code defines the process?

    • title: does this become user.nick?
    • realname: does this become user.realname?
    • email: does this become user.email?
    • lastupdate: when/how is this field set, if ever after initial creation? Is this really a "create date"?
  • newuserchit:

    what is this? is it part of a scenario where email is sent to new user with digest; when email link is clicked through, account is activated and converted to user? If so, how is expiration time calculated?

    • email: is this the email from newuser.email?
    • digest: digset inserted into email link that new users must click? Is this really a create date?
    • lasttime: time when the digest was generated and new user email was sent?
  • newuserimage:

    is this deadwood (user images are now a level 5 privilege)? Or does it store a newly added user image (so that the image can be approved or history kept?) What was/is it used for? Also how does it join to user?

    • newuserimage_id: Primary Key.is this a user with a new image, e.g. user.user_id? Or is it an image id, e.g. image.image_id?

    • timestamp:is this the date/time when the user image was added/changed?

  • user:
    • user_id: Primary key. Auto-incrementing id that identifies a user.
    • nick: The public name of the user. Must be 26 characters or less.
    • passwd: The user's password. Visible only to the user. Modifiable by user via "Edit Profile".
    • realname: The real name of the user. Visible only to user. Modifiable by user via "Edit Profile".
    • email: The email address of the user. Visible only to the user. Modifiable by user via "Edit Profile".
    • lasttime: The date on which the user last visited. ???How is "visited" defined???
    • karma:???Experience points to date? Dead wood? Backwards compatibility???
    • experience:???Experience points to date???
    • votesleft:???Votes left today:???
    • votes: Total number of votes allowed per day.
    • voteavg:???Is this in use? Average number of votes used? Average of up and down votes???
    • imgsrc: URL for user image. Modifiable by user via "Edit Profile". Access/use of field is a level privelege.??? How is level privilege controlled? Integrity rule to prevent non-"" values? Refuse to render if set? Prevent access to edit???
    • lastupdate:??? Audit field? Set automatically anytime this db record changes? Or is this the last significant update???
    • givevotes: Y/N flag. ???How is this used? Flags accounts that can have votes(e.g. non-initiate)? Flags accounts that need a bump in number of votes???
    • user_scratchpad:???node id of user scratchpad???
    • secret:???Is this in use? Password question answer? Other???

User activity

<TODO>

PerlMonks Code Architecture

Defining a webpage

The web pages served up by the PerlMonks website are defined by nodes that store text with a combination of HTML and special perl monks markup. The PerlMonks markup supports both embedded perl code and inclusion of perl code defined in htmlcode nodes. There are several categories of nodes that define themselves using PerlMonks markup. You can click on the links in the list below to see all the nodes in each category.

  • superdoc - entry point for HTTP requests. Superdocs are responsible for generating their own content.
  • htmlpage - entry point for HTTP requests that display a selected node's content.
  • fullpage - seems to be mostly tickers. When would one use this rather than a superdoc or htmlpage? Is this a page with neither display type nor node id?
  • nodelet - defines a subregion of a page.
  • container - defines HTML for standard headers, footers, and sidebars for each page or subregion (nodelet)
  • script-gods only

PerlMonks markup

Perl code is embedded into the above nodes using special perl markup:

["perl string"]
evaluates "perl string" as a Perl interpolated string and inserts the result in place of the markup.
[% any perl code %]
evaluates the perl code between [% and %] ???and inserts the return value in place of the markup.???
[{ code_call }] and [{code_call:param1,param2,...}]
evaluates a script stored in an htmlcode nodes and inserts the return value in place of the markup.

Perl version and configuration

The PerlMonks website is currently using perl-5.8.3. Full details of the configuration of the PerlMonks version of Perl may be found in Config.pm. Have there been any changes to it that will be lost when/if Perl is regenerated? Are we running our own custom compiled version of Perl?

Permitted modules

Any module included in the 5.8.3 core may be used in embedded Perl code. In addition, any class or function defined in a PM module object may be used in code embedded in the PerlMonks website. A full list of the available modules may be found here.

The modules that are displayed in the above link are in fact the files that are being used by the PerlMonks website engine. pmmodule nodes are displayed using the pmmodule display page htmlpage node. That node calls handle_pmmodule_display which calls readPmModule which searches for the module path in %INC and loads the module's text via do. Since we are reading in the module via do, we must be able to safely reload all of the pmmodule files. Has this constraint ever caused problems? Is this a bug waiting to happen if we should wish to move some of the code in htmlcode fragments into pmmodule file subroutines? Might it be better to read the file in line by line like a normal text file?

These modules come from a variety of sources: CPAN, the original everything engine, and the PerlMonks website development effort. (jdporter: Everything/HTML.pm is particularly important).

Data and Perl code stored in the database

The htmlcode node type has many subclasses. Nodes from any of the following categories may be used with this markup - to see a full list of nodes in each category, click on each link:

In addition to htmlcode nodes, the following node types can also embed code or data into a web page definition:

  • css - defines css styles used by HTML embedded in htmlpages, nodelets, and containers
  • data - text files with one row of data per line.
  • image - images for embedding in text.
  • rawdata - can be used to store pure perl code and/or byte data for images. Embedded Perl code can retrieves this data and insert it into generated HTML and PM markup. At present, it is most often used to store user images.

what are these used for: file (0 nodes), includefile (1 node, viewable only by gods), rawpage (6 nodes, mostly empty or with what looks like garbage)

Database engine

The PerlMonks website stores data in a variety of ways, not all of it accessible to a relational database engine. For example, significant amounts of metadata are stored as rows in database tables rather than in system catalogs. Also, some fields contain denormalized data and repeating groups that are difficult for SQL engines to work with. Additionally, at the time that the engine was originally designed, open source relational databases lacked built-in support for many features needed by the PerlMonks data model: row level locking and transaction management, to name two.

As a result of these factors PerlMonks data management is split between the built-in features of the database and wrapper modules that supplement or provide an alternative implementation of the built-in database management support. This custom database management can be compiled separately from the PerlMonks webserver by placing the following files in the @INC path:

  • Everything.pm: provides a routine to initialize the database connection, defines some convenience functions, and a global variable storing the database connection.
  • Everything/DBI.pm: provides a method to make the database connection and wraps the Everything.pm and Everything/Nodebase.pm query methods in logging code. Part of the logging code's responsibility is to track query statistics. This is done with the help of the Everything/DbStats.pm module.
  • Everything/Password.pm: required by Everything/DBI.pm when making the database connection.
  • Everything/DbStats.pm: database functions that retrieve and update query statistics. These are called by the Everything/DBI.pm wrappers.
  • DBI: CPAN module defining the interface to the database. This module isn't in the list of pmmodules. Why?

  • Everything/NodeBase.pm: Defines the Everything::Nodebase class. This class objects representing data connections and provides CRUD and metadata access to the database. It also manages caching of database nodes with the help of Everything/NodeCache.pm and Everything/CacheQueue.pm
  • Everything/Experience.pm - additional database functions to get or change node reputation, a user's level, a user's experience points, their votes on a node, or the number of unused votes. Reputation and experience points play a role in determining rights to create, update, and delete certain kinds of data or belong to certain user groups. They are an essential part of the database engine.
  • Everything/NodeCache.pm: defines the Everything::NodeCache class. This class holds nodes that have already been retrieved from the database.
  • Everything/CacheQueue.pm: defines the node queue used by Everything/NodeCache.pm. This module is a general utility and not specifically related to database processing. It is included in this list because the PerlMonks database engine cannot be compiled unless it is in the @INC path.

The following modules are sometimes used by the database engine but are required at runtime by certain branches of the code. They are only needed if those code branches are traversed and do not affect compilation:

  • Everything/HTML.pm - this is used in two places: data integrity enforcement and permissions management (see isApproved(). Before creating or deleting nodes, the database engine looks for integrity rules stored in maint nodes. The code in those nodes is executed by passing it through Everything::HTML::embedCode(...).

    Permissions management uses Everything/HTML.pm in two different places. First the current user and cached approval information is stored in Everything::HTML::HTMLVARS. Second, Everything::HTML is pre-required before loading and running access rule code as a convenience to access rule authors. Although, the access rule code itself is executed directly without passing it though embedCode, the methods in Everything/HTML.pm are frequently used by the access rule code, so this module is "required" before calling that code so that access rules don't have to do it themselves. A list of access rules may be found here. is this really necessary? why? seems like a violation of separation of concerns? Access rule code doesn't get passed through embedCode so why does maintenance node code?

  • Everything/CGI.pm - required when Everything/HTML.pm is loaded.
  • CGI.pm - required by Everything/HTML.pm.

Database engine questions and food for thought

  • what would be the performance benefit of splitting out the database management code and placing it on the same machine as the PM database (i.e. using it as a front end to database connections). Would that create more or less of a bottleneck? Would it improve security? What about with a persistant connection between the db server and webservers? We have to go to the server anyway to check to see if a node is dirty, so what are we gaining by having separate node caches on each Apache server machine? Or does splitting things up just make everything too complex to be worth the performance benefit, if any
  • Is there a way to support two way communication between dbserver and webservers? What if the dbserver kept track of which server was caching which node and broadcast the fact that a node was dirty. Is there a lightweight way to do this? The actual number of nodes changing is a relatively small portion of the whole(nodes voted on, nodes edited, users involved in voting).
  • Wikipedia/mediawiki uses a master/slave database arrangement. Could we benefit from something like that? Is there existing Perl code that does it? Could we port the MediaWiki code?
  • What is the current use of dbstats? If for site analytics, why are we doing our own? Any thought about integrating more complete third party tools? If for optimization, would it make sense to move this to the node table or version table? How many joins/queries would it save?
  • Does caching support invalidation of selected tables? (I'm guessing not since node retrieval is either node table only or all tables). Would we benefit by an implementation that could invalidate only part of a table? What would that mean? Would there be situations where integrity rules spanned more than one table? Should there be? (i.e. does that indicate a design problem?)
  • A lot of data is stored in setting records. Could we reduce the number of joins by moving this data into table fields?

SQL and persistance engines

The Perl Monks website currently uses mysql 5.0.45-log. Table types are MyISAM by default. For details about the configuration, you can run the query show sql variables.

There is also an unofficial port of the PM website to SQLite written by Corion.

Database connection management

$DB
Declared in Everything.pm. Global variable providing database access. The variable stores a reference to a Everything::NodeBase object created by initEverything.
initEverything($sDb, $bUseStaticNodetypes)
Declared in Everything.pm. The parameters to this function are passed without modification directly to Everything::Nodebase->new(...). What calls this function?
$sClass->new($sDb, $bUseStaticNodetypes)
Declared in Everything/Nodebase.pm. Creates a database connection to a database named $sDb. $sClass must be Everything::Nodetype or one of its subclasses. $bUseStaticNodetypes is a flag that determines caching behavior. If true, nodetypes will be cached the first time they are retrieved and changes to nodetypes will not take effect until the server is restarted. If false, changes to nodetypes take effect immediately.
$DB->getDatabaseHandle()
Defined in Everything/NodeBase.pm. Provides access to raw uncached database connection.
$dbh (deprecated)
Defined in Everything.pm. Global variable equivalent to $DB->getDatabaseHandle()

Nodetype management

Nodetypes supplement the metadata stored in the DBMS's data catalog with metadata for specific rows of nodes. The DBMS for which the PerlMonks database engine was originally designed did not provide all of the support needed. One needs to be aware of this history because some of the information stored in nodetype records would normally be stored directly in a modern database system.

Nodetypes are just regular nodes so most (but not all) the node management routines are used with nodetypes. The methods included in this section are particular to nodetypes and do not apply to general purpose node management.

$DB->getType($idOrName)
Defined in Everything/NodeBase.pm. Returns a hash containing all the attributes of a node type.
$DB->getAllTypes()
Defined in Everything/NodeBase.pm. Returns an array of hashes, one for each nodetype defined in the system. The individual hashes are returned by getType()
$DB->deriveType($hType)
Defined in Everything/NodeBase.pm. Calculate default values for nodetype attributes using inheritance. Called automatically by getType
$DB->getNodetypeTables($hType)
Defined in Everything/NodeBase.pm. Cleans up list of a nodetype's tables. Called automatically by deriveType
$DB->isGroup($hType)
Defined in Everything/NodeBase.pm. Returns true if $hType has a group table. Is it always true that a nodetype that is a subclass of NodeGroup (name?) also has a group table? How is this enforced? By virtue of inheritance? And if something has a group table how do we make sure that it also a subtype of NodeGroup (and do we even care)?

Permissions management

$DB->isApproved($hUser,$hGroupOrAccessRule,$hNode, $bIgnoreGod)
Defined in Everything/NodeBase.pm. Returns true if user $hUser can access node $hNode for the purposes defined in access rule $hGroupOrAccessRule. Normally, gods are authorized automatically, but if $bIgnoreGod is true, then user $hUser's access will be assessed based on their non-godly group memberships.
$DB->getLevel($xUserHashOrId)
Defined in Everything/Experience.pm
$DB->isGod($hUser, $hType)
Defined in Everything/NodeBase.pm. Returns true if user is in the gods group.
$DB->canCreateNode($hUser, $hType)
Defined in Everything/NodeBase.pm. Returns true if user $hUser can create a node of type $hType
$DB->canDeleteNode($hUser, $hType)
Defined in Everything/NodeBase.pm. Returns true if user$hUser can delete a node of type $hType
$DB->canUpdateNode($hUser, $hType)
Defined in Everything/NodeBase.pm. Returns true if user $hUser can update a node of type $hType
$DB->canReadNode($hUser, $hType)
Defined in Everything/NodeBase.pm. Returns true if user $hUser can read a node of type $hType

Node management

This section contains methods used to retrieve and generate data for node records. Special methods are needed because of the complex way node data is stored. Node attributes may be split among several different tables with 1-to-1 relationships to the node table. Additionally, some nodes are collections of nodes, or even deeply nested node hierarchy.

what about nodetype additions, updates and deletes? It seems that the general purpose node insert,update,and nuke routines are being used for nodetypes (needs to be verified by looking at the htmlpage/maintenance/htmlcode nodes for nodetype). If so, how are the differences between non node-type and node-type nodes being accounted for?

  • If we really do need to get all data from all 1-to-1 records associated with a node, would it speed things up or improve our ability to manage dev permissions if we generated SQL views that joined all 1-to-1 tables when we created each nodetype? MySQL view optimization was pretty underdeveloped in 2007, but other DBMS do a better job and even mySQL may have improved by now. Note: most of the problems seem related to the point where temporary tables kick in (since one can't use indices after that point - EXPLAIN can be used to see where that is happening for a particular view or query.
  • The $hData parameter quotes values unless less the key is preceded by '-'. Why is this necessary? Couldn't the data dictionary be responsible for deciding if a raw value needed to be quoted? And if so, wouldn't a data dictionary quoting process be more reliable?
  • SQL statements are prepared and then executed even when it would make sense to batch them? DBI doesn't seem to support this but mySQL does - perhaps we add such a batch process to the Everything::Nodebase? or Everything::DBI?
Retrieve and generate node data (public methods)

Although you can use these node retrieval routines with nodetype nodes, it isn't recommended because non- nodetype and nodetype nodes need to derive different data for each node. If you use one of the node retrieval routines to retrieve a node type you will get only part of the node information needed. To get the remainder, one would also need to pass the node id or hash to getType()

$DB->selectNodeWhere($hWhere, $hType, $orderby,$bNodeTableOnly)
Defined in Everything/NodeBase.pm Returns a reference to an array containing all node ids that match the criteria stored in $hWhere. For an explanation of the parameters see getNodeCursor
$DB->getNodeWhere($hWhere,$hType,$orderby)
Defined in Everything/NodeBase.pm Returns a reference to an array containing fully populated node hashes for all the nodes that match the criteria stored in $hWhere. For an explanation of the parameters see getNodeCursor
$DB->getNode($sTitle,$xNodetypeTitleOrHash)
Defined in Everything/NodeBase.pm. Retrieves a node hash by title+nodetype or by node_id. All data for the node will be retrieved.
$DB->getNodeById($id,$mode)
Defined in Everything/NodeBase.pm Returns the node hash for the node identified by $id. $mode may have one of the following values:
  • force: retrieves node from database even if node is already cached.
  • light: retrieves a reference to the node's nodetype hash.
  • "" or undef: adds extra table and a list of nodegroup members.
$DB->isNodetype($hNode)
Defined in Everything/NodeBase.pm. Returns true if $hNode belongs to the nodetype class.
Retrieve and generate node data (internal methods)
$DB->constructNode($hNode)
Defined in Everything/NodeBase.pm. Adds 1-to-1 table data and group members to a node hash that only has data from the Node table.
$DB->getRef($aIdOrHash)
Defined in Everything/NodeBase.pm Converts each member of $aIdOrHash from an id to a hash reference, if it is not already a hash reference.
$DB->getNodeCursor($hWhere,$hType,$orderby,$bNodeTableOnly)
Generates a query that joins all of the tables storing node attributes into a single view. $bNodeTableOnly is true if only the fields in the node table should be retrieved. If false or missing, then all fields in all tables with a 1-to-1, the relationship to the node will be returned. For an explanation of the remaining parameters, see genWhereString.
  • This method uses an all or nothing approach to node data retrieval. Would there be any benefit to allowing JIT retrieval from individual tables?
  • All 1-to-1 tables are joined via LEFT JOINS. Can we replace some or all of these with equijoins? Would it improve performance? If some tables are optional and some are required we could use a flag on the nodetype table and only use left joins for the optional tables. Or could we replace *all* joins with equijoins? Or would that cause problems when new tables are added for existing nodes (we'd need a routine to go and backfill records for existing nodes in the nodetype.
$DB->genWhereString($hWhere, [$hType], [$orderby])
Defined in Everything/Nodebase.pm Generates a where clause from a set of conditions defined in $hWhere. $hWhere is a hash reference whose keys are field names and whose values store a constraint on the value of that field. The generated where clause will "and" together each of the field constraints. This is not intended to be a general purpose where clause generator, so only a limited number of commonly needed conditions are supported.

If the key begins with '-', then the value is expected to be prefixed with some sort of comparison operator. The condition will be constructed by stripping the '-' from the key and the comparison operator from the beginning of the value:

  • '-X' => '! Y' becomes "X != Y"
  • '-X' => 'not ...' becomes
    "X not ..."
    . e.g. "X not in ...", "X not between ...", "X not like ...", "X not regexp ...".
  • '-X' => '= Y' becomes "X = Y"

If the key begins with anything else then the reference type of the value determines how the condition will be built:

  • when the value is an array reference, the contents are presumed to be node ids or things that can be converted easily into node ids using getId. These are combined into a set of "or"'d conditions. Would the query be faster if instead we generated "X IN (id1,id2,id3,...)" instead of a bunch of or statements that the optimizer may or may not realize constraint the same field?
  • when the value is a scalar, the contents are presumed to be a string that needs to be quoted with quote

The $hType parameters stores a hash containing the field values of a node type. If defined, a condition to limit the query to a particular nodetype will be added to the query. The $orderby parameter is the clause that follows an ORDERBY keyword. If this is defined, an ORDERBY clause will be appended to the end of the where clause.

  • I wonder if it would be a good idea to factor out the node specific from the non-node specific code. Most of this method is pretty generic - only the treatment of array reference values is specific to node management.</li
  • In its current form genWhereString doesn't seem to be used more than once in the everything code. What about in htmlcode nodes and other forms of embedded Perl?
  • This method has side effects if $hWhere contains an array reference value: it will convert any hashes in that value to node ids!!! Where is this used and why hasn't it been a problem so far? Should it be changed (IMHO this side effect is asking for bugs since the programmer has to be more aware of how the data in $hWhere will be used before and after. On the other hand, the database engine libraries seem to be written to treat ids and hashes as equivalent, converting back and forth as needed, so maybe this really isn't an issue after all.)
$DB->loadGroupNodeIDs($hNode)
$DB->selectNodeGroupFlat($hNode)
Defined in Everything/NodeBase.pm. Calls <a href="flattenNodeGroup> flattenNodegroup($hNode,{},[])
$DB->flattenNodeGroup($hNode, $seen,$array)
Defined in Everything/NodeBase.pm. Returns an array reference storing all leaf nodes of the group member tree (or subtree) of which $hNode is the root Any reason not to use a visitor pattern here? Actions that require traversal of a group tree would be less memory and processor intensive if they used a visitor pattern because they can stop traversal as soon as the answer is found. Also if a subroutine reference is used as the visitor we can easily support situations where the whole list all at once really is needed.</scan>
getId($aIdOrNodeHash)
Defined in Everything/Nodebase.pm Scans $aIdOrNodeHash for any elements that are hashes and replaces them with the node id. The hashes are presumed to be node hashes.
$DB->getCache()
returns the node cache. The node cache stores nodes that have been retrieved from the database and remain unchanged. The cache's contents are managed by the other node retrieval functions. Access to the cache is provided primarily for monitoring purposes.
Node reputation management (public methods)

The actions a user can take depend on the user's number of experience points. Experience points are made or lost by three kinds of actions: voting on nodes, writing nodes (which other users vote on), and simply showing up and reading nodes. See Voting/Experience System. Thus reputation management system and its database support is a core function of the database engine and its permissions management.

$DB->hasVoted($xNodeHashOrId,$xUserHashOrId)
Defined in Everything/Experience.pm. Returns a hash containing the vote a particular user has made on a particular node. For more information about data stored for each vote, see the database table for vote
$DB->getVotes($xNodeHashOrId,$xUserHashOrId)
Defined in Everything/Experience.pm. Returns an array reference containing information on each up or down vote made on a node. Each array element stores a hash reference describing the user, ip address, and direction and weight of the vote. If $xUserHashOrId is defined, the query will only return the votes for that user.
$DB->allocateVote($xUserHashOrId,$iVoteCount)
Defined in Everything/Experience.pm. Updates the unused vote code of user $xUserHashOrId so that it equals $iVoteCount. Votes are allocated by daily rounds of the votebot.
$DB->castVote($xNodeHashOrId,$xUserHashOrId,$weight,$ip)
Defined in Everything/Experience.pm. Decrements the unused vote count of user $xUserHashOrId and adds (or subtracts) one from the vote count of the node $xNodeHashOrId.
Node reputation management (internal methods)
$DB->insertVote($xNodeHashOrId,$xUserHashOrId,$weight,$ip)
Defined in Everything/Experience.pm. Called by castVote to do the actual creation of a vote record.
$DB->adjustRep($xNodeHashOrId,$iRepChange)
Defined in Everything/Experience.pm. Called by castVote to adjust the vote count for the node. Each node keeps an accumulated total of votes so that the individual votes don't have to be counted each time we want to display the total vote count.
$DB->adjustExp($xUserHashOrId,$iRepChange)
Defined in Everything/Experience.pm. Called by castVote to adjust the experience level of the node's author and of the voter $xUserHashOrId.
Create, modify, or delete nodes (public methods)
$DB->replaceNode($hNode,$hUser,$hData)
Defined in Everything/NodeBase.pm. Inserts $hNode into the database if it is a new node and updates it if it is an existing node. Why do we need the "-key" form in data? do we use it? can't we always put the raw value in $hData and use the SQL schema to determine whether or not to pass the value through quote?
replaceNodegroup($xNode,$xMember,$xUser)
Defined in Everything/NodeBase.pm. Generates and executes an SQL query to replace all the nodes in $xMembers into the group table of node $xNode. The replace is done by first deleting and then reinserting the nodes. The parameters are the same as insertIntoNodegroup
$DB->nukeNode($hNode,$hUser,$bNoTomb)
Defined in Everything/NodeBase.pm. Performs a cascading delete on a node object. It deletes records for a node from the node,links,nodegroup, and considernodes as well as any tables associated with the nodes nodetype (attribute tables, group member table)
  • What would be the benefit of moving this to the database via declared FK relationships and stored procedures?
  • What are tomb's used for? or is this method always called with $bNoTomb=1?
  • Is this node list complete or are there other tables that also need to be cleansed if we delete a node? (check data model for possibilities).
Create, modify, or delete nodes (internal methods)
insertNode($hNode,$hUser,$hData)
Defined in Everything/NodeBase.pm. Generates and executes an SQL query to insert a node. This method will fail if the node exists.
updateNode($hNode,$hUser)
Defined in Everything/NodeBase.pm. Generates and executes an SQL query to update a node. This method will fail if the node does not exist.
insertIntoNodegroup($xNode,$xUser,$xMembers,$orderby)
Defined in Everything/NodeBase.pm. Generates and executes an SQL query to insert all the nodes in $xMembers into the group table of node $xNode. $xNode is the id or hash for group. $xMembers may be a single node id, a node hash or an array of node hashes and ids. $xUser is the id or hash for the user trying to do the insert. $orderby is used to position the new nodes within a group of existing nodes.How does this method work when multiple nodes change the value of their orderby field? There seems to be a bug in the current implementation when more than one node changes its orderby field value, but I don't remember the details of what happened (though I think a node disappeared!)
removeFromNodegroup($xNode,$xMember,$xUser)
Defined in Everything/NodeBase.pm. Generates and executes an SQL query to remove $xMembers from the group table of node $xNode. $xNode and $xUser are defined as for insertIntoNodegroup. $xMember is id or hash of the group member to be removed.
$DB->nodeMaintenance($nodeIdOrHash, $op)
Defined in Everything/NodeBase.pm. Retrieves and executes content of a "maintenence" type node. The code is retrieved by calling <a href="#getMaintenanceCode> getMaintenanceCode. It is executed by calling Everything::HTML::embedCode. For an explanation of parameters, see getMaintenanceCode. What do we gain by passing this to embed code rather than executing it directly? Do maintenance nodes embed calls to htmlcode nodes? Or use any feature that would require using embedCode? Could we make things more efficient by moving all of the maintenance code to modules? If we want to segregate custom nodetypes from core engine nodetypes, do we really need a database to do it? Why not just load and write files from a special reserve place in the file system, ideally connected to a VCS?
$DB->getMaintenanceCode($nodeIdOrHash, $op)
Defined in Everything/NodeBase.pm. Retrieves the content of a "maintenance" type node storing the Perl code that must be run before executing database operation $op for nodes belonging to the same type as $nodeIdOrHash. $op can be "create", "update", or "delete" though in practice only the "create" and "delete" actions are currently used. (The code that triggers a call for the "update" operation has been commented out.)
$DB->tombstoneNode($hNode,$hUser)
Defined in Everything/NodeBase.pm. Takes a Data::Dumper snapshot of the node and saves it to a table (tomb). This method is called by nuke only when its $bNoTomb method to true. Any other uses? How often is nuke called with $bNoTomb set to true?

Catalog (metadata) queries

In the database engine, catalog queries are called directly only by the nodetype management routines. Node management routines interact with the data dictionary via their node type. None the less, these methods are split out into a separate group of functions because they are written in a very general way that could be used with any mySQL database. Their definition is not specific to nodetypes.

  • Is there an equivalent to ALTER TABLE somewhere or is the only way to alter a field is to drop it or go directly into mysql.
  • Why aren't we using a more full featured 3rd party web front end to the database? (e.g. phpAdmin, or some Perl equivalent)? Integration with nodebase integrity constraints? If so, are we really using them? (a) integrity constraints only partially implemented, e.g. there is a check on drop table but not drop fields. (b) no alter table (that I can see), so in order to change a column w/o losing data we need to go into mysql directly. But if we are doing that anyway we are bypassing any integrity constraints that exist.
$DB->tableExists($table)
Defined in Everything/Nodebase.pm. Returns true if $table exists in $DB and false otherwise.
.
$DB->createNodeTable($table)
Defined in Everything/Nodebase.pm. Creates a skeletal node table with a primary key named "${table}_id".
$DB->dropNodeTable($table)
Defined in Everything/Nodebase.pm. Removes a node table and all of its data from the database. Certain tables are hard coded so that they cannot be dropped. Is this list complete? Does it need to be updated or has it been continuously maintained?
.
$DB->addFieldToTable($table,$field,$type,$bPrimary,$default)
Defined in Everything/Nodebase.pm. Adds a field to a database table
$DB->dropFieldFromTable($table,$field)
Defined in Everything/Nodebase.pm. Drops a field from a table. <scan class="highlight">Any field can be dropped. No integrity checks here. Oversight? Reason?
$DB->getFields($table,$mode)
Defined in Everything/Nodebase.pm. Returns an array with one element per field in table $table. $mode determines what information will be retrieved for each field. If 0, only the field name will be retrieved. If 1, a hash reference containing all of the SQL catalog data for that field will be returned. Defaults to 1.

$DB->getFields($table)
Defined in Everything/Nodebase.pm. Returns an array whose elements are the names of the fields in $table. This is equivalent to calling $DB->getFieldHash($table,0).

SQL command generation

These methos are used by both node and nodetype management routines. sqlquery nodes also contain code that call these methods.

  • Everything engine's SQL execution methods are designed for single table actions. Any thought to extend this to support SQL actions on entire views and/or standadard Nodebase transactions?
  • Error handling is fairly binary in this module (return 0 or 1) and doesn't distinguish between reasons for failure. Is there any interest in making error handling a bit more refined or in using exceptions? This could have two possible benefits (a) the ability to move permissions into database (b) the ability to define error handling separate from database management. If the db is handling permissions we need to know why something failed, not just that it did fail. Also if we generate errors with reasons rather than just return 1 or 0, we could more easily enhance the development environment with customization of error handling look-and-feel or debugging level. This would let pmdevs choose a debugging level or error output suitable to their current work effort.
$DB->sqlSelectMany($select,$from,$where,$other)
Defined in Everything/Nodebase.pm. Builds, prepares and executes the query. $select is the clause after the SELECT keyword, $from is the clause after the FROM keyword. $where is the clause after the WHERE keyword and $other contains any other clauses such as GROUPBY and LIMIT clauses. If this method is successful it returns the populated query cursor. If it fails to execute it returns undef.
$DB->sqlSelect($select,$from,$where,$other)
Defined in Everything/Nodebase.pm. Returns a single query result as either a scalar or an array. $select is the clause after the SELECT keyword, $from is the clause after the FROM keyword. $where is the clause after the WHERE keyword and $other contains any other clauses such as GROUPBY and LIMIT clauses. mysql now supports limit clauses. Wouldn't it be more efficient to rewrite this to use a "limit 1" clause rather than fetch all matching rows? Or would this be a problem because callers are already setting $other to a limit clause?

If one field is selected by the query, then the field value from the first record found will be returned. If multiple fields are selected, an array containing the field values in the order they appear in $select. If the query failes to execute or finds no matching rows, this method returns undef. Just out of curiosity, why did this method use number of fields in the select statement rather than wantarray?

$DB->sqlSelect($select,$from,$where,$other)
Defined in Everything/Nodebase.pm. Returns a single query result as a hashref whose keys are field names and whose values are the values assigned to those fields. If the query fails to execute or executes but finds no rows, this method returns undef.
$DB->sqlDelete($from,$where)
Defined in Everything/Nodebase.pm. Generates, prepares, and executes an SQL DELETE command. $from is the clause after the FROM keyword. $whereis the clause after the WHERE keyword. Returns 1 if successful, even if no rows were deleted. Returns 0 if query failed to execute.
$DB->sqlUpdate($table,$hData,$where)
Defined in Everything/Nodebase.pm. Generates, prepares, and executes an SQL UPDATE command. $table is the clause after the UPDATE keyword $hData is a hash reference whose keys are field names and whose values are the value that should be assigned to each field. If the field name begins with a '-', the '-' will be stripped and the value will be used as is, e.g. -K=>V will be added to the SET clause as "K = V". Otherwise the key will be used as is and the value will be quoted using the quote()</c> method. $where is the clause after the WHERE keyword.

Returns 1 if successful, even if no rows were deleted. Returns 0 if query failed to execute. On failure a message will be printed out via Everything::printErr().

$DB->sqlInsert($table,$hData,$where)
Defined in Everything/Nodebase.pm. Generates, prepares, and executes an SQL INSERT command. $table is the clause after the INSERT keyword. Otherwise it behaves exactly like sqlUpdate.
$DB->quote($s)
Defined in Everything/Nodebase.pm. quotes a string value in accordance with ISO SQL2. Single quotes (') will be doubled and the entire string will be surrounded with single quotes.

XML generation

XML::Fling is the preferred way to generate XML feeds. XML::Fling define both an OO and functional interface for generating and storing unvalidated unvalidated XML. an object that generates and stores unvalidated XML. It was developed as a faster alternative to XML::Generator and XML::Writer. The OOP implementation studiously avoids inheritance via @ISA. What was the reasoning behind that? Or does it merely reflect the time when the code was originally written, i.e. a time when there was still no consensus on how OOP should be implemented in Perl?</scan>

The functional interface is defined in XML/Fling.pm (via a list of exported subroutines), but the actual subroutine definitions are stored in XML/Fling/Export.pm. There are seven functions in the interface, each of which return a string containing the requested snippet of XML:

xml_header([ $encoding [, $version ]])
generates an XML instruction identify the XML version and encoding: <?xml version="$version" encoding="$encoding"?>
xml_start($tag,[$attr=>$val]...)
generates an opening $tag tag with the attribute value pairs $attr => $val. The values should be unencoded. They will be encoded using the same logic as xml_quote below.
xml_quote($attrval)
encodes $attrval so that it may be used as an attribute value. This function does not add the quotes. It merely replaces the characters <&"> with their equivalent HTML entity. One will rarely call this function directly as xml_start,xml_element and xml_empty all handle attribute value encoding on the caller's behalf.
xml_escape($dataval)
encodes $dataval so that it may be used as a data between open and close tags. It replaces the characters <&> with their equivalent HTML entity.
xml_end($tag)
generates a closing $tag tag
xml_element($tag,[$attr=>$val]...,[$data])
generates an entire $tag element: open tag, data, and closing tag.
xml_empty($tag, [$attr=>$val]...)
generates an empty $tag element, e.g. <$tag $attr => $val />

New objects created by XML::Fling->new($buffer) and will be blessed into one of three classes depending on the type of reference stored in $buffer:

  • XML::Fling::String if $buffer is a SCALAR reference.
  • XML::Fling::Handle if $buffer is a glob or IO::HANDLE
  • XML::Fling::Array if $buffer is an ARRAY reference.

All three types of fling objects have the following seven methods. These methods take the same parameters as the xml_XXX equivalent method above. However, rather than return an XML snippet, they append the generated XML to the buffer passed into the constructor. Each method returns $xml so that methods may be called like this: $xml->start('p')->escape('Hello World')->end('p')

  • $xml->header
  • $xml->start
  • $xml->quote
  • $xml->escape
  • $xml->end
  • $xml->element
  • $xml->empty

XML::Fling is used in the following nodes (links display code):

For additional nodes using XML::Fling click here

Mail system

  • Everything/MAIL.pm - defines routines for converting nodes to email messages and vice versa. is this currently in use? If so, where? New user confirmation? Other? Or is it an abandoned feature of the original Everything engine. If so, how do we handle emails today?

Performance considerations

Are race conditions, performance - still an issue? how does it come about? If resolved, how? See

Miscellaneous topics

URLs

  • Everything.pm
    escape($chr)
    converts $chr to a URL encoded character: %NN
    unescape($chr)
    converts an URL encoded character $chr (i.e. %NN) to an unencoded character.

HTML generation

  • Everything/CGI.pm - overrides and extends methods defined in CGI.pm.
  • Everything/FormMenu.pm: Everything::FormMenu class
  • Other

    • Everything.pm
    • Everything/Diff.pm - functions to compare two texts. why does the everything engine have its own set of Algorithm::Diff wrappers? These don't seem to be using any of the PM specific global variables. Convenience/syntactic sugar?
    • Everything/CacheQueue.pm: implements a cache optimized so that the most-frequently used items always end up at the end of the queue and the least-frequently used items end up at the head.
    • Everything/WebServerId.pm - stores values for three global variables: $VERSION, $short, $long. where are these used?

    THE END

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others taking refuge in the Monastery: (8)
    As of 2024-04-23 09:57 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found