PmDev: Technical notes and documentation
- Data model
- Code base
Note 1:Subtypes noted only when subtypes change behavior without adding database tables.
Note 2:Supertypes chosen based on data inheritance.
Note 3:Highlighted items are things I'm not sure about.
Table groupings
- Basic site structure: node, setting,
document, note, nodegroup, links,
nodetype, htmlcode
- Look and feel:
rawpage, htmlpage, nodelet,
container, image, string, rawdata, themesetting
- Site help system: hint, nodehelp
, perlfunc
- Group editing: contributor, wiki, edithistory
- Database management: dbcolumn, maintenance
- Codebase management:
patch, devtask, bug, nodepin
- Node approval: approval, approved,
approvalstatus, approvalhistory,
considernodes, considervote,
editorvote, reapednode
- Request management: cachedinfo
- Site statistics:
dbstats, dbstattype, dbstatduration,
HTTP_USER_AGENT, referrer,
traffic_stats, hitsinfo, dailystatistics,
stats, version
- Site content: mail, scratchpad
sourcecode, perlnews, quest, review, snippet
- Poll data: polls, pollvote, pollpopularity, testpolls
- Node stats: picked_nodes, repliesinfo
- Search: keywords,searchwords, moduleindex, modulelist
- User data:
user, iplog,
newuser, newuserchit, newuserimage
- User activity:
tickerlog, chatter, message, vote
- Empty tables:
approves,cachedresults, protouser
- Defunct tables:
relatednodes,
search words,
largedoc (experimental, never integrated into site),
tomb (predecessor to reapednode),
notepointers (no longer used - so says tye, 2009-08-03 in cb)
General questions
- Ids on many nodes are marked "autoincrement" and yet the documentation implies that the value in that field is the same as the related "node". Is the auto_increment ever used? Or is the field always populated? If auto-increment is used when? If not, why marked auto-increment? Are there ever problems with the id overwriting the defined id? Or would that only happen if the node is saved with id=0 or null (as in mysql)?
- Nodetype and maintenance appear to be handling things that could be implemented with stored procedures and view based permissions. Also why is the db schema information reproduced in a table? - has any thought been given to moving some of that functionality into the database?
- The data model contains a lot of objects with attributes in multiple tables. Normally the integrity of these objects would be preserved by encapsulating updates in a transaction. Have node caching issues related to early versions of mySQL not supporting transactions and row level locking (see Re: Recent slowness and outage and following discussion) been resolved? If not, why not? If so, how?
- What happened to the "symlink" table in the Everything Bible?
Design philosophy
From a design point of view the database tables in the PerlMonks database can be divided into two groups: database tables storing nodes and database tables storing "other information".
Node tables
A node is anything that can be directly created, edited, or deleted by a human being and is not merely the history of a site process (e.g. node approvals or consideration). There is a wide range of information that fits that description and all of it is stored in the database: metadata and integrity rules, root nodes and replies on SOPW, the images displayed in the header, the code that runs each box in the sidebar of a PerlMonks page, the standard layout of each page (header, footer, sidebar on the right), and much more.
Each node has an entry in the node table. This entry assigns the node a node id and node type. It also tracks essential audit information about the node: the owner and the last update date, for example. For many nodes this is not enough information so each node may also link in a 1-to-1 fashion to several additional tables. These tables store extra attributes: document text, an ad-hoc list of settings, additional users who may edit the node, the position of the node within a hierarchy of nodes, and much more.
The set of tables to which a node links is determined by its node type. The node type determines all of the data, integrity constraints, processing and display requirements of the node. It also controls who may add, update, or delete the node and its attributes.
Though normal PerlMonks users will never see a page "list nodetypes", the definition of a node type is also considered a node since a human being must define its
rules. In fact it is the very first node created in the database (id=1) and its assigned node type itself. (If that sounds confusing, you might want to first twist your mind into a mobius strip and then reread the sentence). The available node types and their relationships one to another can be seen in Type Tree. You can use that page to click through to a detailed description of the node type and/or to list of nodes belonging to that node type.
Non-node tables
The second group of database tables, non-node tables, tend to store one of two types of data: (a) data about a node that have a 1-to-Many relationship to that node. (b) data about the history of a process. For example, when the text field of certain nodes contain links to other nodes, the PerlMonks parsing engine extracts those links and saves them in the links table.which nodes does this apply to? which text fields?. The entries in the links table don't count as nodes because they are never edited independently of the containing node's text field. Similarly edithistory records aren't nodes in their own right because they are the by-product of the node editing process and have no existence apart from the node and its editing process.
The records in non-node database tables don't have display pages. In some cases, you can view the database table contents using one of the queries in sqlquery.TODO: list sqlqueries for each non-node table.
- node: A unit of site content, code or meta data. The node stores attributes common to all units: a name, an id, audit trail information. In addition, for convenience, a few other attributes that are applicable most but not all units (e.g. a vote tally) are defined for each node.
Subtypes:dbtable, file, pmmodule
Subtypes with settings (see below for definition of settings): nodeball, theme
- node_id: Primary key. Auto-incrementing id that identifies a node.
- type_nodetype: the id of the node type, nodetype: nodetype.nodetype_id.
- title: The title of the node
- author_user: The author of the node.is this always the user who hit the create button? If not, when and how can it change?
- createtime: The time the node was created.???how is this set???
- nodeupdated:??? Is this the last update? Default is CURRENT_TIMESTAMP: is it reset after each update?
- hits:???is this in use? How is "hit" defined: number of times requested? number of distinct IPs requesting? other?
- reputation.??? is this the net number of up/down votes? If so, where are the separate up and down vote counts stored? Or is it calculated by comparing reputation to votes cast?
- votescast: The number of votes cast for the node.
- lockedby_user: The user who locked this node.??? Who can lock? Who can edit after a lock? Does the node have to be unlocked or can the user who locked it edit it without unlocking? How is "unlawful" locking prevented? How is the user identified? user.user_id? or by node.node_id of their user node???
- locktime: The time when the lock ws applied.
- core: ???
- package: ???
- postbonus: ???
- ucreatetime: ???
- node_iip: ???
- setting: Extra information for a node that stores a collection of settings.
Supertype: node
Subtypes: restricted_setting
- setting_id node.node_id for this node
- vars a URL encoded hash containing variables for this node.as in Perl code for a hash???
- document: Extra attributes for node that contains text. May be used either for site content or to define some part of the site's look and feel.
Supertype: node
Subtypes:
superdoc,restricted_superdoc, sillyscript,
pmdevsuperdoc,includefile
perlquestion,perlmeditation,perltutorial,
monkdiscuss,CUFP,poem,obfusticated,
offtopicroot
sitefaqlet, sitedoclet, alphafaqlet,
perlfaq_nodetype
testquestion,fullpage,perlexercise,
superquestion,superquestionarea,
strangedoc,strangenode,request,
testtype,pmdevtopic,sectioncontainer,
perlman,data,perlcraft
Subtypes with settings:
doclist,faqlist,tutlist,catqalist
- document_id: Primary key. Auto-incrementing id that uniquely identifies a document.
- doctext: text of document
- lastedit: last edit to document???
- links: Connects one node to another via a link embedded in a document node's text. stores links parsed from document text?
- from_node: node.node_id of node containing the link.
- to_node: node.node_id of link target.
- linktype:
???
- hits:
???
- food:
???
- note: Extra attibutes for a document that participates in a hierarchy of nodes. Both the order among siblings and the immediate parent are stored.
Supertype: node
Subtypes: perlsolution, categorized_question, categorized_answer,pmdevnote
- note_id - Primary Key. node.node_id of the node.
- parent_node - node.node_id of the parent node
- position - position relative to siblings - does not appear to be in use as of 2009-08-12 (researched by jdporter an ELISHEVA via code search/grep and examination of node's htmlpages)
- root_node - node.node_id of hierarchy root
- nodegroup (concept): This is a conceptual building block representing a collection of nodes. The members of the node are stored in a database table, known as the "group table". There is no database table corresponds (1to1) to this entity other than node (defined above).
Supertype: node
Subtypes:
nodegroup (not to be confused with the db table which stores the members of the group, not the group definition),nodeletgroup,
usergroup,sourcecodesection,QandASection
- nodegroup (db table): A table storing the members of
various groups. data structure implies that a node may belong to more than one group, i.e. it supports M-to-M relationships between nodes. Is this true?
- nodegroup_id: one field of a two field primary key. node.node_id of a group to which a node belongs.
- rank: second field of two field primary key. The position of this node relative to other nodes in this group.
- node_id: node.node_id of a node belonging to this group.
- orderby: Everything bible says used internally when nodes are reorderd. How???
- nodetype: Extra information for nodes that define a node "class". Nodes belonging to the same class can have their own rules for user permissions, an SQL table to store special attributes and a table to store group members. Nodetypes may also be arranged in a hierarchy by inheriting from other node types.
Supertype: node
- nodetype_id: ???
- readers_user: ???
- writers_user: ???
- deleters_user: ???
- restrict_nodetype: ???
- extends_nodetype the node id of the superclass node typelinks to node.node_id?
- restrictdupes: ???
- grouptable: ???
- updaters_user: ???
- htmlcode: Extra attributes for a node that contains a perl script. The Perl script must return a string containing content to be inserted. Htmlcode may be inserted into nodelets and htmlpages by using PM markup. When the nodelet or htmlpage code is parsed, the string returned by the Perl script will replace the link. Htmlcode nodes can be used to encapsulate HTML generation that is shared by two or more htmlpages or nodelets.
Supertype: node
Subtypes:opcode,accessrule
Subtypes with document and settings:
sqlquery
- htmlcode_id: Primary key. The node's node.node_id
- code: a Perl script (maximum length 64K) that returns a string.
The PerlMonks site's look and feel is defined using a combination of htmlpages,containers and a
variety of components that can be embedded within them. Htmlpages define how to render content for a particular display type. Containers define the HTML that surrounds that content.
These components may be either free standing or bundled together into a theme. Free standing components define site wide defaults. Themes define a set of htmlpages, containers and HTML/CSS default attributes that can override the site-wide defaults to give the website a particular look and feel.
Theme components
- rawpage: page delivered as is without any evaluationwhat are scripts used for? are they part of site maintenance? content? look and feel? what are rawpages used for (all the nodes have no content)?
Supertype: Document
Subtypes: css,Dynamic_CSS (0 nodes),script (gods only)
- rawpage_id: Primary key. The node.node_id of a raw page node.
- datatype. ???
- lastedit: ???
- htmlpage: Extra attributes for a node that
determines how to render a GET request based on the nodetype of the requested node and the value of the display type parameter. The content of the requested node is passed through an evaluation process before being rendered (evalCode, evalX, and finally eval). The name is somewhat of a misnomer because (a) the page may contain more than just HTML markup and (b) because the final output may be any mime type, not just HTML. How is the HTML page determined from node id and displaytype? Is it always nodetypename whitespace displaytype?
Supertype: Node
- htmlpage_id: Primary key. The node.node_id of the htmlpage node.
- pagetype_nodetype. The node.node_id of the node type which this html page is designed to render.
- displaytype: The value of the display type parameter which this html page is designed to render.
- page: the text to insert in the "parent_container". This text may be a mixture of plain text, static HTML, PM markup, and embedded Perl code. Must be < 64K. It is responsible for formatting the content of the selected node in a manner consistent with the selected code type. The resulting output should be suitable for insertion between whatever tags "parent_container" has placed around "CONTAINED_STUFF".
- parent_container: the container into which the contents of "page" should be rendered once they have been evaluated.
- ownedby_theme: the node.node_id of the theme to which this htmlpage belongs
- mimetype: the mime type of the generated page. If left blank, the mime type will default to HTML.
- nodelet: Extra attributes for a node that contains that generates refreshable content for insertion into a div, table row or other html tag defined by a container. This container is specified s part of the nodelet definition. Nodelets may be inserted into an htmlpage using PM markup. Can the content may be refreshed independently of the page?
Supertype: Node
- nodelet_id: Primary key. The node's node.node_id
- nltext: cached contents of "nlcode" after parsing. Must be < 64K.
- nlcode: A mix of HTML, PM markup and embedded Perl code similar to the content of a container, htmlpage, or htmlcode node. Must be < 64K.
- updateinterval: The refresh rate for regenerating the node content. Where possible the contents of "nltext" will be used in lieu of parsing "nlcode" afresh.
- parent_container: the container into which this nodelet's content will be inserted.
- nlgoto: ???
- lastupdate: Timestamp storing the most recent update of these attributes. This will be set whenever any (?) attribute is changed, including "nltext".
- container: Extra information for a node that defines instructions for wrapping raw content in HTML. Containers usually define a single tag and its contents, for example a table, table row, div, or even the entire content of an <html tag. Is there an easy way to tell where a particular container is used? The themes on PM seem to have few if any containers and yet the PM site has many, many containers. Are containers omitted from the themes because all defined themes share the same containers? Because containers need only be made part of a theme if the theme is exported and we don't have a reason to export themes? Other?
Supertype: Node
Subtypes: theme containerwhat ist this subtype used for?
- container_id: Primary key. Stores the node.node_id of a container node.
- context: instructions for wrapping raw content in HTML. The instructions may be a mixture of static HTML, PM markup, and embedded Perl code. The insertion point for the raw context is represented by the string "CONTAINED_STUFF". Text before and after this string is optional.
- parent_container: node.node_id of the container node where this container node is embedded. Container nodes may be nested within other container nodes.
When raw content is inserted into a container, the PM website engine checks to see if the container has a parent. If so, the container and its raw content become the "raw content" for the parent and are inserted in place of the "CONTAINED_STUFF" string. This process of insertion continues until a container that has no parent is reached.
The container insertion hierarchy (as of 2009-07-31).
- DOCTYPE + Html (<html>) containers
- Html (<html>) containers
- Div (<div>) containers
- Table (<table>) containers
- Table row (<tr>) containers
- Embedded Perl containers
- Empty containers (content=1:). these don't contain CONTAINED_STUFF - is it in use?
- image: Extra attributes for a node containing a single image. The attributes contain both the image and values for the <img> tag attributes. Image nodes may be inserted into htmlpages and nodelets using PM markup.
Supertype: node
- image_id: Primary key. Auto-incrementing image id.Only 23 rows so maybe this stores graphics used on the site?
- src: URL for full image?
- alt: text description for accessibility?
- thumbsrc: URL for thumb image?
- description: Full description, notes about reason/role of image?
- string: Extra information for a node containing static HTML content only. This node is used in a variety of ways by the cabal groups. The SiteDocClan) uses it to store definitions of common PerlMonks terms. The pedagogues use it to store links to external tutorials. Could/has string be used to store boilerplate text for htmlpages and nodelets? Why a special table for strings? why not make them a subtype of document? performance? oversight?
Supertype: node
Subtypes:docstring,faqstring,tutstring,catqastring,linktype
- string_id: node.node_id for the node that defines the string;
- text: the text of the string.
- rawdata: Extra information for a node storing data with an unspecified use. The rawdata display page merely inserts it between [%...%] implying that it can be used to store Perl code and/or literals. However,
data from these nodes are not usually displayed directly. Rather the data is retrieved and processed by Perl scripts or Perl embedded into htmlpage or htmlcode nodes. For example, user image browser inserts the contents of this node within an <img> tag.
- rawdata_id: Primary Key. node.node_id
of the rawdata node
- datatype: possible values?
- databytes: the actual data
- lastedit: audit trails
Themes
- theme: This is a conceptual entity representing a collection of attributes that define the look and feel of the site. There is no dedicated database table for themes. Themes store their attributes in the database tables node and setting.
If data alone is considered, themes are a subtype of setting. However, the node type heirarchy defined in nodetype classifies themes as a subtype of nodegroup. what are the implications, if any, of this discrepency?
- themesetting: Extra attributes for a theme that inherits default attributes from another "parent" theme.
Supertype: setting
- themesetting_id: node.node_id for the node that defines the theme;
- parent_theme: node.node_id of the theme that sets defaults for this theme.
- nodehelp: this only has one row. Was it ever used? How was it meant to differ from "hint"?
- hint: PerlMonks site documentation??? Appears to be annotations describing key site compnonts. Also appears not to have been maintained? Could this be used to add annotations to nodes defining nodetypes and dbtables? Or was it intended as a tooltip (e.g. height, width attributes)?
Supertype: document
- hint_id: node.node_id of a hint.
- height: ???
- width: ???
- explains_node: node.node_id of node explained by hint.
- perlfunc: local PerlMonks documentation for a Perl (pre 5.6) function. Now obsolete - current documentation is at http://perldoc.perl.org.
- perlman: A conceptual entity storing local PerlMonks documentation for perl syntax (pre 5.6). Now cbsolete - current documentation is at http://perldoc.perl.org. It has no attributes of its own - all of its attributes may be found in the dbtables used by document.
Supertype: document
Nodes are normally edited only by the user or user group that created them. However, PerlMonks supports three modes of group editing: group owned nodes, moderated nodes and wiki nodes.
The following types of nodes are all group edited. Unless noted otherwise, they all track edit history.
Group edited nodes need special tables to track who can edit and the history of edits performed so far:
Site management
Aside from patch, the tools for managing site enhancements seem little used. How do we manage bug reports in PM? enhancements? It seems like the process is very ad-hoc and informal. Would we benefit from something more formal? (i.e. would it enable us to bring on/manage more volunteers?) If so, would we want to build on what is already in PM? Integrate PM with bugzilla? other?
- devtask: Extra attributes for a node that defines an enhancement to the site. this doesn't seem to be currently in use. why?
Supertype: document
- devtask_id: node.node_id of a site enhancement.
- status: possible values???
- priority: possible values???
- lead_user: who is lead? author? implementer?
- bug: Extra attributes for a node defining a bug report. All the existing records appear to be blank aside from a joke entry and some stray very old (2000) bugs that had no assignment nor followup.deadwood?
Supertype: node
- patch: Extra attributes for a node defining a proposed patch to the look and feel of the PerlMonks website or one of its themes.
Supertype: htmlcode
- patch_id: node.node_id for the node that defines the patch.
- for_node: node.node_id of the htmlpage, nodelet, or htmlcode to be patched.
- field: the name of the field that will be updated by the patch, e.g. 'code' for htmlcode nodes, 'page' for htmlpage nodes, and 'vars' for settings nodes.
- reason:reason for patch
- applied:timestamp wehn patch was applied.
- applied_by:node.node_id of user who applied the patch
- nodepin
<TODO>
There are a number of different tables here, but the tables seem to overlap in function. Which of these tables are actually in use at this point?
- approved: stores a user's approval of a node action, e.g. keep, edit, reap. <scan class="highlight">does this table have a primary key? All I see is a non-unique index. Shouldn't node_id+user_id+timestamp be unique? (on the off chance that a node is consider by a user, unconsidered by a janitor and then reconsidered by the same user?
- user_id: node.node_id of user who approved
node_id.
- node_id: node.node_id of node that was approved.
- action: code identifying what action was approved.
- tstamp: timestamp of user's approval
- considernodes: dbtable storing the description of a janitorial action request (a.k.a. consideration)
- considernodes_id: id of a janitorial action request. is this the node or a separate id (there is no node id on the consideration!)?
- del: tally of reap votes
- keep: tally of keep votes
- edit: tally of edit votes
- considertime:datetime when consideration was requested
- description:reason for consideration
- considervote: dbtable storing a user's vote on a janitorial action request (a.k.a. consideration)
- considervote_id: part 1 of two field primary key, links to considernodes.considernodes_id.
- voter_user: part 2 of two field primary key, links to node.node_id of user who voted
- voted: vote of user: one of the following:
'edit', 'keep', 'delete'
- editorvote: stores each janitor's final decision about a request for janitorial action.
- editorvote_id: part 1 of 3 field primary key, links to considernodes.considernodes_id.
- voter_user: part 2 of 3 field primary key, links to node.node_id of the janitor who voted
- votetype: part 3 of a 3 field primary key, one of the following: 'edit', 'keep', 'delete'
- weight: a weighting factor applied to this janitor's voteThis field appears to be ignored in editor_vote. Are there other places it is currently used?
- reapednode: describes the reason and final vote tally for a reaped node. some of the data that Visit Reaped Nodes seems to be taking from "data" have their own fields (author_user, createtime, type_nodetype). Are these fields currently used or have they been deprecated/ignored in favor of the hash?
- node_id: Primary Key. node.node_id of the reaped node
- data: a hash storing the title, author, and text of the original reaped node.
- author_user:
???
- createtime:
???
- reason: reason for reaping - taken from the description on the janitorial action request (considernodes)
- del: delete tally - taken from the janitorial action request (considernodes)
- keep: keep tally - taken from the janitorial action request (considernodes)
- edit: edit tally - taken from the janitorial action request (considernodes)
- type_nodetype:
???
- reputation: the node reputation at the time of reaping.
- request - a document node that stores an ad-hoc janitorial request. This can be used for requests that are too complex to be handled via the consideration mechanism and for requests by monks who do not yet have consideration powers. Requests have no attributes of their own. They use the same tables as documents to store their attributes.
Supertype: document
- approval: dbtable storing who and what approved a node. the approval nodelet uses "approved". Is this table in use? kept for historic reasons? If historic, why wasn't data converted
over? Is there interest in doing it in the future?
- approvalstatus: this has very few rows (38) and much the same data as approval and approved. Is it in use?
- approvalhistory: this has very few rows (90) and much the same data as approval and approved. Is it in use?
Is cachedinfo currently in use? How? Or are we using another mechanism for caching site code?
- cachedinfo: Stores pre-parsed and evaluated code stored in the node identified via "code_id"
Which of these stat tables are in use? dbstats+dbstattype+dbstatduration would be able to track all statistics but there are several other tables that capture specific stats with or without start time and duration. Which of these are in use if any? is dbstat in use?
- dbstats: indivual stat keyed by stat type, start date, duration.
- dbstattype: types of stats tracked by dbstats
- dbstatduration: duration enum used by dbstats
- version: tracks updates on cached nodes. See Everything/NodeCache.pm
- HTTP_USER_AGENT: number of hits for each user agent type. Has no start date or duration(!)
- referrer: number of hits and time of most recent hit for each referrer URL (limit of 128 characters). Has no start date or duration(!)
- traffic_stats: number of hits per hour
???
- hitsinfo: number of hits on a day
???
- dailystatistics: total number of users and nodes hit. called "daily status" but has fields for 1,2,4 weeks?
- stats: same as daily stats but adds total number of votes, ips, impressions, and xpsum
Simple content nodes inherit their data from document and store any extra attributes
in a single database table named after the nodetype.
<TODO>
<TODO>
<TODO>
- protouser:
Does this store default user attributes? dead wood? - table has no rows and fields seem to identify per-user information which does not have any natural defaults. Or were these meant to be stored validation rules? Subtypes:zombie_user
- newuser:
what is the reason for a separate "newuser" table? Do new users have to be approved or verified (been a while :-))? If so what code defines the process?
- title: does this become user.nick?
- realname: does this become user.realname?
- email: does this become user.email?
- lastupdate: when/how is this field set, if ever after initial creation? Is this really a "create date"?
- newuserchit:
what is this? is it part of a scenario where email is sent to new user with digest; when email link is clicked through, account is activated and converted to user? If so, how is expiration time calculated?
- email: is this the email from newuser.email?
- digest: digset inserted into email link that new users must click? Is this really a create date?
- lasttime: time when the digest was generated and new user email was sent?
- newuserimage:
is this deadwood (user images are now a level 5 privilege)? Or does it store a newly added user image (so that the image can be approved or history kept?) What was/is it used for? Also how does it join to user?
newuserimage_id: Primary Key.is this a user with a new image, e.g. user.user_id? Or is it an image id, e.g. image.image_id?
timestamp:is this the date/time when the user image was added/changed?
- user:
- user_id: Primary key. Auto-incrementing id that identifies a user.
- nick: The public name of the user. Must be 26 characters or less.
- passwd: The user's password. Visible only to the user. Modifiable by user via "Edit Profile".
- realname: The real name of the user. Visible only to user. Modifiable by user via "Edit Profile".
- email: The email address of the user. Visible only to the user. Modifiable by user via "Edit Profile".
- lasttime: The date on which the user last visited. ???How is "visited" defined???
- karma:???Experience points to date? Dead wood? Backwards compatibility???
- experience:???Experience points to date???
- votesleft:???Votes left today:???
- votes: Total number of votes allowed per day.
- voteavg:???Is this in use? Average number of votes used? Average of up and down votes???
- imgsrc: URL for user image. Modifiable by user via "Edit Profile". Access/use of field is a level privelege.??? How is level privilege controlled? Integrity rule to prevent non-"" values? Refuse to render if set? Prevent access to edit???
- lastupdate:??? Audit field? Set automatically anytime this db record changes? Or is this the last significant update???
- givevotes: Y/N flag. ???How is this used? Flags accounts that can have votes(e.g. non-initiate)? Flags accounts that need a bump in number of votes???
- user_scratchpad:???node id of user scratchpad???
- secret:???Is this in use? Password question answer? Other???
<TODO>
The web pages served up by the PerlMonks website are defined by nodes that store text with a combination of HTML and special perl monks markup. The PerlMonks markup supports both embedded perl code and inclusion of perl code defined in htmlcode nodes. There are several categories of nodes that define themselves using PerlMonks markup. You can click on the links in the list below to see all the nodes in each category.
- superdoc - entry point for HTTP requests. Superdocs are responsible for generating their own content.
- htmlpage - entry point for HTTP requests that display a selected node's content.
- fullpage - seems to be mostly tickers. When would one use this rather than a superdoc or htmlpage? Is this a page with neither display type nor node id?
- nodelet - defines a subregion of a page.
- container - defines HTML for standard headers, footers, and sidebars for each page or subregion (nodelet)
- script-gods only
Perl code is embedded into the above nodes using special perl markup:
- ["perl string"]
- evaluates "perl string" as a Perl interpolated string and inserts the result in place of the markup.
- [% any perl code %]
- evaluates the perl code between [% and %] ???and inserts the return value in place of the markup.???
- [{ code_call }] and [{code_call:param1,param2,...}]
- evaluates a script stored in an htmlcode nodes and inserts the return value in place of the markup.
The PerlMonks website is currently using perl-5.8.3. Full details of the configuration of the PerlMonks version of Perl may be found in Config.pm. Have there been any changes to it that will be lost when/if Perl is regenerated? Are we running our own custom compiled version of Perl?
Any module included in the 5.8.3 core may be used in embedded Perl code. In addition, any class or function defined in a PM module object may be used in code embedded in the PerlMonks website. A full list of the available modules may be found here.
The modules that are displayed in the above link are in fact the files that are being used by the PerlMonks website engine. pmmodule nodes are displayed using the pmmodule display page htmlpage node. That node calls handle_pmmodule_display which calls readPmModule which searches for the module path in %INC and loads the module's text via do. Since we are reading in the module via do, we must be able to safely reload all of the pmmodule files. Has this constraint ever caused problems? Is this a bug waiting to happen if we should wish to move some of the code in htmlcode fragments into pmmodule file subroutines? Might it be better to read the file in line by line like a normal text file?
These modules come from a variety of sources: CPAN, the original everything engine, and the PerlMonks website development effort.
(jdporter: Everything/HTML.pm is particularly important).
- CPAN modules: What
versions of these modules are being used? What is the
policy, if any, on keeping CPAN modules up-to-date?
What policy do we have for making use of other CPAN
modules?
- The original Everything engine:
Have these files been
modified? In what way?
- XML::Fling: (written by tye to expedite XML generation. XML/Fling/String.pm and XML/Fling/Handle.pm are "required" if XML::Fling::new is initialized with a glob or scalar ref, but they have no pmmodule nodes. Do they exist as files? Or do we only use array references with XML::Fling?
The htmlcode node type has many subclasses. Nodes from any of the following categories may be used with this markup - to see a full list of nodes in each category, click on each link:
In addition to htmlcode nodes, the following node types can also embed code or data into a web page definition:
- css - defines css styles used by HTML embedded in htmlpages, nodelets, and containers
- data - text files with one row of data per line.
- image - images for embedding in text.
- rawdata - can be used to store pure perl code and/or byte data for images. Embedded Perl code can retrieves this data and insert it into generated HTML and PM markup. At present, it is most often used to store user images.
what are these used for: file (0 nodes), includefile (1 node, viewable only by gods), rawpage (6 nodes, mostly empty or with what looks like garbage)
The PerlMonks website stores data in a variety of ways,
not all of it accessible to a relational database engine.
For example, significant amounts of metadata are stored as
rows in database tables rather than in system catalogs. Also, some fields contain denormalized data and
repeating groups that are difficult for SQL engines to work
with. Additionally, at the time that the engine was
originally designed, open source relational databases
lacked built-in support for many features needed by the PerlMonks data model: row level locking and transaction
management, to name two.
As a result of these factors PerlMonks data management
is split between the built-in features of the database
and wrapper modules that supplement or provide an
alternative implementation of the built-in database
management support. This custom database management can
be compiled separately from the PerlMonks webserver by
placing the following files in the @INC path:
- Everything.pm: provides a routine to initialize
the database connection, defines some convenience
functions, and a global variable storing the
database connection.
- Everything/DBI.pm: provides a method to make
the database connection and wraps the Everything.pm
and Everything/Nodebase.pm query methods in
logging code. Part of the logging code's
responsibility is to track query statistics. This
is done with the help of the Everything/DbStats.pm
module.
- Everything/Password.pm: required by
Everything/DBI.pm when making the database
connection.
- Everything/DbStats.pm: database functions that
retrieve and update query statistics. These are called
by the Everything/DBI.pm wrappers.
- DBI: CPAN module defining the interface to
the database. This module
isn't in the list of pmmodules. Why?
- Everything/NodeBase.pm: Defines the
Everything::Nodebase class. This class
objects representing data connections and provides
CRUD and metadata access to the database. It also
manages caching of database nodes with the help of
Everything/NodeCache.pm and
Everything/CacheQueue.pm
- Everything/Experience.pm - additional database
functions to get or change node reputation, a
user's level, a user's experience points, their
votes on a node, or the number of unused votes.
Reputation and experience
points play a role in determining rights to create,
update, and delete certain kinds of data or belong
to certain user groups. They are an essential part
of the database engine.
- Everything/NodeCache.pm: defines the
Everything::NodeCache class. This class
holds nodes that have already been retrieved from
the database.
- Everything/CacheQueue.pm: defines the node queue
used by Everything/NodeCache.pm. This module is
a general utility and not specifically related to
database processing. It is included in this list
because the PerlMonks database engine cannot be
compiled unless it is in the @INC path.
The following modules are sometimes used by the
database engine but are required at runtime by certain
branches of the code. They are only needed if those code
branches are traversed and do not affect compilation:
- Everything/HTML.pm - this is used in two places:
data integrity enforcement and permissions management
(see isApproved().
Before
creating or deleting nodes, the database engine looks
for integrity rules stored in
maint nodes. The
code in those nodes is executed by passing
it through Everything::HTML::embedCode(...).
Permissions management uses Everything/HTML.pm
in two different places. First the current user and
cached approval information is stored in
Everything::HTML::HTMLVARS. Second,
Everything::HTML is pre-required before loading
and running access rule code as a convenience to access
rule authors. Although, the access rule code itself is
executed directly without passing it though
embedCode, the methods in Everything/HTML.pm
are frequently used by the access rule code, so this
module is "required" before calling that code so that
access rules don't have to do it themselves. A list of
access rules may be found
here.
is this really necessary? why?
seems like a violation of separation of concerns?
Access rule code doesn't get passed through
embedCode so why does maintenance node
code?
- Everything/CGI.pm - required when
Everything/HTML.pm is loaded.
- CGI.pm - required by Everything/HTML.pm.
Database engine questions and food for thought
-
what would be the performance benefit of splitting out
the database management code and placing it on the same
machine as the PM database (i.e. using it as a front end
to database connections). Would that create more or less
of a bottleneck? Would it improve security? What about
with a persistant connection between the db server and
webservers? We have to go to the server anyway
to check to see if a node is dirty, so what are we
gaining by having separate node caches on each Apache
server machine? Or does splitting things up just make
everything too complex to be worth the performance
benefit, if any
-
Is there a way to support two way communication between
dbserver and webservers? What if the dbserver kept track
of which server was caching which node and broadcast the
fact that a node was dirty. Is there a lightweight way
to do this? The actual number of nodes changing is a
relatively small portion of the whole(nodes voted on,
nodes edited, users involved in voting).
-
Wikipedia/mediawiki uses a master/slave database
arrangement. Could we benefit from something like that?
Is there existing Perl code that does it? Could we port
the MediaWiki code?
-
What is the current use of dbstats? If for site
analytics, why are we doing our own? Any thought about
integrating more complete third party tools? If for
optimization, would it make sense to move this to the
node table or version table? How many joins/queries
would it save?
-
Does caching support invalidation of selected tables?
(I'm guessing not since node retrieval is either node
table only or all tables). Would we benefit by an
implementation that could invalidate only part of a
table? What would that mean? Would there be situations
where integrity rules spanned more than one table?
Should there be? (i.e. does that indicate a design
problem?)
-
A lot of data is stored in setting records. Could we
reduce the number of joins by moving this data into
table fields?
The Perl Monks website currently uses mysql 5.0.45-log. Table types are MyISAM by default. For details about the configuration, you can run the query show sql variables.
There is also an unofficial port of the PM website to SQLite written by Corion.
- $DB
- Declared in Everything.pm. Global variable providing
database access. The variable stores a reference to a
Everything::NodeBase object created by
initEverything.
- initEverything($sDb, $bUseStaticNodetypes)
- Declared in Everything.pm. The parameters to this
function are passed without modification directly to
Everything::Nodebase->new(...).
What calls this function?
- $sClass->new($sDb, $bUseStaticNodetypes)
- Declared in Everything/Nodebase.pm. Creates a
database connection to a database named $sDb.
$sClass must be Everything::Nodetype or
one of its subclasses.
$bUseStaticNodetypes is a flag that determines
caching behavior. If true, nodetypes will be cached
the first time they are retrieved and changes to
nodetypes will not take effect until the server is
restarted. If false, changes to nodetypes take effect
immediately.
- $DB->getDatabaseHandle()
- Defined in Everything/NodeBase.pm. Provides access
to raw uncached database connection.
- $dbh (deprecated)
- Defined in Everything.pm. Global variable equivalent
to $DB->getDatabaseHandle()
Nodetypes supplement the metadata stored in the
DBMS's data catalog with metadata for specific rows
of nodes. The DBMS for which the PerlMonks database
engine was originally designed did not provide all of
the support needed. One needs to be aware of this
history because some of the information stored in
nodetype records would normally be stored directly in a
modern database system.
Nodetypes are just regular nodes so most (but not all)
the node management routines are used with nodetypes.
The methods included in this section are particular
to nodetypes and do not apply to general purpose node
management.
- $DB->getType($idOrName)
- Defined in Everything/NodeBase.pm. Returns a hash
containing all the attributes of a node type.
- $DB->getAllTypes()
- Defined in Everything/NodeBase.pm. Returns an array
of hashes, one for each nodetype defined in the system.
The individual hashes are returned by
getType()
- $DB->deriveType($hType)
- Defined in Everything/NodeBase.pm. Calculate default
values for nodetype attributes using inheritance. Called
automatically by getType
-
$DB->getNodetypeTables($hType)
- Defined in Everything/NodeBase.pm. Cleans up list of
a nodetype's tables. Called automatically by
deriveType
- $DB->isGroup($hType)
- Defined in Everything/NodeBase.pm. Returns true if
$hType has a group table.
Is it always true that a
nodetype that is a subclass of NodeGroup (name?) also
has a group table? How is this enforced? By virtue of
inheritance? And if something has a group table
how do we make sure that it also a subtype of NodeGroup
(and do we even care)?
- $DB->isApproved($hUser,$hGroupOrAccessRule,$hNode, $bIgnoreGod)
- Defined in Everything/NodeBase.pm. Returns true if
user $hUser can access node $hNode for the
purposes defined in access rule
$hGroupOrAccessRule. Normally, gods are
authorized automatically, but if $bIgnoreGod is
true, then user $hUser's access will be assessed
based on their non-godly group memberships.
-
$DB->getLevel($xUserHashOrId)
- Defined in Everything/Experience.pm
- $DB->isGod($hUser, $hType)
- Defined in Everything/NodeBase.pm. Returns true if
user is in the gods group.
- $DB->canCreateNode($hUser, $hType)
- Defined in Everything/NodeBase.pm. Returns true if
user $hUser can create a node of type $hType
-
$DB->canDeleteNode($hUser, $hType)
- Defined in Everything/NodeBase.pm. Returns true if
user$hUser can delete a node of type
$hType
-
$DB->canUpdateNode($hUser, $hType)
- Defined in Everything/NodeBase.pm. Returns true if
user $hUser can update a node of type
$hType
-
$DB->canReadNode($hUser, $hType)
- Defined in Everything/NodeBase.pm. Returns true if
user $hUser can read a node of type
$hType
This section contains methods used to retrieve and
generate data for node records. Special methods are
needed because of the complex way node data is stored.
Node attributes may be split among several different
tables with 1-to-1 relationships to the node table.
Additionally, some nodes are collections of nodes,
or even deeply nested node hierarchy.
what about nodetype additions,
updates and deletes? It seems that the general purpose
node insert,update,and nuke routines are being used
for nodetypes (needs to be verified by looking at the
htmlpage/maintenance/htmlcode nodes for nodetype). If
so, how are the differences between non node-type and
node-type nodes being accounted for?
-
If we really do need to get all data from all 1-to-1
records associated with a node, would it speed things up
or improve our ability to manage dev permissions if we
generated SQL views that joined all 1-to-1 tables when
we created each nodetype? MySQL view optimization was
pretty underdeveloped in 2007, but other DBMS do a
better job and even mySQL may have improved by now.
Note: most of the problems seem related to the point
where temporary tables kick in (since one can't use
indices after that point - EXPLAIN can be used to see
where that is happening for a particular view or query.
-
The $hData parameter quotes values unless less
the key is preceded by '-'. Why is this necessary?
Couldn't the data dictionary be responsible for deciding
if a raw value needed to be quoted? And if so, wouldn't
a data dictionary quoting process be more reliable?
-
SQL statements are prepared and then executed even when
it would make sense to batch them? DBI doesn't seem to
support this but mySQL does - perhaps we add such a
batch process to the Everything::Nodebase? or
Everything::DBI?
Retrieve and generate node data (public methods)
Although you can use these node retrieval routines with
nodetype nodes, it isn't recommended because
non- nodetype and nodetype nodes need to derive
different data for each node. If you use one of the node
retrieval routines to retrieve a node type you will get
only part of the node information needed. To get the
remainder, one would also need to pass the node id or
hash to getType()
-
$DB->selectNodeWhere($hWhere, $hType, $orderby,$bNodeTableOnly)
- Defined in Everything/NodeBase.pm
Returns a reference to an array containing all node ids
that match the criteria stored in $hWhere. For
an explanation of the parameters see
getNodeCursor
-
$DB->getNodeWhere($hWhere,$hType,$orderby)
- Defined in Everything/NodeBase.pm
Returns a reference to an array containing fully
populated node hashes for all the nodes that match the
criteria stored in
$hWhere. For an explanation of the parameters see
getNodeCursor
-
$DB->getNode($sTitle,$xNodetypeTitleOrHash)
- Defined in Everything/NodeBase.pm. Retrieves a
node hash by title+nodetype or by node_id. All data
for the node will be retrieved.
- $DB->getNodeById($id,$mode)
- Defined in Everything/NodeBase.pm
Returns the node hash for the node identified by
$id. $mode may have one of the following
values:
- force: retrieves node from database even
if node is already cached.
- light: retrieves a reference to the node's
nodetype hash.
- "" or undef: adds extra table and a
list of nodegroup members.
-
$DB->isNodetype($hNode)
- Defined in Everything/NodeBase.pm. Returns true if
$hNode belongs to the nodetype class.
Retrieve and generate node data (internal methods)
-
$DB->constructNode($hNode)
- Defined in Everything/NodeBase.pm. Adds 1-to-1 table
data and group members to a node hash that only has data
from the Node table.
- $DB->getRef($aIdOrHash)
- Defined in Everything/NodeBase.pm
Converts each member of $aIdOrHash from an id to a
hash reference, if it is not already a hash reference.
- $DB->getNodeCursor($hWhere,$hType,$orderby,$bNodeTableOnly)
- Generates a query that joins all of the tables storing
node attributes into a single view.
$bNodeTableOnly is true if only the fields in the
node table should be retrieved. If false or missing,
then all fields in all tables with a 1-to-1, the
relationship to the node will be returned. For an
explanation of the remaining parameters, see
genWhereString.
-
This method uses an all or nothing approach to node
data retrieval. Would there be any benefit to
allowing JIT retrieval from individual tables?
-
All 1-to-1 tables are joined via LEFT JOINS. Can we
replace some or all of these with equijoins? Would it
improve performance? If some tables are optional and
some are required we could use a flag on the nodetype
table and only use left joins for the optional
tables. Or could we replace *all* joins with
equijoins? Or would that cause problems when
new tables are added for existing nodes (we'd need a
routine to go and backfill records for existing nodes
in the nodetype.
-
$DB->genWhereString($hWhere, [$hType], [$orderby])
- Defined in Everything/Nodebase.pm
Generates a where clause from a set of conditions defined
in $hWhere. $hWhere is a hash reference
whose keys are field names and whose values store a
constraint on the value of that field. The generated
where clause will "and" together each of the field
constraints. This is not intended to be a general
purpose where clause generator, so only a limited number
of commonly needed conditions are supported.
If the key begins with '-', then the value is expected
to be prefixed with some sort of comparison operator.
The condition will be constructed by stripping the '-'
from the key and the comparison operator from the
beginning of the value:
- '-X' => '! Y' becomes "X != Y"
- '-X' => 'not ...' becomes
"X
not ..."
. e.g. "X not in ...", "X not between
...", "X not like ...", "X not regexp ...".
- '-X' => '= Y' becomes "X = Y"
If the key begins with anything else then the
reference type of the value determines how the
condition will be built:
- when the value is an array reference, the contents
are presumed to be node ids or things that can be
converted easily into node ids using
getId. These are
combined into a set of "or"'d conditions.
Would the query be faster
if instead we generated "X IN (id1,id2,id3,...)"
instead of a bunch of or statements that the
optimizer may or may not realize constraint the
same field?
- when the value is a scalar, the contents are
presumed to be a string that needs to be quoted
with quote
The $hType parameters stores a hash containing
the field values of a node type. If defined, a
condition to limit the query to a particular nodetype
will be added to the query. The $orderby
parameter is the clause that follows an ORDERBY
keyword. If this is defined, an ORDERBY clause will
be appended to the end of the where clause.
- I wonder if it would be a
good idea to factor out the node specific from the
non-node specific code. Most of this method is
pretty generic - only the treatment of array
reference values is specific to node management.</li
- In its current form
genWhereString doesn't seem to be used more
than once in the everything code. What about in
htmlcode nodes and other forms of embedded
Perl?
- This method has
side effects if $hWhere contains an array
reference value: it will convert any hashes in that
value to node ids!!! Where is this used and why
hasn't it been a problem so far? Should it be changed
(IMHO this side effect is asking for bugs since the
programmer has to be more aware of how the
data in $hWhere will be used before and
after. On the other hand, the database engine
libraries seem to be written to treat ids and hashes
as equivalent, converting back and forth as needed,
so maybe this really isn't an issue after all.)
-
$DB->loadGroupNodeIDs($hNode)
-
$DB->selectNodeGroupFlat($hNode)
- Defined in Everything/NodeBase.pm. Calls
<a href="flattenNodeGroup>
flattenNodegroup($hNode,{},[])
-
$DB->flattenNodeGroup($hNode, $seen,$array)
- Defined in Everything/NodeBase.pm. Returns an array
reference storing all leaf nodes of the group member
tree (or subtree) of which $hNode is the root
Any reason not to use a visitor
pattern here? Actions that require traversal of a group
tree would be less memory and processor intensive if
they used a visitor pattern because they can stop
traversal as soon as the answer is found. Also if a
subroutine reference is used as the visitor we can
easily support situations where the whole list all at
once really is needed.</scan>
- getId($aIdOrNodeHash)
- Defined in Everything/Nodebase.pm
Scans $aIdOrNodeHash for any elements that are
hashes and replaces them with the node id. The hashes
are presumed to be node hashes.
- $DB->getCache()
- returns the node cache. The node cache stores nodes
that have been retrieved from the database and remain
unchanged. The cache's contents are managed by the
other node retrieval functions. Access to the cache is
provided primarily for monitoring purposes.
The actions a user can take depend on the user's
number of experience points. Experience points are made
or lost by three kinds of actions: voting on nodes,
writing nodes (which other users vote on), and simply
showing up and reading nodes.
See Voting/Experience System. Thus reputation management system
and its database support is a core function of the
database engine and its permissions management.
-
$DB->hasVoted($xNodeHashOrId,$xUserHashOrId)
- Defined in Everything/Experience.pm. Returns a
hash containing the vote a particular user
has made on a particular node. For more information
about data stored for each vote, see the database
table for vote
-
$DB->getVotes($xNodeHashOrId,$xUserHashOrId)
- Defined in Everything/Experience.pm. Returns
an array reference containing information on each up or
down vote made on a node. Each array element stores a
hash reference describing the user, ip address, and
direction and weight of the vote. If
$xUserHashOrId is defined, the query will
only return the votes for that user.
-
$DB->allocateVote($xUserHashOrId,$iVoteCount)
- Defined in Everything/Experience.pm. Updates the
unused vote code of user $xUserHashOrId so
that it equals $iVoteCount. Votes are
allocated by daily rounds of the
votebot.
- $DB->castVote($xNodeHashOrId,$xUserHashOrId,$weight,$ip)
- Defined in Everything/Experience.pm. Decrements the
unused vote count of user $xUserHashOrId and adds
(or subtracts) one from the vote count of the node
$xNodeHashOrId.
Node reputation management (internal methods)
-
$DB->insertVote($xNodeHashOrId,$xUserHashOrId,$weight,$ip)
- Defined in Everything/Experience.pm. Called by
castVote to do the
actual creation of a vote record.
-
$DB->adjustRep($xNodeHashOrId,$iRepChange)
- Defined in Everything/Experience.pm. Called by
castVote to adjust
the vote count for the node. Each node keeps an
accumulated total of votes so that the individual
votes don't have to be counted each time we want
to display the total vote count.
-
$DB->adjustExp($xUserHashOrId,$iRepChange)
- Defined in Everything/Experience.pm. Called by
castVote to adjust
the experience level of the node's author and of
the voter $xUserHashOrId.
Create, modify, or delete nodes (public methods)
-
$DB->replaceNode($hNode,$hUser,$hData)
- Defined in Everything/NodeBase.pm. Inserts
$hNode into the database if it is a new node and
updates it if it is an existing node.
Why do we need the "-key"
form in data? do we use it? can't we always put the raw
value in $hData and use the SQL schema to determine
whether or not to pass the value through quote?
-
replaceNodegroup($xNode,$xMember,$xUser)
- Defined in Everything/NodeBase.pm. Generates and
executes an SQL query to replace all the nodes in
$xMembers into the group table of node
$xNode. The replace is done by first deleting and
then reinserting the nodes. The parameters are the same
as insertIntoNodegroup
-
$DB->nukeNode($hNode,$hUser,$bNoTomb)
- Defined in Everything/NodeBase.pm. Performs a
cascading delete on a node object. It deletes records
for a node from the node,links,nodegroup, and
considernodes as well as any tables associated with
the nodes nodetype (attribute tables, group member table)
- What would be the benefit of
moving this to the database via declared FK
relationships and stored procedures?
- What are tomb's used for?
or is this method always called with
$bNoTomb=1?
- Is this node list complete
or are there other tables that also need to be
cleansed if we delete a node? (check data model for
possibilities).
Create, modify, or delete nodes (internal methods)
-
insertNode($hNode,$hUser,$hData)
- Defined in Everything/NodeBase.pm. Generates and
executes an SQL query to insert a node. This method will
fail if the node exists.
-
updateNode($hNode,$hUser)
- Defined in Everything/NodeBase.pm. Generates and
executes an SQL query to update a node. This method will
fail if the node does not exist.
- insertIntoNodegroup($xNode,$xUser,$xMembers,$orderby)
- Defined in Everything/NodeBase.pm. Generates and
executes an SQL query to insert all the nodes in
$xMembers into the group table of node
$xNode. $xNode is the id or hash for
group. $xMembers may be a single node id, a node
hash or an array of node hashes and ids. $xUser
is the id or hash for the user trying to do the insert.
$orderby is used to position the new nodes within
a group of existing nodes.How
does this method work when multiple nodes change the
value of their orderby field? There seems to be a bug in
the current implementation when more than
one node changes its orderby field value, but I don't
remember the details of what happened (though I think a
node disappeared!)
-
removeFromNodegroup($xNode,$xMember,$xUser)
- Defined in Everything/NodeBase.pm. Generates and
executes an SQL query to remove $xMembers from
the group table of node $xNode. $xNode and
$xUser are defined as for
insertIntoNodegroup. $xMember is
id or hash of the group member to be removed.
-
$DB->nodeMaintenance($nodeIdOrHash, $op)
- Defined in Everything/NodeBase.pm.
Retrieves and executes content of a "maintenence" type
node. The code is retrieved by calling
<a href="#getMaintenanceCode>
getMaintenanceCode. It is executed by calling
Everything::HTML::embedCode. For an explanation
of parameters, see getMaintenanceCode.
What do we gain by passing this
to embed code rather than executing it directly? Do
maintenance nodes embed calls to htmlcode nodes? Or use
any feature that would require using embedCode? Could we
make things more efficient by moving all of the
maintenance code to modules? If we want to segregate
custom nodetypes from core engine nodetypes, do we
really need a database to do it? Why not just load and
write files from a special reserve place
in the file system, ideally connected to a VCS?
-
$DB->getMaintenanceCode($nodeIdOrHash, $op)
- Defined in Everything/NodeBase.pm. Retrieves the
content of a "maintenance" type node storing the Perl
code that must be run before executing database operation
$op for nodes belonging to the same type as
$nodeIdOrHash. $op can be "create",
"update", or "delete" though in practice only the
"create" and "delete" actions are currently used.
(The code that triggers a call for the "update"
operation has been commented out.)
-
$DB->tombstoneNode($hNode,$hUser)
- Defined in Everything/NodeBase.pm. Takes a
Data::Dumper snapshot of the node and saves it
to a table (tomb). This method is called by nuke only
when its $bNoTomb method to true.
Any other uses? How often
is nuke called with $bNoTomb set to
true?
In the database engine, catalog queries are called
directly only by the
nodetype management
routines. Node management routines interact with the
data dictionary via their node type. None the less,
these methods are split out into a separate
group of functions because they are written in a very
general way that could be used with any mySQL database.
Their definition is not specific to nodetypes.
-
Is there an equivalent to ALTER TABLE somewhere or is
the only way to alter a field is to drop it or go
directly into mysql.
-
Why aren't we using a more full featured 3rd party web
front end to the database? (e.g. phpAdmin, or some Perl
equivalent)? Integration with nodebase integrity
constraints? If so, are we really using them?
(a) integrity constraints only partially implemented,
e.g. there is a check on drop table but not drop
fields. (b) no alter table (that I can see), so in
order to change a column w/o losing data we need to go
into mysql directly. But if we are doing that anyway we
are bypassing any integrity constraints that exist.
$DB->tableExists($table)
Defined in Everything/Nodebase.pm. Returns true if
$table exists in $DB and false otherwise..
$DB->createNodeTable($table)
Defined in Everything/Nodebase.pm. Creates a skeletal
node table with a primary key named "${table}_id".
$DB->dropNodeTable($table)
Defined in Everything/Nodebase.pm. Removes a node
table and all of its data from the database. Certain
tables are hard coded so that they cannot be dropped.
Is this list complete? Does it
need to be updated or has it been continuously
maintained?.
$DB->addFieldToTable($table,$field,$type,$bPrimary,$default)
Defined in Everything/Nodebase.pm. Adds a field to a
database table
$DB->dropFieldFromTable($table,$field)
Defined in Everything/Nodebase.pm. Drops a field from
a table. <scan class="highlight">Any field can be
dropped. No integrity checks here. Oversight? Reason?
$DB->getFields($table,$mode)
Defined in Everything/Nodebase.pm. Returns an array
with one element per field in table $table.
$mode determines what information will be
retrieved for each field. If 0, only the field name will
be retrieved. If 1, a hash reference containing all of
the SQL catalog data for that field will be returned.
Defaults to 1.
$DB->getFields($table)
Defined in Everything/Nodebase.pm. Returns an array
whose elements are the names of the fields in
$table. This is equivalent to calling
$DB->getFieldHash($table,0).
These methos are used by both node and nodetype
management routines. sqlquery nodes also contain code
that call these methods.
- Everything engine's SQL
execution methods are designed for single table
actions. Any thought to extend this to support SQL
actions on entire views and/or standadard Nodebase
transactions?
- Error handling is fairly binary
in this module (return 0 or 1) and doesn't distinguish
between reasons for failure. Is there any interest in
making error handling a bit more refined or in using
exceptions? This could have two possible benefits
(a) the ability to move permissions into database
(b) the ability to define error handling separate from
database management. If the db is handling permissions
we need to know why something failed, not just that it
did fail. Also if we generate errors with reasons
rather than just return 1 or 0, we could more easily
enhance the development environment with customization
of error handling look-and-feel or debugging level.
This would let pmdevs choose a debugging level
or error output suitable to their current work
effort.
-
$DB->sqlSelectMany($select,$from,$where,$other)
- Defined in Everything/Nodebase.pm. Builds, prepares
and executes the query. $select is the clause
after the SELECT keyword, $from is the clause
after the FROM keyword. $where is the clause
after the WHERE keyword and $other contains any
other clauses such as GROUPBY and LIMIT clauses. If this
method is successful it returns the populated query
cursor. If it fails to execute it returns
undef.
- $DB->sqlSelect($select,$from,$where,$other)
- Defined in Everything/Nodebase.pm. Returns a single
query result as either a scalar or an array.
$select is the clause after the
SELECT keyword, $from is the clause after the FROM
keyword. $where is the clause after the WHERE
keyword and $other contains any other clauses such
as GROUPBY and LIMIT clauses.
mysql
now supports limit clauses. Wouldn't it be more
efficient to rewrite this to use a "limit 1" clause
rather than fetch all matching rows? Or would this be a
problem because callers are already setting
$other to a limit clause?
If one field is selected by the query, then the field
value from the first record found will be returned.
If multiple fields are selected, an array containing
the field values in the order they appear in
$select. If the query failes to execute or
finds no matching rows, this method returns
undef. Just out of
curiosity, why did this method use number of fields in
the select statement rather than wantarray?
- $DB->sqlSelect($select,$from,$where,$other)
- Defined in Everything/Nodebase.pm. Returns a single
query result as a hashref whose keys are field names and
whose values are the values assigned to those fields. If
the query fails to execute or executes but finds no rows,
this method returns undef.
- $DB->sqlDelete($from,$where)
- Defined in Everything/Nodebase.pm. Generates,
prepares, and executes an SQL DELETE command.
$from is the clause after the FROM keyword.
$whereis the clause after the WHERE keyword.
Returns 1 if successful, even if no rows were deleted.
Returns 0 if query failed to execute.
-
$DB->sqlUpdate($table,$hData,$where)
- Defined in Everything/Nodebase.pm.
Generates, prepares, and executes an SQL UPDATE command.
$table is the clause after the UPDATE keyword
$hData is a hash reference whose keys are field
names and whose values are the value that should be
assigned to each field. If the field name begins with
a '-', the '-' will be stripped and the value will be
used as is, e.g. -K=>V will be added to the SET
clause as "K = V". Otherwise the key will be
used as is and the value will be quoted using the
quote()</c> method.
$where is the clause after the WHERE keyword.
Returns 1 if successful, even if no rows were
deleted. Returns 0 if query failed to execute. On
failure a message will be printed out via
Everything::printErr().
- $DB->sqlInsert($table,$hData,$where)
- Defined in Everything/Nodebase.pm.
Generates, prepares, and executes an SQL INSERT command.
$table is the clause after the INSERT keyword.
Otherwise it behaves exactly like
sqlUpdate.
- $DB->quote($s)
- Defined in Everything/Nodebase.pm.
quotes a string value in accordance with ISO SQL2.
Single quotes (') will be doubled and the entire
string will be surrounded with single quotes.
XML::Fling is the preferred way to generate XML feeds. XML::Fling define both an OO and functional interface for generating and storing unvalidated unvalidated XML. an object that generates and stores unvalidated XML. It was developed as a faster alternative to XML::Generator and XML::Writer. The OOP implementation studiously avoids inheritance via @ISA. What was the reasoning behind that? Or does it merely reflect the time when the code was originally written, i.e. a time when there was still no consensus on how OOP should be implemented in Perl?</scan>
The functional interface is defined in XML/Fling.pm (via a list of exported subroutines), but the actual subroutine definitions are stored in XML/Fling/Export.pm. There are seven functions in the interface, each of which return a string containing the requested snippet of XML:
- xml_header([ $encoding [, $version ]])
- generates an XML instruction identify the XML version and encoding: <?xml version="$version" encoding="$encoding"?>
- xml_start($tag,[$attr=>$val]...)
- generates an opening $tag tag with the attribute value pairs $attr => $val. The values should be unencoded. They will be encoded using the same logic as xml_quote below.
- xml_quote($attrval)
- encodes $attrval so that it may be used as an attribute value. This function does not add the quotes. It merely replaces the characters <&"> with their equivalent HTML entity. One will rarely call this function directly as xml_start,xml_element and xml_empty all handle attribute value encoding on the caller's behalf.
- xml_escape($dataval)
- encodes $dataval so that it may be used as a data between open and close tags. It replaces the characters <&> with their equivalent HTML entity.
- xml_end($tag)
- generates a closing $tag tag
- xml_element($tag,[$attr=>$val]...,[$data])
- generates an entire $tag element: open tag, data, and closing tag.
- xml_empty($tag, [$attr=>$val]...)
- generates an empty $tag element, e.g. <$tag $attr => $val />
New objects created by XML::Fling->new($buffer)
and will be blessed into one of three classes
depending on the type of reference stored in $buffer:
- XML::Fling::String if $buffer is a SCALAR reference.
- XML::Fling::Handle if $buffer is a glob or IO::HANDLE
- XML::Fling::Array if $buffer is an ARRAY reference.
All three types of fling objects have the following seven methods. These methods take the same parameters as the xml_XXX equivalent method above. However, rather than return an XML snippet, they append the generated XML to the buffer passed into the constructor. Each method returns $xml so that methods may be called like this: $xml->start('p')->escape('Hello World')->end('p')
- $xml->header
- $xml->start
- $xml->quote
- $xml->escape
- $xml->end
- $xml->element
- $xml->empty
XML::Fling is used in the following nodes (links display code):
For additional nodes using XML::Fling click here
- Everything/MAIL.pm - defines routines for converting
nodes to email messages and vice versa. is this currently in use? If so, where? New user confirmation? Other? Or is it an abandoned feature of the original Everything engine. If so, how do we handle emails today?
Are race conditions, performance - still an issue? how does it come about? If resolved, how? See
- Everything.pm
- escape($chr)
- converts $chr to a URL encoded character: %NN
- unescape($chr)
- converts an URL encoded character $chr (i.e. %NN) to an unencoded character.
Everything/CGI.pm - overrides and extends
methods defined in CGI.pm.
Everything/FormMenu.pm: Everything::FormMenu class
Other
- Everything.pm
- Everything/Diff.pm - functions to compare two texts.
why does the everything engine
have its own set of Algorithm::Diff wrappers?
These don't seem to be using any of the PM specific
global variables. Convenience/syntactic sugar?
- Everything/CacheQueue.pm: implements a cache
optimized so that the most-frequently used items always
end up at the end of the queue and the least-frequently
used items end up at the head.
- Everything/WebServerId.pm - stores values for three
global variables: $VERSION, $short, $long.
where are these used?
THE END
|
|