http://qs321.pair.com?node_id=1011074


in reply to Re: Machine learning pattern matching...
in thread Machine learning pattern matching...

The algorithm needs to be independent of the content, but the content will contain list data.

Not all result pages will contain all the same info either, hence having a best guess on the actual data meaning too.

Replies are listed 'Best First'.
Re^3: Machine learning pattern matching...
by BrowserUk (Patriarch) on Dec 31, 2012 at 17:40 UTC

    That strengthens my argument that it would be much, much simpler to hand-craft a small routine to extract the required data from each type of page than to try and write a single routine that would attempt to recognise and extract whatever appropriate information exists on any page you give it.

    Indeed, depending upon the variety of possible inputs, I would suggest that the latter is close to impossible.

    And if you did expend the time, manpower and money on getting something working, it would not sooner be working than one or more of the input sources would decide to revamp their site and screw the whole thing up.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.