See what u are over looking is the fact that "Content" or Data is being overlooked here.
I just gave a brief scrap of what came to my mind. Let's say I get serious with this technique, I will make a more complex implementation.
To give a small explanation:
I will create a tree structure, just for a single page. When I say I will give more importance to the title tag, it means it will be the ROOT of the Tree. The H1 (or to be precise, any bold html that shows up prior to simple text) tags will come as nodes, and the content they discuss will come as child to those nodes. To simplify look up, the content is broken up into keywords which have a proper construct (like the way MS Word Grammar check does). These keywords are associated into a index table (just for that particular page, and in the specific subnode), with their occurence frequencies. Now since I said I will index only those keywords which follow proper construct, it will stop spammers from repeatedly wrting the same key word over and over again. After that I create a diversity factor. Usually, in previous case, a spammer could re-write a sentence with same keywords many times over and over again. To cut that, the diversity factor is calculated as a function of words in a sentence construct. It will also include non-keywords like (is that the them their etc), hence a unique paragraph with meaningful text gets properly credited.
This along with frequency table will make the index table.
This index table is then finally generated for the whole page and belongs to the tree structure. Such tree structure is generated for each and every page that is submitted, and then in the end these tree's finally become the part of the giant tree called the webspace. The way a page-tree enters teh web space is, it is categorically stored. Categories are created on the basis of keywords, and a page-tree can belong to several keywords (ofcourse), but are linked with weighted nodes, where the weight of the node tells that how prominent that key word is in the page tree.
Remember that the keyword weight is a function of "where it appears in the page" plus the frequency plus the diversity factor. It can all become a complex mathematical equation if I sit down to seriously work upon it.
But the point is... in a world dominated by Google, its impossible to outperform it. Look at Acoona... a real fine search engine with little future.
Comment/Reply (w/o sign-up)