Instructions & Caveats
This is for sure one of the most important parts of searchlores,
and you would be well advised to try some of the incredibly powerful search engines listed below.
If you limit yourself to google (that in november 2004 suddenly doubled it's index to 8 billions documents)
or to Yahoo (that in July 2005 claimed to have indexed almost 20 billion documents) you 'll just cover less than
one half of the visible web (and not even 1/100 of the hidden one).
Just copy this page onto your harddisk as
c:\main.htm (or whatever), and then bookmark it there and
use it (after having edited or thrown away anything you fancy)
in order to perform effective searches on the web
using any main search engine and starting from an unpolluted jumping off place, a page that has as few frills as
possible and as many useful forms as we know of. A page that you can modify -and ameliorate- yourself (feedback, in that case,
would be appreciated).
The main reason you should use more than one main search engine is that
search engines' results overlap FAR less than you would think. Recent studies
point out that around 3/4 of the results of a given search are UNIQUE for each search engine.
Remember that search engines list only the first part of any BIG DOCUMENT:
the size varies.
Google had a famous limit of 101K, which was abolished in January 2005, the new limit should be around 150K. These
limits are very annoying when dealing with large documents (or on-line books).
Note also that just because one, hundred, or thousand pages from a given site are crawled and
made searchable trough one of the main search engines, this does not guarantee that
every page from an indexed site has really been crawled and indexed. This shortcoming
hits not only 'new' pages, that can take MONTHS to be indexed: beehives
of spiders harvesting
a site often MISS whole subdirectories, old and new. Useful material may
be all but invisible to those that only use 'main' search tools to seek.
Moreover anyone that uses regularly google (for instance, but other search engines are
not that different) will have noticed how polluting commercial sites results nowadays
are. Would
a search engine introduce a new, simple "please hide all commercial sites form your SERPs" (Search
Engines Result Pages) option, or switch, or slide, it would probably become king of the hill in a couple of months.
Therefore, seen the commercial-oriented pollution of the web, you
would be
well advised to use regional engines, usenet and other
specialized or targeted search tools and combing
techniques and also to rely on your own bots as well, when searching your various targets.
Note that you can also easily search and find targets
that do not
exist any more :-)
A useful tool to compare results in google and yahoo:
http://www.langreiter.com/exec/yahoo-vs-google.html?q=searchlores