I use mostly fast,
teoma
and google for my broad queries.
But there are MANY other good 'global' search engines, like for instance
hotbot that can come handy
at times.
See [elsewhere] on my site
WHICH search engines or
bots you should use for your specific queries. The number of
retrieved pages, given in parenthesis,
will of course vary each time you re-perform each example search. Note also
how some of the examples given here represent
quite useful "links factories" per se. 1)USE VARIOUS RESOURCES!
Should I give only ONE advice, it would be this one. Even more important that "keeping on
track" (see below). Never, never, never overestimate your search tool of choice. EACH search
engine, [main],
[regional]
or [local]
has its own quirks and its own blindness patterns ("shadows").
Everytime I 're-play' a given
search on some specific "free-pages depository" local search engine (à la geocities) I get
amazing results... Thus you should
NEVER 'stick' to a 'given' search engine 'of choice'. Learn how much they differ and - even more
important - understand how much each "given s.e." results' set changes over time! The web is a quicksand,
and search engines databases AND POLICIES are continuously changing as well. Altavista and
ftpsearch, for instance,
are now actively 'censoring'
results. Try there a god search for MP3, DVD reversing, Napster, Gnutella or
Infraseek and you'll quickly
see what I mean.
2)KEEP ON TRACK!
Nothing easier than to loose your thread when you are working on
the web. The examples used on my site represent
links to
interesting (I hope) searches / places /startpoints as well.
As you'll soon realize, the examples and links
offer you continuous opportunities to leave this site in
order to
browse to other
very promising ones. This is done on purpose: The hyper bastard approach to web page building is -on most sites-
to restrict click away opportunities to a bare minimum. Even when a
reference demands a link, new methods hide or reduce the
visibility of that same link. Everything in order to keep a visitor 'caged' or 'trapped'
in a given site. I'll do the exact contrary, since you must
learn some discipline if you'r going to be a good seeker. You leave my site for good while
searching for a target?
Good riddance. My links will offer you a
lot of added
knowledge AND will at the same time test your
capability to keep on track :-)
3)LOWERCASE
Always enter your search terms in lower case (unless you
want to limit your search).
Most search engine will thus find both upper and lower case
occurences of your searchstring. "pAris" (24) is
NOT the same as "paris" (4110800)
4)EXACT SEQUENCE [""]
Enclose terms in double quotation marks if you want to retrieve
those exact terms in that exact sequence. This may be very
useful in order to find a specific page. Thus "searchengines" will give
you (20940) pages with the two terms 'glued' together. Similarly
"saerch engine" will
retrieve some (11) pages WITH THIS SAME MISSPELLING ERROR.
5)NARROW DOWN [ AND | & | + ] and ELIMINATE MERCILESSY
[ AND NOT | | | - ]
Narrow your searches by linking your search terms with AND or &,
or simply use
the plus sign [+].
The search engine will find only those pages that contain all of
your search terms.
Similarly, exclude pages that are not relevant to your search by
preceding the search term with AND NOT or | or simply use the
minus sign [-].
+"search
engines" +hints +tips
+techniques -tits -sex -"make money" (5200) is better than
the more simple +"search engines" +hints +tips +techniques (7700)
DOWNSIDE OF THE + & - SIGN With the
+ sign you may miss related documents that don't have the
words you specify as required. For example, the search
"searching tips" +searchlores would not
include documents
that have the words
"searching tips", but not searchlores.
With the - sign it's easy to exclude too much. For example, if you were
looking for information on "bots script" but not in
javascript, the search +"bots scripts"
-javascript would exclude
a document that was all about bots scripts, but that had
the sentence "this kind of bot would be impossible in javascript"
6)DOWNSIDE OF THE BOOLEAN
operators
It's often difficult to specify exactly what you want to include
or exclude.
You can also get unexpected results if you are not careful about
your use
of operators and parentheses. For example, the search seeking OR
searching
AND finding is the same as the search seeking OR (searching AND
finding).
Both queries will find documents that contain both searching and
finding,
together with documents that contain the word seeking. However,
the query
(seeking OR searching) AND finding is not the same. It will find
documents
containing the word finding and, in the same document, either
seeking or
searching.
Be careful with the
boolean operators!
7)"PECULIAR" strings
You should always strive to use differentiating keywords when
searching
the web. Words that are commonly used will not help you much.
Extremely common words like articles and prepositions are so
worthless that they are
completely ignored. Try to use words which underline
the peculiarity of your target. Common words, when combined with
boolean qualifiers, can be very effective. You must identify the
main concepts in your topic and determine any synonyms,
alternate spellings, or variant word forms for the concepts.
Remember that the most "peculiar" a word, the more useful
it will be in order to sharpen your search. +
title:"search strateg*" +hints +tips
in this case we did include the "search strateg*" string (which
already has
an elevate PEC) in the title: keyword.
8)SPECIAL KEYWORDS Note the use of a keyword in the
previous example.
Here a short list of the main keywords
(for altavista):
anchor:text
applet:class
domain:domainname to avoid
commercial crap exclude with -"domain:com"
host:name
image:filename
link:URLtext
text:text
title:text
very useful for narrowing
url:text
9)ASTERISK[*]
Note also the use of the asterisk [*] in
the previous example: it MUST be used after at least 3 characters,
it is valid
for up to 5 characters or as an element of a phrase. For
Altavista:
Asterisk (*): After 3 specified
characters will search
for matches in up to 5 trailing letters.
Question Mark (?): After 3
specified characters will match
exactly one more character.
Double Asterisk (**)
More flexible as it will search for
matches for an unlimited number of
trailing characters.
You
also have the ability use the wildcards interchangeably and more
than once in the same search
string
10)ARCHIVE
You should archive your useful queries and repeat them over time.
All
search engines that contain the "cgi-bin" snippet in the query
produced can be saved and used
again later. Since the results of all queryes VARY
WITH THE TIME (when traffic
is particolarly heavy the search engines "cut" the results) you
would be well
advised, for important queries, to repeat them again and again.
11)STOP WORDS
Stop words are words such as "and" "the" and "or" which search
engines exclude from their searches to make them more effective.
These terms are excluded because they are either extremely common or they
are used by the search engine for performing more specialized
searches. Just think about how many documents on the Web
contain the word "the" and you'll understand how important is a good stop words list for
all search engines.
If you really do want to search for one of these terms, there is an
easy way to work around stop words. By bracketing words in
quotation marks, search engines will look for every word inside
the quotes, in the sequence you specify. Thus, if you wanted to
look for sites with the words search the web
you would use the searchstring "search the web".
12)SNOOPING BOLDLY AROUND -1
As you'll learn elsewhere on this site, there are many methods to access
some 'non public' portions of the web.
A quick tip is to look for a file called ROBOTS.TXT in the main directory of your
target site, entering per hand the URL with the following pattern: http://www.targetsite.com/robots.txt
This file is used to tell search engines which directories and files they should
not index on a specific site. Thus anything that has been put inside a 'robots.txt' file
will not be found by your searchqueries. However, once you have seen the names, you can
still type them directly into your browser in order to access the various subdirectories
and pages.
13)GO REGIONAL!
A fantastic tip: go regional, both using local search engines and using
some linguistic tricks that work even for languages you do not
know and/or using ad hoc dictionaries and translation free services. The depth you can reach on a specific long term search using a -say- korean search engine
cannot be surpassed by any google search. Go regional everytime you're stuck. You wont regret it!
14)DOWNLOADING FILES FROM BUSY SERVERS
If you are trying to download some (ahem) popular files, you are probably competing
with many other people for access. Pick a server in a country where it is
very early in the morning if you have this option, alternatively schedule
the download so that it will be effectuated
when the time IN THE STATES or in EUROPE is early in the morning (GMT 05.00 or GMT 12.00)
or, MUCH MUCH better, use an automatic email downloader like downloadslave
instead (see the
accmail section) and spare you the hassle :-)
15)PROXIMITY SEARCHES... HIT PAYLOAD EVERYTIME YOU SEARCH!
Real ~S~eekers use proximity operators quite a lot (for obvious - ahem - reasons) as you'll
learn in the advanced sections of my site.
Altavista uses the NEAR command in order to select keywords
within 10 words of each other, useful but quite limited.
When you seriously work using
proximity
searches THERE WAS ONLY ONE SEARCH ENGINE FOR YOU: Infoseek which did allow
you to choose any of the following options... or to combine them... :-)
ADJ/# (# number of words apart - exact: no more, no less)
0ADJ (adjacent words in specified order)
0ADJ/# (# number of words apart - exact - in specified order)
NEAR (within 25 words)
NEAR/#(within # words)
ONEAR (within 25 words in specified order)
ONEAR/#(within # words in specified order)
FAR (more than 25 words from each other in at least ONE instance)
FAR/#(more than # words from each other in at least ONE instance)
OFAR (more than #25 words from each other in at least ONE instance in specified order)
OFAR/#(more than # words from each other in at least ONE instance in specified order)
As you can imagine, being something useful this has been ELIMINATED from the web :-(
16)THE YO-YO TECHNIQUE
The 'down yonder' problem is well known by searchers. Being easily spammed, the main search engines
suffer a terrible draw-back: indeed some interesting results may indeed be listed
somewhere, yet the
spam-cram keeps popping up in the first positions
while the juicy targets you are looking for lie buried somewhere
in the huge list, 'down yonder'. There is an interesting approach, that can be used
for most main search negines. They are all subject to spam, the most
infamous one being Altavista... Hic alta, hic salta is a well-known proverb among seekers,
meaning that you should jump straight at the 6/7th page of results when searching with
Altavista, since all pages that are listed in the first postions are - mostly - irrelevant, or even paid
scum. This approach is called the Yo-yo approach.
For example, searches of 'where is my car'
type are obviously calls for quick answers, and if the
info you need is short, you can even get it in the SERP
summaries (the asterisk (*) trick is nice, and also 'asking the answer').
Yesterday I looked for some time the metaspy erom mentioned
at the seeker's and saw with my very eyes the query [what
is brittany speers shoe size], I swear! With a quick fix of
the spelling and some quotes you can get ["britney spears" "shoe size *"] faster than
the guy can figure out why google asks him for an alternative spelling,
when his own is undoubtedly right.
Also, a useful thing to mention is what to do when you realize your search strategy is wrong - like:
If you get little or no results in a main SE, switch to metasearch.
If your keywords are too generic, try a SE that can cluster results (kartoo-like) until you learn enough specific keywords about your target.
If your target is very topic specific, find a topic-centric SE