Submission means filling out a form on a
search engine's site to invite them to add your site to
their index. What many people don't realize is that this
is unnecessary. Engines find what's on the web by
following links. As long as there's a link to your site
from any site that's already in the search engines, the
engines will find your site. If you don't have any
incoming links you're not going to rank well anyway.
Once your site is listed in an engine you're in for
good (unless you get kicked out for trying to fool them,
as covered below under Black Hat
SEO). There's never any reason to resubmit your site
once it's already in. Resubmission is a waste of
time.
The overwhelming majority of search traffic comes from
the top five or so search engines. Some companies will
offer to submit your site to "thousands" of search
engines. This is a waste of money. If your site is linked
to from anywhere, you'll get in all the search engines
that matter, automatically, for free.
Search engines use automated robots to follow the
links around the web and grab the content from the web
pages they find. The robots are called spiders,
and when they follow links they're crawling the
web (also called spidering). Google's spider is
called Googlebot, and you'll see it listed as the
user agent in your server logs. Once a search engine has
gathered a site's data and analyzed it the site is said
to be indexed. To see whether your site is in the
Google index, search Google for
site:yourdomain.com.
New sites don't always get listed right away. In some
cases it can take several months for a new site to show
up in the SERPS. Even when a site gets in the index, Many
believe that Google puts new sites "in the
sandbox" and won't let them rank well for the initial
few months. Jennifer Laycock has a
better explanation: New sites can rank fine if
there's not much competition for that topic, but Google
will assume that a new site in an established,
competitive market isn't any better than the tons of
sites already there, unless that site proves itself to be
superior. (More in this
discussion thread.) The sandbox issue has been
discussed on Webmaster World ad nauseum. (Searching
WebmasterWorld for
all pages mentioning the sandbox results in nearly
1000 hits at present.) Here's a small sampling of
threads from 2004: Sept.
8, Nov.
20, Nov.
23, Nov.
29, Dec.
2, Dec.
9.
Once a site is in a search engine, the engine's
spider will periodically revisit it and re-index it from
scratch. The engines understand that the Internet is
dynamic and changing, so they constantly re-evaluate the
pages in their indices. So not only will every engine
probably find your site on its own the first time, it
will keep visiting it over and over again on its own,
too.
Google appears to visit most pages in its database at
least once a month, though it may take longer. Some pages
get visited every day. Sites with a higher PageRank
(i.e., sites that have a lot of inbound links from other
sites) get spidered more frequently than sites with a low
PR. And sites which update more frequently get spidered
more often than sites which rarely make updates. You can
try to invite more frequent spider visits by updating
your pages more frequently, even if the changes
themselves are minor and negligible, though there is
questionable advantage in doing so. This won't
necessarily let you test your page ranking ideas through
trial and error any faster because even if an engine
spiders your new content to see what you have on your
page, it won't necessarily figure out how those changes
should affect your rank for weeks or months. And
of course, more frequent spider visits by themselves do
nothing for your rankings.
Search engines find pages by following HTML
links. As long as the pages on your site are linked
up properly the engines will find them. But if your pages
aren't linked properly, your pages will never make it
into the index. Here are some typical things that can
cause an engine to fail to find your pages.
1. Links are done in Javascript. Many engines
don't follow links done in Javascript, such as those
found in drop-down menus. If you have Javascript links,
make certain you also have text links somewhere on the
page as well. It doesn't hurt to have Javascript links as
long as you also have plain links on the page.
2. Links are done in Flash. Many engines can't
follow links in Flash. If you have Flash links, make
certain you also have text links somewhere on the page as
well.
3. Orphaned pages. If you forget to link to a
certain page from at least one other page, that page page
is said to be orphaned. An engine can't find it
because it can't follow a link to it. Make sure every
page on your site is linked to from at least one other
page.
4. Dynamic pages. Search engines can generally
follow dynamic URLs (those with a question mark) as long
as the have only one or two parameters. Three parameters
-- hard to say. Four or more parameters is probably
pushing it. But even if the engines can follow dynamic
URLs, that doesn't necessarily mean that those pages will
rank well. Two noted experts stated
flatly in 2003 that pages with dynamic urls rank
worse than those without. It's unclear whether that's
true today, but many webmasters aren't taking chances:
They're using the Mod Rewrite feature of the Apache web
server software to turn dynamic urls into static ones.
There are many threads
on WebmasterWorld about how to do this.
If you prefer not to turn your dynamic urls into
static ones, you should at least put the most important
parameters in your urls first. There's some feeling that
the engines may try the first two or three parameters and
ignore the rest. For example, if your url was:
http://domain.com?language=Eng&user=4873&style=15&article=238
Then instead try:
http://domain.com?article=238&language=Eng&user=4873&style=15
Incidentally, here's an article that gives other
reasons for removing the query string from urls.
5. Site is down. An engine can't index a site
if it's down. Make certain you use a reliable webhost.
(You can also use monitoring software or subscribe to an
automated monitoring service to email, phone, or page you
if your site goes down.) It's unlikely that you'll be
removed from an engine just because your site was down
once when they tried to visit, but if your site is down
for several days that could spell bad news. An engine
doesn't want to list your site in the SERPs if visitors
can't actually get to it.
Images and Frames. Search engine spiders can
follow image links and links in framesets just fine --
depsite what you sometimes read on the net..
Can the spider read your page? Remember
then even if a search engine can find a page it
might not be able to figure out what that page is about.
Spiders eat words, so they have to be able to see the
words on your site in order to index them. Spiders can't
read the text that's in graphics. Any text that you want
the spiders to read and index should be written out as
text. At the very least, put any text that appears in
graphics into the images' ALT tags. Spiders are getting
better at reading the text that's in Flash but they're
still not very good at it. Make sure any Flash page you
have has a "Skip this intro..." link that takes visitors
(and spiders) to the text-rich content of your site.
Now go to Part 4: Choosing
good keywords