Search Engines'
Databases
There are many specialized searchable databases, accessible by the
World Wide Web to access through a search box in a web page (for
example, the UCB Library Catalog Pathfinder, or any other library
catalog; or some statistics databases searchable on the web). The
terms used in the search are sent into that specialized database,
and are returned to in another web page that is dynamically
generated for answer.
Databases provide the content storage for many sites, which
dynamically create web pages around them, including ecommerce
catalog sites, online news, and even entertainment sites. These
databases generally have their own search functions, which may
appear to take the place of a full-text search engine.
Once the search engine finds a new site, it uses a crawler program
to retrieve and index the pages on the site. Pages can only be
found when linked to with visible hyperlinks. For instance, some
search engines are starting to read links created by Flash (for
example, Google).
Search engine crawlers may look at a number of different factors
when crawling a site, and many pages from a site may not be
indexed by the search engines until they gain more PageRank, links
or traffic. Distance of pages from the root directory of a site
may also be a factor in whether or not pages get crawled, as well
as other importance metrics.
Google offers submission program, such as Sitemaps, for which an
XML type feed can be created and submitted. A simple link from a
site already indexed will get the search engines to visit a new
site and begin spidering its contents. It can take a few days or
even weeks from the acquisition of a link from such a site for all
the main search engine spiders to begin indexing a new site, and
there is usually not much that can be done to speed up this
process.
A few search engines, such as Yahoo!, offers a paid submission
service that guarantee crawling for either a set fee or CPC. Such
programs usually guarantee inclusion in the database, but does not
guarantee specific ranking within the search results.