Previous by DateNext by Date Date Index
Previous by ThreadNext by Thread Thread Index
LM_NET Archive



Hello from D.C.

I know many LMNet readers visit ResourceShelf on a regular basis. Thanks!

Those of you who do not stop by, please do.
http://www.resourceshelf.com

It's updated daily.

Today, I posted an article about "collasped results" from web engines that I
think is worthy of a direct post to the list.
You can find a hyperlinked version with plenty examples at:
http://www.resourceshelf.com

Collapsing Results With Web Search Engines
A couple of weeks ago, Bill Dedman, a Pulitzer Prize winning journalist who
also compiles PowerReporting.Com reminded me about the presentation of results
on just about all general web engines. Nothing new here but it's often
forgotten by many searchers, including me. Before we get started I need to
thank my friend and colleague, Greg Notess for some additional facts about site
collapsing that were incorporated into the article.
--
The Issue
Did you realize that in many cases you are only seeing two results from any one
web site when you do a search? Result page "clustering" or "collapsing" is done
to help reduce visible duplicates but it could also cause you to miss useful
material.
--

Examples
Let's do a Google search for "National Library of Canada" and "British
Library". You'll notice that the third and fourth results come from the Library
of Congress site (Loc.Gov). The second result from particular site is always
intented. However, the Loc.Gov site has many more pages that might be of
interest. Because of site collapsing/clustering Google only shows two results
from any one web site. YOU MUST click the "see more results" link to view all
of the hits from the LC.Gov site that contain your search terms. When you do,
you'll find 139 more hits after Google creates and runs a site restricted
search. To "turn off" the site collapsing feature with Google add &filter=0 to
the end of a Google search URL. If results are less than 800 or so, you can go
to the last page of the results and click on "repeat the search with the
omitted results included" link.
--
Other Engines
* AllTheWeb also collapses results (default) but offers you the option to turn
this function off. For AllTheWeb click, "Customize Preferences", Advanced
Settings, and "Site Collapsing" from the home page. AlltheWeb will include
clustered results later on in the results lists, unlike Google. In other words,
pages that have been clustered will show up later in the relevance-ranked
position, at least sometimes. However, many people only look at the first few,
very few, results.
* AltaVista and Teoma also collapse results. To turn the collapse function off
with AV use the check box on the Advanced Search page. Teoma offers no option
but does offer a "see more results" from link below the second hit.
* By default, MSN Search shows all results but you can limit to only one result
per site (with no link to view all material) by selecting the box on the
Advanced Page. Bottom Line: Awareness of this issue and to use the "see more
results" link to view all of the content from a specific web sites.

==
==
I'm also including some information about AltaVista's news search engine.


2) AltaVista News, Searching Beyond Thirty Days
http://news.altavista.com

News Search from AltaVista continues to improve into a favorite tool. Have you
noticed that the AV News now offers an option to limit by date or date range?
Although, I don't recommend limiting by date for general web searching it can
be very useful with news since every article has a specific publication date
associated with it. One of the date limits at AV is "anytime". Another limit
allows you to search by using a range of dates. What does Anytime mean? While
many news engines contain only about 2-4 weeks of news AltaVista's archive goes
back well beyond 30 days. This doesn't mean it's time start canceling your fee-
based services. If older content is available, it's because the various news
organizations are keeping the links active. AV checks the urls regularly to see
if links are still "hot". In a time of declining budgets, we might as well
maximize what free and low cost content is still available.

More Specifics
Andreas Hartmann from AV tells Resourceshelf, "The archive contains
approximately 4 million URLs (of fully indexed articles from a variety of
sources) which are older than 30 days. URLs are checked every 2-4 weeks for
404s or other issues." Content comes from several sources including a Moreover
feed. Additionally, AV is now crawling selected news sites on their own.
Finally, you can use all of AV's advanced syntax with news search. This
includes the proximity operators NEAR and Within. Search Engine Showdown has a
complete list of the AV syntax


Again, hyperlinked versions of these stories at:
http://www.resourceshelf.com

cheers,
gary



--
Gary D. Price, MLIS
Librarian
Gary Price Library Research and Internet Consulting

Visit The ResourceShelf
http://www.resourceshelf.com

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-
All LM_NET postings are protected by copyright law.
To change your LM_NET status, e-mail to: listserv@listserv.syr.edu
In the message write EITHER: 1) SIGNOFF LM_NET  2) SET LM_NET NOMAIL
3) SET LM_NET MAIL  4) SET LM_NET DIGEST  * Allow for confirmation.
LM_NET Help & Information: http://ericir.syr.edu/lm_net/
Archive: http://askeric.org/Virtual/Listserv_Archives/LM_NET.shtml
LM_NET Select/EL-Announce: http://www.cuenet.com/archive/el-announce/
LM_NET Supporters: http://ericir.syr.edu/lm_net/ven.html
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-

LM_NET Mailing List Home