Previous by Date | Next by Date | Date Index
Previous by Thread | Next by Thread
| Thread Index
| LM_NET
Archive
| |
Hello from D.C. I know many LMNet readers visit ResourceShelf on a regular basis. Thanks! Those of you who do not stop by, please do. http://www.resourceshelf.com It's updated daily. Today, I posted an article about "collasped results" from web engines that I think is worthy of a direct post to the list. You can find a hyperlinked version with plenty examples at: http://www.resourceshelf.com Collapsing Results With Web Search Engines A couple of weeks ago, Bill Dedman, a Pulitzer Prize winning journalist who also compiles PowerReporting.Com reminded me about the presentation of results on just about all general web engines. Nothing new here but it's often forgotten by many searchers, including me. Before we get started I need to thank my friend and colleague, Greg Notess for some additional facts about site collapsing that were incorporated into the article. -- The Issue Did you realize that in many cases you are only seeing two results from any one web site when you do a search? Result page "clustering" or "collapsing" is done to help reduce visible duplicates but it could also cause you to miss useful material. -- Examples Let's do a Google search for "National Library of Canada" and "British Library". You'll notice that the third and fourth results come from the Library of Congress site (Loc.Gov). The second result from particular site is always intented. However, the Loc.Gov site has many more pages that might be of interest. Because of site collapsing/clustering Google only shows two results from any one web site. YOU MUST click the "see more results" link to view all of the hits from the LC.Gov site that contain your search terms. When you do, you'll find 139 more hits after Google creates and runs a site restricted search. To "turn off" the site collapsing feature with Google add &filter=0 to the end of a Google search URL. If results are less than 800 or so, you can go to the last page of the results and click on "repeat the search with the omitted results included" link. -- Other Engines * AllTheWeb also collapses results (default) but offers you the option to turn this function off. For AllTheWeb click, "Customize Preferences", Advanced Settings, and "Site Collapsing" from the home page. AlltheWeb will include clustered results later on in the results lists, unlike Google. In other words, pages that have been clustered will show up later in the relevance-ranked position, at least sometimes. However, many people only look at the first few, very few, results. * AltaVista and Teoma also collapse results. To turn the collapse function off with AV use the check box on the Advanced Search page. Teoma offers no option but does offer a "see more results" from link below the second hit. * By default, MSN Search shows all results but you can limit to only one result per site (with no link to view all material) by selecting the box on the Advanced Page. Bottom Line: Awareness of this issue and to use the "see more results" link to view all of the content from a specific web sites. == == I'm also including some information about AltaVista's news search engine. 2) AltaVista News, Searching Beyond Thirty Days http://news.altavista.com News Search from AltaVista continues to improve into a favorite tool. Have you noticed that the AV News now offers an option to limit by date or date range? Although, I don't recommend limiting by date for general web searching it can be very useful with news since every article has a specific publication date associated with it. One of the date limits at AV is "anytime". Another limit allows you to search by using a range of dates. What does Anytime mean? While many news engines contain only about 2-4 weeks of news AltaVista's archive goes back well beyond 30 days. This doesn't mean it's time start canceling your fee- based services. If older content is available, it's because the various news organizations are keeping the links active. AV checks the urls regularly to see if links are still "hot". In a time of declining budgets, we might as well maximize what free and low cost content is still available. More Specifics Andreas Hartmann from AV tells Resourceshelf, "The archive contains approximately 4 million URLs (of fully indexed articles from a variety of sources) which are older than 30 days. URLs are checked every 2-4 weeks for 404s or other issues." Content comes from several sources including a Moreover feed. Additionally, AV is now crawling selected news sites on their own. Finally, you can use all of AV's advanced syntax with news search. This includes the proximity operators NEAR and Within. Search Engine Showdown has a complete list of the AV syntax Again, hyperlinked versions of these stories at: http://www.resourceshelf.com cheers, gary -- Gary D. Price, MLIS Librarian Gary Price Library Research and Internet Consulting Visit The ResourceShelf http://www.resourceshelf.com =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=- All LM_NET postings are protected by copyright law. To change your LM_NET status, e-mail to: listserv@listserv.syr.edu In the message write EITHER: 1) SIGNOFF LM_NET 2) SET LM_NET NOMAIL 3) SET LM_NET MAIL 4) SET LM_NET DIGEST * Allow for confirmation. LM_NET Help & Information: http://ericir.syr.edu/lm_net/ Archive: http://askeric.org/Virtual/Listserv_Archives/LM_NET.shtml LM_NET Select/EL-Announce: http://www.cuenet.com/archive/el-announce/ LM_NET Supporters: http://ericir.syr.edu/lm_net/ven.html =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-