The Ultimate Guide to the Invisible Web

December 19, 2006

There’s information out there that is actually not (yet) indexed in the big search engines such as Google. The non-indexable part of the Web is called the Dark, Deep, Hidden or Invisible Web. Fortunately, the Invisible Web is getting easier to search, with tools beyond the standard “big three” search engines. According to recently published PhD dissertation (Shestakov 2008:5), the query-based dynamic portion of the Web known as the deep Web remained poorly indexed by search engines even in 2008.

Shestakov refined distinction between Deep, Hidden or Invisible Web,

“There is a slight uncertainty in the terms defining the part of the Web that is accessible via web search interfaces to databases. In literature, one can observe the following three terms: invisible Web [97], hidden Web [46 hidden behind web search interfaces], and deep Web [25]. The first term, invisible Web, is a superior to latter two terms as it refers to all kind of web pages which are non-indexed or badly indexed by search engines (i.e., non-indexable Web). The terms hidden Web and deep Web are generally interchangeable, and it is only a matter of preference which to choose. In this thesis we use the term deep Web and define it as web pages generated as results of queries issued via search interfaces to databases available online. In this way, the deep Web is a large part but still part of the invisible Web (Shestakov 2008:5).”

read more | digg story

Garcia claimed Texas-based university professor Jill H. Ellsworth (d.2002), Internet consultant for Fortune 500 companies, coined the term “Invisible Web” in 1996 to refer to websites that are not registered with any search engine. ” “Ellsworth is co-author with her husband, Matthew V. Ellsworth, of The Internet Business Book (John Wiley & Sons, Inc., 1994), Marketing on the Internet: Multimedia Strategies for the World Wide Web (John Wiley & Sons, Inc.), and Using CompuServe. She has also explored education on the Internet, and contributed chapters on business and education to the massive tome, The Internet Unleashed.”

[S]igns of an unsuccessful or poor site are easily identified, says Jill Ellsworth. “Without picking on any particular sites, I’ll give you a couple of characteristics. It would be a site that’s possibly reasonably designed, but they didn’t bother to register it with any of the search engines. So, no one can find them! You’re hidden. I call that the invisible Web. Ellsworth also makes reference to the “dead Web,” which no one has visited for a long time, and which hasn’t been regularly updated (Garcia 1996).

I distinguish between the Invisible Web and the Deep Internet. Much of the research that is promoted by social media continues to focus primarily on business models of marketability not just findability.

The Deep Internet 2008 continues to be at cross purposes with the motivations of social minded authors. Too many foundational texts and articles that could be so useful to robust conversations in civil society are restricted to those with access codes to the deep internet, the dark place of open source and Web 2.0+. It would be hoped that writings and work written about key individuals concerned about ethics, economics, psychoanalysis, sociology, cultural studies . . . would be made available through the Creative Commons License 3.5, preferred by many engaged thinkers including many academics in 2008. Many of the services of the Deep Internet operate within the private sector model as user-pay. Others are restricted to those who are members of exclusive academic associations, the insular knowledge elite, who also operate with obligatory membership fees. JSTOR for example has its references behind a paywall. It provides summaries and a small section of text for free.

In a recent on-line search for biographical information on Zygmunt Bauman, for example a number of sites refer to Deep Internet sites: http://sociologyonline.net. One of the first sources available is http://www.megaessays.com.

“sociologizing makes sense only in as far as it helps humanity” and “sociology is first and foremost a moral enterprise,”

“To think sociologically can render us more sensitive and tolerant of diversity. Thus to think sociologically means to understand a little more fully the people around us in terms of their hopes and desires and their worries and concerns (Bauman & May, 2001).”

 

A pioneer in knowledge management, Professor Kim Veltman of SUMS, traced a history of major projects collections of recorded knowledge that changed the world sometimes taking centuries to construct. He argued that commercial offerings with short-term albeit, useful and profitable solutions lack the essential long-term vision. Digital media, full digital scanning and preservation, electronic networks could enable future generations in every corner of the world to access, study and appreciate all the significant literary, artistic, and scientific works of mankind. He is concerned that privatization of this communal memory is already underway and without intervention will only increase, effectively limiting access to those who have means. We have the means to shed light on the deep Internet. Is there the will?

 
“In a world where we make tens and even hundreds of millions of titles available online, readers need digital reference rooms. [T] he good news is that publishers have made many dictionaries, encyclopaedias and other standard reference works available in electronic form. Modern libraries now typically have an online section on Electronic Reference Sources.118 Special licences with publishers mean that some of these works are available free of charge at libraries and universities. Companies such as XReferplus now offer access to 100 or 150 standard reference works.119 The less good news is that the electronic versions of these reference works are frequently so expensive that they are beyond the reach of individual scholars. Meanwhile, there has been a trend for such reference works to be owned by a few key companies. In Germany, the pioneer in this field was K. G. Saur, which publishes “nearly 2000 print, microfilm, and electronic formats.” In 1987, Saur was acquired by Reed International. In 2000, it became part of the Gale Group owned by Thomson.120 In the United States, Dialog,121 which was founded in 1967, and “provides access to over 9 terabytes or more than 6 million pages of information“, was acquired by the same Thomson Company in 2000.122 Meanwhile, Bowker123 founded in 1872, which publishes Ulrich’s International Periodicals Directory (1932); and Books In Print 124 (1948-) was acquired by Xerox (1967) then Reed International (1981), then by Cambridge Information Group (2001), which has recently also acquired ProQuest Information and Learning (2006).125 Today, works such as Books in Print, are available only to institutions and are no longer available to individual subscribers. Fifty years ago, only the richest libraries could hope to achieve near comprehensive coverage of secondary literature. Today, practically no library can hope to be comprehensive and most collections are retreating. For instance, Göttingen, which had over 70,000 serials in the 1970s, now covers 30,000 serials. The California Digital Library has 21,000 electronic journals, which is impressive until we recall that Ulrich’s Periodicals Index lists 250,000 journals and serials. Meanwhile, at the University of California San Francisco, we find another modern catalogue that looks objective until we look closely and discover that of the 20 headings nine are traditional subjects and the remainder are branches of medicine (Appendix 3) … Ever since Gutenberg went bankrupt from the first printing, it has been obvious that publishers need to be attentive survival. For a very few companies this is not a problem. For instance, in 2004, Reed Elsevier126 listed an operating profit of £1126 million and profit attributable of £675 million.127 Somewhat disturbing is a trend whereby the world of longterm recorded knowledge is increasingly being framed in the terms of short-term business propositions, as if the whole of the public sphere was open to business exploitation..(Veltman 2007:12).” 

Webliography and Bibliography on the Deep Internet

Bergman, Michael K. 2001. “The Deep Web: Surfacing Hidden Value.” Taking License: Recognizing a Need to Change. Journal of Electronic Publishing. 7:1. Ann Arbor, Michigan: Scholarly Publishing Office, University of Michigan University Library. August.

Ellsworth, Jill H.; Ellsworth, Matthew V. 1994. The Internet Business Book. John Wiley & Sons, Inc.

Ellsworth, Jill H.; Ellsworth, Matthew V. 1997. The Internet Business Book. John Wiley & Sons, Inc.

Ellsworth, Jill H.; Ellsworth, Matthew V. 1995. Marketing on the Internet: Multimedia Strategies for the World Wide Web. John Wiley & Sons, Inc.

Ellsworth, Jill H.; Ellsworth, Matthew V. 1996. Marketing on the Internet: Multimedia Strategies for the World Wide Web. 2nd Edition. John Wiley & Sons, Inc.

Ellsworth, Jill H.; Ellsworth, Matthew V. Using CompuServe. John Wiley & Sons, Inc.

Ellsworth, Jill H. Chapters? The Internet Unleashed.

Garcia, Frank. 1996. “Business and Marketing on the Internet.” Masthead. 9:1. January. Alternate url @ web.archive.org

Shestakov, Dennis. 2008-05. “Search Interfaces on the Web: Querying and Characterizing”. PhD. Dissertation. Turku Centre for Computer Science. Finland.

Veltman, Kim H. 2007. “Framework for Long-term Digital Preservation from Political and Scientific Viewpoints.” Digitale Langzeitarchivierung. Strategien und Praxis europäischer Kooperation, Deutschen Nationalbibliothek, anlässlich der EU-Ratspräsidentschaft Deutschlands, 20-21. April 2007. Frankfurt: National Bibliothek.

See also Timeline: Deep Web work in progress

Leave a comment