ELISAD European Gateway on Alcohol and other Drugs / Final Research and Activity Report December 2003 
back to table of contents


1. General state of the art and context of research.

1.1. Development of the World Wide Web

In the following chapter, general features of the world wide web (www) are summarised to point out the context of research into suitable ways of information management.

History of the www: The idea of a computerised network, managing information via associative links was first formulated by Vannevar Bush (1945), technology counselor of US president Franklin D. Roosevelt. His knowledge management system was called "Memex" (1). In 1965, Theodore Nelson formulated a concept named "Xanadu", focusing on a structure that could serve as a carrier of information elements in various formats, and is linked reciprocally to other similar structures (later known as websites). The physical net between computers was then created during 1969, by the company BBN (Bolt, Beranek and Newman) in Cambridge (Massachusetts), and the UCLA, University of California, Los Angeles. This first version was called "ARPAnet" after its initiator, the Advanced Research Projects Agency. Its technological base was an Interface Message Processor, the first version of a server (2). Since then, the concept of a decentral network was continously developped, and in 1972 the first suitable transfer technology, called Ethernet, was created. In 1983, it was replaced by the standardised computer protocol TCP/IP. In the late 1980s, the graphical user interface that facilitated access to web based information was created by Tim Berners-Lee. The internet then fastly developed to include a growing number of information resources and users (3).

Use of the web: Throughout the 1990s, the internet has developped towards its current role as the main public information technology. Publication on the web has become commonplace for private as well as professional bodies and persons. Although the world wide web is not yet available for 100% of the general population through Western European countries, it can be considered as a main contemporary mass medium. Next to its technical possibilities of information transfer, modern communication technology via email has become an important part of professional internet use. Because emailing facilitates communication to collaborators from all around the world within seconds, and enables the transcendence of physical distances, its use is widespread and well-established.

Research into web-based information and online retrieval systems is confronted with several problems that are inherent to the structure of the world wide web.

Size of the www: The number of websites has increased exponentially in the past decade. Each day, many thousands new websites are added to the world wide web. Actually nobody knows how many websites there are. In Germany, the million threshold of registrations has been exceeded in the end of September 1999.

For example, the service Searchenginewatch provides the number of websites currently indexed. In their daily report, Searchenginewatch published the number of websites on June 17, 2002: "FAST announced today that it has expanded its index to 2.1 billion web pages, taking the lead from Google in the search engine size wars." (http://www.searchenginewatch.com/reports/sizes.html). According to NUA internet surveys, the number for the world total of websites is 580.78 million. For Europe, it is 185.83 millions (http://www.nua.ie/surveys/  September 2002).

Reports can only give an incomplete estimate, counting those pages indexed, and cover pages registered orderly, and legally from providers to private persons, institutional bodies and commercial companies.

Security of the www: Since the beginning, the web has also been serving as a means for several kinds of illegal activitiy and crime, as a side effect of enormous extent. These include e.g. political-ideological, sexual, and commercial exchanges and interests, next to hackers and pirats. Given the large range of abuse of the internet, there is a large "black spot" concerning the real number of websites available online. Additionally, there is no overall control on the content of information on the web: anyone can publish anything. Copyright problems and other juridical questions are not satisfyingly and exhaustively solved to date.

Structure of the www: No general ordering system exists on the internet. Due to its character as a fast-changing and distributed system of resources, the stability of its content is limited: websites may disappear suddenly, or their contents can change. Accordingly, a high degree of fluctuation can be stated as one main feature of the world wide web that determines increasing difficulties in retrieval of information resources.

 

1.1.1. Properties of websites

One objective of the project is to generate and systematically collect metadata about the content and general features of websites. Websites are a highly complex medium of information, which are subject to ongoing development, and include a wide range of technical possibilities of information transfer and types of resources.

General structure: In comparison to books and other traditional publications, websites as media are characterised by a high degree of complexity.
Generally, a website consists of several html pages, comparable to chapters in a book. These pages are usually ordered by menue points, which are comparable to a table of contents that structure the information presented. The menue serves for navigation between a varying number of html pages, from which the websites consists. These menue points are organised in a tree structure. Its branches may relate to the fields of activity of an institution, or to the anticipated points of interest of its audience(s), according to the type of website and aims of the publisher.

To formulate standards for navigation oriented to the user´s requirements, principles of user-fiendlyness were formulated e.g. by Jürg Nievergelt in the early 1980s, and other informaticians active in HCI (human computer interaction) research (www.jn.inf.ethz.ch/nievergelt).

Each website is connected to the world wide web by a provider that ensures online accessibility and functionality via a server computer system.

Central function: Websites are intended to be carriers of resources. Websites serve as a means of information transfer and publishing in a very wide sense. The types and formats of information provided via web pages, as well as their content and publishers, are various. There are increasing, and potentially nearly unlimited technical possibilities to include online services and interactive facilities. Therefore, many websites increasingly have a huge degree of complexity,

From the variety of types and formats of information available on websites, the currently most significant and usual types of resources are summarised as follows.

Electronic publications: A function of many websites is that they serve to present fulltext publications of organisations /producers that are included in the website and accessible through internal links within the pages. Electronic text formats other than HTML can include PDF (portable document format), or word documents. Other formats include e.g. pictures (GIFs, JPGs), music (WAV, MP3) and movies (Flash).

Fulltext documents available on websites encompass all kinds of literature-analogeous materials. The spectrum covers journals and articles, books and monographies, scientific reports or activity reports, brochures, leaflets, etc including a large amount of "grey" publications and listings of references or annotated bibliographies (see 1.2.1.).

Interactivity: Websites can provide access to a large variety of interactive sections and tools, such as online subscription and ordering, personally directed communication forums (chatrooms, newsgroups, mailing lists) or automatical communication (tests, questionnaires, educational games), feedback and questions (FAQ).

Databases: Websites can provide direct access to data and metadata via searchable online databases which provide access to systematically selected collections. These can contain e.g. bibliographic data, fulltext documents, addresses, project information, or fulltext publications. Subject gateways are based on the principle of databases of searchable metadata on online resources.

Systematic collections: Other systematic compilations and directories can include addresses, weblinks, research abstracts, laws, and inventories of recent and current activities and projects.

Multilinguality: Another challenge is most striking while working in an European scope: the multilinguality of websites in the drugs field is one of the main reasons limiting knowledge about and access to web-based information. While English is the main language of the web, and some institutional websites feature English translations of their content, the huge majority of information is published in various native languages. For some languages, this even means the use of non-western typography: e.g. Greek and Serbian cyrillic letters may limit their understandability to a minimum of native audiences. In some countries, web-based publications are even restricted to regional languages, e.g. Spain: Catalan, Andalusian ect. This fact contributes significantly to bad retrieval of subject-related resources.

Changeability: Websites are continously developped and updated. The structure and content of websites change over time. Websites may disappear due to several reasons, such as domain changes, or cease of the organisations activity or funding.

Thus, the general conplexity of this medium must be considered in combination with a high degree of fluctuations, and respectively low degrees of stability: structures and contents develop and change over time, or services are suspended (e.g. the online library service bibliotox.it (Gruppo Abele) in Italy).

In relation with this variety of information types and formats, websites cover a large variety concerning their content, functions, and intentions of publishing.


ELISAD European Gateway on Alcohol and other Drugs / Final Research and Activity Report December 2003 
back to table of contents