Digital Libraries

1.1: Digital Libraries


1.1.1: Content / Objects of a DL
1.1.1.1: Examples
1.1.1.2: Possible Definition:
1.1.2: Services
1.1.3: Aspects of Content

2: Information Retrieval

2.1: Introduction


2.1.1: Examples of Information Needs
2.1.1.1: Find a phone number
2.1.1.2: Browse the WWW

2.2: A Bit of History

2.3: Different Kinds of Information Systems

2.3.1: Searching for Scientific Documents


2.3.1.1: Methods
2.3.1.2: Characteristics and Problems

2.3.2: Example of a Search in a Scientific Database

Abb. 1: A document from INSPEC

Abb. 2: Number of documents found in INSPEC with different queries for the time interval January - June 1995

2.3.3: Data Retrieval

Abb. 3: Database entries

2.3.4: Hypertext Systems


2.3.4.1: Example

2.3.5: Expert Systems

2.3.6: Management Information Systems

2.3.7: Specific Systems for Specific Domains

3: Knowledge Representations and Retrieval Models

3.1: Models of Communication and Interaction

3.1.1: Transfer of Information

Abb. 4: Basic Scheme of Information Transfer

3.1.2: Dialogues

Abb. 5: Simple Dialog Scheme

Abb. 6: Basic Scheme of an Information system

3.2: Boolean Retrieval

3.2.1: The Boolean Model


3.2.1.1: Attributes
3.2.1.2: Queries

3.2.2: Boolean Text Retrieval

Abb. 7: Model of a Text-based Boolean Information Retrieval Systems

3.2.3: Implementation


3.2.3.1: Controlled Vocabulary
3.2.3.2: Construction of an inverted list
3.2.3.3: Processing of a query
3.2.3.4: Further Features

3.3: Strings, Terms, Concepts

Abb. 8: Truncations that do not only exclude animals (from Ferber, Wettler, Rapp 1995)

3.3.1: Stemming

Abb. 9: Various depths of reduction (from Kuhlen 1977, p. 58)


3.3.1.1: Replacement Rules

Abb. 10: Some of the Rules of the Kuhlen Algorithm

Abb. 11: Application of the Kuhlen Rules


3.3.1.2: Lexicon Based Approaches

Abb. 12: Morphological Analysis of "Flüssen" according to Lezius (1995)

3.3.2: Classifications

3.3.2.1: Classification


3.3.2.2: Dewey Decimal Classification (DDC)

Abb. 13: Top level classes of the DDC (according to http://www.oclc.org/oclc/fp/about/ddc21sm1.htm)

Abb. 14: Level 2 classes of the DDC (according to http://www.oclc.org/oclc/fp/about/ddc21sm2.htm)

Abb. 15: A path through the International Decimal Classification IDC (according to Manecke 1997)

Abb. 16: A path through the German version of the International Decimal Classification IDC (according to Fuhr 1995)

3.3.3: Thesauri


3.3.3.1: Thesaurus construction

3.4: Vector Space Model

3.4.1: The Model

3.4.1.1: Vector Space Model of IR:

Abb. 17: Vektor Space Model of Text Retrieval

3.4.2: Relation to Boolean Retrieval

3.4.2.1: Inner Product:

3.4.3: Term Weighting


3.4.3.1: Local Weighting Strategies
3.4.3.1.1: Term frequency
3.4.3.1.2: Using document structure
3.4.3.2: Word Frequencies in Language
3.4.3.2.1: Zipfs Law

Abb. 18: Zipfs Law applied to the Brown- and LOB-Korpus

Abb. 19: Qualitative View of Zipfs Law


3.4.3.3: Global Weighting Strategies

Abb. 20: Discrimination Power vs. Frequency (from Salton & McGill 1983)


3.4.3.3.1: Inverted document frequency

3.4.4: Relevance Feedback (Rocchio)

3.4.5: The SMART System


3.4.5.1: Automated Indexing

3.4.6: Similarity Measures


3.4.6.1: Inner Product
3.4.6.2: Cosine Measure
3.4.6.3: Other Similarity Measures

3.5: Evaluation of IR Systems

3.5.1: Measures

3.5.1.1: Relevance

3.5.1.2: Precision and Recall

3.5.1.3: Precision-Recall-Diagram

Abb. 21: A Precision Recall Diagram

Abb. 22: Precision - Recall - Diagram Displayed in the Plane

3.5.2: Test Collections

Abb. 23: Test Collections (according to Griffiths Luckhurst & Willett 1986 and Dumais, 1991)

3.5.3: The TREC Experiments


3.5.3.1: Relevance Judgements

Abb. 24: Size of Document Pools Used for Relevance Assessment (from 1995 - WWW, 1996 - WWW)


3.5.3.2: Tracks

3.6: Advanced Vector Space Systems

3.6.1: Pseudo Relevance Feedback

3.6.2: Pairs of Terms

3.6.3: Passage Retrieval

3.6.4: Similarity Measures


3.6.4.1: Adapted Cosine
3.6.4.2: Robertsons-Spark Jones Formula

Abb. 25: Number of Documents in a Collection when Classified According to the two Criteria: "Contains a term tk " and "Is Relevant to Query"

3.7: Advanced Models

3.7.1: Inference Networks

3.7.2: Co-occurrence Based Methods

Abb. 26: Associations Automatically Generated from the Lob - and the Brown Corpus Using Co-occurrences of Terms


3.7.2.1: Associative Indexing and Query Expansion
3.7.2.2: Cross Language Retrieval

Abb. 27: Results of the Study on Cross Language Retrieval Using Similarity Measures Generated from Parallel Corpora. (from Sheridan and Ballerini, 1996)

3.8: Meta Data

3.8.1: Dublin Core