Concepts and similar pages to Page 8

Page 8 Concepts and similar pages

Concepts

Similarity

Concept

Information structure

Association Measures

Term

Measures of association

Keyword

Term frequency

Keyword frequency

Automatic content analysis

Correlation measure

Frequency of occurrence

In basing a theory of evaluation on the theory of measurement,is it possible to devise a measure of effectiveness not starting with precision and recall but simply with the set of relevant documents and the set of retrieved documents?If so,can we generalise such a measure to take account of degree of relevance?An alternative derivation of an E type measure could be done in terms of recall and fallout ...Up to now the measurement of effectiveness has proved fairly intractable to statistical analysis ...I think the Robertson model described in Chapter 7 goes some way to being considered as a reasonable statistical model ...There may be laws of retrieval such as the well known trade off between precision and recall that are worth establishing either empirically or by theoretical argument ...6 ...There is a need for more intensive research into the problems of what to use to represent the content of documents in a computer ...Information retrieval systems,both operational and experimental,have been keyword based ...The major reason for this rather simple minded approach to document retrieval is a very good one ...

The structure of the book The introduction presents some basic background material,demarcates the subject and discusses loosely some of the problems in IR ...The two major chapters are those dealing with automatic classification and evaluation ...Outline Chapter 2:Automatic Text Analysis contains a straightforward discussion of how the text of a document is represented inside a computer ...Chapter 3:Automatic Classification looks at automatic classification methods in general and then takes a deeper look at the use of these methods in information retrieval ...Chapter 4:File Structures here we try and discuss file structures from the point of view of someone primarily interested in information retrieval ...Chapter 5:Search Strategies gives an account of some search strategies when applied to document collections structured in different ways ...Chapter 6:Probabilistic Retrieval describes a formal model for enhancing retrieval effectiveness by using sample information about the

Two AUTOMATIC TEXT ANALYSIS Introduction Before a computerised information retrieval system can actually operate to retrieve some information,that information must have already been stored inside the computer ...The starting point of the text analysis process may be the complete document text,an abstract,the title only,or perhaps a list of words only ...The developments and advances in the process of representation have been reviewed every year by the appropriate chapters of Cuadra s Annual Review of Information Science and Technology ...

Sparck Jones has carried on this work using measures of association between keywords based on their frequency of co occurrence that is,the frequency with which any two keywords occur together in the same document ...The term information structure for want of better words covers specifically a logical organisation of information,such as document representatives,for the purpose of information retrieval ...The organisation of these files is produced by an automatic classification method ...Evaluation of retrieval systems has proved extremely difficult ...

In practice,one seeks some sort of optimal trade off between representation and discrimination ...The emphasis on representation leads to what one might call a document orientation:that is,a total preoccupation with modelling what the document is about ...This point of view is also adopted by those concerned with defining a concept of information,they assume that once this notion is properly explicated a document can be represented by the information it contains [37]...The emphasis on discrimination leads to a query orientation ...Automatic keyword classification Many automatic retrieval systems rely on thesauri to modify queries and document representatives to improve the chance of retrieving relevant documents ...

linguistics in information science ...The chapter therefore starts with the original ideas of Luhn on which much of automatic text analysis has been built,and then goes on to describe a concrete way of generating document representatives ...Luhn s ideas In one of Luhn s [6]early papers he states:It is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance ...I think this quote fairly summaries Luhn s contribution to automatic text analysis ...Let f be the frequency of occurrence of various word types in a given position of text and r their rank order,that is,the order of their frequency of occurrence,then a plot relating f and r yields a curve similar to the hyperbolic curve in Figure 2 ...

language input and storage more feasible ...The reader will have noticed that already,the idea of relevance has slipped into the discussion ...Intellectually it is possible for a human to establish the relevance of a document to a query ...An information retrieval system Let me illustrate by means of a black box what a typical IR system would look like ...Starting with the input side of things ...

185

approaching this problem of speeding up clustering is to look for what one might call almost classifications ...A big question,that has not yet received much attention,concerns the extent to which retrieval effectiveness is limited by the type of document description used ...Document classification is a special case of a more general process which would also attempt to exploit relationships between documents ...An argument parallel to the one in the last paragraph could be given for automatic keyword classification,which in the more general context might be called automatic content unit classification ...H ...

141

that this principle works so well is not yet clear but see Yu and Salton s recent theoretical paper [39]...The connection with term clustering was already made earlier on in the chapter ...It should be clear now that the quantitative model embodies within one theory such diverse topics as term clustering,early association analysis,document frequency weighting,and relevance weighting ...References 1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...

Concepts

Similar pages