Bibliographic remarks
The early work of H.P. Luhn has been emphasised in this chapter.
Therefore, the reader may like to consult the book by Schultz[44] which contains a selection of his papers.
In particular, it contains his 1957 and 1958 papers cited in the text.
Some other early papers which have had an impact on indexing are Maron and Kuhns[45], and its sequel in Maron[46].
The first paper contains an attempt to construct a probabilistic model for indexing.
Batty[47] provides useful background information to the early work on automatic keyword classification.
An interesting paper which seems to have been largely ignored in the IR literature is Simon[48].
Simon postulates a stochastic process which will generate a distribution for word frequencies similar to the Zipfian distribution.
Doyle[49] examines the role of statistics in text analysis.
A recent paper by Sparck Jones[50] compares many of the different approaches to index term weighting.
A couple of state-of-the-art reports on automatic indexing are Stevens[51] and Sparck Jones[52].
Salton[53] has compiled a report containing a theory of indexing.
Borko[54] has provided a convenient summary of some theoretical approaches to indexing.
For an interesting attack on the use of statistical methods in indexing, see Ghose and Dhawle[55].
References
1. DAMERAU, F.J., 'Automated language processing', Annual Review of Information Science and Technology, 11, 107-161 (1976).
2. SPARCK JONES, K. and KAY, M., Linguistics and Information Science, Academic Press.
New York and London (1973).
3. MONTGOMERY, C.A., "Linguistics and information science', Journal of the American Society for Information Science, 23, 195-219 (1972).
4. KEENAN, E.L., 'On semantically based grammar', Linguistic Inquiry, 3, 413-461 (1972).
5. KEENAN, E.L., Formal Semantics of Natural Language, Cambridge University Press (1975).
6. LUHN, H.P., 'The automatic creation of literature abstracts', IBM Journal of Research and Development, 2, 159-165 (1958).
7. ZIPF, H.P., Human Behaviour and the Principle of Least Effort, Addison-Wesley, Cambridge, Massachusetts (1949).
8. EDMONDSON, H.P. and WYLLYS, R.E., 'Automatic abstracting and indexing survey and recommendations', Communications of the ACM, 4, 226-234 (1961).
9. ANDREWS, K., 'The development of a fast conflation algorithm for English'.
Dissertation submitted for the Diploma in Computer Science, University of Cambridge (unpublished) (1971).
10. LOVINS, B.J., 'Development of a stemming algorithm'.
Mechanical Translation and Computational Linguistics, 11, 22-31 (1968).
11. LOVINS, B.J., 'Error evaluation forstemming algorithms as clustering algorithms', Journal of theAmerican Society for Information Science, 22, 28-40(1971).
|