Belew, 2000  Prior distribution index home

Search the other textbook

using the following terms: Prior distribution; Distribution joint probabilityDistribution priorDistribution stationaryDistribution ZipflanJoint probability distributionZipfian distribution

Pages related to Prior distribution

These pages belong to the same textbook
1 266 It is also important to remember that changes made to a single document in response to a single query can make no guarantees about improved performance with respect to other documents and other queries. For example, two documents might both be moved closer to a query (as proposed by Brauen/Roccio) while their relative rankings are not changed at all!

User drift and event tracking
One interesting feature of the training set generated by the routing task is the odd distribution of positive and negative examples it generates. Initially we can imagine that this filter is very inaccurate;
2 268 Figure 7.5: Training Classifier
(In figure 7.5, the classifications are imagined to be part of a hierarchical classification system, as discussed in detail in Section 7.4.5 .) This data is used somehow (for now it's OK to think of it as magic:) to tune the set of parameters specifying a particular classifier. The second and dominant phase is then to use this classifier to automatically assign document to classes in an analogous manner to those in manually classified in the training set.
We seek the posterior probability of a particular class, given the evidence provided by a new document,
3 340 passage retrieval in full text information systems.
In SIGIR93 . Salton et al., 1994 Salton, G., Allan, J., Buckley, C., and Singhal, A. (1994).
Automatic analysis, theme generation, and summarization of machine-readable texts.
Science , 264:1421-1426. Salton and Bergmark, 1979 Salton, G. and Bergmark, D. (1979).
A citation study of computer science literature.
IEEE Transactions on Professional Communication , 22(3):146-58. Salton and Buckley, 1988a Salton, G. and Buckley, C. (1988a).
On the use of spreading activation methods in automatic information retrieval.
Technical Report 88-907, Dept. Computer Science, Cornell Univ., Ithaca, NY. Salton and
4 117 1. Real FOA versus laboratory retrieval
From the FOA perspective, users retrieve documents as part of an extended search process. They do this often, and because they need the information for something important to them. If we are to collect useful statistics about FOA, we must either capture large numbers of such users in the act, (i.e., in the process of a real, information seeking activity), or we must attempt to create an artificial, laboratory setting. The former is much more desirable, but makes strong requirements concerning a desirable corpus, a population of users, and access to their retrieval
6 121 retrieval procedure, the resulting assesments may have overlap with, and hence be useless for comparison of, methods producing significantly different retrieval sets. For the TREC collection, this problem was handled by drawing the top 200 documents from a wide range of 25 methods which had little overlap. Vogt has explored how similarities and differences between retrieval methods can be similarly exploited as part of combined, hybrid retrieval systems (cf. Section 7.4.4 ).
It is also possible to sample a small subset of a corpus, submit the entire sample to review by the human expert, and extrapolate from the number
7 144 Figure 4.19: RAVE Interface
and LEGAL APPLICATIONS and the third is a scrolling pane containing the text of the long, RelFbk document (in this example, the thesis A PLANNING MODEL WITH PROBLEM ANALSIS ). While the subject is asked to judge the documents shown to him or her as
8 163 The J measure provides a criterion for retrieval function Match(). In experimental sitation the only preferences available are that Rel -Rel, but in natural retreval situations, users'richer RelFbk preference data can be used.
A particularly interesting use of this criterion is as part of error correction learning 7.3 . If we assume that the ranking function has certain free variables that we again have a training set of documents and that the criterion is differentiable with respect to a gradient search procedure can be used to adjust towards an optimal retrieval:
(Eq. 5.24)
For example, Bartell et al. consider
9 116 Following a large number of such interactions documents which are wanted by the users will have been moved slowly into the active portion of the document space - that part in which large numbers of users' queries are concentrated, while items which are normally rejected will be located on the periphery of the space.
This provocative proposal, allowing a search engine to learn from its users}, is considered in much greater detail in Chapter 7.

Summary
We have been discussing RelFbk from the individual user's point of view. We've focused on how this information might be
10 233 of indirect associations as well. To a first approximation the changes made by AIR to direct keyword-to-document associations are not unlike those proposed by Salton and Brauen (if I'd only known!). But AIR makes other changes, to more indirect associations as well.
Salton and Buckley have analyzed the spreading activation search used in some of these systems and concluded that it is inferior to more traditional retrieval methods . They point out:
... the relationships between terms or documents are specified by labeled links between the nodes .... the effectiveness of the procedure is crucially dependent on the availability
Belew, 2000 index home