Belew, 2000  Domain of discourse index home

Search the other textbook

using the following terms: Domain of discourse ; AncestorLinguisticLinguistic corpus based

Pages related to Domain of discourse

These pages belong to the same textbook
1 13 prediction, and are submitting it to a journal. One of the things this particular journal requires is that the author provide up to six keywords under which this article is going to be indexed. If you are sending it to the Communications of the ACM}, you might pick a set of keywords that identify, to the audience of computer scientists you think read this publication, connections between this new work and others' prior work in related areas: NONLINEAR REGRESSION; TIME SERIES PREDICTION.
But now imagine instead that you've decided to submit the same paper} to Byte Magazine}, but must again
2 11 Topical scope
The first constraint we can apply to the set of keywords we will allow in our vocabulary is to define a DOMAIN OF DISCOURSE - the subject area within which each and every user of our search engine is assumed to be searching. While we might imagine building a truly encyclopedic reference work, one capable of answering questions about any topic whatsoever, it is much more common to build a search engine with more limited goals, capable of answering questions about some particular subject. We will choose the simpler path (it will prove enough of a challenge!),
3 185 possible to reason about authors of documents. Section 6.4.1 discusses experiments exploiting Ph.D. genealogies in which dissertation authors are related to one another by shared advisors. Co-authorship and membership in the same research institution have also been proposed as ways to provide context on a particular author's words. In some cases characterizations of expertise of the authors, and independent of the documents themselves, are available.
The chapter concludes with several suggestions of just how these varied information sources can become integrated as part of next-generation FOA tools. Section 6.5 considers several modes of inference by which new conclusion about
4 12 other extreme, VACUUM CLEANERS FOR 747 AIRLINERS} is almost certainly too specific.
The vocabulary size -- the total number of keywords -- depends on many factors, including the scope of the domain of discourse. A typical language user has a reading vocabulary of approximately 50,000 words. Web search engines and large test corpora formed from the union of many document types may require vocabularies ten times this large. It is unlikely that such a large lexicon of keywords is required for restricted corpora, but it is also true that even a narrow field can develop an extensive, specialized jargon
5 157 Figure 5.4: SVD Decomposation
 capture the dominant modes of interaction in J:
J(k) (Eq. 5.17)
This operation is shown schematically in Figure 5.4.
As always, anytime we throw something away (viz., the small eigenvectors), the result must be an approximation. That is, there will be a difference bewteen our reduced-dimension representation J(k) and the original J.
One easy way to measure this discrepancy is by refering to the inter-document similarities latent in the X matrix, and consider how much different they are in the approximate matrix X(k) (Eq. 5.18)
X(k}= (Eq. 5.19)
using the L(2) norm to measure deviation.
6 26 typical not only of images but of documents as well: Watson and Crick's publication of the DNA code in Nature in 1953 was important even then, but what that paper means now could not have been anticipated.
The prospects for associating contentful descriptors with images and even richer media are not quite as bleak as they might seem. In many important cases (e.g., the archives of news photos maintained by magazines and newspapers), images are accompanied by captions, and video streams with transcripts. This additional manually constructed, textual data means that techniques for inferring semantic content directly from images
7 66 Benoit Mandelbrot's explanation
The early days of cybernetics were heady, and Zipf was not alone in seeking a grand, unifying theory that might explain the phenomena of communication on computational grounds like those proving so successful in physics. Benoit Mandelbrot was equally ambitious.
Mandelbrot's background as a physicist is clear when he considers the message decoder as a physical piece of apparatus, ... cutting a continuous incoming string of signs into groups, and recoding each group separately. This differentiator complements an integrator which reconstitutes new messages from individual words. Within this model communication can be considered fully analogous
8 219 much more refined than that of the LoC, it is still too crude for most research currently going on in any one sub-specialty. This has caused some practitioners in various sub-specialties to develop their own extensions. For example, David Waltz was commissioned by Scientific Data-Link to extend the ACM's Computing Reviews} taxonomy for the sub-specialty of artificial intelligence (AI). Waltz's extension is extremely refined and helpful to AI practitioners. At the same time, it is even more ad hoc}, its sponsoring institution has less impact, and consequently it is even less well accepted within libraries.
All three of these
9 162 coordinates.
p{ij}: proximity (evaluation)
d{ij}: distancece (in 2-dimensional plane)
The MDS analsysis described by Borg Lingos (they actually prefer the term Similarity Structure Analysis, SSA) is a key contribution. It is an algorithm for iteratively moving vectors corresponding to the objects of evaluaton (pictures of facial expressions) within an arbitrary dimensional space so as to minimize as much as possible the stress they experience, relative to their pairwise proximities. We think that the pictures have been well-placed if those with similar proximites are close together as measured by this distance.
Within any such space, we can replace the pairwise
10 182 Inference beyond the Index
The Index that critical mapping between documents and descriptive keywords, has dominated our approach to FOA in all the preceeding chapters. But there is of course a larger context of available information: FOA can be accomplished by showing a user relations among keywords, by acquainting him or her with important authors, by pointing to important journals where relevant documents are often published, etc. Retrieval of all these information resources, especially when structured in meaningful interface, can tell a user much more than simply listing relevant documents.
This chapter is concerned with exploiting a variety of
Belew, 2000 index home