Concept: Cosine correlation

Cosine correlation

Similar concepts

Similarity

Concept

Independence measurements

Additive independence

Correlation measure

Partial correlation coefficient

Serial search

Informational correlation measure

Discrimination gain hypothesis

Serial file

Cluster profile

Centroid

Pages with this concept

Similarity

Page

Snapshot

is another example of a matching function ...A popular one used by the SMART project,which they call cosine correlation,assumes that the document and query are represented as numerical vectors in t space,that is Q q 1,q 2,...or,in the notation for a vector space with a Euclidean norm,where [[theta]]is the angle between vectors Q and D ...Serial search Although serial searches are acknowledge to be slow,they are frequently still used as parts of larger systems ...Suppose there are N documents Di in the system,then the serial search proceeds by calculating N values M Q,Di the set of documents to be retrieved is determined ...1 the matching function is given a suitable threshold,retrieving the documents above the threshold and discarding the ones below ...2 the documents are ranked in increasing order of matching function value ...

keyword is indicated by a zero or one in the i th position respectively ...where summation is over the total number of different keywords in the document collection ...Salton considered document representatives as binary vectors embedded in an n dimensional Euclidean space,where n is the total number of index terms ...can then be interpreted as the cosine of the angular separation of the two binary vectors X and Y ...where X,Y is the inner product and ...X x 1,...we get Some authors have attempted to base a measure of association on a probabilistic model [18]...When xi and xj are independent P xi P xj P xi,xj and so I xi,xj 0 ...

106

account of past performance ...Consider now a retrieval strategy that has been implemented by means of a matching function M ...It is the aim of every retrieval strategy to retrieve the relevant documents A and withhold the non relevant documents A ...the decision procedure M Q,D T >0 corresponds to a linear discriminant function used to linearly separate two sets A and A in R [t]...M Q 0,D >T whenever D [[propersubset]]A and M Q 0,D <T whenever D [[propersubset]][[Alpha]]The interesting thing is that starting with any Q we can adjust it iteratively using feedback information so that it will converge to Q 0 ...

107

exists there is an iterative procedure which will ensure that Q will converge to Q 0 in a finite number of steps ...The iterative procedure is called the fixed increment error correction procedure ...It goes as follows:Qi Qi 1 cD if M Qi 1,D T <0 and D [[propersubset]]A Qi Qi 1 cD if M Qi 1,D T >0 and D [[propersubset]]A and no change made to Qi 1 if it diagnoses correctly ...The situation in actual retrieval is not as simple ...Once again this is not the whole story ...If M is taken to be the cosine function Q,D Q D then it is easy to show that [[Phi]]is maximised by where c is an arbitrary proportionality constant ...