  | |  | A question about scoring function in Lucene | A question about scoring function in Lucene 2004-12-15 - By Nhan Nguyen Dang
Back Hi all, Lucene score document based on the correlation between the query q and document t: (this is raw function, I don't pay attention to the boost_t, coord_q_d factor)
score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t) (*)
Could anybody explain it in detail ? Or are there any papers, documents about this function ? Because:
I have also read the book: Modern Information Retrieval, author: Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison Wesley (Hope you have read it too). In page 27, they also suggest a scoring funtion for vector model based on the correlation between query q and document d as follow (I use different symbol):
sum_t( weight_t_d * weight_t_q) score_d(d, q)= -- ---- ---- ---- ---- ---- ----- (**) norm_d * norm_q
where weight_t_d = tf_d * idf_t weight_t_q = tf_q * idf_t norm_d = sqrt( sum_t( (tf_d * idf_t)^2 ) ) norm_q = sqrt( sum_t( (tf_q * idf_t)^2 ) )
(**): sum_t( tf_q*idf_t * tf_d*idf_t) score_d(d, q)=-- ---- ---- ---- ---- ---- ----- (***) norm_d * norm_q
The two function, (*) and (***), have 2 differences: 1. in (***), the sum_t is just for the numerator but in the (*), the sum_t is for everything. So, with norm_q = sqrt(sum_t((tf_q*idf_t)^2)); sum_t is calculated twice. Is this right? please explain.
2. No factor that define norms of the document: norm_d in the function (*). Can you explain this. what is the role of factor norm_d_t ?
One more question: could anybody give me documents, papers that explain this function in detail. so when I apply Lucene for my system, I can adapt the document, and the field so that I still receive the correct scoring information from Lucene .
Best regard, Thanks every body,
===== ?#7863;ng Nh?n
__ ____ ____ ____ ____ ____ ______ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------ To unsubscribe, e-mail: lucene-user-unsubscribe@(protected) For additional commands, e-mail: lucene-user-help@(protected)
Earn $52 per hosting referral at Lunarpages.
|
|
 |