Mailing List
Home
Forum Home
Maven - Project building tool
Axis - Java SOAP implementation
Cocoon - MVC web framework based on XML/XSL
Lucene - Full-featured text search engine APIs
Log4J - A log library
Fop - Create PDF, PCL, PS, SVG, XML driven by XSL formatting objects.
POI - Java Excel, Word and other Microsoft Office files manipulating library
Oracle database error code ...
Subjects
log4j warning: No appenders could be found
java security AccessControlException: access denied (java io FilePermission clie
java lang InstantiationException: org apache tools ant Main
Apache Axis Tutorial
Struts <logic iterate >
log4j properties How to parse outpu to multiple files
configuring log4j with BEA Weblogic 8 1
How to use XSL FOP Java together
JSP precompile
Servlet File Download dialog problem (IE6,Adobe 6 0)
Proposal: Adding jar manifest classpath in jar and war plugins
Unsupported major minor version 48 0 problem while running the an
   telope task
java security AccessControlException: access denied (java io FilePermission
axis wsdl2java Ant Task usage
net sf hibernate MappingException: Error reading resource: test/User hbm xml
Building EAR ANT Script for websphere 5 0
CREATING WAR Files
jsp data into Excel
Classpath problem
Jboss 3 2 3+ vs Tomcat Axis Question
RE: How to include jars and add them into the MANIFEST MF/Class Path
attribute
Printing problem
InstantiationException
Couldn 't find trusted certificate
Please : How can one install ant 1 6 0 under Eclipse 2 1 ?
Excel: Too many different cell formats
Running junit tests fails
XDoclet, Struts and Maven: Where to start? SOLUTION
1 3 final: now giving me java io FileNotFoundException (Too many
open files)
AXIS: tomcat timeout ?
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
A question about scoring function in Lucene

A question about scoring function in Lucene

2004-12-15       - By Nhan Nguyen Dang

 Back
Reply:     1     2     3     4     5     6     7     8     9     10     >>  

Hi all,
Lucene score document based on the correlation between
the query q and document t:
(this is raw function, I don't pay attention to the
boost_t, coord_q_d factor)

score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t
/ norm_d_t)  (*)

Could anybody explain it in detail ? Or are there any
papers, documents about this function ? Because:

I have also read the book: Modern Information
Retrieval, author: Ricardo Baeza-Yates and Berthier
Ribeiro-Neto, Addison Wesley (Hope you have read it
too). In page 27, they also suggest a scoring funtion
for vector model based on the correlation between
query q and document d as follow (I use different
symbol):

          sum_t( weight_t_d * weight_t_q)
score_d(d, q)=  -- ---- ---- ---- ---- ---- ----- (**)
             norm_d * norm_q

where weight_t_d = tf_d * idf_t
     weight_t_q = tf_q * idf_t
     norm_d = sqrt( sum_t( (tf_d * idf_t)^2 ) )
     norm_q = sqrt( sum_t( (tf_q * idf_t)^2 ) )

(**):          sum_t( tf_q*idf_t * tf_d*idf_t)
score_d(d, q)=-- ---- ---- ---- ---- ---- -----  (***)
      norm_d * norm_q

The two function, (*) and (***), have 2 differences:
1. in (***), the sum_t is just for the numerator but
in the (*), the sum_t is for everything. So, with
norm_q = sqrt(sum_t((tf_q*idf_t)^2)); sum_t is
calculated twice. Is this right? please explain.

2. No factor that define norms of the document: norm_d
in the function (*). Can you explain this. what is the
role of factor norm_d_t ?

One more question: could anybody give me documents,
papers that explain this function in detail. so when I
apply Lucene for my system, I can adapt the document,
and the field so that I still receive the correct
scoring information from Lucene .

Best regard,
Thanks every body,

=====
?#7863;ng Nh?n





   
__ ____ ____ ____ ____ ____ ______
Do you Yahoo!?
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
For additional commands, e-mail: lucene-user-help@(protected)


Earn $52 per hosting referral at Lunarpages.