Mailing List
Home
Forum Home
Maven - Project building tool
Axis - Java SOAP implementation
Cocoon - MVC web framework based on XML/XSL
Lucene - Full-featured text search engine APIs
Log4J - A log library
Fop - Create PDF, PCL, PS, SVG, XML driven by XSL formatting objects.
POI - Java Excel, Word and other Microsoft Office files manipulating library
Oracle database error code ...
Subjects
log4j warning: No appenders could be found
java security AccessControlException: access denied (java io FilePermission clie
java lang InstantiationException: org apache tools ant Main
Apache Axis Tutorial
Struts <logic iterate >
log4j properties How to parse outpu to multiple files
configuring log4j with BEA Weblogic 8 1
How to use XSL FOP Java together
JSP precompile
Servlet File Download dialog problem (IE6,Adobe 6 0)
Proposal: Adding jar manifest classpath in jar and war plugins
Unsupported major minor version 48 0 problem while running the an
   telope task
java security AccessControlException: access denied (java io FilePermission
axis wsdl2java Ant Task usage
net sf hibernate MappingException: Error reading resource: test/User hbm xml
Building EAR ANT Script for websphere 5 0
CREATING WAR Files
jsp data into Excel
Classpath problem
Jboss 3 2 3+ vs Tomcat Axis Question
RE: How to include jars and add them into the MANIFEST MF/Class Path
attribute
Printing problem
InstantiationException
Couldn 't find trusted certificate
Please : How can one install ant 1 6 0 under Eclipse 2 1 ?
Excel: Too many different cell formats
Running junit tests fails
XDoclet, Struts and Maven: Where to start? SOLUTION
1 3 final: now giving me java io FileNotFoundException (Too many
open files)
AXIS: tomcat timeout ?
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Lucene1.4.1 + OutOf Memory

Lucene1.4.1 + OutOf Memory

2004-11-10       - By Rupinder Singh Mazara

 Back
Reply:     1     2     3     4     5     6     7     8     9     10  

karthik

i think the core problem in your case is the use of compound files, i would
be best to switch it off
or alternatively issue a optimize as soon as the indexing is over.

 i am copying the file contents between <file> tags, the patch is to be
applied on TermInfosReader.java, this
was done to help out of memory exceptions while doing indexing
 <file>
Index: src/java/org/apache/lucene/index/TermInfosReader.java
===================================================================
RCS file:
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.ja
va,v
retrieving revision 1.9
diff -u -r1.9 TermInfosReader.java
--- src/java/org/apache/lucene/index/TermInfosReader.java   6 Aug 2004
20:50:29 -0000   1.9
+++ src/java/org/apache/lucene/index/TermInfosReader.java   10 Sep 2004
17:46:47 -0000
@@ -45,6 +45,11 @@
    readIndex();
  }

+  protected final void finalize() {
+    // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
+    enumerators.set(null);
+  }
+
  public int getSkipInterval() {
    return origEnum.skipInterval;
  }
</file>



however tomcat does react in strange ways to to-many open files,
try to restrict the number of IndexReader or Searchable objects
 that you create while  doing searches,
I  usually keep one object to handle all my user requests

public static Searcher fetchCitationSearcher(HttpServletRequest request)
throws Exception {
       Searcher rval = (Searcher)
request.getSession().getServletContext().getAttribute(
               "luceneSearchable");
       if (rval == null) {
         rval = new IndexSearcher( fetchCitationReader(request) );

request.getSession().getServletContext().setAttribute("luceneSearchable",
rval);
       }
       return rval;
   }




>-- --Original Message-- --
>From: Karthik N S [mailto:karthik@(protected)]
>Sent: 10 November 2004 11:41
>To: Lucene Users List
>Subject: RE: Lucene1.4.1 + OutOf Memory
>
>
>Hi
>
>  Rupinder Singh Mazara
>
>Apologies............
>
>
>
>  Can u Past the code on to the Mail instead of Attachement...
>
>  [ Cause I am not bale to get the Attachement  on the Company's mail ]
>
>
> Thx in advance
>Karthik
>
>
>-- --Original Message-- --
>From: Rupinder Singh Mazara [mailto:rsmazara@(protected)]
>Sent: Wednesday, November 10, 2004 3:10 PM
>To: Lucene Users List
>Subject: RE: Lucene1.4.1 + OutOf Memory
>
>
>hi all
>
> I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am
>attaching following is the mail from Doug
>
> It sounds like the ThreadLocal in TermInfosReader is not getting
>correctly garbage collected when the TermInfosReader is collected.
>Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is
>that you're running in an older JVM.  Is that right?
>
>I've attached a patch which should fix this.  Please tell me if it works
>for you.
>
>Doug
>
>Daniel Taurat wrote:
>> Okay, that (1.4rc3)worked fine, too!
>> Got only 257 SegmentTermEnums for 1900 objects.
>>
>> Now I will go for the final test on the production server with the
>> 1.4rc3 version  and about 40.000 objects.
>>
>> Daniel
>>
>> Daniel Taurat schrieb:
>>
>>> Hi all,
>>> here is some update for you:
>>> I switched back to Lucene 1.3-final and now the  number of the
>>> SegmentTermEnum objects is controlled by gc again:
>>> it goes up to about 1000 and then it is down again to 254 after
>>> indexing my 1900 test-objects.
>>> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache
>>> was introduced...
>>>
>>> Daniel
>>>
>>>
>>> Rupinder Singh Mazara schrieb:
>>>
>>>> hi all
>>>>  I had a similar problem, i have  database of documents with 24
>>>> fields, and a average content of 7K, with  16M+ records
>>>>
>>>>  i had to split the jobs into slabs of 1M each and merging the
>>>> resulting indexes, submissions to our job queue looked like
>>>>
>>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>>
>>>> and i still had outofmemory exception , the solution that i created
>>>> was to after every 200K, documents create a temp directory, and merge
>>>> them together, this was done to do the first production run, updates
>>>> are now being handled incrementally
>>>>
>>>>
>>>>
>>>> Exception in thread "main" java.lang.OutOfMemoryError Source code of java.lang.OutOfMemoryError
>>>> at
>>>>
>org.apache.lucene.store.RAMOutputStream Source code of org.apache.lucene.store.RAMOutputStream(RAMOutputStream
>.java(Com
>piled
>>>> Code))
>>>>     at
>>>> org.apache.lucene.store.OutputStream Source code of org.apache.lucene.store.OutputStream(OutputStream.java(Inlined
>>>> Compiled Code))
>>>>     at
>>>>
>org.apache.lucene.store.OutputStream Source code of org.apache.lucene.store.OutputStream(OutputStream.java(Inlined
>>>> Compiled Code))
>>>>     at
>>>>
>org.apache.lucene.store.OutputStream Source code of org.apache.lucene.store.OutputStream(OutputStream.java(Compiled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.CompoundFileWriter Source code of org.apache.lucene.index.CompoundFileWriter(CompoundFileWri
>ter.java(
>Compiled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.CompoundFileWriter Source code of org.apache.lucene.index.CompoundFileWriter(CompoundFileWriter
>.java(Com
>piled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.SegmentMerger Source code of org.apache.lucene.index.SegmentMerger(SegmentMer
>ger.java(
>Compiled
>>>> Code))
>>>>     at
>>>> org.apache.lucene.index.SegmentMerger Source code of org.apache.lucene.index.SegmentMerger(SegmentMerger.java(Compiled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.IndexWriter Source code of org.apache.lucene.index.IndexWriter(IndexWriter.java(Compiled
>>>> Code))
>>>>     at
>>>> org.apache.lucene.index.IndexWriter Source code of org.apache.lucene.index.IndexWriter(IndexWriter.java:366)
>>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>>
>>>>
>>>>
>>>>> -- --Original Message-- --
>>>>> From: Daniel Taurat [mailto:daniel.taurat@(protected)]
>>>>> Sent: 10 September 2004 14:42
>>>>> To: Lucene Users List
>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
>>>>> number
>>>>> of documents
>>>>>
>>>>>
>>>>> Hi Pete,
>>>>> good hint, but we actually do have physical memory of  4Gb on the
>>>>> system. But then: we also have experienced that the gc of ibm
>>>>> jdk1.3.1 that we use is sometimes
>>>>> behaving strangely with too large heap space anyway. (Limit seems to
>>>>> be 1.2 Gb)
>>>>> I can say that gc is not collecting these objects since I  forced gc
>>>>> runs when indexing every now and then (when parsing pdf-type
>>>>> objects, that is): No effect.
>>>>>
>>>>> regards,
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> Pete Lewis wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi all
>>>>>>
>>>>>> Reading the thread with interest, there is another way I've come
>>>>>
>>>>>
>>>>> across out
>>>>>
>>>>>
>>>>>> of memory errors when indexing large batches of documents.
>>>>>>
>>>>>> If you have your heap space settings too high, then you get
>>>>>
>>>>>
>>>>> swapping (which
>>>>>
>>>>>
>>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>>> collection, hence you don't garbage collect and hence you run out
>>>>>
>>>>>
>>>>> of memory.
>>>>>
>>>>>
>>>>>> Can you check whether or not your garbage collection is being
>>>>>> triggered?
>>>>>>
>>>>>> Anomalously therefore if this is the case, by reducing the heap
>>>>>> space you
>>>>>> can improve performance get rid of the out of memory errors.
>>>>>>
>>>>>> Cheers
>>>>>> Pete Lewis
>>>>>>
>>>>>> -- -- Original Message -- -- From: "Daniel Taurat"
>>>>>> <daniel.taurat@(protected)>
>>>>>> To: "Lucene Users List" <lucene-user@(protected)>
>>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
>>>>>
>>>>>
>>>>> number of
>>>>>
>>>>>
>>>>>> documents
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Daniel Aber schrieb:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Could you try with a recent CVS version? There has been a fix
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>> about files
>>>>>
>>>>>
>>>>>>>> not being deleted after 1.4.1. Not sure if that could cause the
>>>>>>>> problems
>>>>>>>> you're experiencing.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Well, it seems not to be files, it looks more like those
>>>>>>> SegmentTermEnum
>>>>>>> objects accumulating in memory.
>>>>>>> #I've seen some discussion on these objects in the
>>>>>>> developer-newsgroup
>>>>>>> that had taken place some time ago.
>>>>>>> I am afraid this is some kind of runaway caching I have to
>deal with.
>>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>>
>>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
>>>>>>> For additional commands, e-mail: lucene-user-help@(protected)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
>>>>>> For additional commands, e-mail: lucene-user-help@(protected)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
>>>>> For additional commands, e-mail: lucene-user-help@(protected)
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
>>>> For additional commands, e-mail: lucene-user-help@(protected)
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>>-- --Original Message-- --
>>From: Erik Hatcher [mailto:erik@(protected)]
>>Sent: 10 November 2004 09:35
>>To: Lucene Users List
>>Subject: Re: Lucene1.4.1 + OutOf Memory
>>
>>
>>On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:
>>>
>>> Hi
>>> Guys
>>>
>>> Apologies..........
>>
>>No need to apologize for asking questions.
>>
>>> History
>>>
>>> Ist type :  40000  subindexes   +  MultiSearcher  + Search on Content
>>> Field
>>
>>You've got 40,000 indexes aggregated under a MultiSearcher and you're
>>wondering why you're running out of memory?!  :O
>>
>>> Exception  [ Too many Files Open ]
>>
>>Are you using the compound file format?
>>
>>   Erik
>>
>>
>>-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
>>For additional commands, e-mail: lucene-user-help@(protected)
>>
>>
>
>
>
>-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
>To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
>For additional commands, e-mail: lucene-user-help@(protected)
>
>


-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------
To unsubscribe, e-mail: lucene-user-unsubscribe@(protected)
For additional commands, e-mail: lucene-user-help@(protected)


Earn $52 per hosting referral at Lunarpages.