Setup: Kubernetes Solr Cloud (bitnami chart). Current version 8.11 (also looking to go to 9)
I've tried various methods to get a larger file 120Mb loaded into KeepWordFilterFactory.
Main problems => zookeeper timing out. Then tried embedding the file in the image and loading it from there.
<filter class="solr.KeepWordFilterFactory" words="${keepwords.file.path}" ignoreCase="true"/>
The problem here is that Solr cloud prepends the path /configs/coreName//opt/bitnami/solr/server/solr/custom_resources/keepwords.txt"
This gets's added => /configs/coreName/
Also tried sending it as zookeeper config, but I understand it's not designed to distribute such large files.(increasing -Djute.maxbuffer is not enough).
Also checked managed resources, but these seem to only exist for stopwords and synonyms.
What would be the right way of loading such a file in config? (do note that I probably need to change the keepwordsFilterFactory approach, but for now I would like to use it with existing config it worked nicely).
The exact error is:
org.apache.solr.common.SolrException: Error CREATEing SolrCore 'corename_shard1_replica_n1': Unable to create core [corename_shard1_replica_n1] Caused by: Invalid path string "/configs/corename//opt/bitnami/solr/server/solr/custom_resources/keepwords.txt" caused by empty node name specified @21
1 Answer 1
in SolrCloud you can’t load a 120 MB file into ZooKeeper (even with -Djute.maxbuffer), and absolute paths fail because Solr treats them as ZK configset resources unless you explicitly allow external paths. the way to fix this is to mount the file on a filesystem accessible to all Solr pods (e.g via a Kubernetes PersistentVolume or by embedding it in the image) at a stable location such as /solr-extra/keepwords.txt, then start Solr with -Dsolr.allowPaths=/solr-extra -Dkeepwords.file.path=/solr-extra/keepwords.txt (in the Bitnami chart this can be passed through extraEnvVars or solrOpts). in your schema you can then reference the file either with ${keepwords.file.path} or directly as an absolute path (words="/solr-extra/keepwords.txt"), and Solr will load it from disk rather than from ZooKeeper. This will avoid the path mangling you had seen (/configs/coreName/...) and is the only reliable way to use a large keepwords list in SolrCloud; ZooKeeper and managed resources are unsuitable for files of that size