How To Create a Simple Search Engine
According to wikipedia, a Web Search Engine is a software that designed to search for information, data, and etc on the internet world wide web. This article will covers on how to create a search engine like google.
How-To-Create-a-Simple-Search-Engine2 How To Create a Simple Search Engine
What You Need To Create A Simple Search Engine
In order to create Search engine. You need 2 main part of search engine, They are
- Web crawler, to collect information on the internet.
- Search Platform, used for searching any data, in this case web pages.
To set up a web crawler, you can use Apache Nutch. You can check my previous article to create a web crawler using Apache Nutch. After web crawler is working. You need a search platform to display crawled information from apache nutch. The search platform that we will use is Apache Solr.
Apache Solr is a search platform which is built on top of Apache Lucene. It’s a very powerful searching platform because provides full-text search, dynamic clustering, database integration, rich document handling, and much more.
How To Install Apache Solr
Follow these steps for installation of Apache Solr
1. Download Apache Solr from apache’s website
2. Extract the downloaded file by use following commands
12$ sudo tar xzf apache-solr-4.6.1/$ sudo mv apache-solr-4.6.1/ solr
These commands will extract all apache solr’s file in the destined folder.
3. Navigate to ~/.bashrc file (go to the root directory and type gedit ~/.bashrc) and put the following configuration into ~/.bashrc file :
12#set SOLR homeexport SOLR_HOME=/usr/local/solr/example/solr
This will create an enviroment variable called SOLR_HOME which is required for Apache Solr to run.
4. Test your Apache Solr installation by navigating to example directory of apache solr, and type following command to start Apache Solr
1java -jar start.jar(追記) (追記ここまで)
If it’s done correctly, You will get this output
1234INFO: solr home defaulted to 'solr/' (could not find system property or JNDI)23 Jan, 2014 4:25:24 AM org.apache.solr.servlet.SolrUpdateServlet initINFO: SolrUpdateServlet.init() done2014-01-23 04:25:24.762:INFO::Started SocketConnector00.0.0.0:8983
5. Verify Apache Solr integrity by browsing the following URL
1http://localhost:8983/solr/admin/
You will get the image of Running Apache Solr like images below
How-To-Create-a-Simple-Search-Engine How To Create a Simple Search Engine
6. At this point, Both Apache Nutch and Apache Solr are installed correctly. We need to integrate Apache Solr into Apache Nutch.
Integrate Apache Solr to Apache Nutch
Integration is required for indexing URLs to Apache Solr crawled by Apache Nutch. So once Apache Nutch done with crawling. The information will be indexed by Apache Solr. To integrate Apache Solr into Apache Nutch follow these steps
1. Copy Schema.xml file (Apache Nutch directory/conf) and put it into the conf directory of Apache Solr.
2. Enter the following command to copy schema.xml
1cp <apache nutch directory>/conf/schema.xml <Apache Solr directory>/example/solr/conf/
3. Navigate to example directory. Type the following command to restart Apache Solr
1java -jar start.jar
4. Now you can start Apache Nutch by use these command
cd<Apache Nutch’s directory>/runtime
bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/2
Now you will be able to create a simple search engine. Apache Nutch provide many parameters to extend according to your requirements.
You May Want to See :
- How To Create a Web Crawler and Data Miner How To Create a Web Crawler and Data Miner
- Apache Rewrite Rules Guide Apache Rewrite Rules Guide
- How to set up Web Server on Windows, Linux, and Mac Using Apache How to set up Web Server on Windows, Linux, and Mac Using Apache
- How To Use AirSnort to crack WEP keys How To Use AirSnort to crack WEP keys
- Build A Progressive Web App Using Service Workers Build A Progressive Web App Using Service Workers
- Why is Exchange Mailbox Not Receiving Email from External IDs? Why is Exchange Mailbox Not Receiving Email from External IDs?
- Basic Guide of Interprocess Communication and Pipes Basic Guide of Interprocess Communication and Pipes
- Understanding the basics of HTML Understanding the basics of HTML
- Reading Files Without Filehandle PHP Reading Files Without Filehandle PHP
- Hardening MySQL Security Server Hardening MySQL Security Server
- Prevent Internet Explorer Crashing and Make It Faster Prevent Internet Explorer Crashing and Make It Faster
- Tools You Need For Virtualisation Tools You Need For Virtualisation
- Tips for Speed Up Your Android Phone Tips for Speed Up Your Android Phone
- Making Compressed MP3 Files from CDs Making Compressed MP3 Files from CDs
- What You Should Do If Computer Crash What You Should Do If Computer Crash
- What is CSS and What CSS can do ? What is CSS and What CSS can do ?
- Improve Security With VERIS Framework Improve Security With VERIS Framework
- How To Use Cloud-Based Storage Wisely How To Use Cloud-Based Storage Wisely
- Protect Your Website Against XSS Protect Your Website Against XSS
- Prevent SQL Injection by Using Runtime Protection Prevent SQL Injection by Using Runtime Protection
- Avoiding NSA Trap Avoiding NSA Trap
This site uses Akismet to reduce spam. Learn how your comment data is processed.