Installing Apache Solr

Apache Solr is a search platform which is built on top of Apache Lucene. It can be used for searching any type of data, like web pages. Apache Solr will be used for indexing URLs which are crawled by Apache Nutch and then one can search the details in Apache Solr. 

First download Apache Solr from

Installing and configuring Apache Nutch

Apache Nutch comes in different branches, for example, 1.x, 2.x, and so on. The key difference between Apache Nutch 1.x and Apache Nutch 2.x is that in the former, you have to manually type each command step-by-step for crawling. In the latter, Apache Nutch developers create a crawl script that will do crawling for us by just running that script; there is no need to type commands step-by-step.

Installation dependencies:

  • Apache Nutch 2.2.1
  • HBase 0.90.4
  • Ant
  •  JDK 1.6


Django and MySQL

The default database with Django is SQlite, this is fine for development, but using a database like MySQL is recommended for deployment. To use MySQL with Django, you will need to have it installed on your system. On OSX I use MAMP.

The first step is to install the appropriate database bindings, bindings allow python / django to run database commands. To get Python interface to MySQL go to:

Unix for Mac OS X

Terminal & UNIX Shortcuts

Up/Down arrows: Review Previous commands

Control + a: Move cursor to start of a line

Control + e: Move cursor to end of a line

Option + click line: Move cursor to click point

Tab: Complete command or file name

Tab + Tab: When tab doesn't complete, show list of options

Command + ~: Cycle between Terminal windows

MySQL command line client on OSX

In order to run the mysql command line client in terminal you need to set the execution path, after installing MySQL if you type mysql in terminal and you get command not found, the MySQL is installed but the PATH to the MySQL executable file is not set in the operating system.

OSX is based on Unix and this is done in the .profile file in the home directory. To check you have a .profile file in terminal type

HD:~ ped$ ls .profile

Git Essentials

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git is easy to learn and has a tiny footprint with lightning fast performance.

Git Installation

My preferred way to install git on OSX is by using Homebrew. First install Homebrew, by copying the link provided into Terminal. Once Homebrew is installed, type in the command: