Installation of Apache Cassandra

Image result for apache cassandra
Cassandra is a massively scalable NoSQL database from the house of Apache which is widely used in several real-time implementations off-late due to its high performance capabilities. Thus, I would like to document a few simple steps on the installation of Cassandra in a Unix server.

Cassandra can be installed by using multiple ways like,
  • Using the RPM packages
  • Using the tarballs of the binaries shipped
  • Using the Debian packages
We will focus on each of them briefly.

Before we begin the installation, let's understand the prerequisites to install Cassandra.

On your Unix machine, you will need the following packages installed.
  • Java JDK 1.8 or later (Open JDK 8 is also absolutely fine)
  • Python 2.7 or later (Even the latest update of 2.6 would give you trouble running Cassandra later)
Once you have the above prerequisites installed on your machine, you can choose one of the above methods of installation. 
Further, as Cassandra is mostly used in conjunction with Big Data ecosystem, you may be interested in installing it on more than one node (machine) within a cluster. Thus, there are two types of installation possible for Cassandra.
  1. Installing on a single node (Standalone)
  2. Installing in a cluster
For installing Cassandra in a cluster, we would still have to follow the same steps as that of the single node. However, there are some additional configurations needed in order to keep your Cassandra nodes aware of the neighboring nodes in the same cluster which are briefed a little later in this post.

Installing Cassandra on a Single Node

Using the RPM packages


This is perhaps the easiest method of installing Apache Cassandra on a node. Follow these steps to get it done.
  • Login to your Unix machine with the user you intend to install Cassandra
  • Ensure you have sudo access for this user
  • Create a file called cassandra.repo under /etc/yum.repos.d/
  •  Include the below content to this file (use any editor of your choice like gedit or vim)

[cassandra]

name=Apache Cassandra

baseurl=https://www.apache.org/dist/cassandra/redhat/311x/

gpgcheck=1

repo_gpgcheck=1

gpgkey=https://www.apache.org/dist/cassandra/KEYS 

Note: Here the version used is 3.11 due to which the base URL carries a reference to it. If you are installing any other version, change it appropriately.
Also if you are downloading any of the older versions, it may not work since RPM packaging of Cassandra is a recent inclusion.
  • Save and close the file
  • Now in terminal, type "sudo yum install cassandra" (without the quotes)
  • This would initiate a download along with all the group checks.
  • Type 'y' (without quotes) whenever prompted to continue.
  • The installation takes a while and once it completes, you would see a successful notification, something like this.

Once the installation is done, you may start the services by using the below command on your terminal.
service cassandra start
This would give you the status in return, something like this.
 
That's it, the installation is done on a single node. You can start using your Cassandra by logging into the CQL shell using /bin/cqlsh

 Using the tarball packages 

  •  Download the desired version of Cassandra from the "Apache Cassandra/downloads" website.
  • Untar the file to your desired directory
tar -xvf apache-cassandra-<x.x>-bin.tar.gz cassandra
  • The files will be extracted into apache-cassandra-<x.x> (you need to substitute x.x based on the release number that you have downloaded)
  • Add apache-cassandra-<x.x>\bin to your PATH variable
  • Start Cassandra by invoking bin/cassandra -f from the terminal.
If you ever wish to check the status of Cassandra run 'bin/nodetool' (without quotes) status from the terminal.
If you would like to make any changes to your default configurations, configuration files can be found in the conf sub-directory.

Installation from Debian packages

Installing from a Debian package is almost similar to the RPM way.

  • Like the installation from RPM package, create a file called cassandra.sources.list under  /etc/apt/sources.list.d
         echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
  • Then, again as in the RPM way, add the repository keys.
         curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
  • Update the repositories once with the below command
         sudo apt-get update
  • Now Install Cassandra with the following command
         sudo apt-get install cassandra

Note:
In case you get any key errors, like "The following signatures couldn't be verified because the public key is not available:....", you may try adding a sample key manually to get the original key as below.

sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA

Once the original key is shown, you may use that key and redo this step followed by updating the repository (sudo apt-get update) and proceeding to installation (sudo apt-get install cassandra)

Once done, the services will start automatically (unlike the other two options above)

To verify the status, use the below command.

/bin/nodetool status

In case of manually stopping and starting the services (for any configuration changes), use the below commands.

Stopping: sudo service cassandra stop
Starting: sudo service cassandra start



Great! You now have a working instance of Apache Cassandra. Will continue my post on dockerized version of the same some other time! 

Comments

Popular posts from this blog

Implementing Enterprise Data Lake using Amazon (AWS) S3