Installing Jethro
Table of Contents
Jethro can be installed on any of the following configurations:
- On Hadoop - see "Installing Jethro on Hadoop"
- On a standalone local storage
- On a shared file system using NFS
Installation Prerequisites
Before starting the installation of Jethro, ensure that the conditions specified in the following sections are met according to your environment:
Cloud Instance
If you install Jethro on a cloud instance, use a memory-optimized instance with at least the following configuration:
- 8 virtual processors (vCPUs); it is advisable to have at least 16 vCPUs.
- 64GB RAM; it is advisable to have at least 128 GB RAM.
- Local SSD.
Additional requirements apply to the specific type of instances mentioned below:
- Amazon AWS: Minimal instance type is r3.2xlarge, recommended r3.4xlarge / r3.8xlarge.
- Microsoft Azure: Minimal instance type is D13, recommended D14.
Physical Hardware
If you use a physical server for Jethro software, ensure that your hardware meets the following criteria:
- CPU – Use at least a modern 2-socket server (for example: 2 x 8-core Intel server).
- Memory – Minimum 64GB of RAM, recommended 128GB/256GB (or higher).
Disk – Verify there is sufficient local storage (by default, at least 5GB) under both /opt and /var, where Jethro stores binary files and logs (indexes are stored on HDFS, not locally).
Network Bandwidth
The network bandwidth requirements are:
- For Hadoop cluster - At least 1Gb/s link, recommended 10Gb/s link.
- For SQL clients (BI tools etc) – Recommended 1Gb/s link.
Operating Systems
Jethro is certified on the following Linux flavors:
- RedHat/CentOS 6.x/7.x 64 bit
- Ubuntu 14.x
- Amazon Linux (when using Amazon AWS).
Java
JDK 8 64 bit (To check which version you have, run "java -version")
Local Cache Storage
Prepare a local directory on a dedicated disk (SSD recommended) with 200 Gb or more on its own mount point, to be used by Jethro for caching files. As administrator, you need to set up the directory including auto-mounting it (if a separate mount point is used).
For example, the following commands can be issued:
sudo su - root # switch to user root # make a file system on the local SSD and auto-mount it (not shown) chmod 700 /mnt/jethro_local_cache
If you cannot set up a dedicated disk for caching, it is possible to use a dedicated directory on your internal disk. However, this option is less recommended.
Installing Jethro Software
The installation of Jethro is a straightforward process, which automatically creates and configures the Jethro OS user and Linux service.
To install Jethro, do the following:
Switch to user root:
sudo su - root
Open a terminal and retrieve the most updated jethro.rpm file (which always resides on http://jethro.io/latest-rpm):
wget http://jethro.io/latest-rpm
Run the following command to install the newly downloaded RPM file:
rpm -Uvh {latest-rpm}
Change the owner of the previously created Jethro local cache directory (see Installation Prerequisites) to the Jethro user, which was created during the installation of the jethro.rpm file:
chown jethro:jethro /mnt/{jethro_local_cache}
If using Kerberos, securely copy the keytab file, generated by following the instructions on section Creating Jethro OS User for the Hadoop Cluster under "Installing Jethro on Hadoop" on page , to the Jethro home directory and change its permissions accordingly:
scp kdc_owner@kdc_server:jethro.hadoop.keytab/home/jethro chown jethro:jethro /home/jethro/jethro.hadoop.keytab chmod 600 /home/jethro/jethro.hadoop.keytab
Optionally, set the password for the jethro OS user:
passwd jethro
To proceed with the setup as a jethro OS user, run:
sudo su - jethro
Installing Under a Non-default User
To install the RPM under a different user than the default one (jethro), set the following environment variables, and install the rpm:
export JETHRO_INSTALL_USER={user} export JETHRO_INSTALL_GROUP={group} rpm –Uvh {rpm file}
- If a non-default user is to be used, then the user must already exist.
- If a non-default user is specified, the group must also be specified.
- If the rpm is already installed, the same user must be used.
- If a new user is going to be used for an installation over an existing installation, the rpm package should be removed first (existing instances will need to be attached to, directories or mount points of storage, and the ownership of the Jethro local cache directory must be manually changed).
Environment Variable Changes
$JETHRO_HOME - This environment variable (pre-defined for jethro OS user) points to the Jethro home directory (/opt/jethro/current).
Local Directory Structure
Jethro software is installed under /opt/jethro. The directory structure supports multiple installed binary versions for easier management reasons like upgrades/downgrades. It also supports multiple instances, such as running several development and test instances on the same host. These are the main sub-direcotries created:
- /opt/jethro – The top Jethro directory.
- /opt/jethro/jethro-version – one directory per installed version of Jethro.
- /opt/jethro/current – a symbolic link to the current used version of Jethro.
- /opt/jethro/instances – metadata of all instances.
- /opt/jethro/instances/{instance_name}– metadata of a specific instance.
- /var/log/jethro – top directory of the Jethro logs.
- /var/log/jethro/instance_name – log directory of a specific instance.
Configuring Kerberos Authentication
Kerberos is a third-party network authentication protocol that employs a system of shared secret keys to securely authenticate a user in an unsecured network environment.
Jethro's support for accessing a Kerberized Hadoop cluster, relies on its 'Jethro Monitor' service, which is in charge of the renewal process that is required for Kerberos tickets. To allow the monitor to obtain, cache and renew Kerberos tickets for Jethro services, the Jethro administrator is required to configure a Kerberos key and a Kerberos principle, using the following parameters and command:
The Command
JethroAdmin set-host-config -Dhost-param-name=value [-Dhost-param-name=value]
The Parameters
- hdfs.kerberos.keytab (must) - The location of the Kerberos keytab.
- hdfs.kerberos.principal (must) - The Kerberos prinicpal.
- hdfs.kerberos.reinit.interval.sec (optional) - The renewel interval in seconds. The default value is 3600 (one hour).
Example
JethroAdmin set-host-config -Dhdfs.kerberos.keytab=/home/jethro/jethro.hadoop.keytab -Dhdfs.kerberos.principal=jethro@JETHRO.COM -Dhdfs.kerberos.reinit.interval.sec=3600