Working on Amazon AWS

How to Set Up an Amazon AWS Instance

Amazon AWS can be used for running a Jethro server; you only need to set up an Amazon AWS account, and then you can run a Jethro server in a matter of minutes.
This post takes you through the steps of creating and running an Amazon AWS instance that can be used for running Jethro.

Creating a Server Instance

To create a server instance:

  1. Log in to the Amazon AWS console.
  2. Go to the EC2 dashboard.
  3. On the left menu, go to Images > AMIs.
     
  4. Select Public Images and search for Jethro:
  5. Select the type of instance you would like to launch:
    • Regular instance - In which case click Launch and proceed to select the instance type.
    • Spot instance - In which case proceed to the next step (a spot instance allows you to bid on unused EC2 instances, which can lower your Amazon EC2 costs significantly. For further details, seeAmazon guide on spot instances. You should also review the "Spot Instance considerations").
  6. To launch a spot instance, Click Actions to open a drop-down menu of possible actions.
  7. Select the option Spot Request.
  8. Select the instance type. For best results it is recommended to choose r3.8xlarge, but you can also choose r3.2xlarge or r4.4xlarge. When done, Click Next: Configure Instance Details on the bottom right to proceed to the next step.
  9. Configure the following parameters:
    • Number of instances - Enter a number in this field if you want to launch more than one Jethro server
    • Availability Zone - Based on your preferences or geographic location


  10. If you launch a regular instance, skip the next step.
  11. For a spot request, specify the maximum price you are willing to pay:

  12. Click Next: Add Storage to proceed to the next step.
  13. Configure the instance storage. By default the machine is configured with a root device of 128 GB and one or two additional instance store volumes.

    If you chose the r3.8xlarge machine, ensure that you add the additional instance store volume named "Instance Store 1". This should be enough if you are using Hadoop for storage. However, if you intend to use local disk for storage, it is advisable to add another EBS volume for the data. This will ensure both that you have enough disk space for your data and also that the data persists in case the machine was terminated.
    It is possible to add another EBS volume after you launch your instance.
    To add another volume, click Add New Volume and set the requested size in GB. The following example displays an additional volume of 2048GB (2TB).

  14. Click Next: Tag Instance to proceed to the next step.

  15. Optionally, enter a tag name for the instance and click Next: Configure Security Group to proceed to the next step:

  16. Select whether to use an existing security group or to create a new one, in which case ensure that the security group allows inbound SSH and TCP port 9111 traffic.

    You can add rules to a security group at any time.

  17. Click Review and Launch. If the warning shown below appears, select the bottom (highlighted) option and click Next:

  18.  Review your settings and click Launch to launch the instance:

  19. Choose whether to select an existing key pair or to create a new one. When done, ensure that the check box next to the statement at the bottom is selected and click Launch Instances to continue.
  20. Go to your instance list and search for your instance by its name, according to the tag name you previously set for it:

  21. Copy the public IP address of the instance.
     

Voila!

You now have a running instance.


Spot Instance Considerations

Spot instances allow you to bid how much you are willing to pay for the instance. If the price of the spot instance supersedes the bidding price, the instance is terminated and cannot be accessed again.
Therefore, there are few considerations that need to be taken into account when deciding to use spot instances.

  • The instance may be terminated, resulting in a Jethro server shutting down. If Jethro server down time is not an option, you should not use spot instances.
  • If you use an EBS volume as storage for your Jethro instance, it is important to create this volume with the option Delete on Termination cleared (not selected), so that if the instance is terminated, the volume persists and can then be attached to a new instance. It is your responsibility to delete the volume if you choose to terminate the instance.


Note: Any data that you want to persist should be saved on an ESB volume that is created with the Delete on Termination option cleared, rather than on the root volume. Any data saved on the instance's root volume will be lost if the instance is terminated and you will not be able to attach the root volume as the root volume of another instance even if the option Delete on Termination is not selected.
For further details about spot instances, see Amazon documentation.

How to Configure an AWS Instance For Jethro Server

This post walks you through the steps of configuring an Amazon AWS instance to be used as a Jethro server. It assumes that you already have a running Amazon AWS instance created of the Jethro Query Node AMI.
Note: If you haven't created an instance yet, go to How to Set Up an Amazon AWS Instance for instructions on creating and running an instance.

Upgrading Jethro

To upgrade Jethro:

  1. Use the external IP address you created and log in to the instance by using ssh with the following credentials:
    • The key you provided during the instance creation
    • The user ec2-user
  2. The machine contains a directory called scripts, which resides under the ec2-user home directory and includes the following scripts:
    • upgradeJethro.sh – To upgrade to the latest Jethro software
    • mountVolumes.sh – To mount the extra volumes that were added when creating the instance
    • mountS3Bucket.sh – To mount an Amazon S3 bucket as a file system
    • unmountS3Bucket.sh – To unmount an Amazon S3 bucket file system
       
  3. Run the script upgradeJethro.sh to ensure that you use the latest Jethro release.

Mounting Additional Volumes

The volumes to be used for Jethro cache and storage can be mounted in either of the following methods: automatically or manually

To mount additional volumes automatically:

  1. Go the scripts directory mentioned in "Upgrading Jethro" on page .
  2. Run the command ./mountVolumes.sh.

  3.  Run the command df-h to display a volume mounted to /Jethro. This directory holds two subdirectories, cache and instances, which can be used for storing cache and data, respectively. 
    The ./mountVolumes.sh script also adds the newly mounted volumes to the /etc/fstab file, thereby allowing automatic mounting of the volumes after reboot.

To mount additional volumes manually:

  1. Go the scripts directory mentioned in "Upgrading Jethro" on page .
  2. Run the command lsblk to display all available volumes, as shown in the example below.

  3. In this example the volume /dev/xvda1 is mounted as the root device, and there are two additional devices, /dev/xvdb and /dev/xvdc, which are not mounted. 
    Note: The lsblk command omits the /dev prefix from the device name. If you have two additional devices or more, as shown in this example, you need to create an array. If you have only one additional device, skip the next step.
     
  4. To create a raid 0 array of the additional devices, run the following command:

    sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices={number_of_devices} /dev/{additional_volume1} /dev/{additional_volume2} 
    In the example above:
    sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/xvdb /dev/xvdc


    Running lsblk again displays the following:


  5. Create a file system by running the command below. 
    Note: If you have only one additional device and did not run the previous step, replace /dev/md0 with the device name (most likely /dev/xvdb).
     

    sudo mkfs -t ext4 /dev/md0



  6. Mount the file system by running the following command, replacing the device name if required:

    sudo mount /dev/md0 /Jethro
  7. Run the following commands to create the cache and instances directories and to change the ownership to user jethro:

    sudo mkdir /jethro/cache /jethro/instances
    sudo chown -R jethro:jethro /jethro
  8. To enable automatic mounting of the volume after reboot, you can add the following line to /etc/fstab, replacing the device name if required:

    /dev/md0 /jethro/cache auto noatime 0 0


Voila!

Now you have an Amazon AWS instance that is configured to be used as a Jethro Server, and you can proceed to setting up a Jethro server on an Amazon AWS instance.


How to Set up a Jethro Server on an Amazon AWS Instance

This article walks you through the steps of setting up a Jethro server on an Amazon AWS instance, assuming that you have a running Amazon AWS instance that is already configured to be used as a Jethro server.
If you haven't created an instance yet, go to How to Set Up an Amazon AWS Instance to create and run an instance, and then go to How to Configure an AWS Instance For Jethro Server to configure the instance.

If you intend to use local disk for storage, skip the 'Setting Up Hadoop to be Used for Jethro Storage' section.

Setting Up Hadoop to be Used for Jethro Storage

The following instructions specify the steps required for setting up Hadoop for storage. It assumes that you have a Hadoop cluster running with enough space for your data, and that the Hadoop nodes can be accessed from the Jethro server instance.
To set up Hadoop for Jethro storage:

  1. Configure Hadoop client by copying the files /etc/hadoop/conf/core-site.xml and /etc/hadoop/conf/hdfs-site.xml from any hadoop datanode to the same location on the Jethro server.
  2. Verify that you can connect to hadoop by running the following command:

    hadoop fs -ls /	
  3. As Hadoop hdfs user, create a root HDFS directory for Jethro files, owned by jethro Hadoop user. In this document it is assumed it is /user/jethro/instances:

    hadoop fs -mkdir /user/jethro
    hadoop fs -mkdir /user/jethro/instances
    hadoop fs -chmod -R 740 /user/jethro
    hadoop fs -chown -R jethro /user/Jethro


Setting Up Local Disk to be Used for Jethro Storage

To set up a local disk for Jethro storage:

  1. Create an EBS volume for the data and attach it to the instance.

    If you created an EBS volume for the data when creating the instance and would now like to mount additional volumes manually, skip to the relevant instructions, specified in step 9 and onwards.
  2. Go to the EC2 console.
  3. Select ELASTIC BLOCK STORES > Volumes from the menu on the left.

  4. Click Create Volume to open the Create Volume dialog box.
  5. Specify the size in GB and the availability zone, which should be the same as the one specified for the instance.

  6. Click Create to create the volume.
  7. Select the newly created volume and go to Actions > Attach Volume.

  8. Select the instance and click Attach.
  9. Next, proceed to mount the EBS volume.

    If you used the mountVolumes.sh script to mount the volumes automatically, you can skip the rest of these instructions.
  10. Run the commands lsblk to see the EBS volume which, in the example below, is called xvdd and has a volume of 1TB:


  11. Run the following command:

    sudo file -s /dev/xvdd


    If the output displayed is similar to the one shown above, you need to create a file system on the volume. If the output is different, skip the next step.

  12. Run the following command to create a file system:

    sudo mkfs -t ext4 /dev/xvdd

  13. Run the following command to mount the file system to the directory /Jethro/instances:

    sudo mount /dev/xvdd /Jethro/instances
  14. Run the command df –h to display the mounted device. In this example you have 1TB mounted on /Jethro/instances.


  15. To enable automatic mounting of the volume after reboot, you can add the following line to /etc/fstab:

    /dev/xvdd /jethro/instances auto noatime 0 0

Creating a Jethro Instance

  1. To create a Jethro instance:

    su - jethro
  2. Run the JethroAdmin create-instance command, while providing the following parameters:
     
    • An instance name of your choice (demo in the following example).
    • The instance storage path - An HDFS storage path; an HDFS directory owned by user jethro, a Local storage path, or shared storage.
    • Local caching parameters – A local root path and maximum size for the cache directory. When using local disk for storage, there is no need for local cache. In this case, set the local cache size to 0G.
    • When using a local storage path, an extra parameter is required: -Dstorage.type=POSIX
       
  3. Create-instance syntax:

    # basic command:
    JethroAdmin create-instance { instance-name } -storage-path={storage-path}-cache-path={cache-path}-cache-size={cache-size}{-Dstorage.type=POSIX}
    
    #To create a local instance:
    Run JethroAdmin create-instance demo -storage-path=/user/jethro/instances -cache-path=/mnt/jethro_local_cache -cache-size=0G -Dstorage.type=POSIX
    
    #To create an HDFSinstance:
    Run JethroAdmin create-instance demo -storage-path=/jethro/instances -cache-path=/mnt/jethro_local_cache-cache-size=80G

    The new instance is auto-configured to listen on port 9111.

  4. Start the Jethro service by running:

    service jethro start


    This starts both the Jethro Server and the Jethro maint services.

  5. Run the following command to verify that you can connect to the Jethro server and run queries:

    JethroClient demo localhost:9111 -p jethro
  6. Run the show tables query in the command prompt:
     


Voila!

You now have a Jethro instance ready to load data and run queries.