Oracle by Example branding

Getting Started with Oracle Big Data on Oracle Cloud Infrastructure

section 0Before You Begin

This 20-minute tutorial shows you how to create an Oracle Big Data Cloud (BDC) cluster on Oracle Cloud Infrastructure (OCI) region.

Background

Oracle Big Data Cloud combines open source technologies such as Apache Spark and Apache Hadoop to deliver a complete Big Data platform for running and managing Big Data Analytics applications.

Oracle Big Data Cloud leverages Oracle’s Infrastructure Cloud Services to deliver a secure, elastic, integrated platform for all Big Data workloads.

You can create Oracle Big Data Cloud clusters in Oracle Cloud Infrastructure and in Oracle Cloud Infrastructure Classic. The infrastructure a cluster gets created in depends on the region you select when you create the cluster. If you see the Availability Domain and Subnet fields when you select a region for the cluster you're creating, that means the cluster will be created in Oracle Cloud Infrastructure. Otherwise, the cluster is created in Oracle Cloud Infrastructure Classic.

What Do You Need?

  • Access to Oracle Cloud Infrastructure.
  • Download the oci-curl.sh script
  • The openssl utility, to generate API signing keys. openssl is available by default on most UNIX-like systems. On Windows, you can use Git Bash.
  • Access to Oracle Cloud My Services. See How to Begin with Oracle Big Data Cloud Subscriptions
  • Access to Oracle Big Data Cloud Service
  • If you don't have the user name and password, contact your organization's Oracle Cloud administrator. Note that the credentials for Oracle Cloud My Services might be different from the credentials for Oracle Cloud Infrastructure.


section 1Gather Required OCI Storage Information

  1. Sign in to the Oracle Cloud Infrastructure web console.
  2. In the OCI web console, check the region-selector field near the upper right corner for your region identifier which looks like us-ashburn-1 or us-phoenix-1. Substitute this identifier into the OCI Cloud Storage URL template https://objectstorage.<region>.oraclecloud.com/. For example: https://objectstorage.us-ashburn-1.oraclecloud.com/.
    Note: When you provision BDC you must include the trailing slash for OCI Cloud Storage URL.
    region
    Description of the illustration a2.png
  3. Open the navigation menu. Under Core Infrastructure, click Object Storage. A list of the buckets is displayed in the compartment that you have selected.

    Object  Storage
    Description of the illustration a3.png
    The OCI Cloud Storage Bucket URL is specific to your tenant and is specified in its own URI format.
  4. If you do have any bucket then click Create Bucket to create a new bucket. Enter the bucket name as bdcsce and click Create Bucket.
    create bucket
    Description of the illustration a4.png
    create bucket
    Description of the illustration a4_a.png
  5. Click bdcsce bucket to view the bucket details.
    Note: The URI template is oci://<bucket>@<namespace>/. The value for <namespace> should be based on your root compartment. For this tutorial, OCI Cloud Storage Bucket URL is  oci://bdcsce@yournamespace/. To view the namespace value, open the user icon and click Tenancy.
    tenant
    Description of the illustration a5.png
    namespace
    Description of the illustration a5_a.png
    bdcsce bucket
    Description of the illustration a5_b.png
  6. Open the navigation menu and click Identity under Governance and Administration. Click Users. The OCI Cloud Storage User OCID is specific to a particular OCI user. This OCI user have access to all object storage data within the BDC cluster.
    OCI users
    Description of the illustration a6.png
  7. The OCI users are displayed. Select the user from the list and note the value of the OCID, which uniquely identifies them within OCI.
    OCID
    Description of the illustration a7.png
  8. The user selected above must be granted the ability to manage bucket data via API. This is done by registering a cryptographic key for this user. On your local computer use the following command to create a directory to store the keys that you're going to generate.
    mkdir ~/.oci
  9. Enter the following command to generate a private key.
    openssl genrsa -out ~/.oci/oci_api_key.pem 2048

    Note: PEM key must be generated without passphrase.

  10. Change the permission for the private key file to read-only.
    chmod go-rwx ~/.oci/oci_api_key.pem
  11. Important: Make a note of the full path to the private key file.

    You’ll need the full path and file name while configuring the object storage settings for your Big Data Cloud cluster.

  12. Get the key’s fingerprint.
    openssl rsa -pubout -outform DER -in ~/.oci/oci_api_key.pem | openssl md5 -c
  13. Generate the public key for the private key that you generated earlier.
    openssl rsa -pubout -in ~/.oci/oci_api_key.pem -out ~/.oci/oci_api_key_public.pem
  14. View the public key.
    cat ~/.oci/oci_api_key_public.pem | pbcopy

    Here's an example of a public key:

    -----BEGIN PUBLIC KEY-----
    MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA75bcmb2HuKwZNpYxeP5X
    WjxeKg3rFLUUp1RMK6YAnGOilON98jTGpDRMVMNbYXH4Y2D9c7QYByQa1qcdMH+I
    wumWZCfTIb6Y5yXXHHJLJoS8dclEX1p2lqQf73BUGPpd/3ymrhjZKjW14gpfBs7p
    ZNLa5k97TLV/IlqefIAJktndKLGggUJZLUReX9hZY2e27VpW1OSMyZImwnSnJct6
    3rRqi2bANUf7ojN5OcOnZ5bZrn2YfxpcZaWWV1/xFU+ODChnoM4Z7O1+JtFzw+Mi
    takns0aiDyvUjWkR3cYrO3g+MHHMgOzrhpfYBPrNHVZDLykhDfvR+A/gnWtqw5EB
    bQIDAQAB
    -----END PUBLIC KEY-----
  15. Copy the public key, including the BEGIN and END lines.
  16. In the Oracle Cloud Infrastructure web console, click your username in the top-right corner of the Console, and then click User Settings.
    User Settings menu option
    Description of the illustration a15.png
  17. In the Resources navigation pane, click API Keys, and then click Add Public Key. In the Add Public Key dialog box, paste the public key that you generated and copied earlier, and then click Add.
    Add Public Key dialog box
    Description of the illustration a16.png

    The fingerprint of the key is displayed. It should match the fingerprint that you had generated earlier.

    fingerprint
    Description of the illustration a16_1.png

section 2Start the Instance-Creation Wizard

  1. Sign in to Oracle Cloud My Services.
  2. On the My Services Dashboard page, locate the Big Data Cloud tile. Click the Action menu Action Menu Icon, and select Open Service Console.

    If this is the first time that you're accessing the web console of Oracle Big Data Cloud, the Welcome page is displayed. You can continue to the web console by clicking Go to Console.

  3. The Instances page is displayed. Click Create Instance.
    bdcsce create instance
    Description of the illustration b3.png

section 3Configure Basic Cluster Information

On the Instance page of the wizard, complete the following steps:

  1. Enter a unique Instance Name for your Big data Cloud instance.
  2. Enter your email address in the Notification Email field.
  3. In the Region field, select an Oracle Cloud Infrastructure region: us-phoenix-1, us-ashburn-1, eu-frankfurt-1, or uk-london-1. The field shows only those regions that are in the default region of your account.
  4. In the Availability Domain field, select an availability domain.
  5. In the Subnet field, select a subnet from a virtual cloud network (VCN) that you had created previously in Oracle Cloud Infrastructure. Note: If you want the subnet to be assigned automatically, then select Select No Preference.
  6. In the Tags field, click the drop-down icon to assign tags to the service instance or click the Add Icon icon to add new tags.
  7. Click Next.
    bdcsce instance
    Description of the illustration c7.png

section 4Configure Service Details

On the Service Details page of the wizard, complete the following steps:

  1. In the Cluster Configuration section, choose appropriate values for Deployment Profile, Number of Nodes, Compute Shape, Queue Profile, and Spark Version , or leave them at their default values.
    cluster configuration
    Description of the illustration d1.png
  2. In the Credentials section, click Edit next to the SSH Public Key field.
  3. Select Create a New Key and then click Enter. In the Download Keys dialog box, click Download and save the file sshkeybundle.zip to your local machine.

    Note: The option to download the SSH keys won't be available again. You'll need the private key to SSH to the compute nodes of the Big Data Cloud instance. So don't skip this step.

  4. Click Done.
  5. Enter the Administrative User Name and Password, and then reenter the password in the Confirm Password field.
    credentials
    Description of the illustration d5.png
  6. In the Associations section, select the Cloud Service that you want to associate with your Oracle Big Data Cloud cluster.
  7. In the Cloud Storage Credentials section,enter the following values:
    • OCI Cloud Storage URL: enter OCI Cloud Storage URL. 
    • OCI Cloud Storage Bucket URL: enter the URL of the object-storage bucket that you created earlier in Oracle Cloud Infrastructure.
      Note: You must add the trailing slash / at the end of the OCI Cloud Storage Bucket URL for successful provisioning.
    • OCI Cloud Storage User OCID: enter the Oracle Cloud Infrastructure Object Storage User OCID.
    • OCI Cloud Storage PEM Key: click Browse and select the OCI Cloud Storage private key that you created earlier in Oracle Cloud Infrastructure.
    • OCI Cloud Storage PEM Key Fingerprint:  enter the OCI Cloud Storage PEM Key Fingerprint.
      cloud storage
      Description of the illustration d7.png
  8. In the Block Storage Settings, specify the amount of HDFS storage, you want to allocate to the cluster. Click Next.

section 5Complete the Instance Creation

  1. On the Confirm page of the wizard, review your selections, and click Create.
    confirmation
    Description of the illustration e1.png
  2. In the Oracle Big Data Cloud instances page, the new cluster is displayed, with the status Creating Service. Refresh the page periodically, until the instance is created.
    creating service
    Description of the illustration e2.png
    After the instance is created, you'll receive a notification at the email address that you specified earlier.
  3. To view details of the instance, click the cluster name.

section 5Validate OCI Storage Attributes

Perform the following steps to validate OCI Storage details added to BDC :

  1. Open oci-curl.sh script file and update the following values with your OCI Storage values.
    local tenancyId=<your tenancy OCID>
        local authUserId=<your OCID>
        local keyFingerprint=<your fingerprint>
    local privateKeyPath=<your oci_api_key.pem path location>
    Add the following codes at the bottom of the script file:
    bucket_name=<your bucket name>
    namespace_name=<your namespace name>
    region=<your region>
    tenancyId=<your tenancy OCID>
    oci-curl objectstorage.${region}.oraclecloud.com get "/n/${namespace_name}/b/${bucket_name}?compartmentId=${tenancyId}"
  2. Run the script in your terminal to validate your bucket.
    bash oci-curl.sh
    Note: If the script runs successfully, then the response will show a valid bucket:
    {"namespace":"xxxx","name":"bdcsce","compartmentId":
    "ocid1.compartment.oc1..xxx...}
    Invalid Bucket:
    {"code":"BucketNotFound","message":"The bucket 'bdcsce' 
    does not exist in namespace 'xxxx' "}}


more informationWant to Learn More?