Hive-HBase

Prerequisites

Before installing HBase or Hive:

Java 8 or 11 installed ($JAVA_HOME configured)
Hadoop installed and configured (HDFS working)
start-dfs.sh runs successfully
Linux environment (Ubuntu/Debian preferred)
Basic familiarity with HDFS commands and XML config files

Optional for Hive:

MySQL server installed and running
JDBC connector for MySQL placed in Hive’s lib/ directory

HBase Core Concepts

HBase is a distributed, column-oriented NoSQL database built on HDFS.
Data is organized as:
Tables → Rows → Column Families → Columns → Cells (timestamped values)
HBase uses Zookeeper for coordination.
Tables must be created with at least one column family.
Supports random, real-time read/write access to large-scale data.

HBase Installation (Standalone / Pseudo-Distributed)

1. Download and Configure

wget https://archive.apache.org/dist/hbase/1.4.13/hbase-1.4.13-bin.tar.gz
tar xvf hbase-1.4.13-bin.tar.gz
mv hbase-1.4.13 ~/hbase

2. `.bashrc` or `hbase-env.sh`

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HBASE_HOME=~/hbase
export PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:$HBASE_HOME/lib/*
export HBASE_MANAGES_ZK=true

3. `hbase-site.xml`

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase_data</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/hadoop/zookeeper</value>
  </property>
</configuration>

4. Run HDFS and HBase

start-dfs.sh
start-hbase.sh

HBase Shell Basics

hbase shell

create 'test', 'cf'
put 'test', 'row1', 'cf:a', 'value1'
get 'test', 'row1'
scan 'test'
disable 'test'
drop 'test'

Hive Core Concepts

Hive is a data warehouse built on Hadoop, with SQL-like interface (HiveQL).
Executes queries as MapReduce or Spark jobs.
Uses a metastore (backed by MySQL, Derby, etc.) to store metadata.
Best for batch-oriented data analysis.

Hive Installation

1. Download and Extract

wget https://archive.apache.org/dist/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
tar -xzf apache-hive-2.3.9-bin.tar.gz
mv apache-hive-2.3.9 ~/hive

2. `.bashrc` or environment

export HIVE_HOME=~/hive
export PATH=$HIVE_HOME/bin:$PATH
export HADOOP_USER_CLASSPATH_FIRST=true

3. Setup HDFS directories

hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp /user/hive/warehouse

Hive Metastore with MySQL

sudo apt install mysql-server
sudo systemctl start mysql

-- Run in mysql shell
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';

Add JDBC Connector

wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-5.1.48.tar.gz
tar -xvzf mysql-connector-java-5.1.48.tar.gz
cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar ~/hive/lib

Configure `hive-site.xml`

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

Initialize the Metastore

schematool -initSchema -dbType mysql

HiveQL Commands

CREATE TABLE demo1 (id INT, name STRING);
INSERT INTO demo1 VALUES (1, 'joy');
SELECT * FROM demo1;
DROP TABLE demo1;

Load Data from Local or HDFS

CREATE TABLE demo2 (name STRING);
LOAD DATA INPATH '/demo.txt' INTO TABLE demo2;

Hive Table Types

Managed Table: Data is stored in Hive’s warehouse directory. Dropping table deletes data.
External Table: Data is managed externally (e.g., already on HDFS). Hive only manages metadata.

CREATE EXTERNAL TABLE logs (
  name STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 'hdfs://localhost:9000/data/logs';

Partitioning & Bucketing

Partitioned Table

CREATE TABLE logs (
  id INT,
  msg STRING
)
PARTITIONED BY (date STRING);
 
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

Bucketed Table

CREATE TABLE users (
  id INT,
  name STRING
)
CLUSTERED BY (id) INTO 4 BUCKETS;
 
SET hive.enforce.bucketing=true;

🙌 Sirin's Note

Explorer

Hive-HBase

Prerequisites

HBase Core Concepts

HBase Installation (Standalone / Pseudo-Distributed)

1. Download and Configure

2. `.bashrc` or `hbase-env.sh`

3. `hbase-site.xml`

4. Run HDFS and HBase

HBase Shell Basics

Hive Core Concepts

Hive Installation

1. Download and Extract

2. `.bashrc` or environment

3. Setup HDFS directories

Hive Metastore with MySQL

Add JDBC Connector

Configure `hive-site.xml`

Initialize the Metastore

HiveQL Commands

Load Data from Local or HDFS

Hive Table Types

Partitioning & Bucketing

Partitioned Table

Bucketed Table

Table of Contents

Graph View

Backlinks

🙌 Sirin's Note

Explorer

Hive-HBase

Prerequisites

HBase Core Concepts

HBase Installation (Standalone / Pseudo-Distributed)

1. Download and Configure

2. .bashrc or hbase-env.sh

3. hbase-site.xml

4. Run HDFS and HBase

HBase Shell Basics

Hive Core Concepts

Hive Installation

1. Download and Extract

2. .bashrc or environment

3. Setup HDFS directories

Hive Metastore with MySQL

Add JDBC Connector

Configure hive-site.xml

Initialize the Metastore

HiveQL Commands

Load Data from Local or HDFS

Hive Table Types

Partitioning & Bucketing

Partitioned Table

Bucketed Table

Table of Contents

Graph View

Backlinks

2. `.bashrc` or `hbase-env.sh`

3. `hbase-site.xml`

2. `.bashrc` or environment

Configure `hive-site.xml`