current position:Home>Hive operation HBase summary

Hive operation HBase summary

2022-09-23 10:19:09Actually I'm real

First create an Hbase logon in Hive.

Type>hive to enter the hive command line and execute the hive table creation statement as follows

CREATE TABLE lwb_test1(key string,xm string,nl int)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:xm,data:nl")TBLPROPERTIES ("" = "lwb_test1");

The first line is to create a table and attributes. This key must be written. Other fields and types are added after the key. Here are the fields and types of hive.

The second line is fixed.The hbase.columns.mapping in the third line is the mapping field, which should correspond to the first line, where the field of hbase is defined.

The fourth line, is the table name of hbase.

Note: The table built here is called an internal table: an internal table refers to a table created by hive and entered into the database through load data inpath. This kind of table can be understood as a data table in which data and table structure are stored together.When you delete the table structure in the metadata through DROP TABLE table_name, the data in the table will also be deleted from hdfs.

(It means that when you delete the hive table, the hbase table will be deleted at the same time. After all, it is created through hive. It is understandable to delete it when you delete it)

The second way to create a table is to create a table in HBase now, which is also my current mode, because the hbase table has been built long ago, and there is already business data in it. At this time, I want to use hive for query statisticsand other functions, then I need to create a table corresponding to the existing table of Hbase in hive, which is written as follows:

 CREATE EXTERNAL TABLE realtimetable(key string,vin string,soc int,speed int,mileage int,cds string,vehicles_status string,savetime bigint)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:vin,data:soc,data:speed,data:mileage,data:cds,data:vehicles_status,:timestamp")TBLPROPERTIES ("" = "RealtimeMessage"); 

The EXTERNAL keyword creates an external table. If the label lwb_test1 already exists in Hbase, an exception will be thrown directly without this keyword when creating a table.

External table means that the data has been saved in hdfs before the table structure is created. By creating the table structure, the data is formatted into the table structure.When DROP TABLE table_name, hive will only delete the table structure of the metadata, but not the files on hdfs. Therefore, compared with the internal table, the external table can be used more confidently and boldly.(meaning that when the hive table is deleted, it will not have any effect on the hbase table)

Is there any effect on Hive by deleting the table from Hbase? I have not done any experiments.

Note: Here in order to improve efficiency, usually the query is to useInternal table, but the internal table associated with HBASE is actually very slow.

My approach is to create a Hive table directly,Then periodically insert the HBase-related table into the pure hive table.

This is how

insert from table2 select * from table1; here table1 is a table associated with hbase, and table2 is a pure hive table, not associated with any hbase.

In order to improve the query efficiency of table2, this is an optimization, that is to use ORC format for storage, otherwise hiveThe default is textfile, plain text. I experimented with my own data. The basic ORC format storage is 1/10 of the textfile, and the ORC query is also optimized. The query efficiency from 1G and 10G data must also be very efficient.obviously.For the specific data storage type of Hive, you can check it yourself.

ORC format table creation statement

CREATE TABLE lwb_test3(key string,vin string,soc int,speed int,mileage int,cds string,vehicles_status string) STORED AS ORC;


The last is a simple hive program code, query hbase

import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.ResultSetMetaData;import java.sql.Statement;public class HiveTest1 {/** There was a wrong example on the Internet before. The loading path was org.apache.hadoop.hive.jdbc.HiveDriver. There was an extra hadoop, which should be an older version.* The hive version I use is 1.2.1*/private static String driverName = "org.apache.hive.jdbc.HiveDriver";public static void main(String[] args) {try {Class.forName(driverName);Connection con = null;con = DriverManager.getConnection("jdbc:hive2://", "hive", "hive");//Before jdbc:hive: may also be written in the old versionStatement stmt = con.createStatement();ResultSet res = null;String sql = "select * from lwb_test1";System.out.println("Running: " + sql);res = stmt.executeQuery(sql);System.out.println("ok");ResultSetMetaData rsm = res.getMetaData(); // get the column setfor (int i = 0; i < rsm.getColumnCount(); i++) {System.out.print(rsm.getColumnName(i + 1) + "\t");}System.out.println();while ( {System.out.println(res.getString(1) + "\t" + res.getString(2) + "\t" + res.getInt(3) + "\t");}} catch (Exception e) {e.printStackTrace();System.out.println("error");}}}

copyright notice
author[Actually I'm real],Please bring the original link to reprint, thank you.

Random recommended