current position:Home>Python & Spark Study Notes Series [Chapter 3] Example: Python+Spark+Hbase

Python & Spark Study Notes Series [Chapter 3] Example: Python+Spark+Hbase

2022-09-23 10:17:55Actually I'm real

Here my original data is in Hbase. The following will introduce two ways to achieve the effect of operating Hbase through Spark.

The first type: Spark directly connected to HBase

The code is as follows, but I encountered a problem here, that is, the used in the code below, I used IDEA to run locally and prompted that the class could not be found, and the error was:



This class is under hbase-common-1.0.0.jar, and it should be available in other versions. I have used various methods to import this jar package without success, and the error remains unchanged, so I actually did not have this example.Done, if there is a solution, I hope you can leave a message for me.

from pyspark import SparkContext, SparkConfimport osos.environ['JAVA_HOME'] = 'D:\Java\jdk1.8.0_92'conf = SparkConf().setMaster("local").setAppName("spark_hbase_test")sc = SparkContext(conf=conf)host = ',,'table = '2:IndexMessage'conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat","","org.apache.hadoop.hbase.client.Result", keyConverter=keyConv, valueConverter=valueConv,conf=conf)count = hbase_rddprint(count)

The second type: After creating a Hive table and establishing a mapping relationship with HBase, using SparkSQL to access Hive can also achieve the purpose of operating HBase

Refer to my previous post for how to establish the corresponding relationship between Hive and Hbase

To be continued

copyright notice
author[Actually I'm real],Please bring the original link to reprint, thank you.

Random recommended