How do I integrate HBase with Spark
Spark cannot get all of the Hbase data in certain columns - hadoop, apache-spark, mapreduce, hbase
My Hbase table has 30 million records, each record has the column, raw is column family sample is column. This column is very large and ranges from a few KB to 50 MB. When I run the following Spark code it can only get 40,000 records, but I should get 30 million records:
Right now I'm working that off by first getting the ID list and then doing the ID list to get the column from pure Java client from Hbase in Spark foreach. Any ideas please as to why I can't get all of the columns from Spark, is the column too big?
A few days ago one of my zookeeper nodes and datanodes went down but I fixed it soon as the replica is 3. Would think if I run it would help, thank you very much!
Reply:1 for the answer № 1
TableInputFormat creates a scan object internally to get the data from HBase.
Try to Create a Scan Object (without using Spark) configured to get the same column from HBase if the error repeats:
In addition, TableInputFormat is configured by default to request a very small block of data from the HBase server. Set the following to Increase the block size:
1 for the answer № 2
For a high throughput like yours, Apache Kafka is the best solution to integrate the data flow and keep the data pipeline alive. See http://kafka.apache.org/08/uses.html for some use cases of Kafka
One more http://sites.computer.org/debull/A12june/pipeline.pdf
- What does OK to get on mean?
- How were houses designed before computers?
- What happened to the hippies
- What's wrong with Pickering Ontario
- How is NIFT Kannur for textile design
- What's your favorite song from Nas
- Can love boys as much as girls
- Why do people ship brason
- Which number has the most factors
- What attracts top footballers to China
- Is monoculture responsible for soil erosion
- Can Tyrion Azor Ahai be
- How is a solid rocket made
- Is it normal for babies to snore?
- Is the PMC Bank a government bank
- How would you reform Australian secondary education
- What is an XD file
- Why are people instinctively unethical
- Can possums be trained
- What is an example of information
- What are the best Nintendo games of all time
- Why do people use the Twitch app
- How did you invest your first 100,000
- A 40 degree Celsius fever is dangerous