I’ve been working with HBase on HDInsight for some time. This is a series of tech notes I’ve accumulated over that time. This introductory post will talk about what HBase is and how it is implemented on HDInsight.
HBase is an open-source NoSQL database based on Google BigTable. It provides random access while still providing strong consistency for large amounts of unstructured and semistructured data. The database is a column oriented database and it’s essentially schemaless requiring no more than table name and column family definitions.
When working with HBase on HDInsight, Azure provides a managed cluster configured to store data on Azure Storage in place of HDFS. The cluster still provides direct support for MapReduce, Hive, and other Hadoop native tools even though the underlying storage is not HDFS.
HBase is a great tool for large data needs and can support many different use cases, including:
- key value store
- time series data – telemetry based streams
- real time queries (including SQL support via Phoenix).