What is Google Bigtable?
Google Bigtable is a NoSQL distributed storage system for managing petabyte-scale structured data. Bigtable is designed for fast, low-latency access to data, with scalability and reliability in mind.
Internally, Google uses Bigtable for a number of services, including Google Earth, web indexing, and Google Analytics. While Bigtable stores data in a tabular format, it is not a relational database.
Google started development of Bigtable in 2004, releasing the general Bigtable white paper in 2006. In 2016, Google unveiled Bigtable as a standalone cloud product.
At its core, Bigtable is a distributed, multilevel map that can handle millions of reads/writes per second and manage several terabytes of in-memory data, along with several petabytes of data on-disk, spread across multiple compute locations.
How does Bigtable work?
There are four types of major NoSQL databases: Key-Value, Column-Family, Graph, and Document. Bigtable is a column-family or wide-column-based NoSQL database, similar to Cassandra and Hbase.
Wide-column stores differ from traditional relational databases since their two-level structures do not use a columnar layout. Instead, columns are grouped into 'families' and stored together.
Like a relational database, a Bigtable has rows that describe a single entity, and columns which contain individual values by row.
Unlike a relational database, columns are grouped into a column family. Each column is identified by a combination of the column family and a column qualifier— a unique name within the family.
Thus, in this structure, each row can have multiple cells, where each cell has a timestamped version of the data for that row/column.
By storing multiple cells in a column, the table has access to a record of how the stored data for that row and column has changed over time.
Bigtable tables are sparse, meaning if a column is not used for a particular row, it does not take up space.
When to Use Bigtable
Bigtable, like most wide-column databases, is ideal for use cases that require a large dataset that can be distributed across multiple database nodes. Datasets that can be easily deconstructed for parallel-processing.
These will take advantage of Bigtable’s parallel architecture and allow for the rapid processing of petabyte-scale data. Such use cases include:
- Log data
- Sensor data, for example from IoT applications
- Geographic information
- Reporting systems
It should be noted that wide-column databases have a much narrower application than traditional SQL databases. Shopify, for example, uses Bigtable as a sink for streaming events but finds a majority of data can be modeled in a relational fashion.
Bigtable is a complicated solution that can be difficult to implement, and may not be the best fit for your organization. For smaller-scale datasets, creating an ETL/ELT data pipeline may be a more suitable solution. If that's the case, we recommend Zuar's ELT platform Runner.
Runner can gather data from hundreds of potential sources, model it, transport it to the data warehouse of your choosing and connect it to a visualization platform such as Tableau.
Google Bigtable FAQs
Is Google Bigtable free?
No, Bigtable users are billed according to three variables: the type of Bigtable instance and the total number of nodes in their instance's clusters, the amount of storage that their tables use, and the amount of network bandwidth used. See more details about Bigtable pricing here.
What language is Bigtable written in?
Google Bigtable is written in C++, Java, Python, Go, and Ruby.
Is Bigtable serverless?
Bigtable is not serverless. Bigtable instances are instead managed. That is, they require tuning and database administration to maintain and operate.
Is Bigtable a data warehouse?
No, Bigtable is a wide-column NoSQL database, which differs from a traditional data warehouse.
What is the Bigtable equivalent in AWS?
Amazon does not offer a wide-column database in direct competition with Bigtable. Instead, users may install a wide-column database on Amazon EC2 or EMR. Apache Cassandra is a popular choice.
Is Bigtable OLAP or OLTP?
Bigtable is neither OLAP (online analytical processing) or OLTP (online transaction processing)—both of which are SQL-based systems. Bigtable is a NoSQL database. BigQuery is Google’s OLAP offering.
Is Bigtable a DBMS?
Yes, Bigtable is a database management system. Loosely defined, database management systems are simply computerized data-keeping systems. Bigtable is a non-relational DBMS, meaning it lacks a traditional row/column/SQL interface.
Does BigQuery use Bigtable?
No, BigQuery and Bigtable are two completely separate Google products.
What is the difference between Datastore and Bigtable?
While similar, Bigtable was designed for HBase compatibility, whereas Datastore is geared towards Python/Java/Go web app developers. Furthermore, Datastore supports multiple row indexes whereas Bigtable supports only one. Additionally, there are significant differences in the pricing models for both products.
Is Bigtable open source?
No, Bigtable is a proprietary Google product. That being said, it is accessed through the open-source Apache HBase API.