Serving the “Data Warehouse Mass Market”

The term “data warehouse” was coined some 20 years ago to represent the process and technologies used to maintain a copy of operational data for decision support purposes.  That is, as companies processed more and more transactions (eg, retail point-of-sales, banking withdrawals / deposits, airline reservations, etc) it made sense to store and maintain a copy of the data to allow people to analyze and optimize the business.

Today, data warehousing is a $20B+ industry and growing.  The market can be segmented by size into “high-end data warehouses” with 10’s to 100’s of terabytes of information, and “mass-market data warehouses” with less than 10 terabytes.  The high end of the data warehouse market accounts for over half of the spend, but only 10% of the potential deployments.

Predictably, most of the competition from data warehouse solution providers is at the high end, where the money is.  Companies like Teradata (TDC) and Netezza (NZ) have gone public by servicing this end of the market and are chased by several startups hoping to get their slice of the high-end pie.

Systems at the high end of the market, however, are very expensive – costing often hundreds of thousands and sometimes millions of dollars.  In order to get high performance against large amounts of data, companies like Teradata have had to develop massively parallel processing machines that partition the data across many disks and reduce the processing by throwing lots of CPU’s in parallel against each query.  Netezza took a step further by developing a special chip to filter the data at the disk controller level thereby reducing the I/O bottleneck.

However, 90% of the rest of the companies that could potentially benefit from data warehousing (ie, the data warehousing mass market) are under-served. Why is this? There are a few key reasons:

(1) Mass market companies expect high performance, but don’t have much money;
(2) They don’t have the resources required to implement a high-end solution;
(3) They have been hard to access for startup companies.

So, what is required to crack the data warehouse mass market? 

To address the price/performance issue, data warehouse vendors have had to come up with more cost-effective ways of providing high performance and scalability at an affordable price. In Kickfire’s case, we have addressed the I/O bottleneck by providing a column-oriented rather than row-oriented database engine, which allows much higher data compression and retrieval of only the fields needed by a query rather than the whole row.

More importantly, we have addressed the CPU bottleneck by delivering the industry’s first parallel-processing chip, whereby we pipeline the data through the chip and execute SQL operators at clock speed. These two major technological breakthroughs have allowed Kickfire to deliver high-end performance at one-tenth the cost to our customers. In fact, Kickfire owns the world record in price/performance according to the widely-accepted TPC-H data warehouse benchmark.

To address the limited resources issue, data warehouse providers have had to deliver their offerings in new ways, sometimes implementing or leveraging new business models such as on-demand and cloud computing.  The predominant delivery model of choice for data warehousing, as proven by Teradata and Netezza, remains the appliance model. In this case, the hardware, software, and storage are all pre-configured and tuned to deliver high performance out of the box. 

In Kickfire’s case, we have packaged our solution using standard commodity storage and servers combined with our SQL chip, which acts as a co-processor in much the same way as an Nvidia card acts as a co-processor to the general-purpose CPU.  Additionally, we have plugged our column-store engine into MySQL through its storage engine API, such that it appears to a user as standard MySQL. This means Kickfire automatically benefits from all the third-party application support that MySQL enjoys as the world’s most popular open source database.  In this way, a Kickfire appliance can be plugged into a standard server rack, and data can be copied from an operational MySQL database allowing customers to enjoy 10x – 100x performance improvement in minutes without any changes to their database or application.

Perhaps the toughest challenge for most data warehouse providers, though, has been simply gaining access to the companies in the mass market. This end of the market is dominated by Microsoft SQL Server and its vast network of channel partners and resellers. Here again, MySQL provides an important business model breakthrough. Given that the open source business model allows anyone in the world to download an increasingly functional relational database for free, it’s no wonder that MySQL has become the third most deployed (some say the most deployed) database in the world in short order. As such, Kickfire has been able to gain access to the mass market by leveraging MySQL and it’s vibrant, open, and accessible community with millions of users.

By providing high-end performance at a very affordable price, in an easy-to-use appliance, and based on an open source de facto database standard, Kickfire – and certainly others to follow – is enabling the 90% of companies in the mass market to enjoy the benefits of data warehousing.  A true “Blue Ocean Strategy” at work.

Bookmark and Share

Leave a Reply