Kickfire is really different than anything you have seen before
The Kickfire column store and SQL chip combine to achieve database performance never before seen in a small footprint and power efficient database appliance, or in any other relational database to date for that matter.
I’d like to demonstrate the performance of the Kickfire model 2400 appliance running query #17 of the DBT-3 benchmark. Others have blogged about this query recently, so I figured it would be good to look at Kickfire performance on this query. I decided to present results not at one or ten gigabytes, but instead at one terabyte of data.
Have you ever experienced using MySQL on a very large database?
Let me begin by saying that I don’t think anybody has ever had the patience to allow this query to run to completion on MySQL with one terabyte of data. At the time of this writing, Kickfire is the only MySQL database capable of running this query at all, let alone quickly.
I’ve personally let the query run for three days on a 300 gigabyte MyISAM database with a large 24GB key cache, before finally killing it. I’m not saying that MySQL isn’t a great database, because really it is. It is just that MySQL was not designed to run queries which examine large volumes of data. It is optimized for running queries which examine a small amounts of data, even when the database is large. With Kickfire the exact opposite is true. Our appliances accel at running queries which examine vast amounts of data.
If any one query can attempt to demonstrate the power of the Kickfire appliance, then this is probably the one. Remember that I’m going to run this query on a database with seven hundred more gigabytes of data than that test at 300G.
A quick review of our hardware
Before I actually show the query results, I want to review quickly a little bit about the appliance architecture. The SQL chip is situated in what we call the QPM (query processing module) which is attached to the base Kickfire server (the BSM) with a PCI-X cable. The QPM contains the SQL chip and RAM, which constitutes a very large CPU cache for the chip. While various CPUs today feature tens of megabytes of direct-attached cache, the SQL chip has direct access to gigabytes of cache.
The SQL chip, which features a dataflow architecture, uses its cache for multiple purposes. From caching portions of columns in memory to storage of intermediate results during the execution of queries.
When a query is processed by the appliance it is broken up into a set of interconnected data flow operators which access data stored in QPM memory. This memory is managed very efficiently by the Kickfire OS, which also handles prefetching data from disk for upcoming tasks. The Kickfire model 2400 appliance comes with 128GB of QPM memory which provides plenty of performance.
Okay, now that you made it through the hardware review, on to the show:
Kickfire, like MyISAM, stores the count of rows in the table, so COUNT(*) with no predicate is always fast. I used COUNT(*) to verify that indeed I am using a 1TB database with nearly 6 billion rows:
mysql> select count(*) from lineitem;
+------------+
| count(*) |
+------------+
| 5999989709 |
+------------+
1 row in set (0.01 sec)
First run query results
mysql> select
-> sum(l_extendedprice) / 7.0 as avg_yearly
-> from
-> lineitem,
-> part
-> where
-> p_partkey = l_partkey
-> and p_brand = 'Brand#33'
-> and p_container = 'WRAP PACK'
-> and l_quantity < (
-> select
-> 0.2 * avg(l_quantity)
-> from
-> lineitem
-> where
-> l_partkey = p_partkey
-> );
+------------------+
| avg_yearly |
+------------------+
| 308023084.442857 |
+------------------+
1 row in set (10 min 49.89 sec)
The first run of the query is the most expensive run, and thus the slowest. The database was cold, so there was IO necessary to process the results of the query. The first run takes just under 11 minutes, which is very impressive. Remember that this query was still running after 3 days on regular MySQL and that was with much less data in the database.
It gets better
Are you impressed yet? If so, great, if not keep reading because Kickfire can do better. Don’t get me wrong, 11 minutes is a very impressive amount of time in which to complete this query, particularly with just 3U of hardware. However, remember that I said the first run would be the slowest. With most of the data prefetched in compressed form into the QPM the SQL chip can blaze through tens of millions of rows per second to produce results. In fact, around 99% of the time spent answering the query was spent on IO. Believe it or not, the SQL chip sat idle for most of the query.
Most of the information necessary to answer the query is already in the QPM at this time. At this point the IO bottleneck has been, on the whole, removed for this query. We can say in a real sense that the data is in memory for the query.
Everyone knows that a query will be fast when the data is in memory, and they are right, but there is still a very big instruction-flow bottleneck, also known as the Von Neumann bottleneck or the traditional CPU bottleneck that “in memory” databases must live with. This bottleneck is not often discussed, because it is nearly ubiquitous in computing. Almost all CPUs use instruction flow, so without different hardware it is impossible to avoid. The SQL chip is the first specialized dataflow hardware for processing SQL instructions.
To demonstrate results when both bottlenecks are removed, I’ve changed the query slightly from examining brand #33 to #32. The database must process roughly the same amount of data for both queries. The database is synthetically generated and its properties are well known. I made the change to demonstrate that the query results are not cached, only the data.
This new query completes in 5.55 seconds. Kindly pick your jaw up off the floor. When the von neumann bottleneck is removed, it is possible to process gigabytes of data per second with the SQL chip.
mysql> select
...
-> and p_brand = 'Brand#32'
-> and p_container = 'WRAP PACK'
...
-> );
+------------------+
| avg_yearly |
+------------------+
| 308023084.442857 |
+------------------+
1 row in set (5.55 sec)
We can even complicate the query, giving it a disjunction, proving that the query result isn’t simply cached or materialized or in some other way “faked” by the database engine. The SQL chip is really processing gigabytes per second, this time coming up with the answer in just over 7 seconds.
mysql> select
...
-> and p_brand = 'Brand#32'
-> and (p_container = 'WRAP PACK' or p_container = 'SM BOX')
...
-> );
+------------------+
| avg_yearly |
+------------------+
| 614714472.377143 |
+------------------+
1 row in set (7.07 sec)
How much faster will Kickfire be for you?
I hope you are impressed. I’ve demonstrated that the SQL chip uses a dataflow architecture, which eliminates the traditional CPU register (instruction flow) bottleneck in combination with an advanced column store and memory management system that reduces or eliminates IO bottlenecks.
Of course, not every query is going to perform as well as this one, and I wouldn’t be being honest if I said so. As with any benchmark your mileage may vary. This query in particular runs completely in our SQL hardware, but this isn’t possible for all queries. Speed improvements of 10x to 100x faster over a regular MySQL server are typically possible, with most queries achieving results somewhere between those two extremes. Then there are queries like these, which aren’t even possible with MySQL. I can’t really say how many times faster we are on this query because I don’t know how long it would take to eventually complete on a regular MySQL server. I will leave that judgement up to you.