Using TPC-H metrics to explain the three-fold benefit of Cascade Analytics System (II)

The first blog on this topic  introduced TPC-H Composite Query-per-Hour Performance metric (QphH@Size), a formula which included the size of the dataset against which vendors must benchmark their systems (hence the high numbers), and  TPC-H Price/Performance metric, expressed as the ratio between the total cost of the analytic system and the QphH@Size.

TPC’s metrics are objective in comparing, by numbers, systems running the standard set of queries against same size and distribution datasets. By those numbers, Cascade Engine showed outstanding performance.

This post drills further into TPC-H methodology to recover metrics with more practical meaning for BI system users such as the averages in query response speed of a TPC-H benchmarked analytic system and the cost of resources to support performance at larger data scales.

Business analysts care about speed and its sustainability when data size or number of concurrent sessions go up. They expect the analytic system to maintain its response speed at scale (“scalability”).

Note: BI analysts have yet another concern – the speed of an “ad-hoc” query, a query that has not been ran before and, no matter how well tuned up the query processing engine was, it could default into a full scan of the data, the slowest most expensive operation in terms of time and resource consumption.

The query processing speed of a system, although not a “primary metric”,  is embedded in the performance test required by the TPC-H benchmark methodology.

The performance test consists of two runs of a well defined, representative set of queries, issued serially within each user session: 
1. a "Power test", to measure the raw query execution power of the system when connected with a single active user.
2. a "Throughput test", to measure the ability of the system to process the most queries in the least amount of time, through two or more sessions, one session per 
query stream and each stream must execute queries serially 
(i.e., one after another).

The geometrical mean of query execution time is expressed by TPC-H Power per hour@SF, a secondary metric. This mean is calculated as TPC-H Power/(3600*SF).

The calculated speed and the computing resources (memory and processing power) supporting it  is tabulated below.  Cascade System is compared against the same TPC-H – Top Ten Price/Performance Results Version 2 (Results  As of 25-Aug-2017 at 6:00 AM  [GMT] ), taking the top cluster and non-cluster performer at each Scale Factor.  At one glance, for the Scale Factors at which it was benchmarked, Cascade Engine was twice faster using two-five times fewer computing resources, without even considering its data encoding and compression advantage for the amount of RAM needed to keep all data “in-memory”.

      Best results by TPC Cascade Engine System
SF (~GB size) Database size(rows) Cluster Speed (sec.)
Total RAM (GB) Total #proc.
Total #cores Speed (sec.)
Total RAM (GB) Total #proc. Total #cores
100 600M N 0.78 64 2 16 0.2 64 1 8
100 600M Y 0.28 96 12 120 0.18 192 6 48
300 1.8B N 2.09 128 2 16            * *
 *  *
300 1.8B Y 0.47 288 24 240 0.26 256 8 64
1000 6B N 4.09 512 2 44 0.87 960 2 20
1000 6B Y 0.82 640 40 400 0.4 384+ 10 80
3000 18B N 5.02 3072 4 96 2.11 960+ 2 20
3000 18B Y 1.59 1792 56 560           * * * *
10000 60B N 20.27 6144 4 112           * * * *
10000 60B Y 3.91 5440 68 680           * * * *
30000 180B N 75.04 12800 8 144           * * * *
30000 180B Y 10.31 12800 80 800          * * * *
1000000 600B N _   _            * * * *
1000000 600B Y 30.63 38400 100 1000          * * * *

+ RAM shortage caused some data to be stored on the disc; calculated “speed” could be higher

*The query processing speed (geom. mean) in the above table is calculated as TPC-H Power/(3600*SF) and expressed in s.  TPC-H benchmark methodology requires a "power" test, to measure the raw query "execution power" of the system when connected with a single active user. The result is the geometrical mean of query execution time expressed TPC-H Power per hour@SF, a secondary metric.   
The other secondary metric is TPC-H Throughput, measured during throughput test. This metric's values demonstrate the ability of the system to process the most queries in the least amount of time. (performance). The arithmetic   SF*(S*22*3600)/TPC-H Throughput 
In performance comparisons, geometrical means are a more objective representation of system capabilities than arithmetic means are.
The TPC believes that comparisons of TPC-H results measured against different database sizes are misleading and discourages such comparisons”.
TPC-H Composite Query-per-Hour Metric (QphH@Size) is the geometrica
Top TPC-H benchmarked systems by average speed of query response time (seconds)

As we “unbundled” the system speed from TPC-H metrics, it becomes clear that Cascade Engine’s outstanding performance is supported by  substantially fewer hardware resources (CPU and RAM). These resources are also the most expensive components of a server unit, therefore, using fewer of them yields the “lowest cost of performance” benefit claimed by Zeepabyte.

Each vendor participating in the TPC-H benchmark publishes the technical specifications of the hardware used. When plotting the TPC-H Composite Performance metric values “Query-per-Hour@Size” (equivalent Scale Factor) per unit of computation, either processor “core” or “thread”, Cascade Analytic System (Z) , demonstrates “sustainable scalability” with data volume and well-engineered multi-threaded implementation of its data search algorithm.

Why does “sustainable scalability” matter?

When business analysts are asked to deliver faster reports from more data, the  pressure raises on the mid-level management responsible for business technology to procure more “scalable” analytic solutions. Under tight implementation deadlines there is no time to drill into arcane benchmark results from various vendors to design “scalable solutions” for their own business.  Half a century of conditioned IT thinking in “support systems”, pushes for expanding access to centralized DWHs  by building smaller datawarehouses for each department, which then spawn hundreds of datamarts and thousands of OLAP cubes.

Not surprising, this is an uncanny way to temporarily solve the “scalability” problem, because each of these smaller systems can be expanded with additional storage and computing resources at their own level (and departmental expenses!) up to a point of diminishing returns, where (or when) most of them become slower and too expensive to expand year over year.  That is the point when business analysts start complaining and new analytic system is procured. At the “overall supporting system” level this is just another island of temporary performance relief because the reality is a wider and wider “data swamp”.

Data processing in analytic systems is algorithmic, through software. On a large data set, software can process different parts of  it independently, on multiple CPUs. Therefore it is important to distinguish “scalability” achived by adding a certain amount of resources to sustain some system performance parameters (e.g. TPC-H QphH@Size) from the efficient use of those computing resources (e.g. software implementations).

This idea is brilliantly explained in "Scalability - but at what COST?"  ,  a paper presented at HotOS 2015 conference by 3 ex-Microsoft researchers who work on distributed computing systems and data parallelism. They  surveyed measurements of data-parallel systems touting impressive scalability features, implemented their data processing algorithms  in single thread computing and reproduced the benchmark conditions to what degree are the "scalable" parallel implementations truly improve performance, as opposed to parallelizing overheads that they themselves introduce. 
Defining "COST" of data processing with a given algorithm as the hardware 
configuration required before the platform outperforms a competent 
single-threaded implementation, they found that many systems have surprisingly large COST, that is they used hundreds of cores to match the single threaded implementation performance.

In a natural analogy,  a well engineered analytics system shall behave like watermills:  the water wheel works faster, it yields more power, when more water flows through; an additional wheel shall be engaged when the previous or proximal one reaches maximum speed.

Cascade Analytic System, benchmarked in both cluster and non-cluster configurations, has these natural characteristics, The following graph illustrates its behavior on 3 different type and generations of processors.

The next post will drill further into the benefits of efficient use of computation power translated into real cost advantage at very large data volumes and analytic workloads.

Stay tuned and do not forget to download and use the trial version of Cascade Engine!

Author: Lucia Gradinariu

Leave a Reply