Each member of Zeepabyte’s founding team has worked for more than 15 years with large scale databases and infrastructure, serving data driven business intelligence to operational teams and processes. Whether it has been about Essbase, Informix or Oracle’s warehouses, each of us aligned data from relational and semi-structured data sources, built complex schemas, partitioned data and maintained indices or penetrated the guts of database engine optimizers and network protocols to speed-up business reports and execute ad-hoc queries right on time to support decision making.
We knew too well that our expert efforts added high operating costs to the heavy bills of software and hardware licensing and maintenance. But we were able to solve the “cost of performance” equation and managed to improve response time for our most demanding customers in airline, financial or telecommunications industries.
However, our frustration with analytic database technologies grew over time because most of them approached “infinite costs” while query response time plateaued far from the near real-time (seconds). Occasionally, more often than seldom, some important queries hit incomprehensible, insurmountable snags within the more and more distributed, interdependent and auto-magically managed software stacks.
The flows of data to be scrutinized for business intelligence multiplied, diversified and started to exhibit hard to tame dynamics. Big data analytics extended and expanded traditional analytic database systems, pushing a stream of technology innovations eager to be mapped on market analysts’ maps, quadrants and other visuals for rapid categorization, sorting and ranking. A good collection can be downloaded here.
The new “Business Intelligence Architect” role is in demand but also in pain. Today, our friends and former brothers in arms, business analysts and expert DBAs, holding such maps in their hands, pave the battlefield between the Governance guards overseeing enterprise data assets in the enterprise warehouse and the pressing business strategies in need for real-time actionable insights to compete, with increased agility, in crowded and dynamic marketplaces.
“What is the real cost of Business Intelligence and Real-Time Analytics? Let me tell you what it takes to answer a critical business question these days” said one of my friends who builds OLAP cubes for Product Marketing BI in a major US telecommunications company.
“First I have to find the data sources for the BI report. There are hundred of databases around the company connected to the Data Warehouse and almost all the time I need access to one of them because either the data is newer or the data is missing from the Data Warehouse. It takes days to find the owner of a database and get that access.
Sometimes i have to cross the swamp to get to the Data Lake, the innovation lab, and create a new data source from social-media, data anonymizing engines or IoT edge devices.
Then I have to pass the scrutiny of the gatekeeper and, if lucky, filter a data view with which to align the new data or the new dimension. I could remove or reuse some of the old cubes to save storage space but the time cost of getting the data access back and rebuild them, in this environment, is prohibitive.
I do all the query optimization work back in my swamps, because there is no tolerance for uncertainty over how long and how much memory and CPU resources will be taken out of the daily operating budget of the IT department (on premises or in the cloud). Something unexpected happens the first time I run a new query.”
A BI Architect data swamp consists of numerous connectors into the Data Warehouse, thousands of OLAP cubes, hundreds of Data Marts, one tap into the big Data Lake hanging in the cloud with torrents of data coming from over uncontrollable edges.
Driving near real-time query performance in this data swamp feels as easy as speeding up Orinoco river’s flows in this aerial picture taken in 2001!
Zeepabyte was born on Alex’s innovative idea about how to encode and search data at blasting speeds without burning hundreds of CPUs. But mastering the variables of cost of performance equation across Business Intelligence analytics data swamps? We needed an objective framework to measure expected performance metrics and all gauge sources of costs when running complex business queries against datasets which mirror the nature, organization and scale function of Business Intelligence and IoT analytics use cases.
TPC-H has reigned over benchmarking “the cost of performance” of analytic systems used by business organizations.
“[…]This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.
The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.”
In less than two years, using TPC-H methodology, tools and metrics, we tested Zeepabyte’s first Cascade Analytics System implementation with large infrastructure partners such as Mellanox Technologies Lab in Silicon Valley and IBM’s Power Development Cloud.
We slashed the costs and achieved near real-time performance on Star Schema Benchmark early on. Recently, Cascade Zippy Analytics System, running Version 2.0 of Zeepabyte’s Cascade Engine queried 3TB of data in less than 3 seconds using IBM Power Systems.
Let us know how you are doing! Zeepabyte Cascade Trial Forum is now open to support your experience and get feedback from you on our product.