The idea behind query automation is the ability to schedule and orchestrate query execution and result storing. Query automation is an important part of fraud detection systems, especially in moving them towards real time operation. Many processes require data to be fully processed and complete before using it. I will describe and explain one such system built as a part of a project. The client was a bank team whose daily work included running a lot of queries manually to check if data is ready, storing results once they are available and forwarding them downstream to detection/alerting systems. All these processes require little human interaction and are great candidates for automation, and the client was quick to recognize this.
The core feature set of a query automation engine (QAE) is:
The whole engine is built in Snowflake, apart from notifications. Snowflake has many advantages and a few challenges as well. The main advantage it has is tasks, database objects that execute an SQL statement on schedule or immediately. There are three levels of task use:
High-level engine overview
Building the QAE this way is very high-level since the tasks will solve most if not all low-level problems. The tasks themselves are running stored procedures. The whole QAE is automated, apart from two stored procedures that users interact with, one for activating a query and one for deactivating it. The activation performs the validations and enters the query into QAE. The deactivation suspends the task, marks query as inactive in the execution table and deletes it from the execution stack.
The biggest challenge is something we discovered as the load in production increased. Snowflake is OLAP, so we assumed the transactions are not its strong point. What Snowflake does is transaction queueing, meaning it will queue up to 20 transactions to a single table and try to execute them when conditions allow. The problem is, all subsequent transactions after the 20th will fail with an error. The first 20 errors will get queued to the log table, then the log table itself will get locked by the queued transactions. Subsequent transactions will cause the error and get logged to information_schema system tables.
This can be partially solved by increasing warehouse size, maximum concurrency level and/or maximum cluster count. However, all three methods only increase the limit, there is no way to remove it. Some design considerations can reduce the probability of this error further. An example would be limiting the total number of parallel tasks to a number lower than or equal to 20 (in our example we chose 15). Every task execution will log its work to log table, update execution stack and more. In this way, the probability of the number of transactions queued to either of these tables is reasonably low and the error stops occurring.
There are already plenty of products like the QAE I described. Some have graphic UI and different query lifecycles. While most of them are general purpose, QAE has specific features that make it tailored to the client’s needs. Total costs are also significantly lower as there is no license fee (most of the QAE costs depend on warehouse size and number of task runs). In retrospect, using an OLAP system introduced the only significant limitation, the 20 simultaneous transactions limit. Currently, there is another team in the client bank interested in using QAE and the most likely solution will be a separate installation, possibly with some changes to feature set. On the other hand, OLAP minimizes queries execution times, which is possibly the most important metric, so using OLTP would have its own downsides as well.
Have a need to automate your processes? Feel free to contact us, our team of experts have years of experience and can help you reduce the risks and optimize your time.
Banks are dealing with big amounts of sensitive data daily. If you would like to know more about our solutions with banking institutions, find out more about how we reduced risk with bank loan default prediction.