In the latest version of Tailer 1.3.6, we introduced a new flag for the criticality of a task within a "Table to Table" configuration. This new conditional flag called "break" allows building complex configurations that are triggered on some conditions.
In this tips and tricks post, we'll show you how to write such tasks, some frequent use cases that can improve your configurations.
Conditional Task
Imagine you have some data files that arrive multiple times a day. And you have a set of bigquery intensive sql tasks that are chained to those files. In the previous version of Tailer, if you wanted to have only one execution, you had to schedule your sql tasks once per day at some (precise) arbitrary time. And you had to hope to have that timing synchronized with the file arrival time.
A "Break" task can be written by positioning the "criticality" value to "break". When a task has this value, no matter what the sql result will be, it will not trigger a fail of the configuration. It will though stop the execution
at this task if the "break" query raise an exception.
{
"task_type": "sql",
"criticality": "break",
"id": "check_something",
"short_description": "Task short description",
"temporary_table": false,
"sql_file": "000099_TestConditional_check.sql",
"doc_md": "000099_TestConditional.md"
},
This means if we have the following sequence of tasks :
"task_dependencies": [
"A('criticality': 'break') >> B >> C"
]
If task A fails, B and C will not be executed but the entire configuration will be "success".
If task A succeed, B and C will be executed.
Note
In Tailer, the sequence of tasks is currently flatten into a linear topography. This means that a set of parallel tasks will be resolved into a linear non parallel tasks. This has a huge boost in performance as there is minimum latency between tasks but the drawback is the missing benefit of pure parallel executions.
This also means a "break" task can be inserted within a chain of tasks but the result in case of interruption ( by the "breaking" task) might occurs in a non deterministic location. For that reason, it is not recommended to insert a "break" in the middle of parallel branches of tasks.
Usage
Some interesting usage of the "break" task can be developed through assertions. Let's describe a sequence of two tasks ("Run_Query" and "Swap_Query") and we want this set of tasks to start only on some conditions. We can create a task called "Check_Condition" of type SQL with the criticality set to "break" and sequence it before tasks "Run_Query" and "Swap_Query".
The sequence :
"task_dependencies": [
"Check_Condition >> Run_Query >> Swap_Query"
]
The "break" task :
{
"task_type": "sql",
"criticality": "break",
"id": "Check_Condition",
"short_description": "Check Condition Before Start",
"temporary_table": false,
"sql_file": "000099_Check_Condition.sql",
"doc_md": "000099_Check_Condition.md"
},
SQL of "break" tasks
Now we can write the "000099_Check_Condition.sql" script.
Here are some patterns to enable/disable a sequence.
-- ASSERTION BASED ON A CONDITIONAL DATE EG BEGINNING OF THE MONTH
ASSERT CURRENT_DATE() = DATE_TRUNC(current_date(), month)
as 'This configuration should only be executed at the beginning of each month';
-- ASSERTION BASED ON A CONDITIONAL DATE EG BEGINNING OF THE WEEK
ASSERT CURRENT_DATE() = DATE_TRUNC(current_date(), week)
as 'This configuration should only be executed at the beginning of each week';
-- ASSERTION BASED ON A CONDITIONAL EXECUTION OF PREVIOUS OCCURENCE
ASSERT
current_date() != (
SELECT
cast(TIMESTAMP_MILLIS(last_modified_time) as date)
FROM
`dlk_demo_pda.__TABLES__`
WHERE
table_id = 'ats_example_products'
)
as 'This configuration can only be executed once a day';
-- ASSERTION BASED ON A CONDITIONAL STATE OF OTHER TABLES
Here we test the update date as we want to be agnostic of the content.
ASSERT
((
select count(*) from (
SELECT
'table1',
if (cast(TIMESTAMP_MILLIS(last_modified_time) as date) = current_date(), true, false) as uptodate
FROM
`dlk_demo_pda.__TABLES__`
WHERE
table_id = 'ats_example_products'
UNION ALL
SELECT
'table2',
if (cast(TIMESTAMP_MILLIS(last_modified_time) as date) = current_date(), true, false) as uptodate
FROM
`dlk_demo_pda.__TABLES__`
WHERE
table_id = 'ats_example_products'
UNION ALL
SELECT
'table3',
if (cast(TIMESTAMP_MILLIS(last_modified_time) as date) = current_date(), true, false) as uptodate
FROM
`dlk_demo_pda.__TABLES__`
WHERE
table_id = 'ats_example_products'
) where uptodate = false)
=
0)
AS 'This configuration can only be executed once every table are up to date';
There are many many more use cases that can be written with "break" tasks:
- Skip week ends (by using the day of the week)
- Triggering an execution based on the number of elements to process (row count of a table or number of table suffixes)
- Composite conditions
- A/B Task type
The sky's (sql's :D ) the limit. Feel free to share your use case in the comment section below.