Theory of Time Series Expressions

Innovyze

Howbery Park

Wallingford

Oxfordshire

OX10 8BA

United Kingdom

+44 (01491 821400

support@innovyze.com

www.innovyze.com

 

 

 

Technical Paper

April 2017

 

Contents

Theory of time series expressions

Quick review of conventional (numeric) expressions

Time series data as a new type of value

Properties of time series data values

How null values are generated within a time series data value

Interpolation rules and the effect of null values

Conversion rules for time series data values

Network object fields available for use in time series data expressions

Units in time series data expressions

Constructing time series data values

Standard operators that can be used with time series data values

Standard functions that can be used with time series data values

Simple functions taking numeric arguments

Functions that work with lists

Functions that work with dates

Conditional functions

Other standard functions

Specialised functions for use only with time series data values in InfoAsset Manager

Notes on using NULL in expressions

Theory of time series expressions

The theory of time series expressions is outlined in the following sections. Because time series expressions build on (non-time series) SQL expressions, a basic familiarity with SQL expressions is assumed – see SQL in InfoAsset Manager for more details.

Time series expressions can be used in SQL queries in InfoAsset Manager for the purpose of analysing and extracting individual values and statistics from time series data (TSD) streams that are linked to objects in a network.

In an SQL query, time series data is loaded using the TSLOAD function and each time series has a domain that is specified explicitly using arguments to that function. The units of measurement are specified as an argument of the TSLOAD function.

In a derived stream expression, time series data is loaded using the TSINPUT function and all time series required to satisfy a user’s request (e.g. for display or to answer a question) are defined over the same domain which is determined by the request (plus an allowance for resampling).

Quick review of conventional (numeric) expressions

The SQL engine can evaluate expressions composed of operators, functions and simple numeric values (and also boolean, date and string values, but these are less relevant here). It can also assign values to variables and use these variables in expressions. The value of an expression containing multiple semicolon-separated clauses is the value of the last clause, with the previous clauses used to calculate intermediate variables. Examples of expressions using various operators and functions are:

An expression is evaluated in the context of a specific object in a network. In InfoAsset Manager, it can be any object in the network that can have links to time series data. The expression can include the names of fields of this object and these are replaced by the values stored in those fields when the expression is evaluated. It can also include fields of linked objects . Examples are:

Time series data as a new type of value

To work with time series in expressions, we treat an individual, finite sequence of time series data points as a single value of a new type, which we call "time series data". This new type is on a par with the existing types (numeric, boolean, date, string and null) and can be used in expressions in very much the same way. Typically, the value of an expression containing time series data values will itself be a time series data value.

Note that in the discussion that follows, we are careful to distinguish between a time series data value, which is a value that encapsulates the whole of a finite period of a time series, and an individual (number or null) value taken from a particular data point within that time series.

The theory of time series expressions consists of defining the properties of time series data values and the way in which the different operators and functions act on these values to generate new values. The following sections provide these definitions.

Properties of time series data values

Time series data values used in an expression have the following properties:

How null values are generated within a time series data value

Null-valued data points can appear in a time series for any of the following reasons:

In general, operators and functions do not generate multiple sequential null-valued data points. Once a null-valued data point has been generated, subsequent null-valued data points are skipped until the next data point that has a non-null value is reached, or the end of the time domain is reached.

Interpolation rules and the effect of null values

As noted above, a time series defines a piece-wise linear or step function within the time domain, according to whether the interpolation property is "linear" or "extend" (note that "step" can be used as a synonym for "extend"). In the absence of nulls, this means that numeric values are well-defined at all times within the domain and that time-integrals and similar operations over arbitrary time periods can be computed.

If a data point has a null value, then values at intermediate times between this data point and the next non-null data point will be treated as null. Additionally, linear interpolation between a non-null and null value is not possible, so for a linear time series, intermediate values after the preceding data point are also treated as null. Thus null-valued data points create periods of null value within the time series.

Functions and operators that take values from specific times within such periods have their own rules about how to handle nulls, as discussed in the sections describing operators and functions. Typically, but not always, the resulting value is also null.

Functions that calculate statistics (e.g. mean, min, max) over specific intervals will usually calculate the statistic over the non-null periods in the interval, if any, returning null only if the entire interval is null. Statistics about null data are also available (e.g. countnonnull, coverage). Some statistics (e.g. count, percentile) work only on individual data points (i.e. not weighted by time). See TSRESAMPLE for more details.

Conversion rules for time series data values

A time series data value will never be automatically converted to any other type of value.

Other types of value will be automatically converted to a time series value as needed for use in an operator or function (see the following sections). The conversion rules are:

Network object fields available for use in time series data expressions

In a time series data expression that is part of an SQL query, any object field that is normally usable for SQL can be used, provided it is of an appropriate type (usually numeric).

Units in time series data expressions

Values loaded from network object fields used in SQL queries are evaluated using user units.

The approach taken for time series data values and expressions is to specify the required units explicitly where needed. Thus TSLOAD has an argument to specify the required units.

It is the responsibility of the user to ensure that all numeric values used in the expression are correct for the chosen units.

Constructing time series data values

Unlike a simple numeric value, which can be entered in an expression as a literal such as 3.14159 or obtained from a field of an object in the network, a time series data value can only be created as the result of an expression that involves time series data values, or as the result of a function that constructs a time series data value using parameters that are simple values. The most important such functions are the TSLOAD and TSPOINTS constructor functions. See Specialised functions for use only with time series data values for more information.

Standard operators that can be used with time series data values

Any of the standard set of arithmetic, logical or comparison operators, unary or binary, may be applied to a time series data value or pair of values. The operators are:

A binary operator may be applied to a time series data value and a non-time series data value. The latter is first converted to a time series data value as described under conversion rules above.

The output value of an operator acting on time series data value(s) is computed as follows:

Standard functions that can be used with time series data values

A wide variety of the standard SQL functions can be used with time series data values, as described below.

If all the input time series data values for a function use extend/step interpolation, the result also uses extend/step. Otherwise it uses linear.

Simple functions taking numeric arguments

All of the standard SQL functions that take one or more numeric arguments and that yield a numeric result can also be used with time series data arguments and will yield a time series data value as a result. The rules described above for operators apply for converting arguments to time series data values and for computing the times and values of individual data points. These functions are:

Functions that work with lists

The list-related functions LOOKUP, INDEX, RINDEX and MEMBER can take a time series data value as their first argument and a list as their last argument, returning a time series data value. Normally the list will be a number list. If used with date or string lists, then for each individual data point value, LOOKUP will convert the date/string to a number, MEMBER and RINDEX will always return 0 (not present) and INDEX will always return NULL.

Functions that work with dates

Functions that work with dates and have a numeric or Boolean result can also be used with time series data values. These functions are:

  • YEARPART
  • MONTHPART
  • DAYPART
  • DAYOFWEEK
  • DAYOFYEAR
  • DAYSINYEAR
  • DATEPART
  • TIMEPART
  • INYEAR
  • INMONTH
  • INYEARS
  • INMONTHS

The values in the time series are treated as being the numeric representation of dates (e.g. as set by the TSDATETIME function) and the various functions operate on these dates as documented. Note that TIMEPART is always truncated to a whole number of minutes.

Conditional functions

The standard conditional SQL functions IIF and NVL also work for time series data values. Unlike the other functions, they do not generate an output data point for every possible input time. Instead:

Other standard functions

If standard SQL functions other than those listed above are used with time series data arguments, the result is a null value.

Specialised functions for use only with time series data values in InfoAsset Manager

The following functions can be used only to create or manipulate time series data values. They cannot be used in "normal" SQL expressions and queries.

TSLOAD

TSLOAD (TSD_link_description, units, start_time, end_time)

Creates a time series from a linked TSD stream, identified by its description.

This function should be used only in an SQL query.

The TSD link description must be specified as an explicit string, or as a field or expression that evaluates to a string. The time series data links of the current object are then checked to find the first link that has a description that matches the string.

If no such link is found, or the link does not specify a TSD stream name, or the named stream is not found in the appropriate TSD object, the function returns null.

The appropriate TSD object is searched for within the database using the TSD path specified in the link or (if that is blank) the default TSD path set for the network or (if that is blank) the default TSD path set for the database. If the resulting path is blank or the TSD object cannot be found using the path, an error is raised.

The units must be specified as a string such as "mm" or "ft" or "" and the specified units must be compatible with the units (quantity) of the TSD stream. If not, an error is raised.

The start_time and end_time must be specified as dates or as numeric values that are equivalent to dates (i.e. a number of days after 00:00 30-Dec-1899). The latter allows for calculated dates such as NOW() - 7. These dates are interpreted as local clock time (i.e. including daylight saving) in the time zone of the machine. If these values cannot be interpreted as dates, or the end_time is at or before the start_time, the function returns null.

The returned value is obtained from the TSD stream for the specified time domain, noting that:

TSPOINTS

TSPOINTS(interval_unit, interval_multiplier, date_origin, time_of_day_origin, daylight_saving)

Constructs a time series containing data points that have a specified, repeating time interval. All data points in this series are given the same value (1.0). Intervals are not necessarily of equal duration (e.g. days adjusted for daylight saving; months; years). The created time series uses step/extend interpolation.

The main uses of such a series are to define the times for resampling of other series (using TSRESAMPLE) or to generate profiles that repeat for each interval (using TSAPPLYPROFILE).

When used in an SQL query, it will use a time domain that encompasses the time domains of all data previously loaded by calls to TSLOAD. It is therefore important that TSPOINTS is called after all relevant calls to TSLOAD.

The interval unit must be one of the following:

The chosen interval unit is multiplied by the interval multiplier to give the interval between data points. The multiplier may be fractional if the interval unit is a day or smaller and must be greater than or equal to 0.001.

If daylight saving is true, adjustments for daylight saving are made when crossing a transition between standard time and daylight saving time in the time zone of the domain. Daylight saving adjustment is only carried out if the interval unit is an hour or greater and the interval multiplier is a whole number. This can be used to create daily time stamps that are at the same local clock time every day (e.g. 09:00). Because InfoAsset Manager runs are always in local standard time, the visible time stamp is then one hour earlier during daylight saving (e.g. 08:00).

The procedure for aligning the time stamps of the data points is as follows:

The parameters of this function are somewhat complicated due to the wide range of possible time scales and possible variations in the length of intervals. The following examples of different intervals should help to make them clearer:

Note that it is not possible to create diary-style or specialised intervals (e.g "3rd Wednesday of every month"; "accounting month"; "financial quarter").

TSRESAMPLE

TSRESAMPLE(value_series, time_stamp_series, statistic, window_option, left_secs, right_secs, statistic_factor, output_interpolation)

Resamples the values in value_series using the time stamps of the non-null, non-zero data points in time_stamp_series, calculating the value for a window around each time stamp using the specified statistic and window_option.

Null or zero-valued data points in the time_stamp_series are used to generate output data points at the same time stamps with these points having null values. This is useful for calculating statistics of event periods where each event is represented by a non-null, non-zero data point at the start of the event and a null or zero data point at the end of the event.

The resulting time series uses the specified output_interpolation.

The window options are:

The statistics that may be calculated for values in the window are:

  • "MEAN" - time-mean value
  • "TMEAN" - triangular-weighted time-mean (used mainly for smoothing)
  • "INTEGRAL" - time-integral (with time measured in seconds)
  • "SDEV" - standard deviation
  • "FIRST" - first non-null value
  • "LAST" - last non-null value
  • "DUR" - duration(length) of the time window in seconds
  • "DVDT" - time derivative over the window
  • "MAX" - maximum value
  • "MIN" - minimum value
  • "MAXTIME" - time (in seconds after start of window) at which maximum occurs
  • "MINTIME" - time (in seconds after start of window) at which minimum occurs
  • "COUNT" - the count of data points (including those with null values)
  • "COUNTNONNULL" - the count of data points (excluding those with null values)
  • "TOTAL" - the sum of the values from each data point
  • "COVERAGE" - the percentage of time during which the value is non-null
  • "PERCENTILE" - value of the specified percentile of individual values in the window
  • "BWDPERCENTILE" - percentile calculated as prescribed by EU Bathing Water Directive

Note that:

For most of the above statistics, the statistic parameter is simply a multiplier that is applied after calculating the statistic. This is useful for applying a scaling factor and/or converting time from seconds to some other unit (for INTEGRAL, DUR, DVDT, MINTIME or MAXTIME). A value of 1.0 should be specified if no scaling is required. For PERCENTILE and BWDPERCENTILE, the statistic parameter is the percentile to be calculated (e.g. 95, 99).

The output interpolation must be one of

Resampling buffer and chained (continuous) simulation

There is no resampling buffer in InfoAsset Manager as the user explicitly specifies the time periods to load (using TSLOAD) and the time periods over which to calculate results (using TSAGGREGATE). The user must also ensure that the former loads sufficient data for the latter.

TSAGGREGATE

TSAGGREGATE(value_series, start_time, end_time, statistic, statistic_factor)

Calculates the specified statistic for values in value_series using the specified time window.

This function returns either null or a single numeric value – not a time series data value, and is intended for use in an SQL query.

The available statistics and the meaning of the statistic factor are exactly as described for the TSRESAMPLE function, with the following exception:

TSAPPLYPROFILE

TSAPPLYPROFILE(input_series, normalize, divide_by_duration, multiplier, output_interpolation, number_list)

This function takes a profile, defined as a number list of N values, and applies it to every data point in the input series that has a non-zero, non-null value, as follows:

For each input data point that has a null or zero value, an output data point with a null value is generated. If the output uses step/extend interpolation, the time stamp for this output point will be the same as the input. If the output uses linear interpolation, there will already be an output point at this time stamp (to allow interpolation up to the end of the interval) and so the null-valued point is output with a time stamp half way between this and the next input time stamp – this prevents interpolation between periods that represent distinct events.

The resulting time series uses the specified output interpolation, which must be one of:

TSBEFORE

TSBEFORE(date_time)

Constructs a series that is true (1.0) before the specified date-time and false (0.0) thereafter, with interpolation="extend". The resulting time series data value will contain either 3 data points (if the specified date-time is inside the time domain) or 2 data points (if the specified date-time is outside or on a boundary of the time-domain).

TSDATETIME

TSDATETIME(data_series, daylight_saving, output_interpolation)

Constructs a series with time stamps that match the points in the input data series and values that are the numeric equivalents of these time stamps. If daylight saving is non-zero, daylight saving adjustment is applied when converting the time stamps to numeric values.

The numeric equivalent of a time stamp is the number of days (including fractional days) since 00:00 30-Dec-1899. This is not usually of direct interest but can be used for:

Daylight saving adjustment should not be applied if the values are to be used for calculating time differences. It is intended for use in converting these values to time of day or day of week in local clock time.

The resulting time series uses the specified output interpolation, which must be one of

TSLAG

TSLAG (input_series, lag_seconds)

Constructs a series containing the same data points as the input series, but with time stamps shifted forward (lagged) by the specified number of seconds. A negative lag may be specified to shift the time stamps backwards. When used in a TVD connector expression, the resampling buffer for the TVD connector (in minutes) should be set to be at least as large as the lag time (in seconds) so that data from before the start of the simulation can be shifted into the simulation period (otherwise, the first lag_seconds of the simulation period will contain null data).

TIDE

TIDE(constituent_name, amplitude, phase_lag, timestep)

Constructs a time series for a specified tidal constituent, amplitude and phase lag, using linear interpolation.

The first output data point is at the start of the time domain and the last is at or immediately after the end of the time domain.

The choice of whether to reference phase lag to local standard time or GMT is a matter of convention. When using the TIDE function for time zones other than GMT, it is important to be clear that TIDE uses the convention that phase lags are referenced to local standard time. If a phase lag (G) referenced to GMT is provided, it can be converted to a phase lag (g) referenced to local standard time in time zone (UTC + N hours) as follows:

Where a is the angular speed of the constituent in degrees/hour.

Notes on using NULL in expressions

The behaviour of NULL is slightly complex:

  1. Essentially NULL is an 'I have no value' value.
  2. The general rule for SQL is that any operation involving anything and NULL gives the answer NULL e.g. 3 + NULL is NULL. Important exceptions to this rule are the operators OR, IS NULL and IS NOT NULL and the function NVL (OR treats NULL as false, while NVL is used to detect and replace NULL values).