Notes on time handling

UTC vs. local time

Initializing Event objects

All of the Event variants can be initialized with UTC milliseconds since the epoch, or an aware python datetime object. If a naive datetime object is passed in, an exception will be raised. When passing in a datetime object, it is encouraged that they be in UTC as well.

Be aware that if an aware non-UTC/local time datetime object is passed in, a warning will be issued, it will be converted to a UTC datetime object and that’s what will be used internally. The conversion will be done like this using the python datetime library and the third party pytz library thusly:

    dtime.astimezone(pytz.UTC)

And the resulting datetime object will be converted to milliseconds (see section on precision).

This is for consistency, for parity with the JavaScript Date library that uses epoch ms at its core, and because the Pond/PyPond wire format relies on epoch ms. One really can’t go wrong with initially reporting all of their events using milliseconds since the epoch. Please consider doing that.

Rendering in local time

Of course there are cases where is desirable to represent time series data in the user’s local time zone. Like in a graphing application. Even though PyPond does only business in UTC internally, this is possible. This can be changed on how you window and aggregate the data.

See the section on Aggregation in the main Pond Pipeline documentation. Note how you specify the .windowBy() (.window_by() in PyPond) value in the pipeline chain. This can be a fixed value like 1d where it will aggregate the data into daily buckets. Fixed windows like that can only be rendered in UTC. Or it can be a non-fixed value like daily which will also aggregate the data into daily buckets, but the user can choose how to render the data in that case.

The default will be to render in UTC - any such choice will always default to UTC, the user will always need to set utc=False where appropriate. But when using a non-fixed window, the optional utc boolean can be set:

    kcol = (
        Pipeline()
        .from_source(timeseries)
        .window_by('daily', utc=False)
        .emit_on('eachEvent')
        .aggregate({'in': Functions.avg(), 'out': Functions.avg()})
        .to_keyed_collections()
    )

Then the aggregation key/buckets be daily averages in the local time zone.

There is also a trio of helper functions in the TimeSeries class that presents a higher level access to this functionality:

    TimeSeries.daily_rollup()
    TimeSeries.monthly_rollup()
    TimeSeries.yearly_rollup()

They all take a dict of a column name and an aggregation function as in the above example:

    TimeSeries.monthly_rollup({'in': Functions.avg(), 'out': Functions.avg()})

And the data will automatically be rendered in local time.

Conversion to local time

When the conversion covered in the previous section happens, the user has no control over what time zone it will be rendered to. All conversions will automatically happen using the local time zone as determined by the tzlocal library:

    LOCAL_TZ = tzlocal.get_localzone()

This is primarily for parity with the JavaScript library which will be running browser-side and will be localizing as apropos. Moreover, the scope of this library is not to be a time handling swiss army knife.

Local time and the IndexedEvent class

The only Event class that explicitly takes a utc=False flag is the IndexedEvent class. It behaves somewhat differently than the Event and TimeRangeEvent classes which do not. Rather than being initialized with an epoch ms timestamp or a datetime object they are initialized with strings of the following formats:

        The index string arg will may be of two forms:

        - 2015-07-14  (day)
        - 2015-07     (month)
        - 2015        (year)

        or:

        - 1d-278      (range, in n x days, hours, minutes or seconds)

        and return a TimeRange for that time. The TimeRange may be considered to be
        local time or UTC time, depending on the utc flag passed in.

A UTC conversion will still happen under the hood, just a little differently.

If an Index (which is the underlying time-handling structure to IndexedEvent) is initialized thusly:

    utc = Index('2015-07-14')

That is a daily index and is internally creating a range spanning that entire day. So looking at the internal timestamps yields this:

    print(utc.begin(), utc.end())
    2015-07-14 00:00:00+00:00 2015-07-14 23:59:59+00:00

But doing the same thing with utc=False (if you are in Pacific Time) yields this:

    local = Index('2015-07-14', utc=False)
    print(local.begin(), local.end())
    2015-07-14 07:00:00+00:00 2015-07-15 06:59:59+00:00

The time range is not internally held as spanning that day in the local time zone, it is converted and reflected in UTC.

Yet another example of why it is preferred to input and store the data in UTC and view it in a localized way.

Precision

Internal timestamps are precise down to the millisecond even though the python datetime object is precise down to the microsecond. This is primarily for parity with the JavaScript library - the JS Date object is only accurate down to the millisecond. Unit testing showed that allowing microsecond accuracy exposed discrepancies between times that should have been “the same.”

It is perfectly fine to pass in python datetime objects that have microsecond accuracy, just be aware that it will be rounded to milliseconds automatically.