System Log API - Polling requests

We’re developing a software for getting the event logs from Okta and send them to Devo platform (devo.com).
Our software retrieve the messages every N seconds (this is configurable) and finalize but before finalize we save the timestamp from the last message (and for being sure at 100% that we don’t duplicate the messages in our system we also save all message UUIDs with that same timestamp)

When we launch again our software we use a request with this format:
{{url}}/api/v1/logs?since={{timestamp_with_iso_8601_format}}

We have realized that we are receiving messages that were "published"before the value used in “since” parameter.

This is our scenario:
1 We saved the timestamp: 2020-05-04T09%3A46%3A46Z
2 We do a request like this: {{url}}/api/v1/logs?since=2020-05-04T09%3A46%3A46Z
3 We are receiving messages that have the “published” field before that value, such as “published”: “2020-05-04T09:46:45.108Z”

Yes, I had the same problem. Okta even open an internal case to get it fixed, but it’s been long time ago… I guess your post acts as a confirmation that some things never change :slight_smile:

I suggest you open a ticket with Okta Support

I suggest you open a ticket with Okta Support

I’ve followed your suggestion

Hi @felipe.conde (and @phi1ipp)

There is no guarantee that events will propagate through the Okta system log pipeline in the order they occurred. With this in mind it is very important to follow the guidance for polling requests from the System Log API docs

It also sounds like you are a 3rd party Security Analytics Integrations, as such I would highly encourage you to follow this guidance for Security Analytics Integrations. This doc even includes pseudo code for performing monotonic poling of events from the System Log.

Hope this helps.

-Matt

Hi @matt.egan,

Thank you for finding time to answer the question. But I think the problem here is, that API produces an output, which is not meeting query parameters. From what @felipe.conde observed, it’s a fraction of a second. But my experience would show that Okta may produce results few seconds into the past from the given parameter timestamp.

I’m not sure, if you are Okta employee and have access to support tickets, but if you do, you can have a look at the one #00650356 and its resolution.

@phi1ipp - I was looking at the ask behind the ask.

It is clear to me that Devo is building a log collector and is aiming for monotonic polling. Asking for high precision time filters to reduce duplication. In this use case - regardless of precision - attempting to use a moving time window will absolutely result in missed events.

I can dig around to see if I can find an explanation to the lack of fidelity when using the since param.

For context (and education) what is the use case this lack of precision is blocking?

Thanks,
-Matt

@matt.egan, I was trying to build a system which tracks user create/modification to replicate the transaction into the local storage. Customer didn’t want to go with OPP/SCIM approach due to reasons related to costs and for hooks for some other reasons. Can’t recall all the details, getting old :slight_smile:

Thank you for looking into that. Also appreciate you sharing the approach with polling, I think I might use it as an alternative next time I need to re-visit this area.

Cheers!

Thanks for pointing it out - as i looked at the details i recalled that i’d noticed this before.

I had a brief convo with the team responsible - they agreed to take a closer look at the behavior so they can address it or just clarify in the docs. I’ll try and relay the response here.

That said - for the use case you mention (and almost any other i can think of) the best thing to do is to retain the next or the self link from the link headers returned from Okta and use those as the starting point of your subsequent interval polling.

Use the next link if you have the ability to evaluate the number of results you’ve just received versus the limit you’ve specified. If the number of results you receive is less than you asked for you can be rather certain the next link will be an empty response.

Use the self link if you just want to wait until you get an empty page.

-Matt

Thank you for the insights, @matt.egan It’s really good to know!

Quick update here:

This is caused by the difference in time reference between “polling” (open ended) and “bounded” API requests.

for an open ended (polling request) the API uses “insertion timestamp” to filter records to avoid missing events while ingesting. Events are sorted by “insertion timestamp”. Events may come out of order with respect to “publication time”

A Bounded Requests uses “publication timestamp” to filter records. Events are sorted with respect to “publication time”.