Hi guys,
My company use OKTA for a range of applications, and for compliance reasons we have been tasked to backup the system logs on a daily basis. We have chosen to use the new System Log beta REST API to retrieve the logs, however we seem to have run into an issue whereby there appears to be no effective way to do this with the current API.
The system log documentation states:
You can export your log events to a separate system for analysis or compliance. To obtain the entire dataset, query from the appropriate point of time in the past.
And provides the following example:
“https://{yourOktaDomain}.com/api/v1/logs?since=2017-10-01T00:00:00.000Z”
Simple enough, except that that this generates an infinite number of rel=“next” pagnation links as specified in the design principles documentation:
When you first make an API call and get a cursor-paged list of objects, the end of the list will be the point at which you do not receive another next link value with the response. This holds true for all but two cases:
Events API: The next link always exists, since the Events API is like a stream of data with a cursor.
System Log API: The next link will always exist in polling queries in the System Log API. A polling query is defined as an ASCENDING query with an empty or absent until parameter. Like in the Events API, the polling query is a stream of data.
So to make a non polling query to get around this, we would have to use since & until parameters to split up the data in to manageable chunks in order for us to not hit the 60 API calls per minute constraint. No problem, except this can return duplicates or miss events entirely according to the System Log documentation:
Do not attempt to transfer data by manually paginating using since and until as this may lead to skipped or duplicated events. Instead, always follow the next links.
Our last thought was to manually craft queries using the after keyword, to make sure we don’t run into any issues with duplicates, but again this is not recommended in the documentation:
The after parameter is system generated for use in “next” links. Users should not attempt to craft requests using this value and rely on the system generated links instead.
With all of these opposing constraints, is collecting an unbroken chain of system logs by querying the System Log REST API on a daily basis currently achievable?
Thank you for any help in this matter!