Sessions
For basic health tracking Sentry accepts envelopes containing session update events. These session update events can be used to inform Sentry about release and project associated project health.
Note: When working with Sessions locally, make sure that you update your configuration file ~/.sentry/sentry.config.py
with a necessary environment variable:
SENTRY_EVENTSTREAM = 'sentry.eventstream.kafka.KafkaEventStream'
Basic Operation
- Sessions are entirely client driven. The client determines when a session
starts, ends or transitions into an unhealthy state.
- The client can explicitly end a session to record time or exit condition (crash etc.)
- Clients should explicitly end sessions on restart if needed but it is acceptable for a session not to end.
- Sessions are updated through session change events that hold the entire session state.
- Sessions are updated from events sent in. The most recent event holds the entire session state. The initial session event that is sent to the server is marked explicitly.
- Session updates must not change the attributes or data corrupts when materialized. See the section below on Attribute Immutability.
- Sessions can only be updated for a period of 5 days. If a session did not receive a second event in 5 days it's permanently good.
- A session does not have to be started in order to crash. Just reporting a crash is sufficient.
Server Model
At present Sentry's session system is optimized towards ease of scalability and cost of operation. This means that the protocol is heavily geared towards achieving this goal. Some of these optimizations show in the protocol and it's important for the client to follow the protocol accurately to avoid creating bad data on the server.
The server has hourly buckets of pre-materialized session data. As a session update event comes in the server will immediately materialize the data into the correct bucket. This means that the protocol is restricted to being "additive". This also means that the client needs to store the entire state of the session on its side.
Session Update Payload
A session update is an item in an envelope called session
. It consists of a JSON
payload that looks roughly like this:
{
"sid": "7c7b6585-f901-4351-bf8d-02711b721929",
"did": "optional distinct user id",
"init": true,
"started": "2020-02-07T14:16:00Z",
"duration": 60,
"status": "exited",
"attrs": {
"release": "my-project-name@1.0.0",
"environment": "environment name",
"ip_address": "optional user ip address for filtering",
"user_agent": "optional user agent for filtering"
}
}
Note that this must be enclosed in an envelope. So the full event looks something like this:
{}
{"type":"session"}
{"sid":"..."}
The following fields exist:
sid
- String, optional. Session ID (unique and client generated).
Clients are allowed to skip it if the initial session state is
exited
.
did
- String, optional. The distinct ID. Should be a device or user ID. The system automatically hashes this ID before storing it.
seq
- Number, optional. A logical clock. Defaults to the current UNIX timestamp
in milliseconds during ingestion. The value
0
is reserved in the sense that a session withinit
set totrue
will automatically haveseq
forced to0
.
timestamp
- String, optional. The timestamp of when the session change event came in.
Must be an ISO DateTime string. If not sent, the server will assume the current
UTC timestamp. In the data model, this is called
received
.
started
- String, required. Timestamp when the session started. Must be an ISO DateTime string.
init
- Boolean, optional, default is
false
. If this is set totrue
it means that this was the first event of the session. This lets the server optimize the session counts because no deduplication is needed (client is authoritative anyways). Internally when this flag is setseq
is changed to0
on processing.
duration
- Number, optional. An optional field that can transmit the session duration when the event was received. This can be client controlled so, for instance, inactive time can be subtracted (seconds as float).
status
- String, optional, default is
ok
. The current status of the session. A session can only be in two states effectively:ok
which means the session is alive or one of the terminal states. When a session is moved away fromok
it must not be updated anymore.
ok
: The session is currently in progress but healthy. This can be the terminal state of a session.exited
: The session terminated normally.crashed
: The session terminated in a crash.abnormal
: The session encountered a non crash related abnormal exit.
errors
- Number, optional, default is
0
. A running counter of errors encountered while this session was ongoing. It's important that this counter is also incremented when a session goes tocrashed
. (eg: the crash itself is always an error as well). Ingest should forceerrors
to 1 if not set or 0.
attrs
- Object, required all keys but
release
optional. An object with the following attributes:
release
: The sentry release ID (release
), suggested formatmy-project-name@1.0.0
.environment
: The sentry environment (environment
).ip_address
: The primary IP address to be considered. This is normally the IP of the user. This data is not persisted but used for filtering. If not set the IP is filled in automatically.user_agent
: The user agent to be considered. This is normally the user agent of the user that caused the session. This data is not persisted but used for filtering.
Session Aggregates Payload
Especially for request-mode sessions (see below), it is common to have thousands of requests, and thus sessions, per second.
Under the assumption that these sessions will be short-lived, and tracking their duration is not desired, these sessions can be aggregated together on the SDK side, before they are sent to Sentry.
An SDK should aggregate closed sessions and group them by their started
time,
distinct_id
and their attrs
. These groups will be sent as sessions
envelope item.
It consists of a JSON payload that looks roughly like this:
{
"aggregates": [
{
"started": "2020-02-07T14:16:00Z",
"exited": 123
},
{
"started": "2020-02-07T14:16:00Z",
"did": "optional distinct user id",
"exited": 12,
"errored": 3
}
],
"attrs": {
"release": "my-project-name@1.0.0",
"environment": "development"
}
}
Note that this must be enclosed in an envelope. So the full envelope looks something like this:
{}
{"type": "sessions"}
{"aggregates": [...], "attrs": {...}}
aggregates
- Array, required. An array of aggregates grouped by their
started
timestamp and distinct id (did).
started
: Required. Timestamp of the group, rounded down to the minute. Must be an ISO DateTime string.did
: Optional. The distinct user id of the group.exited
: Optional. The number of sessions with status"exited"
without any errors.abnormal
: Optional. The number of sessions with status"abnormal"
.crashed
: Optional. The number of sessions with status"crashed"
.errored
: Optional. The number of sessions with status"exited"
that had a non-zeroerrors
count.
attrs
- Object, required. See above.
Crashes vs Sessions
Sessions and error events are two distinct systems within Sentry. Session updates can be done without error events to be sent and likewise, errors can be sent without session updates.
This gives the client full control over how session updates should be performed. The motivating factor is that the server is free to reject error events in some situations where it would still be interesting to record session information. For instance, if a project has a rate limit applied on error events they session data can still be routed to the project bypassing this rate limit.
However, it's strongly recommended to send session updates in the same envelope as the crash event in case the session transitions to the crashed status. This will ensure that the events arrive at the same time in the system if the network is unreliable.
Important Client Behavior
These are important rules that the clients must follow:
Attribute Immutability
It's currently not allowed for a session to change any of the attributes
in subsequent updates which includes the did
, started
or other attributes.
The only attributes which are allowed to change are the session status, duration
or error count. If a user is not known in the beginning then either the session
start should be delayed or the session should be restarted once the user is
known.
Session Counting / init
It's crucial that the initial session update sent to the system has init
set
to true
. This is necessary because the server currently does not
deduplicate the total session count as an optimization. If the initial
init: true
flag is missing, the session might not be ingested correctly by Sentry.
Terminal Session States
A session can exist in two states: in progress or terminated. A terminated
session must not receive further updates. exited
, crashed
and abnormal
are all terminal states. When a session reaches this state the client must
not report any more session updates or start a new session.
SDKs are encouraged to distinguish between different ways of ending a session:
exited
: this means the session ended cleanly. It's in no way different than the session staying inok
from the point of success reports. However only sessions ending inexited
will be considered for session durations. A session is allowed to go toexited
even if errors occurred.crashed
: a session should be reported as crashed under the following cases:- an unhandled error occurred and there is a natural session end (eg: end of HTTP request)
- a complete crash of the application occurred (crash to desktop, termination)
- a user feedback dialog is surfaced to the user. After this, the SDK must start a new session as if it fully crashed.;
abnormal
: SDKs are encouraged to always transition a session toexited
orcrashed
if they can do so. For SDKs that are capable of always ending sessions, they should end a session inabnormal
if they could not detect the application shutting down correctly. Examples for abnormal sessions:- a computer was shut down / lost power
- the user force closed an application through
kill -9
or the task manager
Abnormal session ends are normally to be recorded on application restart.
Crashed, Abnormal vs Errored
A session is supposed to transition to crashed
when it encountered an unhandled
error such as a full application crash. For applications that cannot fully
crash such as a website, it's acceptable to transition to the crashed state if
the user encountered an error dialog. For server environments where we create sessions
for every incoming request, crashed
is basically like status code 500
internal server error.
So if there is an unhandled error happening during the request, the session should be crashed
.
Abnormal are sessions of which their fate is unknown. For desktop applications, for instance, it makes sense to transition a session to abnormal if it was stored
but the exit of the application was not observed but also did not crash. These are situations where the user forced the app to close via the
task manager, the machine lost power or other situations. A session can be
stored by persisting it to disk eagerly. This saved file can be detected on
application restart to close the session as abnormal
.
Errored sessions are determined by an errors
counter greater than zero.
The client is required to count events that are considered errors and send the
count along with session updates. A session that is ok
and has an error
count of greater than zero is considered an errored session. All crashed and
abnormal sessions are also at all times considered errored but subtracted from
the final errored session count.
Exited
A session can transition to exited
which is exactly the same state as ok
with one difference: sessions transitioned to exited
have their session
duration averaged. This lets Sentry show you the duration of non crashed
sessions over time.
Alerts
Trigger an alert when an issue affects a specified percent of sessions. Create a new issue alert and select the "When" condition "An issue affects more than {X} percent of sessions". See more issue alert options in the Issue Alert Configuration docs.
SDK Considerations
Generally speaking, there are two separate modes for health reporting that SDKs can use. One is very short lived sessions, the other is user attended sessions.
Short lived sessions (server-mode / request-mode)
These sessions roughly correspond to HTTP requests or RPC calls in a server setting.
- high in volume, typically one session for each request
- the number of sessions is usually higher than the number of sentry events
- sessions are attached to a single hub / concurrency unit
- timing information is typically useless because session time in the milliseconds
User attended sessions (user-mode / application-mode)
These are sessions that are more corresponding to an actual user session or application run. This is what you would see in a web browser, mobile world, command line application or similar.
- typically just a single session from application start to quit
- when applicable, sessions can end once the app is put on the background for over 30 seconds (mobile SDKs)
- there are usually fewer sessions than Sentry events
- sessions span multiple hubs / threads
- session duration typically in the minutes, timing information is useful
Both of those cases look similar from the API point of view but different recommendations apply for SDKs.
Choosing the Session Mode
While it is in theory possible to use both session modes in a single application, it is recommended that the SDK default to a single mode that is most appropriate to the main use case of the language ecosystem. This is similar to and can be used in the same way as a global Hub mode that certain SDKs support.
When the SDK is configured to use user-mode sessions or global Hub mode, a single session should be started at the start of the application and should persist through the application's runtime. Depending on the SDK internals, this single session can be shared among all application threads and thread-local Hubs.
When using server-mode sessions, no application-wide session will be started, and it is up to integrations or the user to start the session when the request is received and end it when a response is returned.
Unified API Implications
The Unified API that SDKs should
adhere to defines the concepts of Hub
, Scope
and Client
.
Conceptually speaking, the session is a concern of the Hub
, and unlike scopes,
sessions should not be nested. When any kind of event happens, there should be
only one unambiguous session that keeps track of the error count.
When considering the flow of events through the SDK, from the static
capture_event
function, through the thread local Hub
, and into the
Client::capture_event(event, scope)
method; depending on the internal
implementation details of the SDK, it might make sense to attach the session to
the Scope
, which would make it possible for the Client
to bundle an event
and a session update into a single envelope to be sent to Sentry.
Session Updates and when to send Updates upstream
For all SDKs, the current session shall be automatically updated whenever data is
captured at a similar place where apply_to_scope
is called to increase the
error
count, or update the session based on the distinct ID / user ID.
SDKs should generally aim to decrease the number of envelopes sent upstream.
server-mode SDKs that track a great number of sessions should consider using
a periodic session flusher (every 60 secs) that pre-aggregates sessions into a
single session_aggregates
envelope item.
user-mode SDKs may instead opt into sending session updates along with captured events in the same envelope. The final session update that closes the session may be batched similar to server-mode sessions.
In either case, the init
flag must be set correctly for the first
transmission of the session, and session metadata such as the distinct ID must
be immutable after the initial transmission.
Pre-aggregation of Sessions
If an SDK is configured to use server-mode sessions, it should group and
pre-aggregate session counts before sending them to Sentry. Whenever a session
is being closed (transitions to a terminal state), and it was not previously
being sent upstream (its init
flag would be true
), it is eligible for
aggregation, which is performed like so:
- The
started
timestamp of the session should be rounded down to minutes. - The session must then be aggregated into the bucket identified by that rounded
timestamp, and the sessions distinct id (
did
). - In the appropriate bucket, increase the session count corresponding to the
sessions status. Contrary to individual session updates, the
"errored"
state is used to mark sessions that have the"exited"
state, and a non-zeroerrors
count.
Exposed API
The most basic API exposed is on the hub level and lets you start and stop session recording:
API:
Hub.start_session()
Stores a session on the current scope and starts tracking it. This normally attaches a brand new session to the scope, and implicitly ends any already existing session.
Hub.end_session()
Ends the session, setting an appropriate
status
andduration
, and enqueues it for sending to Sentry.
Hub.start_auto_session_tracking()
/ Hub.stop_auto_session_tracking()
Stops and reactivates automatic session tracking.
Init Options:
auto_session_tracking
This enables / disables automatic session tracking through integrations.
SDK Implementation Guideline
WIP
We track the health of each release of projects in Sentry by sending session payloads from SDKs. The session payload provide data such as session duration and the presence or absence of errors/crashes.
SDKs track sessions in one of two modes:
- Single Session
- Session Aggregates
Single session is the general case, and is a good fit for (relatively short-lived) applications that are typically involving only a single user. Examples:
- command-line utility like
craft
; every execution of acraft
subcommand report a single session to Sentry - user interacting with a mobile app
- user loading a web site with their favorite browser
Session aggregates are used when sending individual sessions would be undesirable or unpractical. To constrain resource usage (namely memory and network), SDKs keep track of summary information about a batch of sessions that occured in the recent past, never actually having to deal with session objects representing the individual sessions that make up the aggregate. This mode is the choice for applications that run for an arbitrarily long time and handle larger throughputs for potentially multiple users, such as web servers, background job workers, etc. Note that for those types of application, a better definition of session matches the execution of a single HTTP request or task, instead of a single execution of the whole application process.
In either case, SDKs should create and report sessions by default, choosing to report them individually or as aggregates depending on the type of application.
If an SDK can detect that an application is better served by session aggregates, then it must not report an application-wide session. The application-wide session may still be created during SDK initialization but must be aborted and never sent to Sentry. As an example, in the Node.js SDK, we can detect an application is probably a web server if it uses the requestHandler
integration that is provided.
Individual Session Functionality
Configuration
- On by default for global/static API; User should be able to disable sessions if they don't want to track them.
Pre-requisites for reporting sessions and determining Release Health of projects in Sentry, such as release should be automatically detected by the SDK such as by looking up env vars.
(Maybe, needs discussion) if I pre-requisite cannot be detect (for example, no good way to determine release version), then we set some default value so that we can always report sessions by default (depending on discussion, this might not be a change in SDK code, but in Relay, basically removing hard-requirements in the session payload).
Lifetime of a Session
Sessions should be enabled by default only for the global hub/client that is initialized by Sentry.init, and disabled by default for any other manually created client. A session is started when the SDK is initialized (ideally when the default client is bound to the global hub) and ended when one of these conditions happen: The Hub.endSession() method is explicitly called; or The program terminates without errors; or The program terminates with an unhandled exception; or The program terminates with an unhandled promise rejection.
Care must be taken to never attempt to send new session payloads to Sentry for a session that is already ended. For example, if the user manually ends the session with Hub.endSession(), there should not be any new updates to the session when the program terminates.
Session Atrributes and Mutability
Sending Session to Sentry
Session is sent initially after a certain (initially hard-coded, less config is better) delay (something between 1s to 30s TBD), and then updated with the duration and final status and error count when the program terminates. Note that, as an optimization, short lived programs will not send 2 session requests to Relay, but only the final one with status and duration.
Session Aggregates Functionality
Configuration
Sessions should be enabled by default, a session is started as soon as a request is received by a web server and ends as soon as the response is fully sent back.
Lifetime of a Session
Sessions are never tracked nor sent individually, instead they are aggregated and the aggregates are sent every 30s and a final time when the web server is terminating. As an implementation hint to the point above, when a "Client" is closed or flushed the associated "Session Flusher" shall also be flushed and submit the current aggregates the the transport, before the transport is flushed/closed. Make sure this works reasonably for Serverless — there we shall not use "request mode" and SessionFlusher because we cannot have any work that happens outside of the request-response flow. Provide an easy way to integrate with existing Node frameworks (Express, Next.js, Koa).
Session update filtering
Our filtering mechanisms (sample rate, rate limiting, beforeSend, event processors, or ignored exception types) may drop events. Only events dropped due to sampling and rate limiting should update the session despite being dropped, as we assume the event was dropped to save quota, but it would have been something the developer cares about. Events dropped for other reasons should not update the session, as we assume they are more likely to be dropped because the developer chooses to ignore them.
Filter order
The python SDK shall serve as a reference here. The order for filtering error events is:
- Check for ignored exception types (a.k.a
ignore_errors
) - Apply scoped
event_processor
(a.k.aerror_processor
) - Apply global
event_processor
- Apply
before_send
- Update the session if an event made it this far
- Apply sampling rate
Sending the session update
If the event has been dropped and the session updated, the session update should be sent to the server without the event in case the session changed from healthy to errored or from any state to crashed for user attended sessions.