Add remove method to synchronous instruments #4702

atoulme · 2025-10-25T07:14:41Z

Changes

Add the ability to remove a synchronous instrument identified by its attributes. The instrument will no longer report.

… histograms

specification/metrics/api.md

dashpole · 2025-10-27T14:06:05Z

One thing that has been asked for is to be able to delete multiple series at once, since callers often don't have access to the complete list of attribute sets they've previously incremented. E.g. remove(http.target=foo) would remove all streams for that http.target. One way to solve this is by matching all attribute sets which don't have the keys provided. Remove without any arguments would match and remove all streams from the instrument. Unfortunately, it wouldn't be backwards-compatible to change this later, so we need to decide if this is important from the start.

The other big question (which we need to resolve in the SDK spec PR), is how this impacts start time handling for cumulative metrics in the SDK. If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184).

jack-berg · 2025-10-27T20:41:51Z

A more flexible version of @dashpole's suggestion might look like: remove(Predicate<Attribute> predicate), where the predicate is invoked for each series, with usage in java like:

instrument.remove(attributes -> true); // Remove all series
instrument.remove(attributes -> attribute.get("http.route").equals("/v1/foo/bar")); // Remove all series where http.route=/v1/foo/bar
instrument.remove(attributes -> attributes.get("http.route").startsWith("/v1/foo")); // Remove all series matching pattern http.route=/v1/foo.*

If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184).

Yeah for example, in java, we always use the same SDK start time for all cumulative series. Whether they see their first data at start time or days later, the start is always the same. I think your suggestion to track per-attribute-set start times in the SDK seems reasonable, but that seems to imply a behavior change from the single constant start time that we currently do java. Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded?

jack-berg · 2025-10-27T20:46:14Z

Also, if I call something like instrument.remove(attributes -> true) to delete all series, its not clear whether I intend to record additional data in the future. If I don't intend to record additional data, then I would probably view the fact that the SDK continues to have memory allocated to the instrument as a memory leak. But the SDK can't free up all the resources for that instrument without knowing for sure that I won't record again.

Makes me wonder if we need a top level instrument level close / remove method, as well as the fine grained method for removing specific series.

dashpole · 2025-10-27T20:57:53Z

Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded?

I don't think it will impact Prometheus in any negative way (@ArthurSens might know more, since he opened the original issue).

Depending on how strict you want to be, it may make it harder to aggregate timeseries with different start timestamps. If you user the earliest start timestamp, you may be missing data, and not produce an accurate cumulative for the entire time range.

carlosalberto · 2025-10-27T22:25:05Z

cc @jmacd

atoulme · 2025-10-28T06:33:05Z

Makes me wonder if we need a top level instrument level close / remove method, as well as the fine grained method for removing specific series.

This can be added later and separately from this effort, from what I can tell.

atoulme · 2025-10-28T06:40:09Z

If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184).

My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216

When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

dashpole · 2025-10-28T20:06:07Z

we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

How do you reset the start time? All SDKs i'm aware of in OTel today set the cumulative start time at instrument creation time, and never reset it.

ArthurSens · 2025-10-28T20:17:34Z

Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded?

Actually, we would be happier to see this :) I've opened #4184 a long time ago, but never found the time to continue working on it. A start time per time series would help us provide more accurate increase rates.

atoulme · 2025-10-29T22:27:58Z

Since this is now sponsored, I am marking this PR ready for review. The discussion continues.

atoulme · 2025-10-29T22:28:36Z

we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

How do you reset the start time? All SDKs i'm aware of in OTel today set the cumulative start time at instrument creation time, and never reset it.

Let's put this in as a requirement and try it out in the POCs, see how we fare.

github-actions · 2025-11-06T03:33:23Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

specification/metrics/api.md

pellared · 2025-11-06T06:39:38Z

specification/metrics/api.md


+##### Remove
+
+Status: Development


Suggested change

Status: Development

**Status**: [Development](../document-status.md)

pellared · 2025-11-06T06:39:46Z

specification/metrics/api.md


+##### Remove
+
+Status: Development


Suggested change

Status: Development

**Status**: [Development](../document-status.md)

pellared · 2025-11-06T06:39:55Z

specification/metrics/api.md


+##### Remove
+
+Status: Development


Suggested change

Status: Development

**Status**: [Development](../document-status.md)

pellared · 2025-11-06T06:50:19Z

specification/metrics/api.md

+
+Status: Development
+
+Unregister the Counter. It will no longer be reported.


(Assuming that I do not miss something) This is not true. This does not registers the whole counter, but a data point (if I correctly remember/understand the terminology). Maybe it would be better to rename the operation to RemoveDataPoint so that we can add Remove in future that would unregister the whole instrument (and not only a given data point)?

The same comment applies to other instruments.

EDIT: I see similar comments like #4702 (comment) 😉

I agree we aren't unregistering the entire counter. I don't like RemoveDataPoint, as DataPoint is not really an API concept. I would suggest phrasing this as "Unregister the attribute set".

Wondering what are the downsides of this achieving the effect of removing/unregistering the entire Counter? If any attribute sets are still relevant, they get added back next time something is reported for them.

I think it depends a bit on the implementation. If they all get new start times when they are added back, that would produce a lot of unnecessary resets for other attribute sets. If we send a "missing data point flag" to signal the end of a series, that would be even more disruptive for consumers. Even if we don't do either of those, it will probably cause a lot of churn for the SDK when it deletes series and then immediately re-creates most of them.

jmacd · 2025-11-12T17:34:18Z

@atoulme

My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216

When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

This issue has been raised and resisted a number of times in OpenTelemetry. I understand there is still an unanswered need, but I do not like the verb remove as the API method name, it's not clearly "removing" anything; it's not de-registering the instrument. We are not trying to erase the memory of the instrument, we're trying to get series out of memory. We need to report final measurements, "seal" the timeseries in some manner, and then forget about the data. If I could choose the verb for this action, it's "finish", it means "flush and forget". I think the idea of passing a predicate to select series for finishing makes sense.

Consumers should receive the correct finalized value of these series. As @ArthurSens points out in #4184, what we need is a specification for how the data should be transmitted so that the ending of a series is clear. In Prometheus, we have the NaN value, and in OTel we have the missing data point flag, but we've never specified how to set that flag. I would like to see a specification that dictates SDKs have to remember the "finishing" series long enough to send the NaN/missing-data-flag to each reader at least once, otherwise that reader would lose information. Then, to answer #4184, we need to specify that new series must be created with a start time >= the Nan/missing-data-flag previously issued for the same series.

github-actions · 2025-11-20T03:31:17Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2025-11-27T03:32:31Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

atoulme · 2025-12-04T22:27:56Z

Can I please ask for a maintainer to reopen the pull request? Thanks.

atoulme · 2025-12-04T22:36:13Z

@atoulme

My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216

When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

This issue has been raised and resisted a number of times in OpenTelemetry. I understand there is still an unanswered need, but I do not like the verb remove as the API method name, it's not clearly "removing" anything; it's not de-registering the instrument. We are not trying to erase the memory of the instrument, we're trying to get series out of memory. We need to report final measurements, "seal" the timeseries in some manner, and then forget about the data. If I could choose the verb for this action, it's "finish", it means "flush and forget". I think the idea of passing a predicate to select series for finishing makes sense.

Sure, finish works.

Consumers should receive the correct finalized value of these series. As @ArthurSens points out in #4184, what we need is a specification for how the data should be transmitted so that the ending of a series is clear. In Prometheus, we have the NaN value, and in OTel we have the missing data point flag, but we've never specified how to set that flag. I would like to see a specification that dictates SDKs have to remember the "finishing" series long enough to send the NaN/missing-data-flag to each reader at least once, otherwise that reader would lose information. Then, to answer #4184, we need to specify that new series must be created with a start time >= the Nan/missing-data-flag previously issued for the same series.

Are you offering to work on that specification? Is it a requirement for this or a follow-up?

dprotaso · 2025-12-04T23:00:34Z

We need to report final measurements, "seal" the timeseries in some manner, and then forget about the data.

In my use case we have the scenario where we report metrics while some external object is 'alive'. When an object is dead we want to purge attributes since it would just leak memory. But that said object could come back if say the user creates it so we would start reporting metrics for it again.

For example Knative has an Activator component and we report metrics using a Revision name/namespace attributes. Revision names come and go.

@jmacd - So I'm not sure if 'seal' is right because that to me makes the time series immutable - was that intentional?

I like what @dashpole suggests here #4702 (comment) we're actually 'unregistering' attributes

dprotaso · 2025-12-04T23:18:35Z

Also I think not stressing about the name is important. We should focus and move this along. Users are reporting memory in Knative leaks because of the lack of this feature :/

cijothomas · 2025-12-05T03:02:01Z

CHANGELOG.md

-  ([#4746](https://github.com/open-telemetry/opentelemetry-specification/pull/4746))
- Allow instrument `Enabled` implementation to have additional optimizations and features.
-  ([#4747](https://github.com/open-telemetry/opentelemetry-specification/pull/4747))
+- Development: Define `remove` operations for synchronous metric instruments. [#4702](https://github.com/open-telemetry/opentelemetry-specification/pulls/4702)


we need to have corresponding SDK spec too, to document what SDK is supposed to do for this method.

Also, is this relevant for delta given delta anyway "forgets" things not reported in an interval, so no need to explicitly ask it to forget anything?

I can think of a few (minor) benefits:

We could set the end time of the delta interval to be the time at which the attribute set was removed. This could make rates more accurate.

"in OTel we have the missing data point flag, but we've never specified how to set that flag". If we send this flag after removal of the series, it could be used to prevent extrapolation similar to how the NaN staleness marker does in Prometheus. An explicit signal that a series has disappeared can also help signal to downstream stateful components (e.g. prometheus exporter, deltatocumulative processor) that it is safe for them to "forget" the series as well.

But more generally if users can't write instrumentation that works with Cumulatives, they might just use a different library.

Got it. Thanks for clarifying.
After reading this, I think we should have a SDK side specification also added in this PR to clearly state how should SDKs behave for this.

Co-authored-by: Robert Pająk <[email protected]>

Add remove method to synchronous gauges, counters, updowncounters and…

70dab0d

… histograms

atoulme force-pushed the add_remove branch from 32583e9 to 70dab0d Compare October 25, 2025 08:14

dmathieu reviewed Oct 25, 2025

View reviewed changes

specification/metrics/api.md Show resolved Hide resolved

atoulme force-pushed the add_remove branch from 14c9e2e to cdd66a9 Compare October 26, 2025 03:44

Add to changelog and add status

646e391

atoulme force-pushed the add_remove branch from cdd66a9 to 646e391 Compare October 26, 2025 03:48

atoulme mentioned this pull request Oct 27, 2025

Allow to unregister/stop/destroy instruments #2232

Open

atoulme marked this pull request as ready for review October 29, 2025 22:27

atoulme requested review from a team as code owners October 29, 2025 22:27

github-actions bot added the Stale label Nov 6, 2025

pellared reviewed Nov 6, 2025

View reviewed changes

pellared removed the Stale label Nov 6, 2025

github-actions bot added the Stale label Nov 20, 2025

github-actions bot closed this Nov 27, 2025

trask reopened this Dec 4, 2025

cijothomas reviewed Dec 5, 2025

View reviewed changes

github-actions bot removed the Stale label Dec 5, 2025

Update specification/metrics/api.md

159610d

Co-authored-by: Robert Pająk <[email protected]>

	Status: Development
	Status: [Development](../document-status.md)


		Status: Development

		Unregister the Counter. It will no longer be reported.

Add remove method to synchronous instruments #4702

Are you sure you want to change the base?

Add remove method to synchronous instruments #4702

Uh oh!

Conversation

atoulme commented Oct 25, 2025

Changes

Uh oh!

Uh oh!

dashpole commented Oct 27, 2025

Uh oh!

jack-berg commented Oct 27, 2025

Uh oh!

jack-berg commented Oct 27, 2025

Uh oh!

dashpole commented Oct 27, 2025

Uh oh!

carlosalberto commented Oct 27, 2025

Uh oh!

atoulme commented Oct 28, 2025

Uh oh!

atoulme commented Oct 28, 2025

Uh oh!

dashpole commented Oct 28, 2025

Uh oh!

ArthurSens commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atoulme commented Oct 29, 2025

Uh oh!

atoulme commented Oct 29, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pellared Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmacd commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

atoulme commented Dec 4, 2025

Uh oh!

atoulme commented Dec 4, 2025

Uh oh!

dprotaso commented Dec 4, 2025

Uh oh!

dprotaso commented Dec 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

ArthurSens commented Oct 28, 2025 •

edited

Loading

pellared Nov 6, 2025 •

edited

Loading

jmacd commented Nov 12, 2025 •

edited

Loading