[DRAFT] feat: client side metrics handlers by daniel-sanche · Pull Request #16760 · googleapis/google-cloud-python

daniel-sanche · 2026-04-22T00:08:21Z

Migrate googleapis/python-bigtable#1189 to the monorepo

This PR builds off of googleapis/python-bigtable#1187 to add handlers to the client-side metrics system, which can subscribe to the metrics stream, and export the results into different collection systems

We add two handlers to the system:

GoogleCloudMetricsHandler: sends metrics to a private OpenTelemetry meter, and then periodically exports them to GCP. Built on top of OpenTelemetryMetricsHandler
OpenTelemetryMetricsHandler: sends metrics to the root MeterProvider, so the user can access the exported metrics for their own systems. This will be off by default, but can be added alongside GoogleCloudMetricsHandler if needed

TODO:

review comments in feat: client side metrics handlers python-bigtable#924
I see an export error after unit tests: ("KeyError: 'resource_instance'")
make sure we handle errors in metrics client auth without crashing

gemini-code-assist

Code Review

This pull request implements client-side metrics for the Bigtable library using OpenTelemetry, including a custom exporter for Google Cloud Monitoring. The review feedback focuses on improving resource efficiency by moving the MeterProvider and client_uid generation to the client level to avoid thread leaks and inconsistent identifiers across tables. Additionally, recommendations were made to handle potential KeyError exceptions in the exporter, improve logging for background export failures, and ensure non-negative timeouts during batch writes.

gemini-code-assist · 2026-04-22T00:10:49Z

+        gcp_reader = PeriodicExportingMetricReader(
+            exporter, export_interval_millis=export_interval * 1000
+        )
+        # use private meter provider to store instruments and views
+        self.meter_provider = MeterProvider(
+            metric_readers=[gcp_reader], views=VIEW_LIST
+        )


Creating a new PeriodicExportingMetricReader and MeterProvider for every GoogleCloudMetricsHandler is highly inefficient. Since a handler is created for every Table instance, and each PeriodicExportingMetricReader starts its own background thread, this will lead to a significant thread leak and excessive resource consumption (e.g., if a user accesses hundreds of tables).

The MeterProvider and its associated reader should be initialized once at the BigtableDataClient level and shared across all table handlers. This also ensures that metrics are properly flushed and threads are shut down when the client is closed.

gemini-code-assist · 2026-04-22T00:10:49Z

+                                    "instance": data_point.attributes[
+                                        "resource_instance"
+                                    ],
+                                    "cluster": data_point.attributes[
+                                        "resource_cluster"
+                                    ],
+                                    "table": data_point.attributes["resource_table"],
+                                    "zone": data_point.attributes["resource_zone"],


Accessing attributes directly via keys will raise a KeyError if any of the expected resource labels are missing from the data point. This is a known issue mentioned in the PR description. Using .get() or validating the presence of these keys is necessary for a robust exporter.

"instance": data_point.attributes.get("resource_instance", ""), "cluster": data_point.attributes.get("resource_cluster", ""), "table": data_point.attributes.get("resource_table", ""), "zone": data_point.attributes.get("resource_zone", ""),

gemini-code-assist · 2026-04-22T00:10:49Z

+                    for data_point in [
+                        pt for pt in metric.data.data_points if pt.attributes
+                    ]:
+                        if data_point.attributes:


The list comprehension and redundant if check can be simplified to improve readability and avoid unnecessary memory allocation.

for data_point in metric.data.data_points: if data_point.attributes:

gemini-code-assist · 2026-04-22T00:10:49Z

+        try:
+            self._batch_write(all_series, deadline)
+            return MetricExportResult.SUCCESS
+        except Exception:
+            return MetricExportResult.FAILURE


Catching all exceptions and returning FAILURE without logging makes it very difficult to diagnose issues when metrics fail to export. Since this code runs in a background thread, these errors will be silent. Following repository guidelines for background tasks and broad exception blocks, log the exception as a warning (e.g., using logger.warning) to aid in debugging without creating excessive noise from transient failures.

References

For exceptions in background tasks that are designed to be retried, log them as warnings rather than errors to reduce noise from transient, recoverable failures.

Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.

gemini-code-assist · 2026-04-22T00:10:49Z

+        write_ind = 0
+        while write_ind < len(series):
+            # find time left for next batch
+            timeout = deadline - time.time() if deadline else gapic_v1.method.DEFAULT


If the deadline has already passed, the calculated timeout will be negative. It is safer to ensure the timeout is at least zero before passing it to the GAPIC client.

Suggested change

timeout = deadline - time.time() if deadline else gapic_v1.method.DEFAULT

timeout = max(0, deadline - time.time()) if deadline else gapic_v1.method.DEFAULT

gemini-code-assist · 2026-04-22T00:10:50Z

+        # fixed labels sent with each metric update
+        self.shared_labels = {
+            "client_name": f"python-bigtable/{client_version}",
+            "client_uid": client_uid or self._generate_client_uid(),


The client_uid is currently generated per handler instance. Since handlers are created per table, this results in different UIDs for the same client process across different tables. The client_uid should be stable for the lifetime of the BigtableDataClient to allow for consistent aggregation of metrics from a single client instance in the backend. Consider generating the UID once in the client and passing it to the handlers.

daniel-sanche added 5 commits April 21, 2026 16:23

copied over async changes

394cb09

added missing files

186a69e

removed unneeded grpc_throttling_time

ca0949d

fixed mocking

e2a4190

ran format

f025e3e

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

mocked out exporter in veneer tests

8668c63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] feat: client side metrics handlers#16760

[DRAFT] feat: client side metrics handlers#16760
daniel-sanche wants to merge 6 commits intobigtable_csm_2_instrumentation_advancedfrom
bigtable_csm_3_handlers

daniel-sanche commented Apr 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	timeout = deadline - time.time() if deadline else gapic_v1.method.DEFAULT
	timeout = max(0, deadline - time.time()) if deadline else gapic_v1.method.DEFAULT

Conversation

daniel-sanche commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

daniel-sanche commented Apr 22, 2026 •

edited

Loading