Skip to content

Request queue stalls after the first request in SDK 24.1.5 (regression in Tasks.java) #271

@DanBoSlice

Description

@DanBoSlice

SDK: ly.count.sdk:java Version: 24.1.5 Regression from: 24.1.4


When the SDK is initialized with existing persisted request files on disk, only the first request is ever sent. All subsequent [CLY]_request_* files remain on disk indefinitely. New events that are recorded after startup are processed (one per event recorded), but the backlog of pre-existing requests never drains.

Root Cause

In 24.1.5, Tasks.java was refactored to wrap task execution in a try/finally block to guarantee cleanup after exceptions (fixing #264). However, the completion callback was moved inside the try block — before running = null is set in finally:

// Tasks.java — 24.1.5 (BROKEN)
running = task.id;
try {
    T result = task.call();
    if (callback != null) {
        callback.call(result);  // ← fires while running is still non-null
    }
    return result;
} finally {
    synchronized (pending) {
        running = null;         // ← too late; callback has already returned
    }
}

In 24.1.4, the order was correct:

// Tasks.java — 24.1.4 (CORRECT)
running = task.id;
T result = task.call();
synchronized (pending) {
    running = null;             // ← cleared FIRST
}
if (callback != null) {
    callback.call(result);      // ← callback fires after running is null
}

DefaultNetworking chains requests by calling check(config) from the send-task's completion callback:

// DefaultNetworking.java — submit()
tasks.run(transport.send(request), result -> {
    if (result) {
        storageForRequestQueue.removeRequest(request);
        check(config);   // ← meant to schedule the next request
    }
});

check() guards against concurrent execution:

// DefaultNetworking.java — check()
if (!shutdown && !tasks.isRunning() && config.getDeviceId() != null) {
    tasks.run(submit(config));  // only runs if isRunning() == false
}

And isRunning() returns running != null. Because running is not cleared until after the callback returns, check() always sees isRunning() == true and exits without scheduling the next request. Once the finally block sets running = null, there is no pending check() left to restart the queue.

Steps to Reproduce

  1. Initialize the SDK with a storage directory that already contains one or more [CLY]_request_* files from a previous session.
  2. Observe that isSending() briefly returns true while the first file is sent.
  3. Observe that all remaining request files are never picked up and remain on disk.
  4. Record a new event → one more file is processed, then the queue stalls again.

Expected Behavior

The completion callback should fire after running has been set to null, so that check(config) can successfully schedule the next request and drain the queue continuously.

Fix

Move callback.call(result) back to after the finally block (or clear running before invoking the callback):

// Tasks.java — suggested fix
running = task.id;
T result = null;
try {
    result = task.call();
    return result;
} finally {
    synchronized (pending) {
        if (!task.id.equals(0L)) {
            pending.remove(task.id);
        }
        running = null;         // ← clear first
    }
    if (callback != null) {
        callback.call(result);  // ← then invoke callback
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions