Skip to content

Derive all *PoolExecutor classes from an abstract PoolExecutor class derived from Executor #144723

@davidmcnabnz

Description

@davidmcnabnz

Feature or enhancement

Proposal:

In their present form, the *PoolExecutor classes are unsafe for high-pressure production environments, if used purely as documented.

(For example, library calls in an asyncio environment which [in]directly cause DNS lookups will trigger the use of the event loop's current thread pool executor. If that executor is maxed, this can cause massive delays in completion of the DNS lookup which, in turn, can cause intermittent timeouts much higher in the chains which can become difficult and time-consuming to debug.)

The only current remedy to the various dangers is if a developer subclasses the pool executors and implements off-label features by studying the stdlib class internals. But this is unsafe in a different way, because internal implementations of library classes can change radically between Python versions.

Currently, the various pool executors, such as ThreadPoolExecutor, derive directly from Executor. Also, they are highly opaque black boxes, with very few documented methods.

For example, there appears to be no document-compliant way to manage the pool for production-critical operations like:

  • determining the current worker limit
  • determining how many workers are active, and how long they've been active for
  • increasing or decreasing the worker limit
  • taking inventory of active workers
  • selectively killing arbitrary workers

So, this ticket is a feature request to implement an abstract PoolExecutor class which:

  • is derived from Executor
  • becomes the parent class of all other worker executors such as ThreadPoolExecutor, ProcessPoolExecutor and InterpreterPoolExecutor
  • exposes abstract methods/attributes for:
    • implementing the above management operations
    • requiring stdlib subclasses to provide their own implementations of these, which will have the same method signatures, but internally will of course vary hugely
  • provides hooks for user-written subclasses to intercept adding/termination of workers, and other events
  • provides a timeout mechanism to propagate exceptions up the various chains if it takes too long in a full pool object for a new worker to get added
  • is ergonomic for subclassing, both directly from PoolExecutor, and also from subclasses like ThreadPoolExecutor

If users of PoolExecutor-based classes are able to monitor current/maximum worker counts, worker startup delays, and modify resource limits in real time, it will go a long way towards increasing the safety of these classes in real world production environments, and further erode the already diminishing case against using CPython as a production software platform.

Has this already been discussed elsewhere?

Not to my knowledge

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions