Feature Request: Allow EPP to dynamically discover ports based on matching Pods

# Background
Currently, EPP uses pods with fixed ports as the basic unit for request scheduling. However, this approach does not adapt well to scenarios where multiple model servers may start dynamicly on different ports. It is hoped that the capability of EPP can be generalized so that EPP can dynamically discover the inference server port from the matched pod, instead of discovering it from the definition of InferencePool.

One potential use case for this scenario is as follows:

[https://github.com/llm-d-incubation/llm-d-fast-model-actuation](https://github.com/llm-d-incubation/llm-d-fast-model-actuation)) is a project aiming to speed up scale out, at least in simple model server deployment patterns. The key technologies are using vLLM sleep/wake to maintain some low-GPU-resource idle instances, and a launcher process to spring-load startup of child processes. The launcher will have only one awake child at a time, but also some sleeping ones; switching which is awake is fast. Thus, the launcher pod serves different models at different times. There is a controller that manages this, including adjusting the launcher pod's labels to match the right InferencePool. Remember that even a vLLM instance in sleep mode is handling HTTP requests on its inference port. So the children of one launcher have to have different inference ports. That means that the InferencePools have to have different inference ports. I do not like that, it adds constraints between InferencePool objects. So it would be better if the inference port number(s) could come from a label or annotation on the matching pods.



# Proposal
Currently, in the InferencePool API, multiple ports of the backend Model Server are identified through a required `targetPorts` field.  

https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1/inferencepool_types.go#L81C2-L81C13

However, with this approach, it is not possible to identify inference servers dynamically started on different ports within a pod.


We suggest making the `targetPorts` field optional and proposing a new annotation-based method  in the EPP Model Server Protocol to help EPP dynamically discover model servers running on multiple dynamic ports, thereby serving the aforementioned use cases. Specifically, an example of the annotations would look like this:



```python
annotations:
 inference.networking.x-k8s.io/port-discovery: [{"inferencePool":"qwen-pool","number":8007}]
```

This can be parsed into the following data structure:

```python
type InferencePort struct {
   InferencePool string            `json:"inferencePool"`
   Number        int32             `json:"number"`
   Attributes    map[string]string `json:"attributes,omitempty"`
}

type InferencePorts []InferencePort
```

+ Each `InferencePort` declares a model server instance running on that port.
+ The `inferencePool` field specifies which inference pool the instance belongs to (since services running on different ports may belong to different inference pools).
+ For any additional requirements, an optional `Attributes` field allows users to attach metadata to the model server running on that port. This is particularly useful when the inference servers, launched on different ports, are responsible for distinct roles, such as prefill and decode.



At the implementation level, EPP can listen to pods and, based on predefined annotations, transform a pod into one or more "virtual pods", which are then recorded in the local datastore. This approach ensures full compatibility with other EPP scheduling logic based on pods. This implementation has already been proposed in [https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1663](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1663) to support DP. We only need to modify the creation logic of the "virtual pods" so that EPP can create "virtual pods" according to the annotations.


@kfswain 
cc  @MikeSpreitzer  @osswangxining @shmuelk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Allow EPP to dynamically discover ports based on matching Pods #1965

Background

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Allow EPP to dynamically discover ports based on matching Pods #1965

Description

Background

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions