K8s源码分析(二)-K8s调度队列介绍

本文首发在个人博客上,欢迎来踩!
本次分析参考的K8s版本是

文章目录

  • 调度队列简介
  • 调度队列源代码分析
    • 队列初始化
    • QueuedPodInfo元素介绍
    • ActiveQ源代码介绍
    • UnschedulableQ源代码介绍
    • **BackoffQ**源代码介绍
    • 队列弹出待调度的Pod
    • 队列增加新的待调度的Pod
    • pod调度失败返回队列的处理
    • flushBackoffQCompleted
    • flushUnschedulablePodsLeftover
  • 调度队列总结

调度队列简介

这里是官方对于K8s中调度队列的介绍,很值得一看:Scheduling queue in kube-scheduler。整体的架构如下图所示。

请添加图片描述

简单来说K8s中的调度队列主要有3种:

  • ActiveQ(heap结构):在每个调度周期开始时都会从这里取出一个Pod尝试调度。一开始提交的所有没有指定.spec.nodeName的Pod都会发送到这里,也会接收来自unschedulableQ和BackoffQ刷新来的pod。默认的排序规则是按照优先级进行排列,高优先级的Pod在前面。
  • UnschedulableQ(Map结构):存储调度失败的Pod,以等待资源更新、其他相关Pod调度成功等事件,从而将其的Pod其进行重调度。
  • BackoffQ(heap结构):用来暂时退避的队列,默认的排列规则是按退避时间的长度进行排序,需要退避的时间短的Pod在前面。为了防止Pod频繁的重调度,每个Pod都会记录自己的重调度次数,退避时间随着每次失败的调度尝试呈指数增长,直到达到最大值,例如尝试失败 3 次的 Pod 的目标退避超时设置为 curTime + 2s^3 (8s)。注意有两种情况下Pod会进入到BackoffQ队列中:
    • unscheduleableQ会定时对其中的所有pod进行重调度,那么就需要计算各个pod是否退避了足够的时间,如果没有就放入到BackoffQ中再退避一段时间。
    • 如果一个Pod调度失败时,正好这时又异步地发生了资源变更事件(p.moveRequestCycle **>=** podSchedulingCycle )(schedulingCycle 是当前调度的周期,ActiveQ队列每pop一个pod,就加1,moveRequestCycle是事件发生时schedulingCycle 的值),那么就不会放入UnschedulableQ中,而是会直接放入到BackoffQ中。

调度队列机制有两个在后台运行的定期刷新 go协程,负责将 pod 移动到活动队列,后续也将详细介绍相关代码:

  • **flushUnschedulablePodsLeftover:**每 30 秒运行一次,将 Pod 从UnschedulableQ中移动,以允许未由任何事件移动的不可调度的 Pod 再次重试。
  • **flushBackoffQCompleted:**每1秒运行一次,将BackoffQ中已经回避了足够久的Pod移动到ActiveQ队列中

移动请求(move request)会触发一个事件,该事件负责将 Pod 从UnschedulableQ移动到ActiveQ或BackoffQ。集群中许多事件可以触发移动请求的发生,包括了 Pod、节点、服务、PV、PVC、存储类和 CSI 节点的更改。例如当某些pod被调度时,UnschedulableQ中与其具有亲和性要求而导致之前无法调度的pod就会被移动出去,或者当某个新node加入时,原本因为资源不够导致无法调度的Pod也会被移动出去。

调度队列源代码分析

队列初始化

Scheduler中的调度队列SchedulingQueueinternalqueue.SchedulingQueue类型,该类型的实现在pkg/scheduler/internal/queue/scheduling_queue.go:92,如下。

// SchedulingQueue is an interface for a queue to store pods waiting to be scheduled.
// The interface follows a pattern similar to cache.FIFO and cache.Heap and
// makes it easy to use those data structures as a SchedulingQueue.
type SchedulingQueue interface {framework.PodNominatorAdd(pod *v1.Pod) error// Activate moves the given pods to activeQ iff they're in unschedulablePods or backoffQ.// The passed-in pods are originally compiled from plugins that want to activate Pods,// by injecting the pods through a reserved CycleState struct (PodsToActivate).Activate(pods map[string]*v1.Pod)// AddUnschedulableIfNotPresent adds an unschedulable pod back to scheduling queue.// The podSchedulingCycle represents the current scheduling cycle number which can be// returned by calling SchedulingCycle().AddUnschedulableIfNotPresent(pod *framework.QueuedPodInfo, podSchedulingCycle int64) error// SchedulingCycle returns the current number of scheduling cycle which is// cached by scheduling queue. Normally, incrementing this number whenever// a pod is popped (e.g. called Pop()) is enough.SchedulingCycle() int64// Pop removes the head of the queue and returns it. It blocks if the// queue is empty and waits until a new item is added to the queue.Pop() (*framework.QueuedPodInfo, error)Update(oldPod, newPod *v1.Pod) errorDelete(pod *v1.Pod) errorMoveAllToActiveOrBackoffQueue(event framework.ClusterEvent, preCheck PreEnqueueCheck)AssignedPodAdded(pod *v1.Pod)AssignedPodUpdated(pod *v1.Pod)PendingPods() ([]*v1.Pod, string)// Close closes the SchedulingQueue so that the goroutine which is// waiting to pop items can exit gracefully.Close()// Run starts the goroutines managing the queue.Run()
}

上述代码定义了其需要的对队列中的元素添加、删除、更新、获取、运行等方法。而其标准实现PriorityQueue

pkg/scheduler/internal/queue/scheduling_queue.go:145 中,首先查看其需要的变量:

// PriorityQueue implements a scheduling queue.
// The head of PriorityQueue is the highest priority pending pod. This structure
// has two sub queues and a additional data structure, namely: activeQ,
// backoffQ and unschedulablePods.
//   - activeQ holds pods that are being considered for scheduling.
//   - backoffQ holds pods that moved from unschedulablePods and will move to
//     activeQ when their backoff periods complete.
//   - unschedulablePods holds pods that were already attempted for scheduling and
//     are currently determined to be unschedulable.
type PriorityQueue struct {*nominatorstop  chan struct{}clock clock.Clock// pod initial backoff duration.podInitialBackoffDuration time.Duration// pod maximum backoff duration.podMaxBackoffDuration time.Duration// the maximum time a pod can stay in the unschedulablePods.podMaxInUnschedulablePodsDuration time.Durationcond sync.Cond// activeQ is heap structure that scheduler actively looks at to find pods to// schedule. Head of heap is the highest priority pod.activeQ *heap.Heap// podBackoffQ is a heap ordered by backoff expiry. Pods which have completed backoff// are popped from this heap before the scheduler looks at activeQpodBackoffQ *heap.Heap// unschedulablePods holds pods that have been tried and determined unschedulable.unschedulablePods *UnschedulablePods// schedulingCycle represents sequence number of scheduling cycle and is incremented// when a pod is popped.schedulingCycle int64// moveRequestCycle caches the sequence number of scheduling cycle when we// received a move request. Unschedulable pods in and before this scheduling// cycle will be put back to activeQueue if we were trying to schedule them// when we received move request.moveRequestCycle int64clusterEventMap map[framework.ClusterEvent]sets.String// preEnqueuePluginMap is keyed with profile name, valued with registered preEnqueue plugins.preEnqueuePluginMap map[string][]framework.PreEnqueuePlugin// closed indicates that the queue is closed.// It is mainly used to let Pop() exit its control loop while waiting for an item.closed boolnsLister listersv1.NamespaceListermetricsRecorder metrics.MetricAsyncRecorder// pluginMetricsSamplePercent is the percentage of plugin metrics to be sampled.pluginMetricsSamplePercent int
}

pkg/scheduler/internal/queue/scheduling_queue.go:291 中给出了生成了该队列的初始化方法

// NewPriorityQueue creates a PriorityQueue object.
func NewPriorityQueue(lessFn framework.LessFunc,informerFactory informers.SharedInformerFactory,opts ...Option,
) *PriorityQueue {options := defaultPriorityQueueOptionsif options.podLister == nil {options.podLister = informerFactory.Core().V1().Pods().Lister()}for _, opt := range opts {opt(&options)}comp := func(podInfo1, podInfo2 interface{}) bool {pInfo1 := podInfo1.(*framework.QueuedPodInfo)pInfo2 := podInfo2.(*framework.QueuedPodInfo)return lessFn(pInfo1, pInfo2)}pq := &PriorityQueue{nominator:                         newPodNominator(options.podLister),clock:                             options.clock,stop:                              make(chan struct{}),podInitialBackoffDuration:         options.podInitialBackoffDuration,podMaxBackoffDuration:             options.podMaxBackoffDuration,podMaxInUnschedulablePodsDuration: options.podMaxInUnschedulablePodsDuration,activeQ:                           heap.NewWithRecorder(podInfoKeyFunc, comp, metrics.NewActivePodsRecorder()),unschedulablePods:                 newUnschedulablePods(metrics.NewUnschedulablePodsRecorder(), metrics.NewGatedPodsRecorder()),moveRequestCycle:                  -1,clusterEventMap:                   options.clusterEventMap,preEnqueuePluginMap:               options.preEnqueuePluginMap,metricsRecorder:                   options.metricsRecorder,pluginMetricsSamplePercent:        options.pluginMetricsSamplePercent,}pq.cond.L = &pq.lockpq.podBackoffQ = heap.NewWithRecorder(podInfoKeyFunc, pq.podsCompareBackoffCompleted, metrics.NewBackoffPodsRecorder())pq.nsLister = informerFactory.Core().V1().Namespaces().Lister()return pq
}

可以看到其包含了许多我们上面介绍的概念,包括activeQunschedulablePodspodBackoffQschedulingCyclemoveRequestCycle

QueuedPodInfo元素介绍

这里也多次出现了QueuedPodInfo这个关键的数据结构,它是Pod中的基础元素,在此进行介绍,其定义在pkg/scheduler/framework/types.go:98 中,包括了PodInfo、添加时间、尝试次数等

// QueuedPodInfo is a Pod wrapper with additional information related to
// the pod's status in the scheduling queue, such as the timestamp when
// it's added to the queue.
type QueuedPodInfo struct {*PodInfo// The time pod added to the scheduling queue.Timestamp time.Time// Number of schedule attempts before successfully scheduled.// It's used to record the # attempts metric.Attempts int// The time when the pod is added to the queue for the first time. The pod may be added// back to the queue multiple times before it's successfully scheduled.// It shouldn't be updated once initialized. It's used to record the e2e scheduling// latency for a pod.InitialAttemptTimestamp time.Time// If a Pod failed in a scheduling cycle, record the plugin names it failed by.UnschedulablePlugins sets.String// Whether the Pod is scheduling gated (by PreEnqueuePlugins) or not.Gated bool
}

PodInfo的定义在pkg/scheduler/framework/types.go:131

// PodInfo is a wrapper to a Pod with additional pre-computed information to
// accelerate processing. This information is typically immutable (e.g., pre-processed
// inter-pod affinity selectors).
type PodInfo struct {Pod                        *v1.PodRequiredAffinityTerms      []AffinityTermRequiredAntiAffinityTerms  []AffinityTermPreferredAffinityTerms     []WeightedAffinityTermPreferredAntiAffinityTerms []WeightedAffinityTerm
}

Pod的定义在staging/src/k8s.io/api/core/v1/types.go:4202

// Pod is a collection of containers that can run on a host. This resource is created
// by clients and scheduled onto hosts.
type Pod struct {metav1.TypeMeta `json:",inline"`// Standard object's metadata.// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata// +optionalmetav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`// Specification of the desired behavior of the pod.// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status// +optionalSpec PodSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`// Most recently observed status of the pod.// This data may not be up to date.// Populated by the system.// Read-only.// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status// +optionalStatus PodStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}

ActiveQ源代码介绍

从初始化代码中可以看到ActiveQ是一个heap,其相关定义在pkg/scheduler/internal/heap/heap.go

// Heap is a producer/consumer queue that implements a heap data structure.
// It can be used to implement priority queues and similar data structures.
type Heap struct {// data stores objects and has a queue that keeps their ordering according// to the heap invariant.data *data// metricRecorder updates the counter when elements of a heap get added or// removed, and it does nothing if it's nilmetricRecorder metrics.MetricRecorder
}
// data is an internal struct that implements the standard heap interface
// and keeps the data stored in the heap.
type data struct {// items is a map from key of the objects to the objects and their index.// We depend on the property that items in the map are in the queue and vice versa.items map[string]*heapItem// queue implements a heap data structure and keeps the order of elements// according to the heap invariant. The queue keeps the keys of objects stored// in "items".queue []string// keyFunc is used to make the key used for queued item insertion and retrieval, and// should be deterministic.keyFunc KeyFunc// lessFunc is used to compare two objects in the heap.lessFunc lessFunc
}

可以看到他是用queue实现了一个heap。

ActiveQ的默认排序代码在pkg/scheduler/framework/plugins/queuesort/priority_sort.go:42中,即按优先级进行排序,如果优先级相同就提交时间的早晚进行排序。

// Less is the function used by the activeQ heap algorithm to sort pods.
// It sorts pods based on their priority. When priorities are equal, it uses
// PodQueueInfo.timestamp.
func (pl *PrioritySort) Less(pInfo1, pInfo2 *framework.QueuedPodInfo) bool {p1 := corev1helpers.PodPriority(pInfo1.Pod)p2 := corev1helpers.PodPriority(pInfo2.Pod)return (p1 > p2) || (p1 == p2 && pInfo1.Timestamp.Before(pInfo2.Timestamp))
}

UnschedulableQ源代码介绍

UnschedulableQ进行初始化的具体代码在pkg/scheduler/internal/queue/scheduling_queue.go:998

// newUnschedulablePods initializes a new object of UnschedulablePods.
func newUnschedulablePods(unschedulableRecorder, gatedRecorder metrics.MetricRecorder) *UnschedulablePods {return &UnschedulablePods{podInfoMap:            make(map[string]*framework.QueuedPodInfo),keyFunc:               util.GetPodFullName,unschedulableRecorder: unschedulableRecorder,gatedRecorder:         gatedRecorder,}
}

其具体的定义代码在pkg/scheduler/internal/queue/scheduling_queue.go:939 ,

// UnschedulablePods holds pods that cannot be scheduled. This data structure
// is used to implement unschedulablePods.
type UnschedulablePods struct {// podInfoMap is a map key by a pod's full-name and the value is a pointer to the QueuedPodInfo.podInfoMap map[string]*framework.QueuedPodInfokeyFunc    func(*v1.Pod) string// unschedulableRecorder/gatedRecorder updates the counter when elements of an unschedulablePodsMap// get added or removed, and it does nothing if it's nil.unschedulableRecorder, gatedRecorder metrics.MetricRecorder
}

可以看到他没有进行heap的包装,而是直接采用Map结构进行保存。

BackoffQ源代码介绍

BackoffQ也是一个heap,与ActiveQ不同的一点在于排序函数不同,其排序函数的定义在pkg/scheduler/internal/queue/scheduling_queue.go:888

func (p *PriorityQueue) podsCompareBackoffCompleted(podInfo1, podInfo2 interface{}) bool {pInfo1 := podInfo1.(*framework.QueuedPodInfo)pInfo2 := podInfo2.(*framework.QueuedPodInfo)bo1 := p.getBackoffTime(pInfo1)bo2 := p.getBackoffTime(pInfo2)return bo1.Before(bo2)
}

getBackoffTime的定义在pkg/scheduler/internal/queue/scheduling_queue.go:911中,即计算完成避让的时间

// getBackoffTime returns the time that podInfo completes backoff
func (p *PriorityQueue) getBackoffTime(podInfo *framework.QueuedPodInfo) time.Time {duration := p.calculateBackoffDuration(podInfo)backoffTime := podInfo.Timestamp.Add(duration)return backoffTime
}

可以看到队列排序时会将完成避让最早的pod放在前面。

然后再看其是如何计算避让时间的,在pkg/scheduler/internal/queue/scheduling_queue.go

// calculateBackoffDuration is a helper function for calculating the backoffDuration
// based on the number of attempts the pod has made.
func (p *PriorityQueue) calculateBackoffDuration(podInfo *framework.QueuedPodInfo) time.Duration {duration := p.podInitialBackoffDurationfor i := 1; i < podInfo.Attempts; i++ {// Use subtraction instead of addition or multiplication to avoid overflow.if duration > p.podMaxBackoffDuration-duration {return p.podMaxBackoffDuration}duration += duration}return duration
}

其计算可以理解为初次为p.podInitialBackoffDuration,每次需要的避让时间都是前一次的两倍,如果计算得到的避让时间大于p.podMaxBackoffDuration/2 ,就将避让时间设置为p.podMaxBackoffDuration

队列弹出待调度的Pod

其代码在pkg/scheduler/internal/queue/scheduling_queue.go:593

// Pop removes the head of the active queue and returns it. It blocks if the
// activeQ is empty and waits until a new item is added to the queue. It
// increments scheduling cycle when a pod is popped.
func (p *PriorityQueue) Pop() (*framework.QueuedPodInfo, error) {p.lock.Lock()defer p.lock.Unlock()for p.activeQ.Len() == 0 {// When the queue is empty, invocation of Pop() is blocked until new item is enqueued.// When Close() is called, the p.closed is set and the condition is broadcast,// which causes this loop to continue and return from the Pop().if p.closed {return nil, fmt.Errorf(queueClosed)}p.cond.Wait()}obj, err := p.activeQ.Pop()if err != nil {return nil, err}pInfo := obj.(*framework.QueuedPodInfo)pInfo.Attempts++p.schedulingCycle++return pInfo, nil
}

可以看到如果activeQ中没有需要调度的Pod了那么就会使用p.cond.Wait来进行等待,否则就冲activeQ中Pop一个元素QueuedPodInfo,同时这个QueuedPodInfo 的Attempts会+1,整个队列中的schedulingCycle也会增加。

队列增加新的待调度的Pod

其代码在pkg/scheduler/internal/queue/scheduling_queue.go:398

// Add adds a pod to the active queue. It should be called only when a new pod
// is added so there is no chance the pod is already in active/unschedulable/backoff queues
func (p *PriorityQueue) Add(pod *v1.Pod) error {p.lock.Lock()defer p.lock.Unlock()pInfo := p.newQueuedPodInfo(pod)gated := pInfo.Gatedif added, err := p.addToActiveQ(pInfo); !added {return err}if p.unschedulablePods.get(pod) != nil {klog.ErrorS(nil, "Error: pod is already in the unschedulable queue", "pod", klog.KObj(pod))p.unschedulablePods.delete(pod, gated)}// Delete pod from backoffQ if it is backing offif err := p.podBackoffQ.Delete(pInfo); err == nil {klog.ErrorS(nil, "Error: pod is already in the podBackoff queue", "pod", klog.KObj(pod))}klog.V(5).InfoS("Pod moved to an internal scheduling queue", "pod", klog.KObj(pod), "event", PodAdd, "queue", activeQName)metrics.SchedulerQueueIncomingPods.WithLabelValues("active", PodAdd).Inc()p.addNominatedPodUnlocked(pInfo.PodInfo, nil)p.cond.Broadcast()return nil
}

主要是将要加入的pod转化为QueuedPodInfo类型,然后添加到activeQ队列中,还需要检查其他队列中是否有这个pod,如果有就删除,同时做一些日志相关记录,然后还会调用p.cond.Broadcast()来解除上述提到的p.cond.Wait 的等待。

pod调度失败返回队列的处理

当Pod调度失败后,会调用来AddUnschedulableIfNotPresent函数来进行处理,其代码位置在pkg/scheduler/internal/queue/scheduling_queue.go 中。

// AddUnschedulableIfNotPresent inserts a pod that cannot be scheduled into
// the queue, unless it is already in the queue. Normally, PriorityQueue puts
// unschedulable pods in `unschedulablePods`. But if there has been a recent move
// request, then the pod is put in `podBackoffQ`.
func (p *PriorityQueue) AddUnschedulableIfNotPresent(pInfo *framework.QueuedPodInfo, podSchedulingCycle int64) error {p.lock.Lock()defer p.lock.Unlock()pod := pInfo.Podif p.unschedulablePods.get(pod) != nil {return fmt.Errorf("Pod %v is already present in unschedulable queue", klog.KObj(pod))}if _, exists, _ := p.activeQ.Get(pInfo); exists {return fmt.Errorf("Pod %v is already present in the active queue", klog.KObj(pod))}if _, exists, _ := p.podBackoffQ.Get(pInfo); exists {return fmt.Errorf("Pod %v is already present in the backoff queue", klog.KObj(pod))}// Refresh the timestamp since the pod is re-added.pInfo.Timestamp = p.clock.Now()// If a move request has been received, move it to the BackoffQ, otherwise move// it to unschedulablePods.for plugin := range pInfo.UnschedulablePlugins {metrics.UnschedulableReason(plugin, pInfo.Pod.Spec.SchedulerName).Inc()}if p.moveRequestCycle >= podSchedulingCycle {if err := p.podBackoffQ.Add(pInfo); err != nil {return fmt.Errorf("error adding pod %v to the backoff queue: %v", klog.KObj(pod), err)}klog.V(5).InfoS("Pod moved to an internal scheduling queue", "pod", klog.KObj(pod), "event", ScheduleAttemptFailure, "queue", backoffQName)metrics.SchedulerQueueIncomingPods.WithLabelValues("backoff", ScheduleAttemptFailure).Inc()} else {p.unschedulablePods.addOrUpdate(pInfo)klog.V(5).InfoS("Pod moved to an internal scheduling queue", "pod", klog.KObj(pod), "event", ScheduleAttemptFailure, "queue", unschedulablePods)metrics.SchedulerQueueIncomingPods.WithLabelValues("unschedulable", ScheduleAttemptFailure).Inc()}p.addNominatedPodUnlocked(pInfo.PodInfo, nil)return nil
}

这里首先检查了其他队列中是否含有该pod,如果有就返回错误,然后比较moveRequestCyclepodSchedulingCycle ,如果p.moveRequestCycle >= podSchedulingCycle 那就说明在刚刚调度这个pod的时候集群发生了变化,可能现在可以成功调度这个pod了,将其转入backoffQ中,不然就正常加入unschedulableQ中。

flushBackoffQCompleted

在队列运行时会初始化两个go协程,来分别不停检查backoffQunschedulableQ,以及时将相关的Pod移出。代码在pkg/scheduler/internal/queue/scheduling_queue.go:333

// Run starts the goroutine to pump from podBackoffQ to activeQ
func (p *PriorityQueue) Run() {go wait.Until(p.flushBackoffQCompleted, 1.0*time.Second, p.stop)go wait.Until(p.flushUnschedulablePodsLeftover, 30*time.Second, p.stop)
}

对于flushBackoffQCompleted即是每1s运行一次,直到接收到p.stop信息。对flushBackoffQCompleted 函数的具体定义在pkg/scheduler/internal/queue/scheduling_queue.go:537中,如下

// flushBackoffQCompleted Moves all pods from backoffQ which have completed backoff in to activeQ
func (p *PriorityQueue) flushBackoffQCompleted() {p.lock.Lock()defer p.lock.Unlock()activated := falsefor {rawPodInfo := p.podBackoffQ.Peek()if rawPodInfo == nil {break}pInfo := rawPodInfo.(*framework.QueuedPodInfo)pod := pInfo.Podif p.isPodBackingoff(pInfo) {break}_, err := p.podBackoffQ.Pop()if err != nil {klog.ErrorS(err, "Unable to pop pod from backoff queue despite backoff completion", "pod", klog.KObj(pod))break}if err := p.activeQ.Add(pInfo); err != nil {klog.ErrorS(err, "Error adding pod to the active queue", "pod", klog.KObj(pInfo.Pod))} else {klog.V(5).InfoS("Pod moved to an internal scheduling queue", "pod", klog.KObj(pod), "event", BackoffComplete, "queue", activeQName)metrics.SchedulerQueueIncomingPods.WithLabelValues("active", BackoffComplete).Inc()activated = true}}if activated {p.cond.Broadcast()}
}

其主要内容就是从backOffQ的首个元素开始查看,检查器是否已经过了避让时间,如果过了就将其放入到activeQ队列中,直到首个元素没有达到避让时间或者队列为空。

flushUnschedulablePodsLeftover

flushUnschedulablePodsLeftover每30s运行一次,这部分的代码在pkg/scheduler/internal/queue/scheduling_queue.go:572中,如下

// flushUnschedulablePodsLeftover moves pods which stay in unschedulablePods
// longer than podMaxInUnschedulablePodsDuration to backoffQ or activeQ.
func (p *PriorityQueue) flushUnschedulablePodsLeftover() {p.lock.Lock()defer p.lock.Unlock()var podsToMove []*framework.QueuedPodInfocurrentTime := p.clock.Now()for _, pInfo := range p.unschedulablePods.podInfoMap {lastScheduleTime := pInfo.Timestampif currentTime.Sub(lastScheduleTime) > p.podMaxInUnschedulablePodsDuration {podsToMove = append(podsToMove, pInfo)}}if len(podsToMove) > 0 {p.movePodsToActiveOrBackoffQueue(podsToMove, UnschedulableTimeout)}
}

可以看到其主要作用是遍历所有的pod,如果其在unschedulableQ中呆的时间如果超过了最大的p.podMaxInUnschedulablePodsDuration时间,就会将其移出去,至于是移动到activeQ中还是移动到backoffQ中,取决于movePodsToActiveOrBackoffQueue函数,在pkg/scheduler/internal/queue/scheduling_queue.go:771中,如下

// NOTE: this function assumes lock has been acquired in caller
func (p *PriorityQueue) movePodsToActiveOrBackoffQueue(podInfoList []*framework.QueuedPodInfo, event framework.ClusterEvent) {activated := falsefor _, pInfo := range podInfoList {// If the event doesn't help making the Pod schedulable, continue.// Note: we don't run the check if pInfo.UnschedulablePlugins is nil, which denotes// either there is some abnormal error, or scheduling the pod failed by plugins other than PreFilter, Filter and Permit.// In that case, it's desired to move it anyways.if len(pInfo.UnschedulablePlugins) != 0 && !p.podMatchesEvent(pInfo, event) {continue}pod := pInfo.Podif p.isPodBackingoff(pInfo) {if err := p.podBackoffQ.Add(pInfo); err != nil {klog.ErrorS(err, "Error adding pod to the backoff queue", "pod", klog.KObj(pod))} else {klog.V(5).InfoS("Pod moved to an internal scheduling queue", "pod", klog.KObj(pInfo.Pod), "event", event, "queue", backoffQName)metrics.SchedulerQueueIncomingPods.WithLabelValues("backoff", event.Label).Inc()p.unschedulablePods.delete(pod, pInfo.Gated)}} else {gated := pInfo.Gatedif added, _ := p.addToActiveQ(pInfo); added {klog.V(5).InfoS("Pod moved to an internal scheduling queue", "pod", klog.KObj(pInfo.Pod), "event", event, "queue", activeQName)activated = truemetrics.SchedulerQueueIncomingPods.WithLabelValues("active", event.Label).Inc()p.unschedulablePods.delete(pod, gated)}}}p.moveRequestCycle = p.schedulingCycleif activated {p.cond.Broadcast()}
}

注意这个函数不仅仅是在flushUnschedulablePodsLeftover中被调用,还会在处理其他移动请求时触发,只不过这里的移动请求是UnschedulableTimeout ,判断到底是如何移动也很容易从代码中看出,如果已经到达了避让时间,就加入到activeQ中,如果没有就加入到backoffQ中,注意到如果有移动进activeQ中,也是需要执行p.cond.Broadcast(),同时注意到这里更新了moveRequestCycleschedulingCycle,这也是其统一更新moveRequestCycle 的地方。

调度队列总结

考虑到调度队列的细节,我们可以用下图来对其进行归纳回顾。

请添加图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/325613.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

cmd输入mysql -u root -p无法启动

问题分析&#xff1a;cmd输入mysql -u root -p无法启动 解决方法&#xff1a;配置系统环境变量 1.找到mysql安装文件下的bin文件&#xff1a;&#xff08;复制改文件地址,如下图所示&#xff09; 2.电脑桌面下方直接搜索环境变量并进入&#xff0c;如下图 3.点击环境变量&a…

Python 中的 Lambda 函数:简单、快速、高效

大家好&#xff0c;今天再给大家介绍一个python的一个强大工具Lambda 函数&#xff0c;它允许你快速定义简单的匿名函数。这种函数是“匿名的”&#xff0c;因为它们不需要像常规函数那样被明确命名。 在本文中&#xff0c;我们将通过清晰的解释和实用的示例&#xff0c;深入了…

GoF之代理模式(静态代理+动态代理(JDK动态代理+CGLIB动态代理带有一步一步详细步骤))

1. GoF之代理模式&#xff08;静态代理动态代理(JDK动态代理CGLIB动态代理带有一步一步详细步骤)&#xff09; 文章目录 1. GoF之代理模式&#xff08;静态代理动态代理(JDK动态代理CGLIB动态代理带有一步一步详细步骤)&#xff09;每博一文案2. 代理模式的理解3. 静态代理4. 动…

打印图形(C语言)

一、N-S流程图&#xff1b; 二、运行结果&#xff1b; 三、源代码&#xff1b; # define _CRT_SECURE_NO_WARNINGS # include <stdio.h>int main() {//初始化变量值&#xff1b;int i, j;//循环打印&#xff1b;for (i 0; i < 5; i){//列&#xff1b;for (j 0; j &…

网络端口占用问题的综合调研与解决方案

原创 Randy 拍码场 问题背景 去年底信息安全团队进行网络权限治理&#xff0c;要求所有应用实例使用静态IP&#xff0c;公网访问策略与静态IP绑定&#xff1b;之后实例重启时偶现“端口被占用”错误。通过分析总结应用日志&#xff0c;共有以下4种错误类型&#xff0c;实质都是…

1-02-02:虚拟化与容器化Docker环境搭建

1.02.02 虚拟化与容器化Docker环境搭建 一. 虚拟化与容器化技术简介1. 虚拟机环境2. docker环境 二. Docker 架构与隔离机制2.1 Docker 架构2.2 Docker 隔离机制2.3 资源限制2.4 Docker应用场景 三. 实战:Docker在Centos7安装与镜像加速 ❤❤❤3.1 docker安装3.2 设置镜像加速 …

AI回答总不满意?你的提问方式可能完全错误!

大家好&#xff0c;我是卷福同学&#xff0c;一个专注AI大模型整活的前阿里程序员&#xff0c;腾讯云社区2023新秀突破作者 向AI提问想写一篇论文&#xff0c;结果AI就生成2000字左右的文章后就完了。小伙伴们是不是也会遇到这类情况呢。今天来教大家AI提示词的技巧&#xff0c…

Kubernetes基础理论介绍

前言 随着企业数字化转型的深入&#xff0c;为云而生的云原生架构和思想已被大量企业所接受。容器云、微服务、DevOps、 Serverless 已成为企业落地云原生的关键技术&#xff0c;而 Kubernetes 作为容器云的核心基础和事实标准&#xff0c;已成为当今互联网企业和传统 IT 企业…

DHCP原理

什么是DHCP DHCP (Dynamic Host Configuration Protocol,动态主机配置协议&#xff09;是由Internet工作任务小组设计开发的&#xff0c;专门用于为TCP/IP网络中的计算机自动分配TCP/IP参数的协议&#xff0c;是一个应用层协议&#xff0c;使用UDP的67和68端口。 DHCP的前身是B…

发布GPT-5的方式可能会与以往不同;开源vocode使用 AI 自动拨打电话;开源gpt智能对话客服工具;AI自动写提示词

✨ 1: vocode 用AI通过声音与用户进行实时交流 Vocode是一个旨在帮助开发者快速构建基于声音的大型语言模型&#xff08;LLM&#xff09;应用程序的开源库。简单来说&#xff0c;如果你想要开发一个能够通过声音与用户进行实时交流的应用&#xff0c;比如电话机器人、语音助手…

一套MySQL读写分离分库分表的架构,被秀到了!

&#x1f4e2;&#x1f4e2;&#x1f4e2;&#x1f4e3;&#x1f4e3;&#x1f4e3; 作者&#xff1a;IT邦德 中国DBA联盟(ACDU)成员&#xff0c;10余年DBA工作经验&#xff0c; Oracle、PostgreSQL ACE CSDN博客专家及B站知名UP主&#xff0c;全网粉丝10万 擅长主流Oracle、My…

FFmpeg 音视频处理工具三剑客(ffmpeg、ffprobe、ffplay)

【导读】FFmpeg 是一个完整的跨平台音视频解决方案&#xff0c;它可以用于音频和视频的转码、转封装、转推流、录制、流化处理等应用场景。FFmpeg 在音视频领域享有盛誉&#xff0c;号称音视频界的瑞士军刀。同时&#xff0c;FFmpeg 有三大利器是我们应该清楚的&#xff0c;它们…

HNU-操作系统OS-2024期中考试

前言 该卷为22计科/智能OS期中考卷。 感谢智能22毕宿同学记忆了考卷考题。 同学评价&#xff1a;总体简单&#xff1b;第1&#xff0c;7概念题较难需要看书&#xff1b;第4&#xff0c;5题原题。 欢迎同学分享答案。 【1】共10分 操作系统的设计目标有哪些&#xff1f; 【…

设计模式之拦截过滤器模式

想象一下&#xff0c;在你的Java应用里&#xff0c;每个请求就像一场冒险旅程&#xff0c;途中需要经过层层安检和特殊处理。这时候&#xff0c;拦截过滤器模式就化身为你最可靠的特工团队&#xff0c;悄无声息地为每一个请求保驾护航&#xff0c;确保它们安全、高效地到达目的…

Garden Planner for Mac v3.8.62注册激活版:园林绿化设计软件

Garden Planner for Mac是一款专为苹果Mac OS平台设计的园林景观设计软件。这款软件的主要功能是帮助用户设计梦想中的花园&#xff0c;包括安排植物、树木、建筑物和其他物体。 Garden Planner for Mac提供了一个包含1200多种植物和物体符号的库&#xff0c;这些符号都可以进行…

贪吃蛇(c实现)

目录 游戏说明&#xff1a; 第一个是又是封面&#xff0c;第二个为提示信息&#xff0c;第三个是游戏运行界面 游戏效果展示&#xff1a; 游戏代码展示&#xff1a; snack.c test.c snack.h 控制台程序的准备&#xff1a; 控制台程序名字修改&#xff1a; 参考&#xff1a…

【Android】Kotlin学习之Kotlin方法的声明和传参

方法声明 普通类的方法 静态类的方法 不需要构建实例对象, 可以通过类名直接访问静态方法 : NumUtil.double(1) companion object 伴生类的方法 使用companion object 在普通类里定义静态方法 参数 括号内传入方法 : 当参数是方法时, 并且是最后一个参数 , 可以使用括号外…

《二十二》Qt 音频编程实战---做一个音频播放器

1.UI界面制作 作为一个音乐播放器&#xff0c;最基础的肯定就是播放、暂停、上一首以及下一首&#xff0c;为了使这个界面好看一点&#xff0c;还加入了音量控制、进度条、歌曲列表等内容&#xff0c;至于这种配色和效果好不好看&#xff0c;我也不知道&#xff0c;个人审美一如…

腾讯云服务器之ssh远程连接登录及转发映射端口实现内网穿透(实现服务器访问本地电脑端口)

目录 一、创建密钥绑定实例二、设置私钥权限三、ssh远程连接到服务器四、修改root密码五、端口转发&#xff08;实现服务器访问本地电脑的端口&#xff09; 一、创建密钥绑定实例 创建密钥会自动下载一个私钥&#xff0c;把这个私钥复制到c盘 二、设置私钥权限 1、删除所有用户…

电商核心技术揭秘55:社群与粉丝经济的结合

相关系列文章 电商技术揭秘相关系列文章合集&#xff08;1&#xff09; 电商技术揭秘相关系列文章合集&#xff08;2&#xff09; 电商技术揭秘相关系列文章合集&#xff08;3&#xff09; 电商技术揭秘四十一&#xff1a;电商平台的营销系统浅析 电商技术揭秘四十二&#…