Skip to content

scheduler gets stuck without a trace #7935

@dimberman

Description

@dimberman

Apache Airflow version:

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
    What happened:

The scheduler gets stuck without a trace or error. When this happens, the CPU usage of scheduler service is at 100%. No jobs get submitted and everything comes to a halt. Looks it goes into some kind of infinite loop.
The only way I could make it run again is by manually restarting the scheduler service. But again, after running some tasks it gets stuck. I've tried with both Celery and Local executors but same issue occurs. I am using the -n 3 parameter while starting scheduler.

Scheduler configs,
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
executor = LocalExecutor
parallelism = 32

Please help. I would be happy to provide any other information needed

What you expected to happen:

How to reproduce it:

Anything else we need to know:

Moved here from https://issues.apache.org/jira/browse/AIRFLOW-401

Metadata

Metadata

Assignees

Labels

area:Schedulerincluding HA (high availability) schedulerkind:bugThis is a clearly a bug

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions