-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Allow airflow.providers to be installed in multiple python folders #10806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow airflow.providers to be installed in multiple python folders #10806
Conversation
Ah, so you found a way :). Good . I will test it tomorrow |
Hey @ash - I cancelled it due to "watching the watchers problem" so it's best you rebase it :) |
63f4eb4
to
cf60c58
Compare
👍 Done. |
Ugh, mypy is seriously unhappy about this, even if we enable "support" with "--namespace-packages" python/mypy#5759 (comment)
DAMN IT. |
Oh, it seems we can fix it by using |
df5bb55
to
41843cf
Compare
Code on this seems fine, but possibly some more tidying up to get the CI passing:
|
Question - do we actually want it eventually ? I know from pure python installation "versatilty" point of view it might be a good idea, but I hardly imagine this causing a serious problem if we do not support the "multi-source" installation. And even if we allow to install providers from multiple folders, that potentially opens the door for more troubles. In the #10822 I currently have highly optimised (for speed) solution of discovering provider-specific code based on the fact that there is one provider's "path". It can likely be improved, but this will come with the potential penalties for speed (and I hardly imagine someone making a real production installation that would require this versatility. If we just limit it to he same "installation location", then anyone installing 2.0 will find out pretty quickly that they have to install providers in the same place where they installed airflow and it seems like pretty natural thing to do. I am not against fixing it, but seeing how many tools have problem with this, I think this might eventually bring us more problems than the (simple to detect and solve) problem we are trying to solve here. Just a thought @ashb and others. |
My quick thoughts are:
Import time for connections/plugins isn't a huge concern so long as we aren't talking seconds of start up - and being able to support a system wide install with extra user level modules installed is a plus for some of our (astro's) users. So I think it remains to see how many tools/special cases we need to have to make this work. |
@potiuk Good news -- your solution in #10822 already Just Works with this change proposed: >>> list(pkgutil.walk_packages(path=airflow.providers.__path__, prefix=airflow.providers.__name__ + '.'))
[ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers'), name='airflow.providers.redis', ispkg=True),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis'), name='airflow.providers.redis.hooks', ispkg=True),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/hooks'), name='airflow.providers.redis.hooks.redis', ispkg=False),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis'), name='airflow.providers.redis.operators', ispkg=True),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/operators'), name='airflow.providers.redis.operators.redis_publish', ispkg=False),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis'), name='airflow.providers.redis.sensors', ispkg=True),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/sensors'), name='airflow.providers.redis.sensors.redis_key', ispkg=False),
ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/sensors'), name='airflow.providers.redis.sensors.redis_pub_sub', ispkg=False),
ModuleInfo(module_finder=FileFinder('/root/.local/lib/python3.7/site-packages/airflow/providers'), name='airflow.providers.zendesk', ispkg=True),
ModuleInfo(module_finder=FileFinder('/root/.local/lib/python3.7/site-packages/airflow/providers/zendesk'), name='airflow.providers.zendesk.hooks', ispkg=True),
ModuleInfo(module_finder=FileFinder('/root/.local/lib/python3.7/site-packages/airflow/providers/zendesk/hooks'), name='airflow.providers.zendesk.hooks.zendesk', ispkg=False)] |
Cool :). yeah. I saw that paths can be list but I have not realized 'airflow.providers.path' already returns a list. This is cool and I am far less concerned now because indeed it seems to be well baked into the python ecosystem. Thanks for checking! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @ashb ! Thanks ! Need rebase and fixing the problems and we can go ahead with it !
@potiuk Any tips for debugging this test failure? Running the script locally via ./breeze passes, so this is probably an edge case somewhere in environment difference between breeze (which has full sources available) and how this check is run in CI. |
39c4e71
to
deda0d3
Compare
I think this might be another manifestation of #10471 - rebase should fix it, if that's the case. |
480884b
to
ac8a2ec
Compare
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*. |
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*. |
ac8a2ec
to
6361b75
Compare
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*. |
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*. |
The change apache#10806 made airflow works with implicit packages when "airflow" got imported. This is a good change, however it has some unforeseen consequences. The 'provider_packages' script copy all the providers code for backports in order to refactor them to the empty "airflow" directory in provider_packages folder. The apache#10806 change turned that empty folder in 'airflow' package because it was in the same directory as the provider_packages scripts. Moving the scripts to dev solves this problem.
This is a fix to a problem introduced in apache#10806. The change turned provider packages into namespace packages - which made them ignored by find_packages function from setup tools - thus prodiuction image build automatically and used by Kubernetes tests did not have the provider packages installed. This PR fixes it and adds future protection during CI tests of production image to make sure that provider packages are actually installed. Fixes apache#12150
The change apache#10806 made airflow works with implicit packages when "airflow" got imported. This is a good change, however it has some unforeseen consequences. The 'provider_packages' script copy all the providers code for backports in order to refactor them to the empty "airflow" directory in provider_packages folder. The apache#10806 change turned that empty folder in 'airflow' package because it was in the same directory as the provider_packages scripts. Moving the scripts to dev solves this problem.
This is a fix to a problem introduced in #10806. The change turned provider packages into namespace packages - which made them ignored by find_packages function from setup tools - thus prodiuction image build automatically and used by Kubernetes tests did not have the provider packages installed. This PR fixes it and adds future protection during CI tests of production image to make sure that provider packages are actually installed. Fixes #12150
The change #10806 made airflow works with implicit packages when "airflow" got imported. This is a good change, however it has some unforeseen consequences. The 'provider_packages' script copy all the providers code for backports in order to refactor them to the empty "airflow" directory in provider_packages folder. The #10806 change turned that empty folder in 'airflow' package because it was in the same directory as the provider_packages scripts. Moving the scripts to dev solves this problem.
This paves the way to a more-seamless experience with AIP-8 coming
For example, this allows some providers to be installed in site packages (
/usr/local/python3.7/...
) and others to be installed in the user folder (~/.local/lib/python3.7/...
) and both be importable.If we didn't have code in
airflow/__init__.py
this would be much easier to achieve (we simply delete the top level init file would be enough) - but sadly we can't take that route.From the docs of pkgutil: https://docs.python.org/3/library/pkgutil.html#module-pkgutil
Tested as follows:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.