Skip to content

Allow airflow.providers to be installed in multiple python folders #10806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 1, 2020

Conversation

ashb
Copy link
Member

@ashb ashb commented Sep 8, 2020

This paves the way to a more-seamless experience with AIP-8 coming

For example, this allows some providers to be installed in site packages (/usr/local/python3.7/...) and others to be installed in the user folder (~/.local/lib/python3.7/...) and both be importable.

If we didn't have code in airflow/__init__.py this would be much easier to achieve (we simply delete the top level init file would be enough) - but sadly we can't take that route.

From the docs of pkgutil: https://docs.python.org/3/library/pkgutil.html#module-pkgutil

This will add to the package’s path all subdirectories of
directories on sys.path named after the package. This is useful if one
wants to distribute different parts of a single logical package as
multiple directories.

Tested as follows:

# Edit setup.py to `exclude=airflow.providers*'`
$ pip install /wheels/apache_airflow-2.0.0.dev0-py3-none-any.whl

# Verify it wasn't istalled.
$ ls -ald $(python -c 'import os; print(os.path.dirname(__import__("airflow").__file__))')/providers
ls: cannot access '/usr/local/lib/python3.7/site-packages/airflow/providers': No such file or directory

# Install some provider packages.
$ pip install --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-redis
$ pip install --user --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-imap

# See where they are imported from
$ python -c 'import airflow.providers.imap, airflow.providers.redis; print(airflow.providers.imap.__file__); print(airflow.providers.redis.__file__)'
/root/.local/lib/python3.7/site-packages/airflow/providers/imap/__init__.py
/usr/local/lib/python3.7/site-packages/airflow/providers/redis/__init__.py

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb requested a review from potiuk September 8, 2020 16:48
@potiuk
Copy link
Member

potiuk commented Sep 8, 2020

Ah, so you found a way :). Good . I will test it tomorrow

@potiuk
Copy link
Member

potiuk commented Sep 8, 2020

Hey @ash - I cancelled it due to "watching the watchers problem" so it's best you rebase it :)

@ashb ashb force-pushed the implicit-namespaces-for-providers branch from 63f4eb4 to cf60c58 Compare September 8, 2020 21:23
@ashb
Copy link
Member Author

ashb commented Sep 8, 2020

Hey @ash - I cancelled it due to "watching the watchers problem" so it's best you rebase it :)

👍 Done.

@ashb
Copy link
Member Author

ashb commented Sep 9, 2020

Ugh, mypy is seriously unhappy about this, even if we enable "support" with "--namespace-packages" python/mypy#5759 (comment)

airflow ❯ mypy --namespace-packages airflow/providers/docker/hooks/docker.py
airflow/providers/docker/hooks/docker.py:20: error: Module 'docker' has no attribute 'APIClient'
    from docker import APIClient
    ^
Found 1 error in 1 file (checked 1 source file)

~/code/airflow/airflow implicit-namespaces-for-providers*
airflow ❯ mypy --version
mypy 0.782

~/code/airflow/airflow implicit-namespaces-for-providers*
airflow ❯ touch airflow/providers/__init__.py

~/code/airflow/airflow implicit-namespaces-for-providers*
airflow ❯ mypy --namespace-packages airflow/providers/docker/hooks/docker.py
Success: no issues found in 1 source file

DAMN IT.

@ashb
Copy link
Member Author

ashb commented Sep 9, 2020

Oh, it seems we can fix it by using mypy -p <packages> instead of mypy <files>.

@ashb ashb force-pushed the implicit-namespaces-for-providers branch 4 times, most recently from df5bb55 to 41843cf Compare September 10, 2020 14:40
@ashb
Copy link
Member Author

ashb commented Sep 10, 2020

Code on this seems fine, but possibly some more tidying up to get the CI passing:

  • prepare backport is failing with (I think) diffs in the generated output that look unrelated
  • Docs were failing without the init.py -- hacked around it for now by creating (then removing) the file, but upgrading to 1.3.0 of autoapi might fix it better.
  • pylint didn't like it either (when running _everything) -- adding back the file for that seemed the easiest path.

@potiuk
Copy link
Member

potiuk commented Sep 10, 2020

Question - do we actually want it eventually ?

I know from pure python installation "versatilty" point of view it might be a good idea, but I hardly imagine this causing a serious problem if we do not support the "multi-source" installation. And even if we allow to install providers from multiple folders, that potentially opens the door for more troubles.

In the #10822 I currently have highly optimised (for speed) solution of discovering provider-specific code based on the fact that there is one provider's "path". It can likely be improved, but this will come with the potential penalties for speed (and I hardly imagine someone making a real production installation that would require this versatility.

If we just limit it to he same "installation location", then anyone installing 2.0 will find out pretty quickly that they have to install providers in the same place where they installed airflow and it seems like pretty natural thing to do.

I am not against fixing it, but seeing how many tools have problem with this, I think this might eventually bring us more problems than the (simple to detect and solve) problem we are trying to solve here.

Just a thought @ashb and others.

@ashb
Copy link
Member Author

ashb commented Sep 10, 2020

My quick thoughts are:

  • more tools are supporting it over time
  • this behaviour is supported/documented in python so a bug in tools that don't
  • the less special cases/departure from standard python semantics (re import paths of "plugins") the better to my mind.

Import time for connections/plugins isn't a huge concern so long as we aren't talking seconds of start up - and being able to support a system wide install with extra user level modules installed is a plus for some of our (astro's) users.

So I think it remains to see how many tools/special cases we need to have to make this work.

@ashb
Copy link
Member Author

ashb commented Sep 11, 2020

@potiuk Good news -- your solution in #10822 already Just Works with this change proposed:

>>> list(pkgutil.walk_packages(path=airflow.providers.__path__, prefix=airflow.providers.__name__ + '.'))
[ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers'), name='airflow.providers.redis', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis'), name='airflow.providers.redis.hooks', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/hooks'), name='airflow.providers.redis.hooks.redis', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis'), name='airflow.providers.redis.operators', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/operators'), name='airflow.providers.redis.operators.redis_publish', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis'), name='airflow.providers.redis.sensors', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/sensors'), name='airflow.providers.redis.sensors.redis_key', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/usr/local/lib/python3.7/site-packages/airflow/providers/redis/sensors'), name='airflow.providers.redis.sensors.redis_pub_sub', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/root/.local/lib/python3.7/site-packages/airflow/providers'), name='airflow.providers.zendesk', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/root/.local/lib/python3.7/site-packages/airflow/providers/zendesk'), name='airflow.providers.zendesk.hooks', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/root/.local/lib/python3.7/site-packages/airflow/providers/zendesk/hooks'), name='airflow.providers.zendesk.hooks.zendesk', ispkg=False)]

@potiuk
Copy link
Member

potiuk commented Sep 11, 2020

Cool :). yeah. I saw that paths can be list but I have not realized 'airflow.providers.path' already returns a list. This is cool and I am far less concerned now because indeed it seems to be well baked into the python ecosystem.

Thanks for checking! 👍

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ashb ! Thanks ! Need rebase and fixing the problems and we can go ahead with it !

@ashb
Copy link
Member Author

ashb commented Sep 14, 2020

@potiuk Any tips for debugging this test failure? Running the script locally via ./breeze passes, so this is probably an edge case somewhere in environment difference between breeze (which has full sources available) and how this check is run in CI.

@ashb ashb force-pushed the implicit-namespaces-for-providers branch 2 times, most recently from 39c4e71 to deda0d3 Compare September 14, 2020 12:23
@potiuk
Copy link
Member

potiuk commented Sep 14, 2020

@potiuk Any tips for debugging this test failure? Running the script locally via ./breeze passes, so this is probably an edge case somewhere in environment difference between breeze (which has full sources available) and how this check is run in CI.

I think this might be another manifestation of #10471 - rebase should fix it, if that's the case.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@ashb ashb force-pushed the implicit-namespaces-for-providers branch from ac8a2ec to 6361b75 Compare October 21, 2020 10:08
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

potiuk added a commit to PolideaInternal/airflow that referenced this pull request Nov 4, 2020
The change apache#10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The apache#10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.

Moving the scripts to dev solves this problem.
potiuk added a commit to PolideaInternal/airflow that referenced this pull request Nov 9, 2020
This is a fix to a problem introduced in apache#10806. The change
turned provider packages into namespace packages - which made
them ignored by find_packages function from setup tools - thus
prodiuction image build automatically and used by Kubernetes
tests did not have the provider packages installed.

This PR fixes it and adds future protection during CI tests of
production image to make sure that provider packages are
actually installed.

Fixes apache#12150
potiuk added a commit to PolideaInternal/airflow that referenced this pull request Nov 9, 2020
The change apache#10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The apache#10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.

Moving the scripts to dev solves this problem.
potiuk added a commit that referenced this pull request Nov 9, 2020
This is a fix to a problem introduced in #10806. The change
turned provider packages into namespace packages - which made
them ignored by find_packages function from setup tools - thus
prodiuction image build automatically and used by Kubernetes
tests did not have the provider packages installed.

This PR fixes it and adds future protection during CI tests of
production image to make sure that provider packages are
actually installed.

Fixes #12150
potiuk added a commit that referenced this pull request Nov 9, 2020
The change #10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The #10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.

Moving the scripts to dev solves this problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Providers should be installable in other folder than Airflow
2 participants