Python on MaxCompute (PyODPS) is MaxCompute SDK for Python, which helps you use PyODPS to interact with MaxCompute and process data. You can use PyODPS to develop MaxCompute jobs, analyze data, and manage MaxCompute resources. This topic describes how to use PyODPS.
Introduction to PyODPS
PyODPS supports the DataFrame framework and basic operations on MaxCompute objects. PyODPS supports Python 2 and Python 3. Python 2 can be Python 2.6 or later.
For more information about PyODPS, see the following documentation:
Initialization
Before you can use PyODPS, you must initialize a connection to MaxCompute by using your Alibaba Cloud account. To initialize a connection, run the following command:
import os
from odps import ODPS
# Set the environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET to the AccessKey ID and AccessKey secret of your Alibaba Cloud account.
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='your-default-project',
endpoint='your-end-point',
)
Parameters:
ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET: the AccessKey ID and AccessKey secret of your Alibaba Cloud account. Make sure that the account has the operation permissions to manage objects in the target MaxCompute project. You can obtain the AccessKey ID on the AccessKey page.
your-default-project: the name of your MaxCompute project. You can log on to the MaxCompute console. In the top navigation bar, select a region. In the left-side navigation pane, choose Workspace > Projects to view the name of the MaxCompute project.
your-end-point: the endpoint of the region where your MaxCompute project resides.
Description
The following table describes the methods that you can use to perform operations on MaxCompute objects.
Item | Operation | Description |
Projects | get_project(project_name) | Obtains the name of a MaxCompute project. |
exist_project(project_name) | Checks whether a MaxCompute project exists. | |
Tables | list_tables() | Lists all tables in a MaxCompute project. |
exist_table(table_name) | Checks whether a table exists. | |
get_table(table_name, project=project_name) | Obtains a specified table. You can obtain a table from another MaxCompute project. | |
create_table() | Creates a table. | |
read_table() | Reads data from a table. | |
write_table() | Writes data to a table. | |
delete_table() | Deletes an existing table. | |
Table partitions | exist_partition() | Checks whether a partition exists. |
get_partition() | Obtains information about a partition. | |
create_partition() | Creates a partition. | |
delete_partition() | Deletes an existing partition. | |
SQL | execute_sql()/run_sql() | Executes SQL statements. |
open_reader() | Reads execution results of SQL statements. | |
Instances | list_instances() | Lists all instances in a MaxCompute project. |
exist_instance() | Checks whether an instance exists. | |
get_instance() | Obtains information about an instance. | |
stop_instance() | Terminates an instance. | |
Resources | create_resource() | Creates a resource. |
open_resource() | Opens a resource. | |
get_resource() | Obtains information about a resource. | |
list_resources() | Lists all existing resources. | |
exist_resource() | Checks whether a resource exists. | |
delete_resource() | Deletes an existing resource. | |
Functions | create_function() | Creates a function. |
delete_function() | Deletes an existing function. | |
Uploads and downloads tunnels | create_upload_session() | Creates a session that is used to upload data. |
create_download_session() | Creates a session that is used to download data. |
You must specify parameters when you use create_table(), read_table(), write_table(), and delete_table() methods. For more information, see Examples of using the SDK for Python: tables.