Azure Cosmos DB SQL API client library for Python
Disclaimer
Azure SDK Python packages support for Python 2.7 has ended 01 January 2022. For more information and questions, please refer to https://github.com/Azure/azure-sdk-for-python/issues/20691
Azure Cosmos DB is a globally distributed, multi-model database service that supports document, key-value, wide-column, and graph databases.
Use the Azure Cosmos DB SQL API SDK for Python to manage databases and the JSON documents they contain in this NoSQL database service. High level capabilities are:
- Create Cosmos DB databases and modify their settings
- Create and modify containers to store collections of JSON documents
- Create, read, update, and delete the items (JSON documents) in your containers
- Query the documents in your database using SQL-like syntax
SDK source code
| Package (PyPI)
| Package (Conda)
| API reference documentation
| Product documentation
| Samples
This SDK is used for the SQL API. For all other APIs, please check the Azure Cosmos DB documentation to evaluate the best SDK for your project.
Getting started
Important update on Python 2.x Support
New releases of this SDK won't support Python 2.x starting January 1st, 2022. Please check the CHANGELOG for more information.
Prerequisites
If you need a Cosmos DB SQL API account, you can create one with this Azure CLI command:
az cosmosdb create --resource-group <resource-group-name> --name <cosmos-account-name>
Install the package
pip install azure-cosmos
Configure a virtual environment (optional)
Although not required, you can keep your base system and Azure SDK environments isolated from one another if you use a virtual environment. Execute the following commands to configure and then enter a virtual environment with venv:
python3 -m venv azure-cosmosdb-sdk-environment
source azure-cosmosdb-sdk-environment/bin/activate
Authenticate the client
Interaction with Cosmos DB starts with an instance of the CosmosClient class. You need an account, its URI, and one of its account keys to instantiate the client object.
Use the Azure CLI snippet below to populate two environment variables with the database account URI and its primary master key (you can also find these values in the Azure portal). The snippet is formatted for the Bash shell.
RES_GROUP=<resource-group-name>
ACCT_NAME=<cosmos-db-account-name>
export ACCOUNT_URI=$(az cosmosdb show --resource-group $RES_GROUP --name $ACCT_NAME --query documentEndpoint --output tsv)
export ACCOUNT_KEY=$(az cosmosdb list-keys --resource-group $RES_GROUP --name $ACCT_NAME --query primaryMasterKey --output tsv)
Create the client
Once you've populated the ACCOUNT_URI
and ACCOUNT_KEY
environment variables, you can create the CosmosClient.
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
AAD Authentication
You can also authenticate a client utilizing your service principal's AAD credentials and the azure identity package.
You can directly pass in the credentials information to ClientSecretCredential, or use the DefaultAzureCredential:
from azure.cosmos import CosmosClient
from azure.identity import ClientSecretCredential, DefaultAzureCredential
import os
url = os.environ['ACCOUNT_URI']
tenant_id = os.environ['TENANT_ID']
client_id = os.environ['CLIENT_ID']
client_secret = os.environ['CLIENT_SECRET']
aad_credentials = ClientSecretCredential(
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret)
aad_credentials = DefaultAzureCredential()
client = CosmosClient(url, aad_credentials)
Always ensure that the managed identity you use for AAD authentication has readMetadata
permissions.
More information on how to set up AAD authentication: Set up RBAC for AAD authentication
More information on allowed operations for AAD authenticated clients: RBAC Permission Model
Key concepts
Once you've initialized a CosmosClient, you can interact with the primary resource types in Cosmos DB:
-
Database: A Cosmos DB account can contain multiple databases. When you create a database, you specify the API you'd like to use when interacting with its documents: SQL, MongoDB, Gremlin, Cassandra, or Azure Table. Use the DatabaseProxy object to manage its containers.
-
Container: A container is a collection of JSON documents. You create (insert), read, update, and delete items in a container by using methods on the ContainerProxy object.
-
Item: An Item is the dictionary-like representation of a JSON document stored in a container. Each Item you add to a container must include an id
key with a value that uniquely identifies the item within the container.
For more information about these resources, see Working with Azure Cosmos databases, containers and items.
How to use enable_cross_partition_query
The keyword-argument enable_cross_partition_query
accepts 2 options: None
(default) or True
.
Note on using queries by id
When using queries that try to find items based on an id value, always make sure you are passing in a string type variable. Azure Cosmos DB only allows string id values and if you use any other datatype, this SDK will return no results and no error messages.
Note on client consistency levels
As of release version 4.3.0b3, if a user does not pass in an explicit consistency level to their client initialization,
their client will use their database account's default level. Previously, the default was being set to Session
consistency.
If for some reason you'd like to keep doing this, you can change your client initialization to include the explicit parameter for this like shown:
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY, consistency_level='Session')
Limitations
Currently, the features below are not supported. For alternatives options, check the Workarounds section below.
Data Plane Limitations:
- Group By queries
- Queries with COUNT from a DISTINCT subquery: SELECT COUNT (1) FROM (SELECT DISTINCT C.ID FROM C)
- Direct TCP Mode access
- Continuation token support for aggregate cross-partition queries like sorting, counting, and distinct.
Streamable queries like
SELECT * FROM WHERE
do support continuation tokens. - Change Feed: Processor
- Change Feed: Read multiple partitions key values
- Cross-partition ORDER BY for mixed types
- Enabling diagnostics for async query-type methods
Control Plane Limitations:
- Get CollectionSizeUsage, DatabaseUsage, and DocumentUsage metrics
- Get the connection string
- Get the minimum RU/s of a container
Workarounds
Control Plane Limitations Workaround
Typically, you can use Azure Portal, Azure Cosmos DB Resource Provider REST API, Azure CLI or PowerShell for the control plane unsupported limitations.
Using The Async Client as a Workaround to Bulk
While the SDK supports transactional batch, support for bulk requests is not yet implemented in the Python SDK. You can use the async client along with this concurrency sample we have developed as a reference for a possible workaround.
[WARNING]
Using the asynchronous client for concurrent operations like shown in this sample will consume a lot of RUs very fast. We strongly recommend testing this out against the cosmos emulator first to verify your code works well and avoid incurring charges.
Boolean Data Type
While the Python language uses "True" and "False" for boolean types, Cosmos DB accepts "true" and "false" only. In other words, the Python language uses Boolean values with the first uppercase letter and all other lowercase letters, while Cosmos DB and its SQL language use only lowercase letters for those same Boolean values. How to deal with this challenge?
- Your JSON documents created with Python must use "True" and "False", to pass the language validation. The SDK will convert it to "true" and "false" for you. Meaning that "true" and "false" is what will be stored in Cosmos DB.
- If you retrieve those documents with the Cosmos DB Portal's Data Explorer, you will see "true" and "false".
- If you retrieve those documents with this Python SDK, "true" and "false" values will be automatically converted to "True" and "False".
SQL Queries x FROM Clause Subitems
This SDK uses the query_items method to submit SQL queries to Azure Cosmos DB.
Cosmos DB SQL language allows you to get subitems by using the FROM clause, to reduce the source to a smaller subset. As an example, you can use select * from Families.children
instead of select * from Families
. But please note that:
- For SQL queries using the
query_items
method, this SDK demands that you specify the partition_key
or use the enable_cross_partition_query
flag. - If you are getting subitems and specifying the
partition_key
, please make sure that your partition key is included in the subitems, which is not true for most of the cases.
Max Item Count
This is a parameter of the query_items method, an integer indicating the maximum number of items to be returned per page. The None
value can be specified to let the service determine the optimal item count. This is the recommended configuration value, and the default behavior of this SDK when it is not set.
Examples
The following sections provide several code snippets covering some of the most common Cosmos DB tasks, including:
Create a database
After authenticating your CosmosClient, you can work with any resource in the account. The code snippet below creates a SQL API database, which is the default when no API is specified when create_database is invoked.
from azure.cosmos import CosmosClient, exceptions
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
try:
database = client.create_database(DATABASE_NAME)
except exceptions.CosmosResourceExistsError:
database = client.get_database_client(DATABASE_NAME)
Create a container
This example creates a container with default settings. If a container with the same name already exists in the database (generating a 409 Conflict
error), the existing container is obtained instead.
from azure.cosmos import CosmosClient, PartitionKey, exceptions
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
try:
container = database.create_container(id=CONTAINER_NAME, partition_key=PartitionKey(path="/productName"))
except exceptions.CosmosResourceExistsError:
container = database.get_container_client(CONTAINER_NAME)
except exceptions.CosmosHttpResponseError:
raise
Create an analytical store enabled container
This example creates a container with Analytical Store enabled, for reporting, BI, AI, and Advanced Analytics with Azure Synapse Link.
The options for analytical_storage_ttl are:
- 0 or Null or not informed: Not enabled.
- -1: The data will be stored infinitely.
- Any other number: the actual ttl, in seconds.
CONTAINER_NAME = 'products'
try:
container = database.create_container(id=CONTAINER_NAME, partition_key=PartitionKey(path="/productName"),analytical_storage_ttl=-1)
except exceptions.CosmosResourceExistsError:
container = database.get_container_client(CONTAINER_NAME)
except exceptions.CosmosHttpResponseError:
raise
The preceding snippets also handle the CosmosHttpResponseError exception if the container creation failed. For more information on error handling and troubleshooting, see the Troubleshooting section.
Get an existing container
Retrieve an existing container from the database:
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
container = database.get_container_client(CONTAINER_NAME)
Insert data
To insert items into a container, pass a dictionary containing your data to ContainerProxy.upsert_item. Each item you add to a container must include an id
key with a value that uniquely identifies the item within the container.
This example inserts several items into the container, each with a unique id
:
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
container = database.get_container_client(CONTAINER_NAME)
for i in range(1, 10):
container.upsert_item({
'id': 'item{0}'.format(i),
'productName': 'Widget',
'productModel': 'Model {0}'.format(i)
}
)
Delete data
To delete items from a container, use ContainerProxy.delete_item. The SQL API in Cosmos DB does not support the SQL DELETE
statement.
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
container = database.get_container_client(CONTAINER_NAME)
for item in container.query_items(
query='SELECT * FROM products p WHERE p.productModel = "Model 2"',
enable_cross_partition_query=True):
container.delete_item(item, partition_key='Widget')
NOTE: If you are using partitioned collection, the value of the partitionKey
in the example code above, should be set to the value of the partition key for this particular item, not the name of the partition key column in your collection. This holds true for both point reads and deletes.
Query the database
A Cosmos DB SQL API database supports querying the items in a container with ContainerProxy.query_items using SQL-like syntax.
This example queries a container for items with a specific id
:
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
container = database.get_container_client(CONTAINER_NAME)
import json
for item in container.query_items(
query='SELECT * FROM mycontainer r WHERE r.id="item3"',
enable_cross_partition_query=True):
print(json.dumps(item, indent=True))
NOTE: Although you can specify any value for the container name in the FROM
clause, we recommend you use the container name for consistency.
Perform parameterized queries by passing a dictionary containing the parameters and their values to ContainerProxy.query_items:
discontinued_items = container.query_items(
query='SELECT * FROM products p WHERE p.productModel = @model',
parameters=[
dict(name='@model', value='Model 7')
],
enable_cross_partition_query=True
)
for item in discontinued_items:
print(json.dumps(item, indent=True))
For more information on querying Cosmos DB databases using the SQL API, see Query Azure Cosmos DB data with SQL queries.
Get database properties
Get and display the properties of a database:
from azure.cosmos import CosmosClient
import os
import json
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
properties = database.read()
print(json.dumps(properties))
Get database and container throughputs
Get and display the throughput values of a database and of a container with dedicated throughput:
from azure.cosmos import CosmosClient
import os
import json
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
db_offer = database.get_throughput()
print('Found Offer \'{0}\' for Database \'{1}\' and its throughput is \'{2}\''.format(db_offer.properties['id'], database.id, db_offer.properties['content']['offerThroughput']))
CONTAINER_NAME = 'testContainer'
container = database.get_container_client(CONTAINER_NAME)
container_offer = container.get_throughput()
print('Found Offer \'{0}\' for Container \'{1}\' and its throughput is \'{2}\''.format(container_offer.properties['id'], container.id, container_offer.properties['content']['offerThroughput']))
Modify container properties
Certain properties of an existing container can be modified. This example sets the default time to live (TTL) for items in the container to 10 seconds:
from azure.cosmos import CosmosClient, PartitionKey
import os
import json
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
container = database.get_container_client(CONTAINER_NAME)
database.replace_container(
container,
partition_key=PartitionKey(path="/productName"),
default_ttl=10,
)
container_props = container.read()
print(json.dumps(container_props['defaultTtl']))
For more information on TTL, see Time to Live for Azure Cosmos DB data.
Response headers include metadata information from the executed operations like etag
, which allows for optimistic concurrency scenarios, or x-ms-request-charge
which lets you know how many RUs were consumed by the request.
This applies to all item point operations in both the sync and async clients - and can be used by referencing the get_response_headers()
method of any response as such:
from azure.cosmos import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
DATABASE_NAME = 'testDatabase'
CONTAINER_NAME = 'products'
client = CosmosClient(URL, credential=KEY)
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)
operation_response = container.create_item({"id": "test_item", "productName": "test_item"})
operation_headers = operation_response.get_response_headers()
etag_value = operation_headers['etag']
request_charge = operation_headers['x-ms-request-charge']
Using the asynchronous client
The asynchronous cosmos client is a separate client that looks and works in a similar fashion to the existing synchronous client. However, the async client needs to be imported separately and its methods need to be used with the async/await keywords.
The Async client needs to be initialized and closed after usage, which can be done manually or with the use of a context manager. The example below shows how to do so manually. We don't recommend doing it this way, since it requires that you manually call aenter() before using the client.
from azure.cosmos.aio import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
DATABASE_NAME = 'testDatabase'
CONTAINER_NAME = 'products'
async def create_products():
client = CosmosClient(URL, credential=KEY)
await client.__aenter__()
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)
for i in range(10):
await container.upsert_item({
'id': 'item{0}'.format(i),
'productName': 'Widget',
'productModel': 'Model {0}'.format(i)
}
)
await client.close()
Instead of manually opening and closing the client, it is highly recommended to use the async with
keywords. This creates a context manager that will initialize and later close the client once you're out of the statement, as well as cache important information the SDK needs. The example below shows how to do so.
from azure.cosmos.aio import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
DATABASE_NAME = 'testDatabase'
CONTAINER_NAME = 'products'
async def create_products():
async with CosmosClient(URL, credential=KEY) as client:
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)
for i in range(10):
await container.upsert_item({
'id': 'item{0}'.format(i),
'productName': 'Widget',
'productModel': 'Model {0}'.format(i)
}
)
Queries with the asynchronous client
Unlike the synchronous client, the async client does not have an enable_cross_partition
flag in the request. Queries without a specified partition key value will attempt to do a cross partition query by default.
Query results can be iterated, but the query's raw output returns an asynchronous iterator. This means that each object from the iterator is an awaitable object, and does not yet contain the true query result. In order to obtain the query results you can use an async for loop, which awaits each result as you iterate on the object, or manually await each query result as you iterate over the asynchronous iterator.
Since the query results are an asynchronous iterator, they can't be cast into lists directly; instead, if you need to create lists from your results, use an async for loop or Python's list comprehension to populate a list:
from azure.cosmos.aio import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'products'
container = database.get_container_client(CONTAINER_NAME)
async def create_lists():
results = container.query_items(
query='SELECT * FROM products p WHERE p.productModel = "Model 2"')
item_list = []
async for item in results:
item_list.append(item)
item_list = [item async for item in results]
await client.close()
Using Integrated Cache
An integrated cache is an in-memory cache that helps you ensure manageable costs and low latency as your request volume grows. The integrated cache has two parts: an item cache for point reads and a query cache for queries. The code snippet below shows you how to use this feature with the point read and query cache methods.
The benefit of using this is that the point reads and queries that hit the integrated cache won't use any RUs. This means you will have a much lower per-operation cost than reads from the backend.
How to configure the Azure Cosmos DB integrated cache (Preview)
import azure.cosmos.cosmos_client as cosmos_client
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
client = cosmos_client.CosmosClient(URL, credential=KEY)
DATABASE_NAME = 'testDatabase'
database = client.get_database_client(DATABASE_NAME)
CONTAINER_NAME = 'testContainer'
container = database.get_container_client(CONTAINER_NAME)
def integrated_cache_snippet():
item_id = body['id']
query = 'SELECT * FROM c'
container.read_item(item=item_id, partition_key=item_id, max_integrated_cache_staleness_in_ms=30000)
container.query_items(query=query,
partition_key=item_id, max_integrated_cache_staleness_in_ms=30000)
For more information on Integrated Cache, see Azure Cosmos DB integrated cache - Overview.
Using Transactional Batch
Transactional batch requests allow you to send several operations to be executed at once within the same partition key.
If all operations succeed in the order they're described within the transactional batch operation, the transaction will be committed.
However, if any operation fails, the entire transaction is rolled back.
Transactional batches have a limit of 100 operations per batch, and a total size limit of 1.2Mb for the
batch operations being passed in.
Transactional Batch operations look very similar to the singular operations apis, and are tuples containing
(operation_type_string
, args_tuple
, batch_operation_kwargs_dictionary
), with the kwargs dictionary being optional:
batch_operations = [
("create", (item_body,), kwargs),
("replace", (item_id, item_body), kwargs),
("read", (item_id,), kwargs),
("upsert", (item_body,), kwargs),
("patch", (item_id, operations), kwargs),
("delete", (item_id,), kwargs),
]
batch_results = container.execute_item_batch(batch_operations=batch_operations, partition_key=partition_key)
The batch operation kwargs dictionary is limited, and only takes a total of three different key values.
In the case of wanting to use conditional patching within the batch, the use of filter_predicate
key is available for the
patch operation, or in case of wanting to use etags with any of the operations, the use of the if_match_etag
/if_none_match_etag
keys is available as well.
batch_operations = [
("replace", (item_id, item_body), {"if_match_etag": etag}),
("patch", (item_id, operations), {"filter_predicate": filter_predicate, "if_none_match_etag": etag}),
]
We also have some samples showing these transactional batch operations in action with both the sync
and async clients.
If there is a failure for an operation within the batch, the SDK will raise a CosmosBatchOperationError
letting you know which operation failed,
as well as containing the list of failed responses for the failed request.
For more information on Transactional Batch, see Azure Cosmos DB Transactional Batch.
Public Preview - Vector Embeddings and Vector Indexes
We have added new capabilities to utilize vector embeddings and vector indexing for users to leverage vector
search utilizing our Cosmos SDK. These two container-level configurations have to be turned on at the account-level
before you can use them.
Each vector embedding should have a path to the relevant vector field in your items being stored, a supported data type
(float32, int8, uint8), the vector's dimensions, and the distance function being used for that embedding. Vectors indexed
with the flat index type can be at most 505 dimensions. Vectors indexed with the quantizedFlat index type can be at most 4,096 dimensions.
A sample vector embedding policy would look like this:
vector_embedding_policy = {
"vectorEmbeddings": [
{
"path": "/vector1",
"dataType": "float32",
"dimensions": 256,
"distanceFunction": "euclidean"
},
{
"path": "/vector2",
"dataType": "int8",
"dimensions": 200,
"distanceFunction": "dotproduct"
},
{
"path": "/vector3",
"dataType": "uint8",
"dimensions": 400,
"distanceFunction": "cosine"
}
]
}
Separately, vector indexes have been added to the already existing indexing_policy and only require two fields per index:
the path to the relevant field to be used, and the type of index from the possible options - flat, quantizedFlat, or diskANN.
A sample indexing policy with vector indexes would look like this:
indexing_policy = {
"automatic": True,
"indexingMode": "consistent",
"compositeIndexes": [
[
{"path": "/numberField", "order": "ascending"},
{"path": "/stringField", "order": "descending"}
]
],
"spatialIndexes": [
{"path": "/location/*", "types": [
"Point",
"Polygon"]}
],
"vectorIndexes": [
{"path": "/vector1", "type": "flat"},
{"path": "/vector2", "type": "quantizedFlat"},
{"path": "/vector3", "type": "diskANN"}
]
}
For vector index types of diskANN and quantizedFlat, there are additional options available as well. These are:
quantizationByteSize - the number of bytes used in product quantization of the vectors. A larger value may result in better recall for vector searches at the expense of latency. This applies to index types diskANN and quantizedFlat. The allowed range is between 1 and the minimum between 512 and the vector dimensions. The default value is 64.
indexingSearchListSize - which represents the size of the candidate list of approximate neighbors stored while building the diskANN index as part of the optimization processes. This applies only to index type diskANN. The allowed range is between 25 and 500.
indexing_policy = {
"automatic": True,
"indexingMode": "consistent",
"vectorIndexes": [
{"path": "/vector1", "type": "quantizedFlat", "quantizationByteSize": 8},
{"path": "/vector2", "type": "diskANN", "indexingSearchListSize": 50}
]
}
You would then pass in the relevant policies to your container creation method to ensure these configurations are used by it.
The operation will fail if you pass new vector indexes to your indexing policy but forget to pass in an embedding policy.
database.create_container(id=container_id, partition_key=PartitionKey(path="/id"),
indexing_policy=indexing_policy, vector_embedding_policy=vector_embedding_policy)
Note: vector embeddings and vector indexes CANNOT be edited by container replace operations. They are only available directly through creation.
Public Preview - Vector Search
With the addition of the vector indexing and vector embedding capabilities, the SDK can now perform order by vector search queries.
These queries specify the VectorDistance to use as a metric within the query text. These must always use a TOP or LIMIT clause within the query though,
since vector search queries have to look through a lot of data otherwise and may become too expensive or long-running.
Since these queries are relatively expensive, the SDK sets a default limit of 50000 max items per query - if you'd like to raise that further, you
can use the AZURE_COSMOS_MAX_ITEM_BUFFER_VECTOR_SEARCH
environment variable to do so. However, be advised that queries with too many vector results
may have additional latencies associated with searching in the service.
The query syntax for these operations looks like this:
VectorDistance(<embedding1>, <embedding2>, [,<exact_search>], [,<specification>])
Embeddings 1 and 2 are the arrays of values for the relevant embeddings, exact_search
is an optional boolean indicating whether
to do an exact search vs. an approximate one (default value of false), and specification
is an optional Json snippet with embedding
specs that can include dataType
, dimensions
and distanceFunction
. The specifications within the query will take precedence
to any configurations previously set by a vector embedding policy.
A sample vector search query would look something like this:
query = "SELECT TOP 10 c.title,VectorDistance(c.embedding, [{}]) AS " \
"SimilarityScore FROM c ORDER BY VectorDistance(c.embedding, [{}])".format(embeddings_string, embeddings_string)
Or if you'd like to add the optional parameters to the vector distance, you could do this:
query = "SELECT TOP 10 c.title,VectorDistance(c.embedding, [{}], true, {{'dataType': 'float32' , 'distanceFunction': 'cosine'}}) AS " \
"SimilarityScore FROM c ORDER BY VectorDistance(c.embedding, [{}], true, {{'dataType': " \
"'float32', 'distanceFunction': 'cosine'}})".format(embeddings_string, embeddings_string)
The embeddings_string
above would be your string made from your vector embeddings.
You can find our sync samples here and our async samples here as well to help yourself out.
Note: For a limited time, if your query operates against a region or emulator that has not yet been updated the client might run into some issues
not being able to recognize the new NonStreamingOrderBy capability that makes vector search possible.
If this happens, you can set the AZURE_COSMOS_DISABLE_NON_STREAMING_ORDER_BY
environment variable to "True"
to opt out of this
functionality and continue operating as usual.
Public Preview - Full Text Policy and Full Text Indexes
We have added new capabilities to utilize full text policies and full text indexing for users to leverage full text search
utilizing our Cosmos SDK. These two container-level configurations have to be turned on at the account-level
before you can use them.
A full text policy allows the user to define the default language to be used for all full text paths, or to set
a language for each path individually in case the user would like to use full text search on data containing different
languages in different fields.
A sample full text policy would look like this:
full_text_policy = {
"defaultLanguage": "en-US",
"fullTextPaths": [
{
"path": "/text1",
"language": "en-US"
},
{
"path": "/text2",
"language": "en-US"
}
]
}
Currently, the only supported language is en-US
- using the relevant ISO-639 language code to ISO-3166 country code.
Any non-supported language or code will return an exception when trying to use it - which will also include the list of supported languages.
This list will include more options in the future; for more information on supported languages, please see here.
Full text search indexes have been added to the already existing indexing_policy and only require the path to the
relevant field to be used.
A sample indexing policy with full text search indexes would look like this:
indexing_policy = {
"automatic": True,
"indexingMode": "consistent",
"compositeIndexes": [
[
{"path": "/numberField", "order": "ascending"},
{"path": "/stringField", "order": "descending"}
]
],
"fullTextIndexes": [
{"path": "/abstract"}
]
}
Modifying the index in a container is an asynchronous operation that can take a long time to finish. See here for more information.
For more information on using full text policies and full text indexes, see here.
Public Preview - Full Text Search and Hybrid Search
With the addition of the full text indexing and full text policies, the SDK can now perform full text search and hybrid search queries.
These queries can utilize the new query functions FullTextContains()
, FullTextContainsAll
, and FullTextContainsAny
to efficiently
search for the given terms within your item fields.
Beyond these, you can also utilize the new Order By RANK
and Order By RANK RRF
along with FullTextScore
to execute the BM25 scoring algorithm
or Reciprocal Rank Fusion (RRF) on your query, finding the items with the highest relevance to the terms you are looking for.
All of these mentioned queries would look something like this:
-
SELECT TOP 10 c.id, c.text FROM c WHERE FullTextContains(c.text, 'quantum')
-
SELECT TOP 10 c.id, c.text FROM c WHERE FullTextContainsAll(c.text, 'quantum', 'theory')
-
SELECT TOP 10 c.id, c.text FROM c WHERE FullTextContainsAny(c.text, 'quantum', 'theory')
-
SELECT TOP 10 c.id, c.text FROM c ORDER BY RANK FullTextScore(c.text, ['quantum', 'theory'])
-
SELECT TOP 10 c.id, c.text FROM c ORDER BY RANK RRF(FullTextScore(c.text, ['quantum', 'theory']), FullTextScore(c.text, ['model']))
-
SELECT TOP 10 c.id, c.text FROM c ORDER BY RANK RRF(FullTextScore(c.text, ['quantum', 'theory']), FullTextScore(c.text, ['model']), VectorDistance(c.embedding, {item_embedding}))"
These queries must always use a TOP or LIMIT clause within the query since hybrid search queries have to look through a lot of data otherwise and may become too expensive or long-running.
Since these queries are relatively expensive, the SDK sets a default limit of 1000 max items per query - if you'd like to raise that further, you
can use the AZURE_COSMOS_HYBRID_SEARCH_MAX_ITEMS
environment variable to do so. However, be advised that queries with too many vector results
may have additional latencies associated with searching in the service.
You can find our sync samples here and our async samples here as well for additional guidance.
Troubleshooting
General
When you interact with Cosmos DB using the Python SDK, exceptions returned by the service correspond to the same HTTP status codes returned for REST API requests:
HTTP Status Codes for Azure Cosmos DB
For example, if you try to create a container using an ID (name) that's already in use in your Cosmos DB database, a 409
error is returned, indicating the conflict. In the following snippet, the error is handled gracefully by catching the exception and displaying additional information about the error.
try:
database.create_container(id=CONTAINER_NAME, partition_key=PartitionKey(path="/productName"))
except exceptions.CosmosResourceExistsError:
print("""Error creating container
HTTP status code 409: The ID (name) provided for the container is already in use.
The container name must be unique within the database.""")
Logging Diagnostics
This library uses the standard
logging library for logging diagnostics.
Basic information about HTTP sessions (URLs, headers, etc.) is logged at INFO
level.
Detailed DEBUG level logging, including request/response bodies and unredacted
headers, can be enabled on a client with the logging_enable
argument:
import sys
import logging
from azure.cosmos import CosmosClient
logger = logging.getLogger('azure')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
logger.addHandler(handler)
client = CosmosClient(URL, credential=KEY, logging_enable=True)
Similarly, logging_enable
can enable detailed logging for a single operation,
even when it isn't enabled for the client:
database = client.create_database(DATABASE_NAME, logging_enable=True)
Alternatively, you can log using the CosmosHttpLoggingPolicy, which extends from the azure core HttpLoggingPolicy, by passing in your logger to the logger
argument.
By default, it will use the behaviour from HttpLoggingPolicy. Passing in the enable_diagnostics_logging
argument will enable the
CosmosHttpLoggingPolicy, and will have additional information in the response relevant to debugging Cosmos issues.
import logging
from azure.cosmos import CosmosClient
logger = logging.getLogger('azure')
logger.setLevel(logging.DEBUG)
handler = logging.FileHandler(filename="azure")
logger.addHandler(handler)
client = CosmosClient(URL, credential=KEY, logger=logger, enable_diagnostics_logging=True)
Similarly, logging can be enabled for a single operation by passing in a logger to the singular request.
However, if you desire to use the CosmosHttpLoggingPolicy to obtain additional information, the enable_diagnostics_logging
argument needs to be passed in at the client constructor.
client = CosmosClient(URL, credential=KEY, enable_diagnostics_logging=True)
database = client.create_database(DATABASE_NAME, logger=logger)
Telemetry
Azure Core provides the ability for our Python SDKs to use OpenTelemetry with them. The only packages that need to be installed
to use this functionality are the following:
pip install azure-core-tracing-opentelemetry
pip install opentelemetry-sdk
For more information on this, we recommend taking a look at this document
from Azure Core describing how to set it up. We have also added a sample file to show how it can
be used with our SDK. This works the same way regardless of the Cosmos client you are using.
Next steps
For more extensive documentation on the Cosmos DB service, see the Azure Cosmos DB documentation on docs.microsoft.com.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ or
contact opencode@microsoft.com with any additional questions or comments.
Release History
4.9.0 (2024-11-18)
Features Added
- Added full text policy and full text indexing policy. See PR 37891.
- Added support for full text search and hybrid search queries. See PR 38275.
4.8.0 (2024-11-12)
This version and all future versions will support Python 3.13.
Features Added
- Added response headers directly to SDK item point operation responses. See PR 35791.
- SDK will now retry all ServiceRequestErrors (failing outgoing requests) before failing. Default number of retries is 3. See PR 36514.
- Added Retry Policy for Container Recreate in the Python SDK. See PR 36043.
- Added option to disable write payload on writes. See PR 37365.
- Added get feed ranges API. See PR 37687.
- Added feed range support in
query_items_change_feed
. See PR 37687. - Added provisional helper APIs for managing session tokens. See PR 36971.
- Added ability to get feed range for a partition key. See PR 36971.
Breaking Changes
- Item-level point operations will now return
CosmosDict
and CosmosList
response types.
Responses will still be able to be used directly as previously, but will now have access to their response headers without need for a response hook. See PR 35791.
For more information on this, see our README section here.
Bugs Fixed
- Consolidated Container Properties Cache to be in the Client to cache partition key definition and container rid to avoid unnecessary container reads. See PR 35731.
- Fixed bug with client hangs when running into WriteForbidden exceptions. See PR 36514.
- Added retry handling logic for DatabaseAccountNotFound exceptions. See PR 36514.
- Fixed SDK regex validation that would not allow for item ids to be longer than 255 characters. See PR 36569.
- Fixed issue where 'NoneType' object has no attribute error was raised when a session retry happened during a query. See PR 37578.
- Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See PR 38136.
- Batch requests will now be properly considered as Write operation. See PR 38365.
Other Changes
- Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See PR 36043.
- Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See PR 36043.
- SDK will now make database account calls every 5 minutes to refresh location cache. See PR 36514.
4.7.0 (2024-05-15)
Features Added
- Adds vector embedding policy and vector indexing policy. See PR 34882.
- Adds support for vector search non-streaming order by queries. See PR 35468.
- Adds support for using the start time option for change feed query API. See PR 35090.
Bugs Fixed
- Fixed a bug where change feed query in Async client was not returning all pages due to case-sensitive response headers. See PR 35090.
- Fixed a bug when a retryable exception occurs in the first page of a query execution causing query to return 0 results. See PR 35090.
4.6.1 (2024-05-15)
4.6.0 (2024-03-14)
Features Added
- GA release of hierarchical partitioning, index metrics and transactional batch.
Bugs Fixed
- Keyword arguments were not being passed down for
create_container_if_not_exists()
methods. See PR 34286.
Other Changes
4.5.2b5 (2024-03-02)
Bugs Fixed
- Fixed bug with async lock not properly releasing on async global endpoint manager. see PR 34579.
Other Changes
- Marked
computed_properties
keyword as provisional, un-marked continuation_token_limit
as provisional. See PR 34207.
4.5.2b4 (2024-02-02)
This version and all future versions will require Python 3.8+.
Features Added
- Added preview support for Computed Properties on Python SDK (Must be enabled on the account level before it can be used). See PR 33626.
Bugs Fixed
- Made use of
response_hook
thread-safe in the sync client. See PR 33790. - Fixed bug with the session container not being properly maintained. See 33738.
4.5.2b3 (2023-11-10)
Features Added
- Added support for capturing Index Metrics in query operations. See PR 33034.
4.5.2b2 (2023-10-31)
Features Added
- Added support for Transactional Batch. See PR 32508.
- Added preview support for Priority Based Throttling/Priority Based Execution (Must be enabled at the account level before it can be used). See PR 32441.
4.5.2b1 (2023-10-17)
Features Added
- Added support for Hierarchical Partitioning, also known as Subpartitioning. See PR 31121.
Bugs Fixed
- Small fix to the
offer_throughput
option in the async client's create_database_if_not_exists
method, which was previously misspelled as offerThroughput
.
See PR 32076.
Other Changes
- Marked the outdated
diagnostics.py
file for deprecation since we now recommend the use of our CosmosHttpLoggingPolicy
for diagnostics.
For more on the CosmosHttpLoggingPolicy
see our README.
4.5.1 (2023-09-12)
Bugs Fixed
- Fixed bug when query with DISTINCT + OFFSET/LIMIT operators returns unexpected result. See PR 31925.
Other Changes
- Added additional checks for resource creation using specific characters that cause issues. See PR 31861.
4.5.0 (2023-08-09)
Features Added
- Added support for continuation tokens for streamable cross partition queries. See PR 31189.
Bugs Fixed
- Fixed bug with async
create_database_if_not_exists
method not working when passing offer_throughput
as an option. See PR 31478.
Other Changes
- Renamed
response_continuation_token_limit_in_kb
to continuation_token_limit
for GA. See PR 31532.
4.4.1b1 (2023-07-25)
Features Added
- Added ability to limit continuation token size when querying for items. See PR 30731
Bugs Fixed
- Fixed bug with async patch_item method. See PR 30804.
4.4.0 (2023-06-09)
Features Added
- GA release of Patch API and Delete All Items By Partition Key
4.4.0b2 (2023-05-22)
Features Added
- Added conditional patching for Patch operations. See PR 30455.
Bugs Fixed
- Fixed bug with non english locales causing an error with the RFC 1123 Date Format. See PR 30125.
Other Changes
- Refactoring of our client
connection_timeout
and request_timeout
configurations. See PR 30171.
4.4.0b1 (2023-04-11)
Features Added
Bugs Fixed
- Fixed bug in method
create_container_if_not_exists()
of async database client for unexpected kwargs being passed into read()
method used internally. See PR 29136. - Fixed bug with method
query_items()
of our async container class, where partition key and cross partition headers would both be set when using partition keys. See PR 29366. - Fixed bug with client not properly surfacing errors for invalid credentials and identities with insufficient permissions. Users running into 'NoneType has no attribute ConsistencyPolicy' errors when initializing their clients will now see proper authentication exceptions. See PR 29256.
Other Changes
- Removed use of
six
package within the SDK.
4.3.1 (2023-02-23)
Features Added
- Added
correlated_activity_id
for query operations. - Added cross regional retries for Service Unavailable/Request Timeouts for read/Query Plan operations.
- GA release of CosmosHttpLoggingPolicy and autoscale feature.
Bugs Fixed
- Bug fix to address queries with VALUE MAX (or any other aggregate) that run into an issue if the query is executed on a container with at least one "empty" partition.
4.3.1b1 (2022-09-19)
Features Added
- GA release of integrated cache functionality. For more information on integrated cache please see Azure Cosmos DB integrated cache.
- Added ability to replace analytical ttl on containers. For more information on analytical ttl please see Azure Cosmos DB analytical store.
- Added
CosmosHttpLoggingPolicy
to replace HttpLoggingPolicy
for logging HTTP sessions. - Added the ability to create containers and databases with autoscale properties for the sync and async clients.
- Added the ability to update autoscale throughput properties.
Bugs Fixed
- Fixed parsing of args for overloaded
container.read()
method. - Fixed
validate_cache_staleness_value()
method to allow max_integrated_cache_staleness to be an integer greater than or equal to 0. - Fixed
__aiter__()
method by removing the async keyword.
4.3.0 (2022-05-23)
Features Added
- GA release of Async I/O APIs, including all changes from 4.3.0b1 to 4.3.0b4.
Breaking Changes
- Method signatures have been updated to use keyword arguments instead of positional arguments for most method options in the async client.
- Bugfix: Automatic Id generation for items was turned on for
upsert_items()
method when no 'id' value was present in document body.
Method call will now require an 'id' field to be present in the document body.
Other Changes
- Deprecated offer-named methods in favor of their new throughput-named counterparts (
read_offer
-> get_throughput
). - Marked the GetAuthorizationHeader method for deprecation since it will no longer be public in a future release.
- Added samples showing how to configure retry options for both the sync and async clients.
- Deprecated the
connection_retry_policy
and retry_options
options in the sync client. - Added user warning to non-query methods trying to use
populate_query_metrics
options.
4.3.0b4 (2022-04-07)
Features Added
- Added support for AAD authentication for the async client.
- Added support for AAD authentication for the sync client.
Other Changes
- Changed
_set_partition_key
return typehint in async client.
4.3.0b3 (2022-03-10)
[WARNING]
The default Session
consistency bugfix will impact customers whose database accounts have a Bounded Staleness
or Strong
consistency level, and were previously not sending Session
as a consistency_level parameter when initializing
their clients.
Default consistency level for the sync and async clients is no longer "Session" and will instead be set to the
consistency level of the user's cosmos account setting on initialization if not passed during client initialization.
Please see Consistency Levels in Azure Cosmos DB
for more details on consistency levels, or the README section on this change here.
Features Added
- Added new provisional
max_integrated_cache_staleness_in_ms
parameter to read item and query items APIs in order
to make use of the preview CosmosDB integrated cache functionality See PR #22946.
Please see Azure Cosmos DB integrated cache for more details. - Added support for split-proof queries for the async client.
Bugs fixed
- Default consistency level for the sync and async clients is no longer
Session
and will instead be set to the
consistency level of the user's cosmos account setting on initialization if not passed during client initialization.
This change will impact client application in terms of RUs and latency. Users relying on default Session
consistency
will need to pass it explicitly if their account consistency is different than Session
.
Please see Consistency Levels in Azure Cosmos DB for more details. - Fixed invalid request body being sent when passing in
serverScript
body parameter to replace operations for trigger, sproc and udf resources. - Moved
is_system_key
logic in async client. - Fixed TypeErrors not being thrown when passing in invalid connection retry policies to the client.
4.3.0b2 (2022-01-25)
This version and all future versions will require Python 3.6+. Python 2.7 is no longer supported.
We will also be removing support for Python 3.6 and will only support Python 3.7+ starting December 2022.
Features Added
- Added support for split-proof queries for the sync client.
Other Changes
- Added async user agent for async client.
4.3.0b1 (2021-12-14)
Features Added
- Added language native async i/o client.
4.2.0 (2020-10-08)
Bug fixes
- Fixed bug where continuation token is not honored when query_iterable is used to get results by page. Issue #13265.
- Fixed bug where resource tokens not being honored for document reads and deletes. Issue #13634.
New features
- Added support for passing partitionKey while querying changefeed. Issue #11689.
4.1.0 (2020-08-10)
- Added deprecation warning for "lazy" indexing mode. The backend no longer allows creating containers with this mode and will set them to consistent instead.
New features
- Added the ability to set the analytical storage TTL when creating a new container.
Bug fixes
- Fixed support for dicts as inputs for get_client APIs.
- Fixed Python 2/3 compatibility in query iterators.
- Fixed type hint error. Issue #12570 - thanks @sl-sandy.
- Fixed bug where options headers were not added to upsert_item function. Issue #11791 - thank you @aalapatirvbd.
- Fixed error raised when a non string ID is used in an item. It now raises TypeError rather than AttributeError. Issue #11793 - thank you @Rabbit994.
4.0.0 (2020-05-20)
- Stable release.
- Added HttpLoggingPolicy to pipeline to enable passing in a custom logger for request and response headers.
4.0.0b6
- Fixed bug in synchronized_request for media APIs.
- Removed MediaReadMode and MediaRequestTimeout from ConnectionPolicy as media requests are not supported.
4.0.0b5
- azure.cosmos.errors module deprecated and replaced by azure.cosmos.exceptions
- The access condition parameters (
access_condition
, if_match
, if_none_match
) have been deprecated in favor of separate match_condition
and etag
parameters. - Fixed bug in routing map provider.
- Added query Distinct, Offset and Limit support.
- Default document query execution context now used for
- ChangeFeed queries
- single partition queries (partitionkey, partitionKeyRangeId is present in options)
- Non document queries
- Errors out for aggregates on multiple partitions, with enable cross partition query set to true, but no "value" keyword present
- Hits query plan endpoint for other scenarios to fetch query plan
- Added
__repr__
support for Cosmos entity objects. - Updated documentation.
4.0.0b4
- Added support for a
timeout
keyword argument to all operations to specify an absolute timeout in seconds
within which the operation must be completed. If the timeout value is exceeded, a azure.cosmos.errors.CosmosClientTimeoutError
will be raised. - Added a new
ConnectionRetryPolicy
to manage retry behaviour during HTTP connection errors. - Added new constructor and per-operation configuration keyword arguments:
retry_total
- Maximum retry attempts.retry_backoff_max
- Maximum retry wait time in seconds.retry_fixed_interval
- Fixed retry interval in milliseconds.retry_read
- Maximum number of socket read retry attempts.retry_connect
- Maximum number of connection error retry attempts.retry_status
- Maximum number of retry attempts on error status codes.retry_on_status_codes
- A list of specific status codes to retry on.retry_backoff_factor
- Factor to calculate wait time between retry attempts.
4.0.0b3
- Added
create_database_if_not_exists()
and create_container_if_not_exists
functionalities to CosmosClient and Database respectively.
4.0.0b2
Version 4.0.0b2 is the second iteration in our efforts to build a more Pythonic client library.
Breaking changes
- The client connection has been adapted to consume the HTTP pipeline defined in
azure.core.pipeline
. - Interactive objects have now been renamed as proxies. This includes:
Database
-> DatabaseProxy
User
-> UserProxy
Container
-> ContainerProxy
Scripts
-> ScriptsProxy
- The constructor of
CosmosClient
has been updated:
- The
auth
parameter has been renamed to credential
and will now take an authentication type directly. This means the master key value, a dictionary of resource tokens, or a list of permissions can be passed in. However the old dictionary format is still supported. - The
connection_policy
parameter has been made a keyword only parameter, and while it is still supported, each of the individual attributes of the policy can now be passed in as explicit keyword arguments:
request_timeout
media_request_timeout
connection_mode
media_read_mode
proxy_config
enable_endpoint_discovery
preferred_locations
multiple_write_locations
- A new classmethod constructor has been added to
CosmosClient
to enable creation via a connection string retrieved from the Azure portal. - Some
read_all
operations have been renamed to list
operations:
CosmosClient.read_all_databases
-> CosmosClient.list_databases
Container.read_all_conflicts
-> ContainerProxy.list_conflicts
Database.read_all_containers
-> DatabaseProxy.list_containers
Database.read_all_users
-> DatabaseProxy.list_users
User.read_all_permissions
-> UserProxy.list_permissions
- All operations that take
request_options
or feed_options
parameters, these have been moved to keyword only parameters. In addition, while these options dictionaries are still supported, each of the individual options within the dictionary are now supported as explicit keyword arguments. - The error hierarchy is now inherited from
azure.core.AzureError
instead of CosmosError
which has been removed.
HTTPFailure
has been renamed to CosmosHttpResponseError
JSONParseFailure
has been removed and replaced by azure.core.DecodeError
- Added additional errors for specific response codes:
CosmosResourceNotFoundError
for status 404CosmosResourceExistsError
for status 409CosmosAccessConditionFailedError
for status 412
CosmosClient
can now be run in a context manager to handle closing the client connection.- Iterable responses (e.g. query responses and list responses) are now of type
azure.core.paging.ItemPaged
. The method fetch_next_block
has been replaced by a secondary iterator, accessed by the by_page
method.
4.0.0b1
Version 4.0.0b1 is the first preview of our efforts to create a user-friendly and Pythonic client library for Azure Cosmos. For more information about this, and preview releases of other Azure SDK libraries, please visit https://aka.ms/azure-sdk-preview1-python.
Breaking changes: New API design
- Operations are now scoped to a particular client:
CosmosClient
: This client handles account-level operations. This includes managing service properties and listing the databases within an account.Database
: This client handles database-level operations. This includes creating and deleting containers, users and stored procedures. It can be accessed from a CosmosClient
instance by name.Container
: This client handles operations for a particular container. This includes querying and inserting items and managing properties.User
: This client handles operations for a particular user. This includes adding and deleting permissions and managing user properties.
These clients can be accessed by navigating down the client hierarchy using the get_<child>_client
method. For full details on the new API, please see the reference documentation. - Clients are accessed by name rather than by Id. No need to concatenate strings to create links.
- No more need to import types and methods from individual modules. The public API surface area is available directly in the
azure.cosmos
package. - Individual request properties can be provided as keyword arguments rather than constructing a separate
RequestOptions
instance.
3.0.2
- Added Support for MultiPolygon Datatype
- Bug Fix in Session Read Retry Policy
- Bug Fix for Incorrect padding issues while decoding base 64 strings
3.0.1
- Bug fix in LocationCache
- Bug fix endpoint retry logic
- Fixed documentation
3.0.0
- Multi-region write support added
- Naming changes
- DocumentClient to CosmosClient
- Collection to Container
- Document to Item
- Package name updated to "azure-cosmos"
- Namespace updated to "azure.cosmos"
2.3.3
- Added support for proxy
- Added support for reading change feed
- Added support for collection quota headers
- Bugfix for large session tokens issue
- Bugfix for ReadMedia API
- Bugfix in partition key range cache
2.3.2
- Added support for default retries on connection issues.
2.3.1
- Updated documentation to reference Azure Cosmos DB instead of Azure DocumentDB.
2.3.0
2.2.1
- bugfix for aggregate dict
- bugfix for trimming slashes in the resource link
- tests for unicode encoding
2.2.0
- Added support for Request Unit per Minute (RU/m) feature.
- Added support for a new consistency level called ConsistentPrefix.
2.1.0
- Added support for aggregation queries (COUNT, MIN, MAX, SUM, and AVG).
- Added an option for disabling SSL verification when running against DocumentDB Emulator.
- Removed the restriction of dependent requests module to be exactly 2.10.0.
- Lowered minimum throughput on partitioned collections from 10,100 RU/s to 2500 RU/s.
- Added support for enabling script logging during stored procedure execution.
- REST API version bumped to '2017-01-19' with this release.
2.0.1
- Made editorial changes to documentation comments.
2.0.0
- Added support for Python 3.5.
- Added support for connection pooling using the requests module.
- Added support for session consistency.
- Added support for TOP/ORDERBY queries for partitioned collections.
1.9.0
-
Added retry policy support for throttled requests. (Throttled requests receive a request rate too large exception, error code 429.)
By default, DocumentDB retries nine times for each request when error code 429 is encountered, honoring the retryAfter time in the response header.
A fixed retry interval time can now be set as part of the RetryOptions property on the ConnectionPolicy object if you want to ignore the retryAfter time returned by server between the retries.
DocumentDB now waits for a maximum of 30 seconds for each request that is being throttled (irrespective of retry count) and returns the response with error code 429.
This time can also be overridden in the RetryOptions property on ConnectionPolicy object.
-
DocumentDB now returns x-ms-throttle-retry-count and x-ms-throttle-retry-wait-time-ms as the response headers in every request to denote the throttle retry count
and the cumulative time the request waited between the retries.
-
Removed the RetryPolicy class and the corresponding property (retry_policy) exposed on the document_client class and instead introduced a RetryOptions class
exposing the RetryOptions property on ConnectionPolicy class that can be used to override some of the default retry options.
1.8.0
- Added the support for geo-replicated database accounts.
- Test fixes to move the global host and masterKey into the individual test classes.
1.7.0
- Added the support for Time To Live(TTL) feature for documents.
1.6.1
- Bug fixes related to server side partitioning to allow special characters in partitionkey path.
1.6.0
- Added the support for server side partitioned collections feature.
1.5.0
- Added Client-side sharding framework to the SDK. Implemented HashPartionResolver and RangePartitionResolver classes.
1.4.2
- Implement Upsert. New UpsertXXX methods added to support Upsert feature.
- Implement ID Based Routing. No public API changes, all changes internal.
1.3.0
- Release skipped to bring version number in alignment with other SDKs
1.2.0
- Supports GeoSpatial index.
- Validates id property for all resources. Ids for resources cannot contain ?, /, #, \, characters or end with a space.
- Adds new header "index transformation progress" to ResourceResponse.
1.1.0
- Implements V2 indexing policy
1.0.1
- Supports proxy connection