Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
azure-ai-translation-document
Advanced tools
Microsoft Azure Ai Translation Document Client Library for Python
Azure Cognitive Services Document Translation is a cloud service that can be used to translate multiple and complex documents across languages and dialects while preserving original document structure and data format. Use the client library for Document Translation to:
Source code | Package (PyPI) | Package (Conda) | API reference documentation | Product documentation | Samples
Azure SDK Python packages support for Python 2.7 has ended 01 January 2022. For more information and questions, please refer to https://github.com/Azure/azure-sdk-for-python/issues/20691
Install the Azure Document Translation client library for Python with pip:
pip install --pre azure-ai-translation-document
Note: This version of the client library defaults to the v2024-05-01 version of the service
The Document Translation feature supports single-service access only. To access the service, create a Translator resource.
You can create the resource using
Option 1: Azure Portal
Option 2: Azure CLI. Below is an example of how you can create a Translator resource using the CLI:
# Create a new resource group to hold the Translator resource -
# if using an existing resource group, skip this step
az group create --name my-resource-group --location westus2
# Create document translation
az cognitiveservices account create \
--name document-translation-resource \
--custom-domain document-translation-resource \
--resource-group my-resource-group \
--kind TextTranslation \
--sku S1 \
--location westus2 \
--yes
In order to interact with the Document Translation feature service, you will need to create an instance of a client. An endpoint and credential are necessary to instantiate the client object.
You can find the endpoint for your Translator resource using the Azure Portal.
Note that the service requires a custom domain endpoint. Follow the instructions in the above link to format your endpoint: https://{NAME-OF-YOUR-RESOURCE}.cognitiveservices.azure.com/
The API key can be found in the Azure Portal or by running the following Azure CLI command:
az cognitiveservices account keys list --name "resource-name" --resource-group "resource-group-name"
To use an API key as the credential
parameter,
pass the key as a string into an instance of AzureKeyCredential.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]
document_translation_client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
AzureKeyCredential
authentication is used in the examples in this getting started guide, but you can also
authenticate with Azure Active Directory using the azure-identity library.
To use the DefaultAzureCredential type shown below, or other credential types provided
with the Azure SDK, please install the azure-identity
package:
pip install azure-identity
You will also need to register a new AAD application and grant access to your
Translator resource by assigning the "Cognitive Services User"
role to your service principal.
Once completed, set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables:
AZURE_CLIENT_ID
, AZURE_TENANT_ID
, AZURE_CLIENT_SECRET
.
"""DefaultAzureCredential will use the values from these environment
variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
"""
from azure.identity import DefaultAzureCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
credential = DefaultAzureCredential()
document_translation_client = DocumentTranslationClient(endpoint, credential)
The Document Translation service requires that you upload your files to an Azure Blob Storage source container and provide a target container where the translated documents can be written. Additional information about setting this up can be found in the service documentation:
Interaction with the Document Translation client library begins with an instance of the DocumentTranslationClient
.
The client provides operations for:
Input to the begin_translation
client method can be provided in two different ways:
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
document_translation_client = DocumentTranslationClient("<endpoint>", AzureKeyCredential("<api_key>"))
poller = document_translation_client.begin_translation("<sas_url_to_source>", "<sas_url_to_target>", "<target_language>")
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient, DocumentTranslationInput, TranslationTarget
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]
source_container_url_1 = os.environ["AZURE_SOURCE_CONTAINER_URL_1"]
source_container_url_2 = os.environ["AZURE_SOURCE_CONTAINER_URL_2"]
target_container_url_fr = os.environ["AZURE_TARGET_CONTAINER_URL_FR"]
target_container_url_ar = os.environ["AZURE_TARGET_CONTAINER_URL_AR"]
target_container_url_es = os.environ["AZURE_TARGET_CONTAINER_URL_ES"]
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
poller = client.begin_translation(
inputs=[
DocumentTranslationInput(
source_url=source_container_url_1,
targets=[
TranslationTarget(target_url=target_container_url_fr, language="fr"),
TranslationTarget(target_url=target_container_url_ar, language="ar"),
],
),
DocumentTranslationInput(
source_url=source_container_url_2,
targets=[TranslationTarget(target_url=target_container_url_es, language="es")],
),
]
)
result = poller.result()
print(f"Status: {poller.status()}")
print(f"Created on: {poller.details.created_on}")
print(f"Last updated on: {poller.details.last_updated_on}")
print(f"Total number of translations on documents: {poller.details.documents_total_count}")
print("\nOf total documents...")
print(f"{poller.details.documents_failed_count} failed")
print(f"{poller.details.documents_succeeded_count} succeeded")
for document in result:
print(f"Document ID: {document.id}")
print(f"Document status: {document.status}")
if document.status == "Succeeded":
print(f"Source document location: {document.source_document_url}")
print(f"Translated document location: {document.translated_document_url}")
print(f"Translated to language: {document.translated_to}\n")
elif document.error:
print(f"Error Code: {document.error.code}, Message: {document.error.message}\n")
Note: the target_url for each target language must be unique.
To translate documents under a folder, or only translate certain documents, see sample_begin_translation_with_filters.py. See the service documentation for all supported languages.
Long-running operations are operations which consist of an initial request sent to the service to start an operation, followed by polling the service at intervals to determine whether the operation has completed or failed, and if it has succeeded, to get the result.
Methods that translate documents are modeled as long-running operations.
The client exposes a begin_<method-name>
method that returns a DocumentTranslationLROPoller
or AsyncDocumentTranslationLROPoller
. Callers should wait
for the operation to complete by calling result()
on the poller object returned from the begin_<method-name>
method.
Sample code snippets are provided to illustrate using long-running operations below.
The following section provides several code snippets covering some of the most common Document Translation tasks, including:
Translate all the documents in your source container to the target container. To translate documents under a folder, or only translate certain documents, see sample_begin_translation_with_filters.py.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
source_container_sas_url_en = "<sas-url-en>"
target_container_sas_url_es = "<sas-url-es>"
document_translation_client = DocumentTranslationClient(endpoint, credential)
poller = document_translation_client.begin_translation(source_container_sas_url_en, target_container_sas_url_es, "es")
result = poller.result()
print(f"Status: {poller.status()}")
print(f"Created on: {poller.details.created_on}")
print(f"Last updated on: {poller.details.last_updated_on}")
print(f"Total number of translations on documents: {poller.details.documents_total_count}")
print("\nOf total documents...")
print(f"{poller.details.documents_failed_count} failed")
print(f"{poller.details.documents_succeeded_count} succeeded")
for document in result:
print(f"Document ID: {document.id}")
print(f"Document status: {document.status}")
if document.status == "Succeeded":
print(f"Source document location: {document.source_document_url}")
print(f"Translated document location: {document.translated_document_url}")
print(f"Translated to language: {document.translated_to}\n")
else:
print(f"Error Code: {document.error.code}, Message: {document.error.message}\n")
Begin translating with documents in multiple source containers to multiple target containers in different languages.
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient, DocumentTranslationInput, TranslationTarget
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]
source_container_url_1 = os.environ["AZURE_SOURCE_CONTAINER_URL_1"]
source_container_url_2 = os.environ["AZURE_SOURCE_CONTAINER_URL_2"]
target_container_url_fr = os.environ["AZURE_TARGET_CONTAINER_URL_FR"]
target_container_url_ar = os.environ["AZURE_TARGET_CONTAINER_URL_AR"]
target_container_url_es = os.environ["AZURE_TARGET_CONTAINER_URL_ES"]
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
poller = client.begin_translation(
inputs=[
DocumentTranslationInput(
source_url=source_container_url_1,
targets=[
TranslationTarget(target_url=target_container_url_fr, language="fr"),
TranslationTarget(target_url=target_container_url_ar, language="ar"),
],
),
DocumentTranslationInput(
source_url=source_container_url_2,
targets=[TranslationTarget(target_url=target_container_url_es, language="es")],
),
]
)
result = poller.result()
print(f"Status: {poller.status()}")
print(f"Created on: {poller.details.created_on}")
print(f"Last updated on: {poller.details.last_updated_on}")
print(f"Total number of translations on documents: {poller.details.documents_total_count}")
print("\nOf total documents...")
print(f"{poller.details.documents_failed_count} failed")
print(f"{poller.details.documents_succeeded_count} succeeded")
for document in result:
print(f"Document ID: {document.id}")
print(f"Document status: {document.status}")
if document.status == "Succeeded":
print(f"Source document location: {document.source_document_url}")
print(f"Translated document location: {document.translated_document_url}")
print(f"Translated to language: {document.translated_to}\n")
elif document.error:
print(f"Error Code: {document.error.code}, Message: {document.error.message}\n")
Enumerate over the translation operations submitted for the resource.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
operations = client.list_translation_statuses()
for operation in operations:
print(f"ID: {operation.id}")
print(f"Status: {operation.status}")
print(f"Created on: {operation.created_on}")
print(f"Last updated on: {operation.last_updated_on}")
print(f"Total number of operations on documents: {operation.documents_total_count}")
print(f"Total number of characters charged: {operation.total_characters_charged}")
print("\nOf total documents...")
print(f"{operation.documents_failed_count} failed")
print(f"{operation.documents_succeeded_count} succeeded")
print(f"{operation.documents_canceled_count} canceled\n")
To see how to use the Document Translation client library with Azure Storage Blob to upload documents, create SAS tokens for your containers, and download the finished translated documents, see this sample. Note that you will need to install the azure-storage-blob library to run this sample.
The following section provides some insights for some advanced translation features such as glossaries and custom translation models.
Glossaries are domain-specific dictionaries. For example, if you want to translate some medical-related documents, you may need support for the many words, terminology, and idioms in the medical field which you can't find in the standard translation dictionary, or you simply need specific translation. This is why Document Translation provides support for glossaries.
Document Translation supports glossaries in the following formats:
File Type | Extension | Description | Samples |
---|---|---|---|
Tab-Separated Values/TAB | .tsv, .tab | Read more on wikipedia | glossary_sample.tsv |
Comma-Separated Values | .csv | Read more on wikipedia | glossary_sample.csv |
Localization Interchange File Format | .xlf, .xliff | Read more on wikipedia | glossary_sample.xlf |
View all supported formats here.
In order to use glossaries with Document Translation, you first need to upload your glossary file to a blob container, and then provide the SAS URL to the file as in the code samples sample_translation_with_glossaries.py.
Instead of using Document Translation's engine for translation, you can use your own custom Azure machine/deep learning model.
For more info on how to create, provision, and deploy your own custom Azure translation model, please follow the instructions here: Build, deploy, and use a custom model for translation
In order to use a custom translation model with Document Translation, you first need to create and deploy your model, then follow the code sample sample_translation_with_custom_model.py to use with Document Translation.
Document Translation client library will raise exceptions defined in Azure Core.
This library uses the standard logging library for logging.
Basic information about HTTP sessions (URLs, headers, etc.) is logged at INFO
level.
Detailed DEBUG
level logging, including request/response bodies and unredacted
headers, can be enabled on the client or per-operation with the logging_enable
keyword argument.
See full SDK logging documentation with examples here.
Optional keyword arguments can be passed in at the client and per-operation level. The azure-core reference documentation describes available configurations for retries, logging, transport protocols, and more.
The following section provides several code snippets illustrating common patterns used in the Document Translation Python client library. More samples can be found under the samples directory.
These code samples show common scenario operations with the Azure Document Translation client library.
This library also includes a complete set of async APIs. To use them, you must
first install an async transport, such as aiohttp. Async clients
are found under the azure.ai.translation.document.aio
namespace.
For more extensive documentation on Azure Cognitive Services Document Translation, see the Document Translation documentation on docs.microsoft.com.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
FAQs
Microsoft Azure Ai Translation Document Client Library for Python
We found that azure-ai-translation-document demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.