Security News
PyPI’s New Archival Feature Closes a Major Security Gap
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
This package includes data reader for DS to access data in a easy way.
Covered data platforms:
May cover in the future:
Besides, the package also include functions from Pipelines
class to help move data around:
The module is test and usable for Python 3.10 and Python 3.9. Other versions(Python 3.6+) should also works.
Use pip to install the package and all of the dependences
pip install -U azdsdr
The -U
will help update your old version to the newest
Or, you can clone the repository and copy over the readers.py
file to your project folder.
The installation will also install all the dependance packages automatrically.
If you are working on a new build OS, the all-in-one installation will also save you time from installing individual packages one by one.
Most of the time, all dependent packages should be successfully installed without any additional interfere. But you may still see error message based on different OS and Python version.
Need elevated permission
Error: Could not install packages due to an OSError: [Erron 13] Permission denied:...
Fail to install pyodbc
Usually occurs in Linux and MacOS.
Error message
Building wheel for pyodbc (setup.py) ... error
Solution
Linux: run this first
sudo apt-get install unixodbc-dev
https://github.com/mkleehammer/pyodbc/issues/276
Macos: run this first
brew install unixodbc
export LDFLAGS="-L/opt/homebrew/Cellar/unixodbc/2.3.9/lib"
export CPPFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.9/include"
Before running the kusto query, please use
az login
To login into Azure using AAD authentication. An authentication refresh token is generated by Azure and stored in your local machine. This token will be revoked after 90 days of inactivity.
For More details, read Sign in with Azure CLI.
After successufuly authenticated with AAD, you should be able to run the following code without any pop up auth request. The Kusto Reader is test in Windows 10, also works in Linux and Mac.
from azdsdr.readers import KustoReader
cluster = "https://help.kusto.windows.net"
db = "Samples"
kr = KustoReader(cluster=cluster,db=db)
kql = "StormEvents | take 10"
r = kr.run_kql(kql)
The function run_kql
will return a Pandas Dataframe object hold by r
. The kr
object will be reused in the following samples.
Use run_kql_all
to output multiple result set.
kql = '''
StormEvents
| take 10
;
StormEvents
| summarize count()
'''
rs = kr.run_kql_all(kql=kql)
for r in rs:
display(r)
List all tables:
kr.list_tables()
List tables with folder keyword:
kr.list_tables(folder_name='Covid19')
This function can be used before uploading CSV data to Kusto table. Instead of manually creating a Kusto table from CSV schema, use this function to create a empty Kusto table based on CSV file automatically.
Besides, you can also specify the table's folder name.
kusto_table_name = 'target_kusto_table'
folder_name = 'target_kusto_folder'
csv_file_name = 'local_csv_path'
kr.create_table_from_csv (
kusto_table_name = kusto_table_name
,csv_file_path = csv_file_name
,kusto_folder = folder_name
)
Before uploading your data to Kusto, please make sure you have the right table created to hold the data. Ideally, you can use the above create_table_from_csv
to create an empty table for you.
To enable the data ingestion(upload), you should also initialize the KustoReader object with an additional ingest_cluster_str
parameter. Here is a sample, you should ask your admin or doc to find out the ingestion cluster url.
cluster = "https://help.kusto.windows.net"
ingest_cluster = "https://help-ingest.kusto.windows.net"
db = "Samples"
kr = KustoReader(cluster=cluster,db=db,ingest_cluster_str=ingest_cluster)
Note that you will need to create a empty table with aligned table schema to hold the data.
You can also save the dataframe object your_df_data
as CSV file first, and create a empty table from the csv file.
your_df_data.to_csv('temp.csv',index=False)
target_kusto_table = 'upload_df_to_kusto_test'
kr.create_table_from_csv(
kusto_table_name = target_kusto_table
,kusto_folder = 'test'
,csv_file_path = 'temp.csv'
)
print('create empty table done')
Then upload Pandas Dataframe to Kusto:
target_kusto_table = 'kusto_table_name'
df_data = your_df_data
kr.upload_df_to_kusto(
target_table_name = target_kusto_table
,df_data = df_data
)
kr.check_table_data(target_table_name=target_kusto_table)
Upload CSV file to Kusto:
target_kusto_table = 'kusto_table_name'
csv_path = 'csv_file.csv'
kr.upload_csv_to_kusto(
target_table_name = target_kusto_table
,csv_path = csv_path
)
Upload Azure Blob CSV file to Kusto, this is the best and fast way to upload massive csv data to Kusto table.
target_kusto_table = 'kusto_table_name'
blob_sas_url = 'the sas url you generate from Azure portal or Azure Storage Explorer, or azdsdr'
kr.upload_csv_from_blob (
target_table_name = kusto_table_name
,blob_sas_url = blob_sas_url
)
I will cover how to generate blob_sas_url
in the Azure Blob Reader section. [TODO]
You will need to install the Dremio ODBC driver first to use DremioReader
from this package.
For Windows user
Please download the dremio-connector file from the drivers folder.
Note: you will have to log out your Windows account and log in again to take the new env variable take effort.
Data Source Name
as Dremio Connector.For Linux and Mac User
You can download the driver from Dremio's ODBC Driver page. It should be working in theory, haven't been test yet.
from azdsdr.readers import DremioReader
import os
username = "name@host.com"
#token = "token string"
token = os.environ.get("DREMIO_TOKEN")
dr = DremioReader(username=username,token=token)
sql = '''
select
*
from
[workspace].[folder].[tablename]
limit 10
'''
r = dr.run_sql(sql)
Pipelines
class[TODO]
When the export data is very large like exceed 1 billion rows, kusto will export data to several csv files. this function will automatically combine the data to one single CSV file in destination folder.
[TODO]
display_all
Display all dataframe rowsThe IPython's display
can display only limited rows of data. This tool can display all or specified rows of data.
from azdsdr.tools import pd_tools
display_all = pd_tools().display_all
#...prepare pd data
# display all
display_all(pd_data)
# display top 20 rows
display_all(pd_data,top=20)
The Dremio ODBC Reader solution is origin from KC Munnings. Glory and credits belong to KC.
bar1_chart
in vis_tools
, so that you can plot bar chart using vis_tools
class.Grid
and XY Axes Lable
option for 1 line and 2 lines chart.show_data_label
option for vis_tools
's line1_chart
function.
If specify the show_data_label=True
, the chart will show each data point's value.get_table_schema
for KustoReader
get_table_folder
for KustoReader
download_file_list
of class AzureBlobReader
to download a list of CSV file with the same schema and merge to one target CSV file.delete_blob_files
of class AzureBlobReader
to delete a list of blob files.FAQs
This package provide functions and tools for accessing data in a easy way.
We found that azdsdr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
Research
Security News
Malicious npm package postcss-optimizer delivers BeaverTail malware, targeting developer systems; similarities to past campaigns suggest a North Korean connection.
Security News
CISA's KEV data is now on GitHub, offering easier access, API integration, commit history tracking, and automated updates for security teams and researchers.