Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.
For community support, please visit the Teradata Community.
For Teradata customer support, please visit Teradata Support.
Copyright 2024, Teradata. All Rights Reserved.
teradataml no longer supports setting the auth_token
using set_config_params()
. Users should use set_auth_token()
to set the token.
alias()
- Creates a DataFrame with alias name.db_object_name
- Get the underlying database object name, on which DataFrame is created.alias()
- Creates a GeoDataFrame with alias name.DataFrameColumn.isnan()
- Function evaluates expression to determine if the floating-point
argument is a NaN (Not-a-Number) value.DataFrameColumn.isinf()
- Function evaluates expression to determine if the floating-point
argument is an infinite number.DataFrameColumn.isfinite()
- Function evaluates expression to determine if it is a finite
floating value.apply()
- Adds Feature, Entity, DataSource to a FeatureGroup.from_DataFrame()
- Creates a FeatureGroup from teradataml DataFrame.from_query()
- Creates a FeatureGroup using a SQL query.remove()
- Removes Feature, Entity, or DataSource from a FeatureGroup.reset_labels()
- Removes the labels assigned to the FeatureGroup, that are set using set_labels()
.set_labels()
- Sets the Features as labels for a FeatureGroup.features
- Get the features of a FeatureGroup.labels
- Get the labels of FeatureGroup.apply()
- Adds Feature, Entity, DataSource, FeatureGroup to FeatureStore.archive_data_source()
- Archives a specified DataSource from a FeatureStore.archive_entity()
- Archives a specified Entity from a FeatureStore.archive_feature()
- Archives a specified Feature from a FeatureStore.archive_feature_group()
- Archives a specified FeatureGroup from a FeatureStore. Method archives underlying Feature, Entity, DataSource also.delete_data_source()
- Deletes an archived DataSource.delete_entity()
- Deletes an archived Entity.delete_feature()
- Deletes an archived Feature.delete_feature_group()
- Deletes an archived FeatureGroup.get_data_source()
- Get the DataSources associated with FeatureStore.get_dataset()
- Get the teradataml DataFrame based on Features, Entities and DataSource from FeatureGroup.get_entity()
- Get the Entity associated with FeatureStore.get_feature()
- Get the Feature associated with FeatureStore.get_feature_group()
- Get the FeatureGroup associated with FeatureStore.list_data_sources()
- List DataSources.list_entities()
- List Entities.list_feature_groups()
- List FeatureGroups.list_features()
- List Features.list_repos()
- List available repos which are configured for FeatureStore.repair()
- Repairs the underlying FeatureStore schema on database.set_features_active()
- Marks the Features as active.set_features_inactive()
- Marks the Features as inactive.setup()
- Setup the FeatureStore for a repo.repo
- Property for FeatureStore repo.grant
- Property to Grant access on FeatureStore to user.revoke
- Property to Revoke access on FeatureStore from user.Image2Matrix()
- Converts an image into a matrix.New Analytics Database Analytic Functions:
CFilter()
NaiveBayes()
TDNaiveBayesPredict()
Shap()
SMOTE()
CopyArt()
list_files()
- List the installed files in Database.OpensourceML
(OpenML
) feature.
The following functionality is added in the current release:
td_lightgbm
- Interface object to run lightgbm functions and classes through Teradata Vantage.
Example usage below:
from teradataml import td_lightgbm, DataFrame
df_train = DataFrame("multi_model_classification")
feature_columns = ["col1", "col2", "col3", "col4"]
label_columns = ["label"]
part_columns = ["partition_column_1", "partition_column_2"]
df_x = df_train.select(feature_columns)
df_y = df_train.select(label_columns)
# Dataset creation.
# Single model case.
obj_s = td_lightgbm.Dataset(df_x, df_y, silent=True, free_raw_data=False)
# Multi model case.
obj_m = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=part_columns)
obj_m_v = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=part_columns)
## Model training.
# Single model case.
opt = td_lightgbm.train(params={}, train_set = obj_s, num_boost_round=30)
opt.predict(data=df_x, num_iteration=20, pred_contrib=True)
# Multi model case.
opt = td_lightgbm.train(params={}, train_set = obj_m, num_boost_round=30,
callbacks=[td_lightgbm.record_evaluation(rec)],
valid_sets=[obj_m_v, obj_m_v])
# Passing `label` argument to get it returned in output DataFrame.
opt.predict(data=df_x, label=df_y, num_iteration=20)
td_lightgbm
.Refer Teradata Python Package User Guide for more details of this feature, arguments, usage, examples and supportability in Vantage.
register()
- Registers a user defined function (UDF).call_udf()
- Calls a registered user defined function (UDF) and returns ColumnExpression.list_udfs()
- List all the UDFs registered using 'register()' function.deregister()
- Deregisters a user defined function (UDF).table_operator
- Specifies the name of table operator.set_auth_token()
- Added base_url
parameter which accepts the CCP url.
'ues_url' will be deprecated in future and users
will need to specify 'base_url' instead.join()
on
argument.on
argument.other
argument.join()
on
argument.on
argument.other
argument.SAX()
- Default value added for window_size
and output_frequency
.DickeyFuller()
max_lags
.drift_trend_formula
.algorithm
.AutoML
, AutoRegressor
and AutoClassifier
TextParser()
covert_to_lowercase
changed to convert_to_lowercase
.db_list_tables()
now returns correct results when '%' is used.teradataml will no longer be supported with SQLAlchemy < 2.0.
teradataml no longer shows the warnings from Vantage by default.
display.suppress_vantage_runtime_warnings
to False
to display warnings.TFIDF()
Pivoting()
UnPivoting()
AutoArima()
DWT()
DWT2D()
FilterFactory1d()
IDWT()
IDWT2D()
IQR()
Matrix2Image()
SAX()
WindowDFFT()
udf()
- Creates a user defined function (UDF) and returns ColumnExpression.set_session_param()
is added to set the database session parameters.unset_session_param()
is added to unset database session parameters.materialize()
- Persists DataFrame into database for current session.create_temp_view()
- Creates a temporary view for session on the DataFrame.DataFrameColumn.to_timestamp()
- Converts string or integer value to a TIMESTAMP data type or TIMESTAMP WITH TIME ZONE data type.DataFrameColumn.extract()
- Extracts date component to a numeric value.DataFrameColumn.to_interval()
- Converts a numeric value or string value into an INTERVAL_DAY_TO_SECOND or INTERVAL_YEAR_TO_MONTH value.DataFrameColumn.parse_url()
- Extracts a part from a URL.DataFrameColumn.log
- Returns the logarithm value of the column with respect to 'base'.AutoML()
, AutoRegressor()
and AutoClassifier()
:
evaluate()
- Performs evaluation on the data using the best model or the model of users choice
from the leaderboard.load()
: Loads the saved model from database.deploy()
: Saves the trained model inside database.remove_saved_model()
: Removes the saved model in database.model_hyperparameters()
: Returns the hyperparameter of fitted or loaded models.AutoML()
, AutoRegressor()
AutoML()
, AutoRegressor()
and AutoClassifier
volatile
, persist
.predict()
- Data input is now mandatory for generating predictions. Default model
evaluation is now removed.DataFrameColumn.cast()
: Accepts 2 new arguments format
and timezone
.
DataFrame.assign()
: Accepts ColumnExpressions returned by udf()
.
set_config_params()
ues_url
auth_token
to_pandas()
- Function returns the pandas dataframe with Decimal columns types as float instead of object.
If user want datatype to be object, set argument coerce_float
to False.list_td_reserved_keywords()
- Accepts a list of strings as argument.ACF()
- round_results
parameter removed as it was used for internal testing.BreuschGodfrey()
- Added default_value 0.05 for parameter significance_level
.GoldfeldQuandt()
-
weights
and formula
.
Replaced parameter orig_regr_paramcnt
with const_term
.
Changed description for parameter algorithm
. Please refer document for more details.HoltWintersForecaster()
- Default value of parameter seasonal_periods
removed.IDFFT2()
- Removed parameter output_fmt_row_major
as it is used for internal testing.Resample()
- Added parameter output_fmt_index_style
.predict()
function can now predict on test data which does not contain target column.sklearn.ensemble
:
apply()
apply()
apply()
apply()
sklearn.impute
:
transform()
, fit_transform()
, inverse_transform()
transform()
, fit_transform()
sklearn.kernel_approximations
:
transform()
, fit_transform()
transform()
, fit_transform()
transform()
, fit_transform()
sklearn.neighbors
:
transform()
, fit_transform()
transform()
, fit_transform()
sklearn.preprocessing
:
transform()
transform()
, inverse_transform()
sklearn.feature_selection
:
transform()
, fit_transform()
, inverse_transform()
transform()
, fit_transform()
, inverse_transform()
transform()
, fit_transform()
, inverse_transform()
transform()
, fit_transform()
, inverse_transform()
transform()
, fit_transform()
, inverse_transform()
sklearn.clustering
:
transform()
, fit_transform()
score()
, predict()
etc on top
of the returned objects.predict()
function now generates correct ROC-AUC value for positive class.deploy()
method of Script
and Apply
classes retries model deployment if there is any
intermittent network issues.teradataml no longer supports Python versions less than 3.8.
set_auth_token()
- teradataml now supports authentication via PAT in addition to
OAuth 2.0 Device Authorization Grant (formerly known as the Device Flow).
username
and expiration_time
in seconds.ANOVA()
group_name_column
, group_value_name
, group_names
, num_groups
for data containing group values and group names.FTest()
sample_name_column
, sample_name_value
, first_sample_name
, second_sample_name
.GLM()
stepwise_direction
, max_steps_num
and initial_stepwise_columns
.attribute_data
, parameter_data
, iteration_mode
and partition_column
.GetFutileColumns()
category_summary_column
and threshold_value
are now optional.KMeans()
initialcentroids_method
.NonLinearCombineFit()
result_column
is now optional.ROC()
positive_class
is now optional.SVMPredict()
model_type
.ScaleFit()
ignoreinvalid_locationscale
, unused_attributes
, attribute_name_column
, attribute_value_column
.attribute_name_column
, attribute_value_column
and target_attributes
are supported for sparse input.attribute_data
, parameter_data
and partition_column
are supported for partitioning.ScaleTransform()
attribute_name_column
and attribute_value_column
support for sparse input.TDGLMPredict()
family
and partition_column
.XGBoost()
base_score
is added for initial prediction value for all data points.XGBoostPredict()
detailed
is added for detailed information of each prediction.ZTest()
sample_name_column
, sample_value_column
, first_sample_name
and second_sample_name
.AutoML()
, AutoRegressor()
and AutoClassifier()
max_models
is added as an early stopping criterion to limit the maximum number of models to be trained.DataFrame.agg()
fastload()
- Improved error and warning table handling with below-mentioned new arguments.
err_staging_db
err_tbl_name
warn_tbl_name
err_tbl_1_suffix
err_tbl_2_suffix
fastload()
- Change in behaviour of save_errors
argument.
When save_errors
is set to True
, error information will be available in two persistent tables ERR_1
and ERR_2
.
When save_errors
is set to False
, error information will be available in single pandas dataframe.Apply
's deploy()
.ColumnTransformer
function now processes its arguments in the order they are passed.OpenML
dynamically exposes opensource packages through Teradata Vantage. OpenML
provides an
interface object through which exposed classes and functions of opensource packages can be accessed
with the same syntax and arguments.
The following functionality is added in the current release:
td_sklearn
- Interface object to run scikit-learn functions and classes through Teradata Vantage.
Example usage below:
from teradataml import td_sklearn, DataFrame
df_train = DataFrame("multi_model_classification")
feature_columns = ["col1", "col2", "col3", "col4"]
label_columns = ["label"]
part_columns = ["partition_column_1", "partition_column_2"]
linear_svc = td_sklearn.LinearSVC()
OpenML
is supported in both Teradata Vantage Enterprise and Teradata Vantage Lake.Use of X and y arguments
- Scikit-learn users are familiar with using X
and y
as argument names
which take data as pandas DataFrames, numpy arrays or lists etc. However, in OpenML, we pass
teradataml DataFrames for arguments X
and y
.
df_x = df_train.select(feature_columns)
df_y = df_train.select(label_columns)
linear_svc = linear_svc.fit(X=df_x, y=df_y)
Additional support for data, feature_columns, label_columns and group_columns arguments
-
Apart from traditional arguments, OpenML supports additional arguments - data
,
feature_columns
, label_columns
and group_columns
. These are used as alternatives to X
, y
and groups
.
linear_svc = linear_svc.fit(data=df_train, feature_columns=feature_columns, label_colums=label_columns)
Support for classification and regression metrics
- Metrics functions for classification and
regression in sklearn.metrics
module are supported. Other metrics functions' support will be added
in future releases.Distributed Modeling and partition_columns argument support
- Existing scikit-learn supports
only single model generation. However, OpenML supports both single model use case and distributed
(multi) model use case. For this, user has to additionally pass partition_columns
argument to
existing fit()
, predict()
or any other function to be run. This will generate multiple models
for multiple partitions, using the data in corresponding partition.
df_x_1 = df_train.select(feature_columns + part_columns)
linear_svc = linear_svc.fit(X=df_x_1, y=df_y, partition_columns=part_columns)
Support for load and deploy models
- OpenML provides additional support for saving (deploying) the
trained models. These models can be loaded later to perform operations like prediction, score etc. The
following functions are provided by OpenML:
<obj>.deploy()
- Used to deploy/save the model created and/or trained by OpenML.td_sklearn.deploy()
- Used to deploy/save the model created and/or trained outside teradataml.td_sklearn.load()
- Used to load the saved models.
Refer Teradata Python Package User Guide for more details of this feature, arguments, usage, examples and supportability in both VantageCloud Enterprise and VantageCloud Lake.
AutoML is an approach to automate the process of building, training, and validating machine learning models. It involves automation of various aspects of the machine learning workflow, such as feature exploration, feature engineering, data preparation, model training and evaluation for given dataset. teradataml AutoML feature offers best model identification, model leaderboard generation, parallel execution, early stopping feature, model evaluation, model prediction, live logging, customization on default process.
AutoML
AutoML is a generic algorithm that supports all three tasks, i.e. 'Regression',
'Binary Classification' and 'Multiclass Classification'.
__init__()
- Instantiate an object of AutoML with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoML. Presents diverse models, feature
selection method, and performance metrics.leader()
- Show best performing model and its details such as feature
selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users
choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized
run of AutoML.AutoRegressor
AutoRegressor is a special purpose AutoML feature to run regression specific tasks.
__init__()
- Instantiate an object of AutoRegressor with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoRegressor. Presents diverse models, feature
selection method, and performance metrics.leader()
- Show best performing model and its details such as feature
selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users
choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized
run of AutoRegressor.AutoClassifier
AutoClassifier is a special purpose AutoML feature to run classification specific tasks.
__init__()
- Instantiate an object of AutoClassifier with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoClassifier. Presents diverse models, feature
selection method, and performance metrics.leader()
- Show best performing model and its details such as feature
selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users
choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized
run of AutoClassifier.fillna
- Replace the null values in a column with the value specified.cube()
- Analyzes data by grouping it into multiple dimensions.rollup()
- Analyzes a set of data across a single dimension with more than one level of detail.replace()
- Replaces the values for columns.deploy()
- Function deploys the model, generated after execute_script()
, in database or user
environment in lake. The function is available in both Script and Apply.fillna
- Replaces every occurrence of null value in column with the value specified.DataFrameColumn.week_start()
- Returns the first date or timestamp of the week that begins immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.week_begin()
- It is an alias for DataFrameColumn.week_start()
function.DataFrameColumn.week_end()
- Returns the last date or timestamp of the week that ends immediately after the specified date or timestamp value in a column as a literal.DataFrameColumn.month_start()
- Returns the first date or timestamp of the month that begins immediately before the specified date or timestamp value in a column or as a literal.DataFrameColumn.month_begin()
- It is an alias for DataFrameColumn.month_start()
function.DataFrameColumn.month_end()
- Returns the last date or timestamp of the month that ends immediately after the specified date or timestamp value in a column or as a literal.DataFrameColumn.year_start()
- Returns the first date or timestamp of the year that begins immediately before the specified date or timestamp value in a column or as a literal.DataFrameColumn.year_begin()
- It is an alias for DataFrameColumn.year_start()
function.DataFrameColumn.year_end()
- Returns the last date or timestamp of the year that ends immediately after the specified date or timestamp value in a column or as a literal.DataFrameColumn.quarter_start()
- Returns the first date or timestamp of the quarter that begins immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_begin()
- It is an alias for DataFrameColumn.quarter_start()
function.DataFrameColumn.quarter_end()
- Returns the last date or timestamp of the quarter that ends immediately after the specified date or timestamp value in a column as a literal.DataFrameColumn.last_sunday()
- Returns the date or timestamp of Sunday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_monday()
- Returns the date or timestamp of Monday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_tuesday()
- Returns the date or timestamp of Tuesday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_wednesday()
- Returns the date or timestamp of Wednesday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.last_thursday()
- Returns the date or timestamp of Thursday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.last_friday()
- Returns the date or timestamp of Friday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.last_saturday()
- Returns the date or timestamp of Saturday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_week()
- Returns the number of days from the beginning of the week to the specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_month()
- Returns the number of days from the beginning of the month to the specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_year()
- Returns the number of days from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_calendar()
- Returns the number of days from the beginning of the business calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_month()
- Returns the number of weeks from the beginning of the month to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_quarter()
- Returns the number of weeks from the beginning of the quarter to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_year()
- Returns the number of weeks from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_calendar()
- Returns the number of weeks from the beginning of the calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.month_of_year()
- Returns the number of months from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.month_of_calendar()
- Returns the number of months from the beginning of the calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.month_of_quarter()
- Returns the number of months from the beginning of the quarter to the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_of_year()
- Returns the number of quarters from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_of_calendar()
- Returns the number of quarters from the beginning of the calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.year_of_calendar()
- Returns the year of the specified date or timestamp value in a column as a literal.DataFrameColumn.day_occurrence_of_month()
- Returns the nth occurrence of the weekday in the month for the date to the specified date or timestamp value in a column as a literal.DataFrameColumn.year()
- Returns the integer value for year in the specified date or timestamp value in a column as a literal.DataFrameColumn.month()
- Returns the integer value for month in the specified date or timestamp value in a column as a literal.DataFrameColumn.hour()
- Returns the integer value for hour in the specified timestamp value in a column as a literal.DataFrameColumn.minute()
- Returns the integer value for minute in the specified timestamp value in a column as a literal.DataFrameColumn.second()
- Returns the integer value for seconds in the specified timestamp value in a column as a literal.DataFrameColumn.week()
- Returns the number of weeks from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.next_day()
- Returns the date of the first weekday specified as 'day_value' that is later than the specified date or timestamp value in a column as a literal.DataFrameColumn.months_between()
- Returns the number of months between value in specified date or timestamp value in a column as a literal and date or timestamp value in argument.DataFrameColumn.add_months()
- Adds an integer number of months to specified date or timestamp value in a column as a literal.DataFrameColumn.oadd_months()
- Adds an integer number of months, date or timestamp value in specified date or timestamp value in a column as a literal.DataFrameColumn.to_date()
- Function converts a string-like representation of a DATE or PERIOD type to Date type.DataFrameColumn.concat()
- Function to concatenate the columns with a separator.DataFrameColumn.like()
- Function to match the string pattern. String match is case sensitive.DataFrameColumn.ilike()
- Function to match the string pattern. String match is not case sensitive.DataFrameColumn.substr()
- Returns the substring from a string column.DataFrameColumn.startswith()
- Function to check if the column value starts with the specified value or not.DataFrameColumn.endswith()
- Function to check if the column value ends with the specified value or not.DataFrameColumn.format()
- Function to format the values in column based on formatter.DataFrameColumn.to_char()
- Function converts numeric type or datetype to character type.DataFrameColumn.trim()
- Function trims the string values in the column.DataFrameColumn.cbrt()
- Computes the cube root of values in the column.DataFrameColumn.hex()
- Computes the Hexadecimal from decimal for the values in the column.DataframeColumn.hypot()
- Computes the decimal from Hexadecimal for the values in the column.DataFrameColumn.unhex()
- computes the hypotenuse for the values between two columns.DataFrameColumn.from_byte()
- Encodes a sequence of bits into a sequence of characters.DataFrameColumn.greatest()
- Returns the greatest values from columns.DataFrameColumn.least()
- Returns the least values from columns.DataFrameColumn.replace()
is changed.DataFrameColumn.to_byte()
is changed. It now decodes a sequence of characters in a given encoding into a sequence of bits.DataFrameColumn.trunc()
is changed. It now accepts Date type columns.url_encode
is no longer used in create_context()
and is deprecated.
create_context()
function argument password
as it is without changing special characters.fillna()
in VAL transformation allows to replace NULL values with empty string.setup_sandbox_env()
copy_files_from_container()
cleanup_sandbox_env()
describe_model()
delete_model()
list_models()
publish_model()
retrieve_model()
save_model()
DataFrame.join()
lsuffix
and rsuffix
now add suffixes to new column names for join operation.DataFrame.describe()
columns
is added to generate statistics on only those columns instead of all applicable columns.DataFrame.groupby()
CUBE
and ROLLUP
with additional optional argument option
.DataFrame.column.window()
partition_columns
and order_columns
arguments.DataFrame.column.contains()
allows ColumnExpressions for pattern
argument.DataFrame.window()
partition_columns
and order_columns
arguments.create_env()
:
conda_env
is added to create a conda environment.list_user_envs()
:
conda_env
.columns
argument for FillNa
function is made optional.ColumnExpression.nulls_first()
- Displays NULL values at first.
ColumnExpression.nulls_last()
- Displays NULL values at last.
Bit Byte Manipulation Functions
DataFrameColumn.bit_and()
- Returns the logical AND operation on the bits from
the column and corresponding bits from the argument.DataFrameColumn.bit_get()
- Returns the bit specified by input argument from the column and
returns either 0 or 1 to indicate the value of that bit.DataFrameColumn.bit_or()
- Returns the logical OR operation on the bits from the column and
corresponding bits from the argument.DataFrameColumn.bit_xor()
- Returns the bitwise XOR operation on the binary representation of the
column and corresponding bits from the argument.DataFrameColumn.bitand()
- It is an alias for DataFrameColumn.bit_and()
function.DataFrameColumn.bitnot()
- Returns a bitwise complement on the binary representation of the column.DataFrameColumn.bitor()
- It is an alias for DataFrameColumn.bit_or()
function.DataFrameColumn.bitwise_not()
- It is an alias for DataFrameColumn.bitnot()
function.DataFrameColumn.bitwiseNOT()
- It is an alias for DataFrameColumn.bitnot()
function.DataFrameColumn.bitxor()
- It is an alias for DataFrameColumn.bit_xor()
function.DataFrameColumn.countset()
- Returns the count of the binary bits within the column that are either set to 1
or set to 0, depending on the input argument value.DataFrameColumn.getbit()
- It is an alias for DataFrameColumn.bit_get()
function.DataFrameColumn.rotateleft()
- Returns an expression rotated to the left by the specified number of bits,
with the most significant bits wrapping around to the right.DataFrameColumn.rotateright()
- Returns an expression rotated to the right by the specified number of bits,
with the least significant bits wrapping around to the left.DataFrameColumn.setbit()
- Sets the value of the bit specified by input argument to the value
of column.DataFrameColumn.shiftleft()
- Returns the expression when value in column is shifted by the specified
number of bits to the left.DataFrameColumn.shiftright()
- Returns the expression when column expression is shifted by the specified
number of bits to the right.DataFrameColumn.subbitstr()
- Extracts a bit substring from the column expression based on the specified
bit position.DataFrameColumn.to_byte()
- Converts a numeric data type to the Vantage byte representation
(byte value) of the column expression value.Regular Expression Functions
DataFrameColumn.regexp_instr()
- Searches string value in column for a match to value specified in argument.DataFrameColumn.regexp_replace()
- Replaces the portions of string value in a column that matches the value
specified regex string and replaces with the replace string.DataFrameColumn.regexp_similar()
- Compares value in column to value in argument and returns integer value.DataFrameColumn.regexp_substr()
- Extracts a substring from column that matches a regular expression
specified in the input argument.create_env()
:
template
by providing specifications in template json file. New feature allows user to create complete user environment, including file and library installation, in just single function call.models
- Supports listing of models in user environment.install_model()
- Install a model in user environment.uninstall_model()
- Uninstall a model from user environment.snapshot()
- Take the snapshot of the user environment.DataRobotPredict()
- Score the data in Vantage using the model trained externally in datarobot and stored
in Vantage.DataFrame.describe()
statistics
, which specifies the aggregate operation to be performed.DataFrame.sort()
view_log()
downloads the Apply query logs based on query id.Analytics Database Analytic Functions
.ignore_nulls
added to DataFrame.plot()
to ignore the null values while plotting the data.Dataframe.sample()
DataFrameColumn.cast()
accepts all teradatasqlalchemy types.DataFrame.merge()
.Hyperparameter tuning is an optimization method to determine the optimal set of hyperparameters for the given dataset and learning model. teradataml hyperparameter tuning feature offers best model identification, parallel execution, early stopping feature, best data identification, model evaluation, model prediction, live logging, input data hyper-parameterization, input data sampling, numerous scoring functions, hyper-parameterization for non-model trainer functions.
GridSearch
GridSearch is an exhaustive search algorithm that covers all possible
parameter values to identify optimal hyperparameters.
__init__()
- Instantiate an object of GridSearch for given model function and parameters.evaluate()
- Function to perform evaluation on the given teradataml DataFrame using default model.fit()
- Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.get_error_log()
- Useful to get the error log if model execution failed, using the model identifier.get_input_data()
- Useful to get the input data using the data identifier, when input data is also parameterized.get_model()
- Returns the trained model for the given model identifier.get_parameter_grid()
- Returns the hyperparameter space used for hyperparameter optimization.is_running()
- Returns the execution status of hyperaparameter tuning.predict()
- Function to perform prediction on the given teradataml DataFrame using default model.set_model()
- Function to update the default model.best_data_id
- Returns the best data identifier used for model training.best_model
- Returns the best trained model.best_model_id
- Returns the identifier for best model.best_params_
- Returns the best set of hyperparameter.best_sampled_data_
- Returns the best sampled data used to train the best model.best_score_
- Returns the best trained model score.model_stats
- Returns the model evaluation reports.models
- Returns the metadata of all the models.RandomSearch
RandomSearch algorithm performs random sampling on hyperparameter
space to identify optimal hyperparameters.
__init__()
- Instantiate an object of RandomSearch for given model function and parameters.evaluate()
- Function to perform evaluation on the given teradataml DataFrame using default model.fit()
- Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.get_error_log()
- Useful to get the error log if model execution failed, using the model identifier.get_input_data()
- Useful to get the input data using the data identifier, when input data is also parameterized.get_model()
- Returns the trained model for the given model identifier.get_parameter_grid()
- Returns the hyperparameter space used for hyperparameter optimization.is_running()
- Returns the execution status of hyperaparameter tuning.predict()
- Function to perform prediction on the given teradataml DataFrame using default model.set_model()
- Function to update the default model.best_data_id
- Returns the best data identifier used for model training.best_model
- Returns the best trained model.best_model_id
- Returns the identifier for best model.best_params_
- Returns the best set of hyperparameter.best_sampled_data_
- Returns the best sampled data used to train the best model.best_score_
- Returns the best trained model score.model_stats
- Returns the model evaluation reports.models
- Returns the metadata of all the models.teradataml currently has different functions to generate a model, predict, transform and evaluate. All these functions are needed to be invoked individually, i.e., predict(), evaluate(), transform() cannot be invoked using the model trainer function output. Enhancement done to this feature now enables user to invoke these functions as methods of the model trainer function. Below is the list of functions, updated with this enhancement:
BincodeFit()
- Supports transform()
method.DecisionForest()
- Supports predict()
, evaluate()
methods.Fit()
- Supports transform()
method.GLM()
- Supports predict()
, evaluate()
methods.GLMPerSegment()
- Supports predict()
, evaluate()
methods.KMeans()
- Supports predict()
method.KNN()
- Supports predict()
, evaluate()
methods.NaiveBayesTextClassifierTrainer()
- Supports predict()
, evaluate()
methods.NonLinearCombineFit()
- Supports transform()
method.OneClassSVM()
- Supports predict()
method.OneHotEncodingFit()
- Supports transform()
method.OrdinalEncodingFit()
- Supports transform()
method.OutlierFilterFit()
- Supports transform()
method.PolynomialFeaturesFit()
- Supports transform()
method.RandomProjectionFit()
- Supports transform()
method.RowNormalizeFit()
- Supports transform()
method.ScaleFit()
- Supports transform()
method.SimpleImputeFit()
- Supports transform()
method.SVM()
- Supports predict()
, evaluate()
methods.TargetEncodingFit()
- Supports transform()
method.XGBoost()
- Supports predict()
, evaluate()
methods.ArimaEstimate()
- Supports forecast()
, validate()
methods.DFFT()
- Supports convolve()
, inverse()
methods.IDFFT()
- Supports inverse()
method.DFFT2()
- Supports convolve()
, inverse()
methods.IDFFT2()
- Supports inverse()
method.DIFF()
- Supports inverse()
method.UNDIFF()
- Supports inverse()
method.SeasonalNormalize()
- Supports inverse()
method.DataFrame.plot()
- Generates the below type of plots on teradataml DataFrame.
DataFrame.itertuples()
- iterate over teradataml DataFrame rows as namedtuples or list.GeoDataFrame.plot()
- Generate the below type of plots on teradataml GeoDataFrame.
Plot:
Axis
- Genertes the axis for plot.Figure
- Generates the figure for plot.subplots
- Helps in generating multiple plots on a single Figure
.Bring Your Own Model (BYOM) Function:
DataikuPredict
- Score the data in Vantage using the model trained externally in Dataiku UI and stored in Vantage.async_run_status()
- Function to check the status of asynchronous run(s) using unique run id(s).
DataFrameColumn.abs()
- Computes the absolute value.DataFrameColumn.ceil()
- Returns the ceiling value of the column.DataFrameColumn.ceiling()
- It is an alias for DataFrameColumn.ceil()
function.DataFrameColumn.degrees()
- Converts radians value from the column to degrees.DataFrameColumn.exp()
- Raises e (the base of natural logarithms) to the power of the value in the column, where e = 2.71828182845905.DataFrameColumn.floor()
- Returns the largest integer equal to or less than the value in the column.DataFrameColumn.ln()
- Computes the natural logarithm of values in column.DataFrameColumn.log10()
- Computes the base 10 logarithm.DataFrameColumn.mod()
- Returns the modulus of the column.DataFrameColumn.pmod()
- It is an alias for DataFrameColumn.mod()
function.DataFrameColumn.nullifzero()
- Converts data from zero to null to avoid problems with division by zero.DataFrameColumn.pow()
- Computes the power of the column raised to expression or constant.DataFrameColumn.power()
- It is an alias for DataFrameColumn.pow()
function.DataFrameColumn.radians()
- Converts degree value from the column to radians.DataFrameColumn.round()
- Returns the rounded off value.DataFrameColumn.sign()
- Returns the sign.DataFrameColumn.signum()
- It is an alias for DataFrameColumn.sign()
function.DataFrameColumn.sqrt()
- Computes the square root of values in the column.DataFrameColumn.trunc()
- Provides the truncated value of columns.DataFrameColumn.width_bucket()
- Returns the number of the partition to which column is assigned.DataFrameColumn.zeroifnull()
- Converts data from null to zero to avoid problems with null.DataFrameColumn.acos()
- Returns the arc-cosine value.DataFrameColumn.asin()
- Returns the arc-sine value.DataFrameColumn.atan()
- Returns the arc-tangent value.DataFrameColumn.atan2()
- Returns the arc-tangent value based on x and y coordinates.DataFrameColumn.cos()
- Returns the cosine value.DataFrameColumn.sin()
- Returns the sine value.DataFrameColumn.tan()
- Returns the tangent value.DataFrameColumn.acosh()
- Returns the inverse hyperbolic cosine value.DataFrameColumn.asinh()
- Returns the inverse hyperbolic sine value.DataFrameColumn.atanh()
- Returns the inverse hyperbolic tangent value.DataFrameColumn.cosh()
- Returns the hyperbolic cosine value.DataFrameColumn.sinh()
- Returns the hyperbolic sine valueDataFrameColumn.tanh()
- Returns the hyperbolic tangent value.DataFrameColumn.ascii()
- Returns the decimal representation of the first character in column.DataFrameColumn.char2hexint()
- Returns the hexadecimal representation for a character string in a column.DataFrameColumn.chr()
- Returns the Latin ASCII character of a given a numeric code value in column.DataFrameColumn.char()
- It is an alias for DataFrameColumn.chr()
function.DataFrameColumn.character_length()
- Returns the number of characters in the column.DataFrameColumn.char_length()
- It is an alias for DataFrameColumn.character_length()
function.DataFrameColumn.edit_distance()
- Returns the minimum number of edit operations required to
transform string in a column into string specified in argument.DataFrameColumn.index()
- Returns the position of a string in a column where string specified in argument starts.DataFrameColumn.initcap()
- Modifies a string column and returns the string with the first character
of each word in uppercase.DataFrameColumn.instr()
- Searches the string in a column for occurrences of search string passed as argument.DataFrameColumn.lcase()
- Returns a character string identical to string values in column,
with all uppercase letters replaced with their lowercase equivalents.DataFrameColumn.left()
- Truncates string in a column to a specified number of characters desired from
the left side of the string.DataFrameColumn.length()
- It is an alias for DataFrameColumn.character_length()
function.DataFrameColumn.levenshtein()
- It is an alias for DataFrameColumn.edit_distance()
function.DataFrameColumn.locate()
- Returns the position of the first occurrence of a string in a column within
string in argument.DataFrameColumn.lower()
- It is an alias for DataFrameColumn.character_lcase()
function.DataFrameColumn.lpad()
- Returns the string in a column padded to the left with the characters specified
in argument so that the resulting string has length specified in argument.DataFrameColumn.ltrim()
- Returns the string in a column, with its left-most characters removed up
to the first character that is not in the string specified in argument.DataFrameColumn.ngram()
- Returns the number of n-gram matches between string in a column,
and string specified in argument.DataFrameColumn.nvp()
- Extracts the value of a name-value pair where the name in the pair matches
the name and the number of the occurrence specified.DataFrameColumn.oreplace()
- Replaces every occurrence of search string in the column.DataFrameColumn.otranslate()
- Returns string in a column with every occurrence of each character in
string in argument replaced with the corresponding character in another argument.DataFrameColumn.replace()
- It is an alias for DataFrameColumn.oreplace()
function.DataFrameColumn.reverse()
- Returns the reverse of string in column.DataFrameColumn.right()
- Truncates input string to a specified number of characters desired from
the right side of the string.DataFrameColumn.rpad()
- Returns the string in a column padded to the right with the characters specified
in argument so the resulting string has length specified in argument.DataFrameColumn.rtrim()
- Returns the string in column, with its right-most characters removed up
to the first character that is not in the string specified in argument.DataFrameColumn.soundex()
- Returns a character string that represents the Soundex code for
string in a column.DataFrameColumn.string_cs()
- Returns a heuristically derived integer value that can be used to determine
which KANJI1-compatible client character set was used to encode string in a column.DataFrameColumn.translate()
- It is an alias for DataFrameColumn.otranslate()
function.DataFrameColumn.upper()
- Returns a character string with all lowercase letters in a column replaced
with their uppercase equivalents.configure.indb_install_location
Specifies the installation location of In-DB Python package.set_auth_token()
set_auth_token()
does not accept username and password anymore. Instead, function opens up a browser session and user should authenticate in browser.auth_token
is not set or retrieved from the configure
option anymore.create_env()
- supports creation of R environment.remove_env()
- Supports removal of remote R environment.remove_all_envs()
- Supports removal of all remote R environments.remove_env()
and remove_all_envs()
supports asynchronous call.libs
- Supports listing of libraries in R remote environment.install_lib()
- Supports installing of libraries in remote R environment.uninstall_lib()
- Supports uninstalling of libraries in remote R environment.update_lib()
- Supports updating of libraries in remote R environment.ArimaEstimate()
CSS
algorithm via algorithm
argument.teradataml is now compatible with SQLAlchemy 2.0.X
execute()
method on SQLAlchemy engine object returned by
get_context()
and create_context()
teradataml functions. This is due to the SQLAlchemy has
removed the support for execute()
method on the engine object. Thus, user scripts where
get_context().execute()
and create_context().execute()
, is used, Teradata recommends to
replace those with either execute_sql()
function exposed by teradataml or exec_driver_sql()
method on the Connection
object returned by get_connection()
function in teradataml.get_connection().execute()
accepts only executable sqlalchemy object. Refer to
sqlalchemy.engine.base.execute()
for more details.execute_sql()
function exposed by teradataml or
exec_driver_sql()
method on the Connection
object returned by get_connection()
function in teradataml, in such cases.New utility function execute_sql()
is added to execute the SQL.
Extending compatibility for MAC with ARM processors.
Added support for floor division (//) between two teradataml DataFrame Columns.
Analytics Database Analytic Functions:
GLMPerSegment()
GLMPredictPerSegment()
OneClassSVM()
OneClassSVMPredict()
SVM()
SVMPredict()
TargetEncodingFit()
TargetEncodingTransform()
TrainTestSplit()
WordEmbeddings()
XGBoost()
XGBoostPredict()
display.geometry_column_length
Option to display the default length of geometry column in GeoDataFrame.set_auth_token()
function can generate the client id automatically based on org_id when user do not specify it.ColumnTransformer()
onehotencoding_fit_data
and ordinalencoding_fit_data
.OrdidnalEncodingFit()
category_data
, target_column_names
, categories_column
, ordinal_values_column
.target_column
, start_value
, default_value
.OneHotEncodingFit()
category_data
, approach
, target_columns
, categories_column
, category_counts
.target_column
, other_column
.DataFrame.sample()
method output is now deterministic.copy_to_sql()
now preserves the rows of the table even when the view content is copied to the same table name.list_user_envs()
does not raise warning when no user environments found.lprefix
and rprefix
added.lsuffix
and rsuffix
will be changed in future, use new arguments instead.ReadNOS
and WriteNOS
now accept dictionary value for authorization
and row_format
arguments.WriteNOS
supports writing CSV files to external store.copy_to_sql()
bug related to NaT value has been fixed.value
argument of FillNa()
, a Vantage Analytic Library function supports special characters.case
function accepts DataFrame column as value in whens
argument.set_auth_token()
- Sets the JWT token automatically for using Open AF API's.display.suppress_vantage_runtime_warnings
Suppresses the VantageRuntimeWarning raised by teradataml, when set to True.stats_columns
and stats
are made to be optional.table_format
is added to ReadNOS().full_scan
is changed to scan_pct
in ReadNOS().DataFrame.apply()
supports hash by and local order by.DataFrame.pivot()
- Rotate data from rows into columns to create easy-to-read DataFrames.DataFrame.unpivot()
- Rotate data from columns into rows to create easy-to-read DataFrames.DataFrame.drop_duplicate()
- Drop duplicate rows from teradataml DataFrame.Dataframe.is_art
- Check whether teradataml DataFrame is created on an Analytic Result Table, i.e., ART table or not.New Functions
ACF()
ArimaEstimate()
ArimaValidate()
DIFF()
LinearRegr()
MultivarRegr()
PACF()
PowerTransform()
SeasonalNormalize()
Smoothma()
UNDIFF()
Unnormalize()
ArimaForecast()
DTW()
HoltWintersForecaster()
MAMean()
SimpleExp()
BinaryMatrixOp()
BinarySeriesOp()
GenseriesFormula()
MatrixMultiply()
Resample()
BreuschGodfrey()
BreuschPaganGodfrey()
CumulPeriodogram()
DickeyFuller()
DurbinWatson()
FitMetrics()
GoldfeldQuandt()
Portman()
SelectionCriteria()
SignifPeriodicities()
SignifResidmean()
WhitesGeneral()
Convolve()
Convolve2()
DFFT()
DFFT2()
DFFT2Conv()
DFFTConv()
GenseriesSinusoids()
IDFFT()
IDFFT2()
LineSpec()
PowerSpec()
ExtractResults()
InputValidator()
MInfo()
SInfo()
TrackingOp()
New Features: Inputs to Unbounded Array Framework (UAF) functions
TDAnalyticResult()
- Allows to prepare function output generated by UAF functions to be passed.TDGenSeries()
- Allows to generate a series, that can be passed to a UAF function.TDMatrix()
- Represents a Matrix in time series, that can be created from a teradataml DataFrame.TDSeries()
- Represents a Series in time series, that can be created from a teradataml DataFrame.display_analytic_functions()
categorizes the analytic functions based on function type.copy_to_sql
updated to map data type timezone(tzinfo) to TIMESTAMP(timezone=True), instead of VARCHAR.ANOVA()
ClassificationEvaluator()
ColumnTransformer()
DecisionForest()
GLM()
GetFutileColumns()
KMeans()
KMeansPredict()
NaiveBayesTextClassifierTrainer()
NonLinearCombineFit()
NonLinearCombineTransform()
OrdinalEncodingFit()
OrdinalEncodingTransform()
RandomProjectionComponents()
RandomProjectionFit()
RandomProjectionTransform()
RegressionEvaluator()
ROC()
SentimentExtractor()
Silhouette()
TDGLMPredict()
TextParser()
VectorDistance()
display_analytic_functions()
categorizes the analytic functions based on function type.list_base_envs()
- list the available python base versions.create_env()
- create a new user environment. get_env()
- get existing user environment.list_user_envs()
- list the available user environments.remove_env()
- delete user environment.remove_all_envs()
- delete all the user environments.files
- Get files in user environment.libs
- Get libraries in user environment.install_file()
- Install a file in user environment.remove_file()
- Remove a file in user environment.install_lib()
- Install a library in user environment.update_lib()
- Update a library in user environment.uninstall_lib()
- Uninstall a library in user environment.status()
- Check the status of
refresh()
- Refresh the environment details in local client.__init__()
- Instantiate an object of apply for script execution.install_file()
- Install a file in user environment.remove_file()
- Remove a file in user environment.set_data()
– Reset data and related arguments.execute_script()
– Executes Python script.DataFrame.apply()
- Execute a user defined Python function on VantageLake Cloud.ONNXPredict()
- Score using model trained externally on ONNX and stored in Vantage.accumulate
argument is working for ScaleTransform()
.accumulate
argument added on Database Versions: 17.20.x.x
ConvertTo()
GetRowsWithoutMissingValues()
GetRowsWithoutMissingValues()
OutlierFilterFit()
supports multiple output.OutlierFilterFit()
function below arguments are optional in teradataml 17.20.x.x
lower_percentile
upper_percentile
outlier_method
replacement_value
percentile_method
columns
argument is required.output_responses
argument in MLE function DecisionTreePredict()
, does not allow empty string.list_td_reserved_keywords()
- Validates if the specified string is Teradata reserved
keyword or not, else lists down all the Teradata reserved keywords.create_context()
- Password containing special characters requires URL encoding as per
https://docs.microfocus.com/OMi/10.62/Content/OMi/ExtGuide/ExtApps/URL_encoding.html.
teradataml has added a fix to take care of the URL encoding of the password while creating a context.
Also, a new argument is added to give a more control over the URL encoding to be done at the time of context creation.The Geospatial feature in teradataml enables data manipulation, exploration and analysis on tables, views, and queries on Teradata Vantage that contains Geospatial data.
__getattr__()
__getitem__()
__init__()
__repr__()
assign()
concat()
count()
drop()
dropna()
filter()
from_query()
from_table()
get()
get_values()
groupby()
head()
info()
join()
keys()
merge()
sample()
select()
set_index()
show_query()
sort()
sort_index()
squeeze()
tail()
to_csv()
to_pandas()
to_sql()
buffer()
contains()
crosses()
difference()
disjoint()
distance()
distance_3D()
envelope()
geom_equals()
intersection()
intersects()
make_2D()
mbb()
mbr()
overlaps()
relates()
set_exterior()
set_srid()
simplify()
sym_difference()
to_binary()
to_text()
touches()
transform()
union()
within()
wkb_geom_to_sql()
wkt_geom_to_sql()
spherical_buffer()
spherical_distance()
spheriodal_buffer()
spheriodal_distance()
set_x()
set_y()
set_z()
end_point()
length()
length_3D()
line_interpolate_point()
num_points()
point()
start_point()
interiors()
num_interior_ring()
point_on_surface()
geom_component()
num_geometry()
clip()
get_final_timestamp()
get_init_timestamp()
get_link()
get_user_field()
get_user_field_count()
point_heading()
set_link()
speed()
intersects_mbb()
mbb_filter()
mbr_filter()
within_mbb()
buffer()
contains()
crosses()
difference()
disjoint()
distance()
distance_3D()
envelope()
geom_equals()
intersection()
intersects()
make_2D()
mbb()
mbr()
overlaps()
relates()
set_exterior()
set_srid()
simplify()
sym_difference()
to_binary()
to_text()
touches()
transform()
union()
within()
wkb_geom_to_sql()
wkt_geom_to_sql()
spherical_buffer()
spherical_distance()
spheriodal_buffer()
spheriodal_distance()
set_x()
set_y()
set_z()
endpoint()
length()
length_3D()
line_interpolate_point()
num_points()
point()
start_point()
interiors()
num_interior_ring()
point_on_surface()
geom_component()
num_geometry()
clip()
get_final_timestamp()
get_init_timestamp()
get_link()
get_user_field()
get_user_field_count()
point_heading()
set_link()
speed()
intersects_mbb()
mbb_filter()
mbr_filter()
within_mbb()
to_csv()
display_analytic_functions()
API displays all the available SQLE Analytic functions based on database version.Antiselect()
Attribution()
DecisionForestPredict()
DecisionTreePredict()
GLMPredict()
MovingAverage()
NaiveBayesPredict()
NaiveBayesTextClassifierPredict()
NGramSplitter()
NPath()
Pack()
Sessionize()
StringSimilarity()
SVMParsePredict()
Unpack()
Antiselect()
Attribution()
BincoodeFit()
BncodeTransform()
CategoricalSummary()
ChiSq()
ColumnSummary()
ConvertTo()
DecisionForestPredict()
DecisionTreePredict()
GLMPredict()
FillRowId()
FTest()
Fit()
Transform()
GetRowsWithMissingValues()
GetRowsWithoutMissingValues()
MovingAverage()
Histogram()
NaiveBayesPredict()
NaiveBayesTextClassifierPredict()
NGramSplitter()
NPath()
NumApply()
OneHotEncodingFit()
OneHotEncodingTransform()
OutlierFilterFit()
OutlierFilterTransform()
Pack()
PolynomialFeatuesFit()
PolynomialFeatuesTransform()
QQNorm()
RoundColumns()
RowNormalizeFit()
RowNormalizeTransform()
ScaleFit()
ScaleTransform()
Sessionize()
SimpleImputeFit()
SimpleImputeTransform()
StrApply()
StringSimilarity()
SVMParsePredict()
UniVariateStatistics()
Unpack()
WhichMax()
WhichMin()
ZTest()
read_csv()
read_nos()
write_nos()
get_license()
set_byom_catalog()
set_license()
copy_to_sql()
- New argument "chunksize" added to load data in chunks.fastexport()
fastload()
to_pandas()
concat()
td_intersect()
td_expect()
td_minus()
set_byom_catalog()
and set_license()
such as table name, schema name and license details respectively.
delete_byom()
list_byom()
retrieve_byom()
save_byom()
view_log()
- Allows user to view BYOM logs.db_python_package_details()
function is fixed to support latest STO release for pip and Python aliases used.print()
issue related to Response Row size is greater than the 1MB allowed maximum.
has been fixed to print the data with lot of columns.DataFrame.to_sql()
and copy_to_sql()
to fix the issue where the function was failing with error - "Request requires too many SPOOL files.". Reducing the chunksize than the default one will result in successful operation.remove_context()
is fixed to remove the active connection from database.fastexport()
and fastload()
functions.DataFrame.to_sql()
is fixed to support temporary table when default database differs from the username.DataFrame.to_pandas()
now by default support data transfer using regular method. Change is carried out for user to allow the data transfer if utility throttles are configured, i.e., TASM configuration does not support data export using FastExport.save_byom()
now notifies if VARCHAR column is trimmed out if data passed to the API is greater than the length of the VARCHAR column.DataFrame.map_row()
and DataFrame.map_parition()
when executed in LOCAL mode.show_query()
.fastexport()
to show the correct import statement.Fixed [CS0733758] db_python_package_details() fails on recent STO release due to changes in pip and python aliases.
H2OPredict()
- Score using model trained externally in H2O and stored in Vantage.PMMLPredict()
- Score using model trained externally in PMML and stored in Vantage.save_byom()
- Save externally trained models in Teradata Vantage.delete_byom()
- Delete a model from the user specified table in Teradata Vantage.list_byom()
- List models.retrieve_byom()
- Function to retrieve a saved model.XmlToHtmlReport()
- Transforms XML output of VAL functions to HTML.DataFrame.window()
- Generates Window object on a teradataml DataFrame to run window aggregate functions.DataFrame.csum()
- Returns column-wise cumulative sum for rows in the partition of the dataframe.DataFrame.mavg()
- Returns moving average for the current row and the preceding rows.DataFrame.mdiff()
- Returns moving difference for the current row and the preceding rows.DataFrame.mlinreg()
- Returns moving linear regression for the current row and the preceding rows.DataFrame.msum()
- Returns moving sum for the current row and the preceding rows.DataFrame.corr()
- Returns the Sample Pearson product moment correlation coefficient.DataFrame.covar_pop()
- Returns the population covariance.DataFrame.covar_samp()
- Returns the sample covariance.DataFrame.regr_avgx()
- Returns the mean of the independent variable.DataFrame.regr_avgy()
- Returns the mean of the dependent variable.DataFrame.regr_count()
- Returns the count of the dependent and independent variable arguments.DataFrame.rege_intercept()
- Returns the intercept of the univariate linear regression line.DataFrame.regr_r2()
- Returns the coefficient of determination.DataFrame.regr_slope()
- Returns the slope of the univariate linear regression line through.DataFrame.regr_sxx()
- Returns the sum of the squares of the independent variable expression.DataFrame.regr_sxy()
- Returns the sum of the products of the independent variable and the dependent variable.DataFrame.regr_syy()
- Returns the sum of the squares of the dependent variable expression.ColumnExpression.window()
- Generates Window object on a teradataml DataFrameColumn to run window aggregate functions.ColumnExpression.desc()
- Sorts ColumnExpression in descending order.ColumnExpression.asc()
- Sorts ColumnExpression in ascending order.ColumnExpression.distinct()
- Removes duplicate value from ColumnExpression.ColumnExpression.corr()
- Returns the Sample Pearson product moment correlation coefficient.ColumnExpression.count()
- Returns the column-wise count.ColumnExpression.covar_pop()
- Returns the population covariance.ColumnExpression.covar_samp()
- Returns the sample covariance.ColumnExpression.kurtosis()
- Returns kurtosis value for a column.ColumnExpression.median()
- Returns column-wise median value.ColumnExpression.max()
- Returns the column-wise max value.ColumnExpression.mean()
- Returns the column-wise average value.ColumnExpression.min()
- Returns the column-wise min value.ColumnExpression.regr_avgx()
- Returns the mean of the independent variable.ColumnExpression.regr_avgy()
- Returns the mean of the dependent variable.ColumnExpression.regr_count()
- Returns the count of the dependent and independent variable arguments.ColumnExpression.rege_intercept()
- Returns the intercept of the univariate linear regression line.ColumnExpression.regr_r2()
- Returns the coefficient of determination arguments.ColumnExpression.regr_slope()
- Returns the slope of the univariate linear regression line.ColumnExpression.regr_sxx()
- Returns the sum of the squares of the independent variable expression.ColumnExpression.regr_sxy()
- Returns the sum of the products of the independent variable and the dependent variable.ColumnExpression.regr_syy()
- Returns the sum of the squares of the dependent variable expression.ColumnExpression.skew()
- Returns skew value for a column.ColumnExpression.std()
- Returns the column-wise population/sample standard deviation.ColumnExpression.sum()
- Returns the column-wise sum.ColumnExpression.var()
- Returns the column-wise population/sample variance.ColumnExpression.percentile()
- Returns the column-wise percentile.Window.corr()
- Returns the Sample Pearson product moment correlation coefficient.Window.count()
- Returns the count.Window.covar_pop()
- Returns the population covariance.Window.covar_samp()
- Returns the sample covariance.Window.cume_dist()
- Returns the cumulative distribution of values.Window.dense_Rank()
- Returns the ordered ranking of all the rows.Window.first_value()
- Returns the first value of an ordered set of values.Window.lag()
- Returns data from the row preceding the current row at a specified offset value.Window.last_value()
- Returns the last value of an ordered set of values.Window.lead()
- Returns data from the row following the current row at a specified offset value.Window.max()
- Returns the column-wise max value.Window.mean()
- Returns the column-wise average value.Window.min()
- Returns the column-wise min value.Window.percent_rank()
- Returns the relative rank of all the rows.Window.rank()
- Returns the rank (1 … n) of all the rows.Window.regr_avgx()
- Returns the mean of the independent variable arguments.Window.regr_avgy()
- Returns the mean of the dependent variable arguments.Window.regr_count()
- Returns the count of the dependent and independent variable arguments.Window.rege_intercept()
- Returns the intercept of the univariate linear regression line arguments.Window.regr_r2()
- Returns the coefficient of determination arguments.Window.regr_slope()
- Returns the slope of the univariate linear regression line.Window.regr_sxx()
- Returns the sum of the squares of the independent variable expression.Window.regr_sxy()
- Returns the sum of the products of the independent variable and the dependent variable.Window.regr_syy()
- Returns the sum of the squares of the dependent variable expression.Window.row_number()
- Returns the sequential row number.Window.std()
- Returns the column-wise population/sample standard deviation.Window.sum()
- Returns the column-wise sum.Window.var()
- Returns the column-wise population/sample variance.fastexport()
- Exports teradataml DataFrame to Pandas DataFrame using FastExport data transfer protocol.display.blob_length
Specifies default display length of BLOB column in teradataml DataFrame.configure.temp_table_database
Specifies database name for storing the tables created internally.configure.temp_view_database
Specifies database name for storing the views created internally.configure.byom_install_location
Specifies the install location for the BYOM functions.configure.val_install_location
Specifies the install location for the Vantage Analytic Library functions.to_pandas()
-
read_sql()
:
coerce_float
parse_dates
FillNa()
Binning()
OneHotEncoder()
LabelEncoder()
Fixed the internal library load issue related to the GCC version discrepancies on CentOS platform.
Association()
AdaptiveHistogram()
Explore()
Frequency()
Histogram()
Overlaps()
Statistics()
TextAnalyzer()
Values()
DecisionTree()
DecisionTreePredict()
DecisionTreeEvaluator()
KMeans()
KMeansPredict()
LinReg()
LinRegPredict()
LogReg()
LogRegPredict()
LogRegEvaluator()
PCA()
PCAPredict()
PCAEvaluator()
Matrix()
BinomialTest()
ChiSquareTest()
KSTest()
ParametricTest()
RankTest()
Transform()
Binning()
- Perform bin coding to replaces continuous numeric column with a
categorical one to produce ordinal values.Derive()
- Perform free-form transformation done using arithmetic formula.FillNa()
- Perform missing value/null replacement transformations.LabelEncoder()
- Re-express categorical column values into a new coding scheme.MinMaxScalar()
- Rescale data limiting the upper and lower boundaries.OneHotEncoder()
- Re-express a categorical data element as one or more
numeric data elements, creating a binary numeric field for each
categorical data value.Retain()
- Copy one or more columns into the final analytic data set.Sigmoid()
- Rescale data using sigmoid or s-shaped functions.ZScore()
- Rescale data using Z-Score values.DataFrame.map_row()
- Function to apply a user defined function to each row in the
teradataml DataFrame.DataFrame.map_partition()
- Function to apply a user defined function to a group or
partition of rows in the teradataml DataFrame.DataFrame.tdtypes
- Get the teradataml DataFrame metadata containing column names and
corresponding teradatasqlalchemy types.db_python_package_details()
- Lists the details of Python packages installed on Vantage.print_options()
view_log()
setup_sandbox_env()
copy_files_from_container()
cleanup_sandbox_env()
create_context()
test_script()
can now be executed in 'local' mode, i.e., outside of the sandbox.Script.setup_sto_env()
is deprecated. Use setup_sandbox_env()
function instead.save_model()
- Save a teradataml Analytic Function model.retrieve_model()
- Retrieve a saved model.list_model()
- List accessible models.describe_model()
- List the details of a model.delete_model()
- Remove a model from Model Catalog.publish_model()
- Share a model.setup_sto_env()
- Set up test environment.test_script()
- Test user script in containerized environment.set_data()
- Set test data parameters.execute_script()
- Execute user script in Vantage.install_file()
- Install or replace file in Database.remove_file()
- Remove installed file from Database.set_data()
- Set test data parameters.DataFrame.show_query()
- Show underlying query for DataFrame.kurtosis()
- Calculate the kurtosis value.skew()
- Calculate the skewness of the distribution.distinct
is added to following aggregates to exclude duplicate values.
count()
max()
mean()
min()
sum()
std()
population
is added to calculate the population standard deviation.var()
population
is added to calculate the population variance.kurtosis()
- Calculate the kurtosis value.count()
- Get the total number of values.max()
- Calculate the maximum value.mean()
- Calculate the average value.min()
- Calculate the minimum value.percentile()
- Calculate the desired percentile.skew()
- Calculate the skewness of the distribution.sum()
- Calculate the column-wise sum value.std()
- Calculate the sample and population standard deviation.var()
- Calculate the sample and population standard variance.db_drop_table()
db_drop_view()
db_list_tables()
install_file()
- Install a file in Database.remove_file()
- Remove an installed file from Database.create_context()
database
added to create_context()
API, that allows user to specify connecting database.Betweenness
Closeness
FMeasure
FrequentPaths
IdentityMatch
Interpolator
ROC
show_query()
get_build_time()
get_prediction_type()
get_target_column()
response_column
numeric_columns
categorical_columns
all_columns
Fixed the DataFrame data display corruption issue observed with certain analytic functions.
Compatible with Vantage 1.1.1.
The following ML Engine (teradataml.analytics.mle
) functions have new and/or updated arguments to support the Vantage version:
AdaBoostPredict
DecisionForestPredict
DecisionTreePredict
GLMPredict
LDA
NaiveBayesPredict
NaiveBayesTextClassifierPredict
SVMDensePredict
SVMSparse
SVMSparsePredict
XGBoostPredict
show_versions()
- to list the version of teradataml and dependencies installed.fastload()
- for high performance data loading of large amounts of data into a table on Vantage. Requires teradatasql
version 16.20.0.48
or above.concat
td_intersect
td_except
td_minus
case()
- to help construct SQL CASE based expressions.copy_to_sql
copy_to_sql
to save multi-level index.create_context()
updated to support 'JWT' logon mechanism.NERTrainer
NERExtractor
NEREvaluator
GLML1L2
GLML1L2Predict
as_categorical()
in the teradataml.common.formula
module.DataFrame.sample()
- to sample data.DataFrame.index
- Property to access index_label
of DataFrame.groupby_time()
resample()
bottom()
count()
describe()
delta_t()
mad()
median()
mode()
first()
last()
top()
DataFrame.info()
- Default value for null_counts
argument updated from None
to False
.Dataframe.merge()
updated to accept columns expressions along with column names to on
, left_on
, right_on
arguments.cast()
- to help cast the column to a specified type.isin()
and ~isin()
- to check the presence of values in a column.teradataml.analytics module
have been removed.
Newer versions of the functions are available under the teradataml.analytics.mle
and the teradataml.analytics.sqle
modules.
The modules removed are:
teradataml.analytics.Antiselect
teradataml.analytics.Arima
teradataml.analytics.ArimaPredictor
teradataml.analytics.Attribution
teradataml.analytics.ConfusionMatrix
teradataml.analytics.CoxHazardRatio
teradataml.analytics.CoxPH
teradataml.analytics.CoxSurvival
teradataml.analytics.DecisionForest
teradataml.analytics.DecisionForestEvaluator
teradataml.analytics.DecisionForestPredict
teradataml.analytics.DecisionTree
teradataml.analytics.DecisionTreePredict
teradataml.analytics.GLM
teradataml.analytics.GLMPredict
teradataml.analytics.KMeans
teradataml.analytics.NGrams
teradataml.analytics.NPath
teradataml.analytics.NaiveBayes
teradataml.analytics.NaiveBayesPredict
teradataml.analytics.NaiveBayesTextClassifier
teradataml.analytics.NaiveBayesTextClassifierPredict
teradataml.analytics.Pack
teradataml.analytics.SVMSparse
teradataml.analytics.SVMSparsePredict
teradataml.analytics.SentenceExtractor
teradataml.analytics.Sessionize
teradataml.analytics.TF
teradataml.analytics.TFIDF
teradataml.analytics.TextTagger
teradataml.analytics.TextTokenizer
teradataml.analytics.Unpack
teradataml.analytics.VarMax
remove_context()
when context is created using a SQLAlchemy engine.Antiselect
, Pack
, StringSimilarity
, and Unpack
.NGrams
function to work with Vantage 1.1.shape
, iloc
, describe
, get_values
, merge
, and tail
.isnull
, notnull
) and string processing (lower
, strip
, contains
).teradataml 16.20.00.00
is the first release version. Please refer to the Teradata Python Package User Guide for a list of Limitations and Usage Considerations.Note: 32-bit Python is not supported.
Use pip to install the Teradata Python Package for Advanced Analytics.
Platform | Command |
---|---|
macOS/Linux | pip install teradataml |
Windows | py -3 -m pip install teradataml |
When upgrading to a new version of the Teradata Python Package, you may need to use pip install's --no-cache-dir
option to force the download of the new version.
Platform | Command |
---|---|
macOS/Linux | pip install --no-cache-dir -U teradataml |
Windows | py -3 -m pip install --no-cache-dir -U teradataml |
Your Python script must import the teradataml
package in order to use the Teradata Python Package:
>>> import teradataml as tdml
>>> from teradataml import create_context, remove_context
>>> create_context(host = 'hostname', username = 'user', password = 'password')
>>> df = tdml.DataFrame('iris')
>>> df
SepalLength SepalWidth PetalLength PetalWidth Name
0 5.1 3.8 1.5 0.3 Iris-setosa
1 6.9 3.1 5.1 2.3 Iris-virginica
2 5.1 3.5 1.4 0.3 Iris-setosa
3 5.9 3.0 4.2 1.5 Iris-versicolor
4 6.0 2.9 4.5 1.5 Iris-versicolor
5 5.0 3.5 1.3 0.3 Iris-setosa
6 5.5 2.4 3.8 1.1 Iris-versicolor
7 6.9 3.2 5.7 2.3 Iris-virginica
8 4.4 3.0 1.3 0.2 Iris-setosa
9 5.8 2.7 5.1 1.9 Iris-virginica
>>> df = df.select(['Name', 'SepalLength', 'PetalLength'])
>>> df
Name SepalLength PetalLength
0 Iris-versicolor 6.0 4.5
1 Iris-versicolor 5.5 3.8
2 Iris-virginica 6.9 5.7
3 Iris-setosa 5.1 1.4
4 Iris-setosa 5.1 1.5
5 Iris-virginica 5.8 5.1
6 Iris-virginica 6.9 5.1
7 Iris-setosa 5.1 1.4
8 Iris-virginica 7.7 6.7
9 Iris-setosa 5.0 1.3
>>> df = df[(df.Name == 'Iris-setosa') & (df.PetalLength > 1.5)]
>>> df
Name SepalLength PetalLength
0 Iris-setosa 4.8 1.9
1 Iris-setosa 5.4 1.7
2 Iris-setosa 5.7 1.7
3 Iris-setosa 5.0 1.6
4 Iris-setosa 5.1 1.9
5 Iris-setosa 4.8 1.6
6 Iris-setosa 4.7 1.6
7 Iris-setosa 5.1 1.6
8 Iris-setosa 5.1 1.7
9 Iris-setosa 4.8 1.6
General product information, including installation instructions, is available in the Teradata Documentation website
Use of the Teradata Python Package is governed by the License Agreement for the Teradata Python Package for Advanced Analytics.
After installation, the LICENSE
and LICENSE-3RD-PARTY
files are located in the teradataml
directory of the Python installation directory.
FAQs
Teradata Vantage Python package for Advanced Analytics
We found that teradataml demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.