Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
snowflake-snowpark-python
Advanced tools
The Snowpark library provides intuitive APIs for querying and processing data in a data pipeline. Using this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.
Source code | Snowpark Python developer guide | Snowpark Python API reference | Snowpark pandas developer guide | Snowpark pandas API reference | Product documentation | Samples
If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.
You can use miniconda, anaconda, or virtualenv to create a Python 3.8, 3.9, 3.10 or 3.11 virtual environment.
For Snowpark pandas, only Python 3.9, 3.10, or 3.11 is supported.
To have the best experience when using it with UDFs, creating a local conda environment with the Snowflake channel is recommended.
pip install snowflake-snowpark-python
To use the Snowpark pandas API, you can optionally install the following, which installs modin in the same environment. The Snowpark pandas API provides a familiar interface for pandas users to query and process data directly in Snowflake.
pip install "snowflake-snowpark-python[modin]"
from snowflake.snowpark import Session
connection_parameters = {
"account": "<your snowflake account>",
"user": "<your snowflake user>",
"password": "<your snowflake password>",
"role": "<snowflake user role>",
"warehouse": "<snowflake warehouse>",
"database": "<snowflake database>",
"schema": "<snowflake schema>"
}
session = Session.builder.configs(connection_parameters).create()
# Create a Snowpark dataframe from input data
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
result = df.collect()
df.show()
# -------------
# |"A" |"B" |
# -------------
# |3 |4 |
# -------------
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark import Session
CONNECTION_PARAMETERS = {
'account': '<myaccount>',
'user': '<myuser>',
'password': '<mypassword>',
'role': '<myrole>',
'database': '<mydatabase>',
'schema': '<myschema>',
'warehouse': '<mywarehouse>',
}
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
# Create a Snowpark pandas dataframe from input data
df = pd.DataFrame([['a', 2.0, 1],['b', 4.0, 2],['c', 6.0, None]], columns=["COL_STR", "COL_FLOAT", "COL_INT"])
df
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1.0
# 1 b 4.0 2.0
# 2 c 6.0 NaN
df.shape
# (3, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.dropna(subset=["COL_INT"], inplace=True)
df
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.shape
# (2, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
# Save the result back to Snowflake with a row_pos column.
df.reset_index(drop=True).to_snowflake('pandas_test2', index=True, index_label=['row_pos'])
The Snowpark Python developer guide, Snowpark Python API references, Snowpark pandas developer guide, and Snowpark pandas api references have basic sample code. Snowflake-Labs has more curated demos.
Configure logging level for snowflake.snowpark
for Snowpark Python API logs.
Snowpark uses the Snowflake Python Connector.
So you may also want to configure the logging level for snowflake.connector
when the error is in the Python Connector.
For instance,
import logging
for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
Snowpark Python API supports reading from and writing to a pandas DataFrame via the to_pandas and write_pandas commands.
To use these operations, ensure that pandas is installed in the same environment. You can install pandas alongside Snowpark Python by executing the following command:
pip install "snowflake-snowpark-python[pandas]"
Once pandas is installed, you can convert between a Snowpark DataFrame and pandas DataFrame as follows:
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
# Convert Snowpark DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
# Write pandas DataFrame to a Snowflake table and return Snowpark DataFrame
snowpark_df = session.write_pandas(pandas_df, "new_table", auto_create_table=True)
Snowpark pandas API also supports writing to pandas:
import modin.pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"])
# Convert Snowpark pandas DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
Note that the above Snowpark pandas commands will work if Snowpark is installed with the [modin]
option, the additional [pandas]
installation is not required.
Please refer to CONTRIBUTING.md.
snowflake.snowpark.dataframe
:
map
include_error
to Session.query_history
to record queries that have error during execution.Session.get_session_stage
is used instead of raising SnowparkSQLException
.Session.stored_procedure_profiler.set_active_profiler
.DataFrame
:
cache_result
In
expression were used in selects.AttributeError
while calling Session.stored_procedure_profiler.get_output
when Session.stored_procedure_profiler
is disabled.protobuf>=5.28
and tzlocal
at runtime.protoc-wheel-0
for the development profile.snowflake-connector-python>=3.12.0, <4.0.0
(was >=3.10.0
).modin
from 0.28.1 to 0.30.1.pandas
2.2.x versions.Index.to_numpy
.DataFrame.align
and Series.align
for axis=0
.size
in GroupBy.aggregate
, DataFrame.aggregate
, and Series.aggregate
.snowflake.snowpark.functions.window
pd.read_pickle
(Uses native pandas for processing).pd.read_html
(Uses native pandas for processing).pd.read_xml
(Uses native pandas for processing)."size"
and len
in GroupBy.aggregate
, DataFrame.aggregate
, and Series.aggregate
.Series.str.len
.pd.DataFrame([0]).agg(np.mean)
) would fail to transpose the result.DataFrame.dropna()
would:
subset
(e.g. []
) as if it specified all columns instead of no columns.TypeError
for a scalar subset
instead of filtering on just that column.ValueError
for a subset
of type pandas.Index
instead of filtering on the columns in the index.TableNotFoundError
when using dynamic pivot in notebook environment.snowflake.snowpark.functions
module.snowflake.snowpark.functions.any_value
Table.update
could not handle VariantType
, MapType
, and ArrayType
data types.DataFrame.join
, causing errors when selecting columns from a joined DataFrame.Table.update
and Table.merge
could fail if the target table's index was not the default RangeIndex
.Session
class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the same Session
object.
FEATURE_THREAD_SAFE_PYTHON_SESSION
to True
for account.DataFrame.queries
API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.session.lineage.trace
API.copy_grants
parameter when registering UDxF and stored procedures.DataFrameWriter
to support daisy-chaining:
option
options
partition_by
snowflake_cortex_summarize
.snowflake.snowpark.functions.array_remove
it is now possible to use in python.df.sort().limit()
and df.limit().sort()
generates the same query with sort in front of limit. Now, df.limit().sort()
will generate query that reads df.limit().sort()
.df.limit().sort()
, because limit stops table scanning as soon as the number of records is satisfied.DataFrame.analytics.time_series_agg
function to handle multiple data points in same sliding interval.np.subtract
, np.multiply
, np.divide
, and np.true_divide
.__array_ufunc__
.np.float_power
, np.mod
, np.remainder
, np.greater
, np.greater_equal
, np.less
, np.less_equal
, np.not_equal
, and np.equal
.np.log
, np.log2
, and np.log10
DataFrameGroupBy.bfill
, SeriesGroupBy.bfill
, DataFrameGroupBy.ffill
, and SeriesGroupBy.ffill
.on
parameter with Resampler
.value_counts()
.snowflake_cortex_summarize
.DataFrame.attrs
and Series.attrs
.DataFrame.style
.np.full_like
head
and iloc
when the row key is a slice.tz_convert
and tz_localize
in Series
, DataFrame
, Series.dt
, and DatetimeIndex
.tz_convert
and tz_localize
in Series
, DataFrame
, Series.dt
, and DatetimeIndex
to specify the supported timezone formats.df.apply
and series.apply
( as well as map
and applymap
) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.iloc
and iat
when the row key is a scalar.iterrows
.Series.map
to reflect the unsupported features.np.may_share_memory
which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.DataFrame
and Series
pct_change()
would raise TypeError
when input contained timedelta columns.replace()
would sometimes propagate Timedelta
types incorrectly through replace()
. Instead raise NotImplementedError
for replace()
on Timedelta
.DataFrame
and Series
round()
would raise AssertionError
for Timedelta
columns. Instead raise NotImplementedError
for round()
on Timedelta
.reindex
fails when the new index is a Series with non-overlapping types from the original index.__getitem__
on a DataFrameGroupBy object always returned a DataFrameGroupBy object if as_index=False
.NotImplementedError
.DataFrame.shift()
on axis=0 and axis=1 would fail to propagate timedelta types.DataFrame.abs()
, DataFrame.__neg__()
, DataFrame.stack()
, and DataFrame.unstack()
now raise NotImplementedError
for timedelta inputs instead of failing to propagate timedelta types.DataFrame.alias
raises KeyError
for input column name.to_csv
on Snowflake stage fails when data contains empty strings.snowflake.snowpark.functions
:
make_interval
Window.range_between()
when the order by column is TIMESTAMP or DATE type.thread_id
to QueryRecord
to track the thread id submitting the query history.Session.stored_procedure_profiler
.'NoneType' has no len() when trying to read default values from function
.TimedeltaIndex.mean
method.Timedelta
columns on axis=0
with agg
or aggregate
.by
, left_by
, right_by
, left_index
, and right_index
for pd.merge_asof
.include_describe
to Session.query_history
.DatetimeIndex.mean
and DatetimeIndex.std
methods.Resampler.asfreq
, Resampler.indices
, Resampler.nunique
, and Resampler.quantile
.resample
frequency W
, ME
, YE
with closed = "left"
.DataFrame.rolling.corr
and Series.rolling.corr
for pairwise = False
and int window
.window
and min_periods = None
for Rolling
.DataFrameGroupBy.fillna
and SeriesGroupBy.fillna
.Series
and DataFrame
objects with the lazy Index
object as data
, index
, and columns
arguments.Series
and DataFrame
objects with index
and column
values not present in DataFrame
/Series
data
.pd.read_sas
(Uses native pandas for processing).rolling().count()
and expanding().count()
to Timedelta
series and columns.tz
in both pd.date_range
and pd.bdate_range
.Series.items
.errors="ignore"
in pd.to_datetime
.DataFrame.tz_localize
and Series.tz_localize
.DataFrame.tz_convert
and Series.tz_convert
.sin
) in Series.map
, Series.apply
, DataFrame.apply
and DataFrame.applymap
.to_pandas
to persist the original timezone offset for TIMESTAMP_TZ type.dtype
results for TIMESTAMP_TZ type to show correct timezone offset.dtype
results for TIMESTAMP_LTZ type to show correct timezone.numeric_only
for groupby aggregations.sort_values
.convert_dtype
in Series.apply
.Index
object created from a Series
/DataFrame
incorrectly updates the Series
/DataFrame
's index name after an inplace update has been applied to the original Series
/DataFrame
.SettingWithCopyWarning
that sometimes appeared when printing Timedelta
columns.inplace
argument for Series
objects derived from other Series
objects.Series.sort_values
failed if series name overlapped with index column name.Timedelta
index levels to integer column levels.Resampler
methods on timedelta columns would produce integer results.pd.to_numeric()
would leave Timedelta
inputs as Timedelta
instead of converting them to integers.loc
set when setting a single row, or multiple rows, of a DataFrame with a Series value.date_add
and date_sub
functions failed for NULL
values.equal_null
could fail inside a merge statement.row_number
could fail inside a Window function.This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
snowflake.snowpark.functions
:
array_remove
ln
Session.write_pandas
by making use_logical_type
option more explicit.DataFrameWriter.save_as_table
:
enable_schema_evolution
data_retention_time
max_data_extension_time
change_tracking
copy_grants
iceberg_config
A dicitionary that can hold the following iceberg configuration options:
external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
DataFrameWriter.copy_into_table
:
iceberg_config
A dicitionary that can hold the following iceberg configuration options:
external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
DataFrame.create_or_replace_dynamic_table
:
mode
refresh_mode
initialize
clustering_keys
is_transient
data_retention_time
max_data_extension_time
session.read.csv
that caused an error when setting PARSE_HEADER = True
in an externally defined file format.session.get_session_stage
that referenced a non-existing stage after switching database or schema.DataFrame.to_snowpark_pandas
without explicitly initializing the Snowpark pandas plugin caused an error.explode
function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the outer
parameter.Index.identical
.DataFrameWriter.save_as_table
incorrectly handled DataFrames containing only a subset of columns from the existing table.to_timestamp
does not set the default timezone of the column datatype.Timedelta
type, including the following features. Snowpark pandas will raise NotImplementedError
for unsupported Timedelta
use cases.
copy
, cache_result
, shift
, sort_index
, assign
, bfill
, ffill
, fillna
, compare
, diff
, drop
, dropna
, duplicated
, empty
, equals
, insert
, isin
, isna
, items
, iterrows
, join
, len
, mask
, melt
, merge
, nlargest
, nsmallest
, to_pandas
.astype
.NotImplementedError
will be raised for the rest of methods that do not support Timedelta
.Timedelta
.Timedelta
values.Timedelta
values and numeric values.TimedeltaIndex
.pd.to_timedelta
.GroupBy
aggregations min
, max
, mean
, idxmax
, idxmin
, std
, sum
, median
, count
, any
, all
, size
, nunique
, head
, tail
, aggregate
.GroupBy
filtrations first
and last
.TimedeltaIndex
attributes: days
, seconds
, microseconds
and nanoseconds
.diff
with timestamp columns on axis=0
and axis=1
TimedeltaIndex
methods: ceil
, floor
and round
.TimedeltaIndex.total_seconds
method.Series.dt.round
.DatetimeIndex
.Index.name
, Index.names
, Index.rename
, and Index.set_names
.Index.__repr__
.DatetimeIndex.month_name
and DatetimeIndex.day_name
.Series.dt.weekday
, Series.dt.time
, and DatetimeIndex.time
.Index.min
and Index.max
.pd.merge_asof
.Series.dt.normalize
and DatetimeIndex.normalize
.Index.is_boolean
, Index.is_integer
, Index.is_floating
, Index.is_numeric
, and Index.is_object
.DatetimeIndex.round
, DatetimeIndex.floor
and DatetimeIndex.ceil
.Series.dt.days_in_month
and Series.dt.daysinmonth
.DataFrameGroupBy.value_counts
and SeriesGroupBy.value_counts
.Series.is_monotonic_increasing
and Series.is_monotonic_decreasing
.Index.is_monotonic_increasing
and Index.is_monotonic_decreasing
.pd.crosstab
.pd.bdate_range
and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both pd.date_range
and pd.bdate_range
.Index
objects as labels
in DataFrame.reindex
and Series.reindex
.Series.dt.days
, Series.dt.seconds
, Series.dt.microseconds
, and Series.dt.nanoseconds
.DatetimeIndex
from an Index
of numeric or string type.Timedelta
objects.Series.dt.total_seconds
method.DataFrame.apply(axis=0)
.Series.dt.tz_convert
and Series.dt.tz_localize
.DatetimeIndex.tz_convert
and DatetimeIndex.tz_localize
.quoted_identifier_to_snowflake_type
to avoid making metadata queries if the types have been cached locally.pd.to_datetime
to handle all local input cases.NotImplementedError
for Index bitwise operators.Index.names
is set to a non-like-like object.pd.read_snowflake
include the creation reason when temp table creation is triggered.DataFrame.set_index
, or setting DataFrame.index
or Series.index
by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current Series
/DataFrame
object length, a ValueError
is no longer raised. Instead, when the Series
/DataFrame
object is longer than the provided index, the Series
/DataFrame
's new index is filled with NaN
values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.NotImplementedError
when ambiguous/nonexistent are non-string in ceil
/floor
/round
.pd.Timedelta
scalars.Series.dt.isocalendar
using a named Seriesinplace
argument for Series objects derived from DataFrame columns.Series.reindex
and DataFrame.reindex
did not update the result index's name correctly.Series.take
did not error when axis=1
was specified.to_pandas_batches
with async jobs caused an error due to improper handling of waiting for asynchronous query completion.snowflake.snowpark.testing.assert_dataframe_equal
that is a utility function to check the equality of two Snowpark DataFrames.INFER_SCHEMA
options to DataFrameReader
via INFER_SCHEMA_OPTIONS
.parameters
parameter to Column.rlike
and Column.regexp
.df.cache_result()
in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by setting session.auto_clean_up_temp_table_enabled
to True
.fmt
parameter of snowflake.snowpark.functions.to_date
.*
column has an incorrect subquery.DataFrame.to_pandas_batches
where the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level.DataFrame.lineage.trace
to split the quoted feature view's name and version correctly.Column.isin
that caused invalid sql generation when passed an empty list.rank
dense_rank
percent_rank
cume_dist
ntile
datediff
array_agg
rlike
and regexp
changes above.ignore_nulls
properly.DataFrame.backfill
, DataFrame.bfill
, Series.backfill
, and Series.bfill
.DataFrame.compare
and Series.compare
with default parameters.Series.dt.microsecond
and Series.dt.nanosecond
.Index.is_unique
and Index.has_duplicates
.Index.equals
.Index.value_counts
.Series.dt.day_name
and Series.dt.month_name
.df.index[:10]
.DataFrame.unstack
and Series.unstack
.DataFrame.asfreq
and Series.asfreq
.Series.dt.is_month_start
and Series.dt.is_month_end
.Index.all
and Index.any
.Series.dt.is_year_start
and Series.dt.is_year_end
.Series.dt.is_quarter_start
and Series.dt.is_quarter_end
.DatetimeIndex
.Series.argmax
and Series.argmin
.Series.dt.is_leap_year
.DataFrame.items
.Series.dt.floor
and Series.dt.ceil
.Index.reindex
.DatetimeIndex
properties: year
, month
, day
, hour
, minute
, second
, microsecond
,
nanosecond
, date
, dayofyear
, day_of_year
, dayofweek
, day_of_week
, weekday
, quarter
,
is_month_start
, is_month_end
, is_quarter_start
, is_quarter_end
, is_year_start
, is_year_end
and is_leap_year
.Resampler.fillna
and Resampler.bfill
.Timedelta
type, including creating Timedelta
columns and to_pandas
.Index.argmax
and Index.argmin
.SnowflakeQueryCompiler.is_series_like
method.Dataframe.columns
now returns native pandas Index object instead of Snowpark Index object.query_compiler
argument in Index
constructor to create Index
from query compiler.pd.to_datetime
now returns a DatetimeIndex object instead of a Series object.pd.date_range
now returns a DatetimeIndex object instead of a Series object.pivot_table
raise NotImplementedError
instead of KeyError
.Series.drop_duplicates
and DataFrame.drop_duplicates
when called after sort_values
.Index.to_frame
where the result frame's column name may be wrong where name is unspecified.Series.reset_index(drop=True)
where the result name may be wrong.Groupby.first/last
ordering by the correct columns in the underlying window expression.DataFrame
:
_execute_and_get_query_id
arrays_zip
function.df._in
by avoiding unnecessary cast for numeric values. You can enable this optimization by setting session.eliminate_numeric_sql_value_cast_enabled = True
.write_pandas
when the target table does not exist and auto_create_table=False
.format_json
to the Session.SessionBuilder.app_name
function that sets the app name in the Session.query_tag
in JSON format. By default, this parameter is set to False
.lag(x, 0)
was incorrect and failed with error message argument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'
.patch
function when registering a mocked function:
distinct
allows an alternate function to be specified for when a sql function should be distinct.pass_column_index
passes a named parameter column_index
to the mocked function that contains the pandas.Index for the input data.pass_row_index
passes a named parameter row_index
to the mocked function that is the 0 indexed row number the function is currently operating on.pass_input_data
passes a named parameter input_data
to the mocked function that contains the entire input dataframe for the current expression.column_order
parameter to method DataFrameWriter.save_as_table
.DataFrameGroupBy.all
, SeriesGroupBy.all
, DataFrameGroupBy.any
, and SeriesGroupBy.any
.DataFrame.nlargest
, DataFrame.nsmallest
, Series.nlargest
and Series.nsmallest
.replace
and frac > 1
in DataFrame.sample
and Series.sample
.read_excel
(Uses local pandas for processing)Series.at
, Series.iat
, DataFrame.at
, and DataFrame.iat
.Series.dt.isocalendar
.Series.case_when
except when condition or replacement is callable.Index
and its APIs.DataFrame.assign
.DataFrame.stack
.DataFrame.pivot
and pd.pivot
.DataFrame.to_csv
and Series.to_csv
.Series.str.translate
where the values in the table
are single-codepoint strings.DataFrame.corr
.df.plot()
and series.plot()
to be called, materializing the data into the local clientDataFrameGroupBy
and SeriesGroupBy
aggregations first
and last
DataFrameGroupBy.get_group
.limit
parameter when method
parameter is used in fillna
.Series.str.translate
where the values in the table
are single-codepoint strings.DataFrame.corr
.DataFrame.equals
and Series.equals
.DataFrame.reindex
and Series.reindex
.Index.astype
.Index.unique
and Index.nunique
.Index.sort_values
.DataFrame
or Series
with dtype=np.uint64
.values
is set to index
when index
and columns
contain all columns in DataFrame during pivot_table
.Index.copy()
dtype
, values
, item()
, tolist()
, to_series()
and to_frame()
pd.pivot_table
and DataFrame.pivot_table
.inplace
parameter in DataFrame.sort_index
and Series.sort_index
.to_boolean
function.RecursionError: maximum recursion depth exceeded
when the DataFrame has more than 500 columns.AsyncJob.result("no_result")
doesn't wait for the query to finish execution.strict
parameter when registering UDFs and Stored Procedures.DateType
raises AttributeError
.to_char
that raises IndexError
when incoming column has nonconsecutive row index.CaseExpr
expressions that raises IndexError
when incoming column has nonconsecutive row index.Column.like
that raises IndexError
when incoming column has nonconsecutive row index.iff
.DataFrame.pct_change
and Series.pct_change
without the freq
and limit
parameters.Series.str.get
.Series.dt.dayofweek
, Series.dt.day_of_week
, Series.dt.dayofyear
, and Series.dt.day_of_year
.Series.str.__getitem__
(Series.str[...]
).Series.str.lstrip
and Series.str.rstrip
.DataFrameGroupBy.size
and SeriesGroupBy.size
.DataFrame.expanding
and Series.expanding
for aggregations count
, sum
, min
, max
, mean
, std
, var
, and sem
with axis=0
.DataFrame.rolling
and Series.rolling
for aggregation count
with axis=0
.Series.str.match
.DataFrame.resample
and Series.resample
for aggregations size
, first
, and last
.DataFrameGroupBy.all
, SeriesGroupBy.all
, DataFrameGroupBy.any
, and SeriesGroupBy.any
.DataFrame.nlargest
, DataFrame.nsmallest
, Series.nlargest
and Series.nsmallest
.replace
and frac > 1
in DataFrame.sample
and Series.sample
.read_excel
(Uses local pandas for processing)Series.at
, Series.iat
, DataFrame.at
, and DataFrame.iat
.Series.dt.isocalendar
.Series.case_when
except when condition or replacement is callable.Index
and its APIs.DataFrame.assign
.DataFrame.stack
.DataFrame.pivot
and pd.pivot
.DataFrame.to_csv
and Series.to_csv
.Index.T
.DataFrame.describe
on a frame with duplicate columns of differing dtypes could cause an error or incorrect results.DataFrame.rolling
and Series.rolling
so window=0
now throws NotImplementedError
instead of ValueError
DataFrame.aggregate
and Series.aggregate
with axis=0
.pd.read_csv
reads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported by read_csv
including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.pd.Index
directly in Snowpark pandas. Support for pd.Index
as a first-class component of Snowpark pandas is coming soon.len
, shape
, size
, empty
, to_pandas()
and names
. For df.index
, Snowpark pandas creates a lazy index object.df.columns
, Snowpark pandas supports a non-lazy version of an Index
since the data is already stored locally.{"infer_schema": True}
when reading csv file without specifying its schema.Session.create_dataframe
when called with more than 512 rows and using format
or pyformat
paramstyle
.DataFrame.cache_result
and Series.cache_result
methods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.DataFrame.pivot_table
with no index
parameter, as well as for margins
parameter.DataFrame.shift
/Series.shift
/DataFrameGroupBy.shift
/SeriesGroupBy.shift
to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix
argument, or sequence values of periods
.Series.str.split
.Series.str.*
).csv
and json
:
False
UTF8
DataFrame.analytics.moving_agg
and DataFrame.analytics.cumulative_agg_agg
.if_not_exists
parameter during UDF and stored procedure registration.*
to fail.date_add
was unable to handle some numeric types.TimestampType
casting resulted in incorrect data.DecimalType
data to have incorrect precision in some cases.IndexError
.to_timestamp_ntz
can not handle None data.DataFrame.with_column_renamed
ignores attributes from parent DataFrames after join operations.Column.equal_nan
where null data is handled incorrectly.DataFrame.drop
ignore attributes from parent DataFrames after join operations.date_part
where Column type is set wrong.DataFrameWriter.save_as_table
does not raise exceptions when inserting null data into non-nullable columns.DataFrameWriter.save_as_table
where
pyarrow
as it is not used.Column.cast
, adding support for casting to boolean and all integral types.is_permanent
and anonymous
options in UDFs and stored procedures registration to make it more clear that those features are not yet supported.NotImplementedError
instead of warnings and unclear error information.DataFrameWriter.save_as_table
DataFrame.create_or_replace_view
DataFrame.create_or_replace_temp_view
DataFrame.create_or_replace_dynamic_table
{"infer_schema": True}
when reading CSV file without specifying its schema.to_timestamp_ltz
, to_timestamp_ntz
, to_timestamp_tz
and to_timestamp
.to_char
.snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException
.sys.path
during the clean-up step.Session.get_current_[schema|database|role|user|account|warehouse]
returns upper-cased identifiers when identifiers are quoted.substr
and substring
can not handle 0-based start_expr
.SnowparkLocalTestingException
in error cases which is on par with SnowparkSQLException
raised in non-local execution.Session.write_pandas
method that NotImplementError
will be raised when called.to_date
.NaT
and NaN
values to not be recognized.DataFrameReader.csv
was unable to handle quoted values containing a delimiter.None
value in an arithmetic calculation, the output should remain None
instead of math.nan
.sum
and covar_pop
that when there is math.nan
in the data, the output should also be math.nan
.DataFrame.to_pandas
should take Snowflake numeric types with precision 38 as int64
.truncate
save mode in DataFrameWrite
to overwrite existing tables by truncating the underlying table instead of dropping it.DataFrame
into one or more files in a stage:
DataFrame.write.json
DataFrame.write.csv
DataFrame.write.parquet
DataFrame
and DataFrameWriter
:
snowflake.snowpark.Session.file.get
and snowflake.snowpark.Session.file.get_stream
comment
.session.cte_optimization_enabled
to True
.statement_params
was not passed to query executions that register stored procedures and user defined functions.snowflake.snowpark.Session.file.get_stream
to fail for quoted stage locations.utils.py
might raise AttributeError in case the underlying module can not be found.to_time
.Session.builder.getOrCreate
should return the created mock session.process
method.SnowflakePlanBuilder
that save_as_table
does not filter column that name start with '$' and follow by number correctly.field_optionally_enclosed_by
is specified.pattern
is a Column
.KeyError
when updating null values in the rows.DataFrame.collect
.count_distinct
does not work correctly when counting.TypeError
.DataFrameReader
to raise FileNotFound
error when reading a path that does not exist or when there are no files under the path.date_part
argument in function last_day
.SessionBuilder.app_name
will set the query_tag after the session is created.DataFrame.to_local_iterator
where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945.Session.range
returns empty result when the range is large.split_blocks=True
by default during to_pandas
conversion, for optimal memory allocation. This parameter is passed to pyarrow.Table.to_pandas
, which enables PyArrow
to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.DataFrame.to_pandas
that caused an error when evaluating on a Dataframe with an IntergerType
column with null values.statement_params
in StoredProcedure.__call__
.Session.add_import
.
chunk_size
: The number of bytes to hash per chunk of the uploaded files.whole_file_hash
: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.external_access_integrations
and secrets
when creating a UDAF from Snowpark Python to allow integration with external access.Session.append_query_tag
. Allows an additional tag to be added to the current query tag by appending it as a comma separated value.Session.update_query_tag
. Allows updates to a JSON encoded dictionary query tag.SessionBuilder.getOrCreate
will now attempt to replace the singleton it returns when token expiration has been detected.snowflake.snowpark.functions
:
array_except
create_map
sign
/signum
DataFrame.analytics
:
moving_agg
function in DataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.cummulative_agg
function in DataFrame.analytics
to enable commulative aggregations like sums and averages on multiple columns.compute_lag
and compute_lead
functions in DataFrame.analytics
for enabling lead and lag calculations on multiple columns.time_series_agg
function in DataFrame.analytics
to enable time series aggregations like sums and averages with multiple time windows.Fixed a bug in DataFrame.na.fill
that caused Boolean values to erroneously override integer values.
Fixed a bug in Session.create_dataframe
where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:
LongType()
, but will now be correctly maintained as timestamp values and be inferred as TimestampType(TimestampTimeZone.NTZ)
.TimestampType(TimestampTimeZone.NTZ)
and loose timezone information but will now be correctly inferred as TimestampType(TimestampTimeZone.LTZ)
and timezone information is retained correctly.PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME
to revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.Fixed a bug that DataFrame.to_pandas
gets decimal type when scale is not 0, and creates an object dtype in pandas
. Instead, we cast the value to a float64 type.
Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()
is called after DataFrame.sort().limit()
.DataFrame.sort()
or filter()
is called on a DataFrame that already has a window function or sequence-dependent data generator column.
For instance, df.select("a", seq1().alias("b")).select("a", "b").sort("a")
won't flatten the sort clause anymore.DataFrame.limit()
. For instance, df.limit(10).select(row_number().over())
won't flatten the limit and select in the generated SQL.Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b"))
df = copy(df)
df.select(col("b").alias("c")) # threw an error. Now it's fixed.
Fixed a bug in Session.create_dataframe
that the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.
Fixed a bug in SQL simplifier where non-select statements in session.sql
dropped a SQL query when used with limit()
.
Fixed a bug that raised an exception when session parameter ERROR_ON_NONDETERMINISTIC_UPDATE
is true.
to_pandas
operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned as int8
gets returned as int64
. Users can fix this by explicitly specifying precision values for their return column.Session.call
in case of table stored procedures where running Session.call
would not trigger stored procedure unless a collect()
operation was performed.StoredProcedureRegistration
will now automatically add snowflake-snowpark-python
as a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.snowflake.snowpark.functions
:
from_utc_timestamp
to_utc_timestamp
Add the conn_error
attribute to SnowflakeSQLException
that stores the whole underlying exception from snowflake-connector-python
.
Added support for RelationalGroupedDataframe.pivot()
to access pivot
in the following pattern Dataframe.group_by(...).pivot(...)
.
Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
Added support for arrays_to_object
new functions in snowflake.snowpark.functions
.
Added support for the vector data type.
cloudpickle==2.2.1
snowflake-connector-python
to 3.4.0
.session.read.with_metadata
creates inconsistent table when doing df.write.save_as_table
.DataFrame.to_local_iterator()
.input_names
in UDTFRegistration.register/register_file
and functions.pandas_udtf
. By default, RelationalGroupedDataFrame.applyInPandas
will infer the column names from current dataframe schema.sql_error_code
and raw_message
attributes to SnowflakeSQLException
when it is caused by a SQL exception.DataFrame.to_pandas()
where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits.session.add_packages
can not handle requirement specifier that contains project name with underscore and version.DataFrame.limit()
when offset
is used and the parent DataFrame
uses limit
. Now the offset
won't impact the parent DataFrame's limit
.DataFrame.write.save_as_table
where dataframes created from read api could not save data into snowflake because of invalid column name $1
.date_format
:
format
argument changed from optional to required.normal
, zipf
, uniform
, seq1
, seq2
, seq4
, seq8
) function is used, the sort and filter operation will no longer be flattened when generating the query.typing-extensions
.Dataframe.writer.save_as_table
which does not need insert permission for writing tables.PythonObjJSONEncoder
json-serializable objects for ARRAY
and OBJECT
literals.Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.
Added support for specifying clustering keys when saving dataframes using DataFrame.save_as_table
.
Accept Iterable
objects input for schema
when creating dataframes using Session.create_dataframe
.
Added the property DataFrame.session
to return a Session
object.
Added the property Session.session_id
to return an integer that represents session ID.
Added the property Session.connection
to return a SnowflakeConnection
object .
Added support for creating a Snowpark session from a configuration file or environment variables.
snowflake-connector-python
to 3.2.0.ValueError
even when compatible package version were added in session.add_packages
.register_from_file
.invalid_identifier
error.DataFrame.copy
disables SQL simplfier for the returned copy.session.sql().select()
would fail if any parameters are specified to session.sql()
external_access_integrations
and secrets
when creating a UDF, UDTF or Stored Procedure from Snowpark Python to allow integration with external access.snowflake.snowpark.functions
:
array_flatten
flatten
apply_in_pandas
in snowflake.snowpark.relational_grouped_dataframe
.Session.replicate_local_environment
.session.create_dataframe
fails to properly set nullable columns where nullability was affected by order or data was given.DataFrame.select
could not identify and alias columns in presence of table functions when output columns of table function overlapped with columns in dataframe.is_permanent=False
will now create temporary objects even when stage_name
is provided. The default value of is_permanent
is False
which is why if this value is not explicitly set to True
for permanent objects, users will notice a change in behavior.types.StructField
now enquotes column identifier by default.snowflake.snowpark.functions
:
array_sort
sort_array
array_min
array_max
explode_outer
Session.add_requirements
or Session.add_packages
. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.
custom_packages_upload_enabled
and custom_packages_force_upload_enabled
to enable the support for pure Python packages feature mentioned above. Both parameters default to False
.Session.add_requirements
.DataFrame.rename
.params
in session.sql()
in stored procedures.TIMESTAMP_NTZ
, TIMESTAMP_LTZ
, TIMESTAMP_TZ
)
TimestampTimezone
as an argument in TimestampType
constructor.NTZ
, LTZ
, TZ
and Timestamp
to annotate functions when registering UDFs.typing-extensions
.DataFrame.cache_result
now creates temp table fully qualified names under current database and current schema.numpy.ufunc
.DataFrame.union
was not generating the correct Selectable.schema_query
when SQL simplifier is enabled.DataFrameWriter.save_as_table
now respects the nullable
field of the schema provided by the user or the inferred schema based on data from user input.snowflake-connector-python
to 3.0.4.DataFrame.agg
and DataFrame.describe
, no longer strip away non-printing characters from column names.snowflake.snowpark.functions
:
array_generate_range
array_unique_agg
collect_set
sequence
TABLE
return type.length
in StringType()
to specify the maximum number of characters that can be stored by the column.functions.element_at()
for functions.get()
.Column.contains
for functions.contains
.DataFrame.alias
.DataFrame
using DataFrameReader
.StructType.add
to append more fields to existing StructType
objects.execute_as
in StoredProcedureRegistration.register_from_file()
to specify stored procedure caller rights.Dataframe.join_table_function
did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.ColumnOrName
, ColumnOrLiteralStr
, ColumnOrSqlExpr
, LiteralType
and ColumnOrLiteral
that were breaking mypy
checks.DataFrameWriter.save_as_table
and DataFrame.copy_into_table
failed to parse fully qualified table names.session.getOrCreate
.Column.getField
.snowflake.snowpark.functions
:
date_add
and date_sub
to make add and subtract operations easier.daydiff
explode
array_distinct
.regexp_extract
.struct
.format_number
.bround
.substring_index
skip_upload_on_content_match
when creating UDFs, UDTFs and stored procedures using register_from_file
to skip uploading files to a stage if the same version of the files are already on the stage.DataFrameWriter.save_as_table
method to take table names that contain dots.DataFrame.filter()
or DataFrame.order_by()
is followed by a projection statement (e.g. DataFrame.select()
, DataFrame.with_column()
).Dataframe.create_or_replace_dynamic_table
.params
in session.sql()
to support binding variables. Note that this is not supported in stored procedures yet.strtok_to_array
where an exception was thrown when a delimiter was passed in.session.add_import
where the module had the same namespace as other dependencies.delimiters
parameter in functions.initcap()
.functions.hash()
to accept a variable number of input expressions.Session.RuntimeConfig
for getting/setting/checking the mutability of any runtime configuration.Row
results from DataFrame.collect
using case_sensitive
parameter.Session.conf
for getting, setting or checking the mutability of any runtime configuration.Row
results from DataFrame.collect
using case_sensitive
parameter.snowflake.snowpark.types.StructType
.log_on_exception
to Dataframe.collect
and Dataframe.collect_no_wait
to optionally disable error logging for SQL exceptions.DataFrame.substract
, DataFrame.union
, etc.) being called after another DataFrame set operation and DataFrame.select
or DataFrame.with_column
throws an exception.SNOWPARK_LEFT
, SNOWPARK_RIGHT
) by default. Users can disable this at runtime with session.conf.set('use_constant_subquery_alias', False)
to use randomly generated alias names instead.session.call()
.source_code_display=False
at registration.if_not_exists
when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.snowflake.snowpark.functions.get
to extract value from array.functions.reverse
in functions to open access to Snowflake built-in function
reverse.require_scoped_url
in snowflake.snowflake.files.SnowflakeFile.open() (in Private Preview)
to replace is_owner_file
is marked for deprecation.paramstyle
to qmark
when creating a Snowpark session.df.join(..., how="cross")
fails with SnowparkJoinException: (1112): Unsupported using join type 'Cross'
.DataFrame
column created from chained function calls used a wrong column name.asc
, asc_nulls_first
, asc_nulls_last
, desc
, desc_nulls_first
, desc_nulls_last
, date_part
and unix_timestamp
in functions.DataFrame.dtypes
to return a list of column name and data type pairs.functions.expr()
for functions.sql_expr()
.functions.date_format()
for functions.to_date()
.functions.monotonically_increasing_id()
for functions.seq8()
functions.from_unixtime()
for functions.to_timestamp()
PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
is True
after Snowflake 7.3 was released. In snowpark-python, session.sql_simplifier_enabled
reads the value of PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
by default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, set PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
in Snowflake to False
or run session.sql_simplifier_enabled = False
from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.Session.generator()
to create a new DataFrame
using the Generator table function.secure
to the functions that create a secure UDF or UDTF.Session.create_async_job()
to create an AsyncJob
instance from a query id.AsyncJob.result()
now accepts argument result_type
to return the results in different formats.AsyncJob.to_df()
returns a DataFrame
built from the result of this asynchronous job.AsyncJob.query()
returns the SQL text of the executed query.DataFrame.agg()
and RelationalGroupedDataFrame.agg()
now accept variable-length arguments.lsuffix
and rsuffix
to DataFram.join()
and DataFrame.cross_join()
to conveniently rename overlapping columns.Table.drop_table()
so you can drop the temp table after DataFrame.cache_result()
. Table
is also a context manager so you can use the with
statement to drop the cache temp table after use.Session.use_secondary_roles()
.first_value()
and last_value()
. (contributed by @chasleslr)on
as an alias for using_columns
and how
as an alias for join_type
in DataFrame.join()
.Session.create_dataframe()
that raised an error when schema
names had special characters.Session.read.option()
were not passed to DataFrame.copy_into_table()
as default values.DataFrame.copy_into_table()
raises an error when a copy option has single quotes in the value.Session.add_packages()
now raises ValueError
when the version of a package cannot be found in Snowflake Anaconda channel. Previously, Session.add_packages()
succeeded, and a SnowparkSQLException
exception was raised later in the UDF/SP registration step.FileOperation.get_stream()
to support downloading stage files as stream.functions.ntiles()
to accept int argument.functions.call_function()
for functions.call_builtin()
.functions.function()
for functions.builtin()
.DataFrame.order_by()
for DataFrame.sort()
DataFrame.orderBy()
for DataFrame.sort()
DataFrame.cache_result()
to return a more accurate Table
class instead of a DataFrame
class.session
as the first argument when calling StoredProcedure
.Session.sql_simplifier_enabled = True
.DataFrame.select()
, DataFrame.with_column()
, DataFrame.drop()
and other select-related APIs have more flattened SQLs.DataFrame.union()
, DataFrame.union_all()
, DataFrame.except_()
, DataFrame.intersect()
, DataFrame.union_by_name()
have flattened SQLs generated when multiple set operators are chained.Table.update()
, Table.delete()
, Table.merge()
try to reference a temp table that does not exist.block
to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:
DataFrame.collect()
, DataFrame.to_local_iterator()
, DataFrame.to_pandas()
, DataFrame.to_pandas_batches()
, DataFrame.count()
, DataFrame.first()
.DataFrameWriter.save_as_table()
, DataFrameWriter.copy_into_location()
.Table.delete()
, Table.update()
, Table.merge()
.DataFrame.collect_nowait()
to allow asynchronous evaluations.AsyncJob
to retrieve results from asynchronously executed queries and check their status.table_type
in Session.write_pandas()
. You can now choose from these table_type
options: "temporary"
, "temp"
, and "transient"
.list
, tuple
and dict
) as literal values in Snowpark.execute_as
to functions.sproc()
and session.sproc.register()
to allow registering a stored procedure as a caller or owner.DataFrame.copy_into_table()
and DataFrameWriter.save_as_table()
mistakenly created a new table if the table name is fully qualified, and the table already exists.create_temp_table
in Session.write_pandas()
.snowflake-connector-python
to 2.7.12.source_code_display
as False
when calling register()
or @udf()
.DataFrame.select()
, DataFrame.with_column()
and DataFrame.with_columns()
which now take parameters of type table_function.TableFunctionCall
for columns.overwrite
to session.write_pandas()
to allow overwriting contents of a Snowflake table with that of a pandas DataFrame.column_order
to df.write.save_as_table()
to specify the matching rules when inserting data into table in append mode.FileOperation.put_stream()
to upload local files to a stage via file stream.TableFunctionCall.alias()
and TableFunctionCall.as_()
to allow aliasing the names of columns that come from the output of table function joins.get_active_session()
in module snowflake.snowpark.context
to get the current active Snowpark session.statement_params
is not passed to the function.session.create_dataframe()
is called with dicts and a given schema.df.write.save_as_table()
.function.uniform()
to infer the types of inputs max_
and min_
and cast the limits to IntegerType
or FloatType
correspondingly.statement_params
to the following methods to allow for specifying statement level parameters:
collect
, to_local_iterator
, to_pandas
, to_pandas_batches
,
count
, copy_into_table
, show
, create_or_replace_view
, create_or_replace_temp_view
, first
, cache_result
and random_split
on class snowflake.snowpark.Dateframe
.update
, delete
and merge
on class snowflake.snowpark.Table
.save_as_table
and copy_into_location
on class snowflake.snowpark.DataFrameWriter
.approx_quantile
, statement_params
, cov
and crosstab
on class snowflake.snowpark.DataFrameStatFunctions
.register
and register_from_file
on class snowflake.snowpark.udf.UDFRegistration
.register
and register_from_file
on class snowflake.snowpark.udtf.UDTFRegistration
.register
and register_from_file
on class snowflake.snowpark.stored_procedure.StoredProcedureRegistration
.udf
, udtf
and sproc
in snowflake.snowpark.functions
.Column
as an input argument to session.call()
.table_type
in df.write.save_as_table()
. You can now choose from these table_type
options: "temporary"
, "temp"
, and "transient"
.session.use_*
methods.session.create_dataframe()
.session.create_dataframe()
mistakenly converted 0 and False
to None
when the input data was only a list.session.create_dataframe()
using a large local dataset sometimes created a temp table twice.function.trim()
with the SQL function definition.sum
vs. the Snowpark function.sum()
.create_temp_table
in df.write.save_as_table()
.snowflake.snowpark.functions.udtf()
to register a UDTF, or use it as a decorator to register the UDTF.
Session.udtf.register()
to register a UDTF.Session.udtf.register_from_file()
to register a UDTF from a Python file.snowflake.snowpark.functions.table_function()
to create a callable representing a table function and use it to call the table function in a query.snowflake.snowpark.functions.call_table_function()
to call a table function.over
clause that specifies partition by
and order by
when lateral joining a table function.Session.table_function()
and DataFrame.join_table_function()
to accept TableFunctionCall
instances.functions.udf()
and functions.sproc()
, you can now specify an empty list for the imports
or packages
argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages.__repr__
implementation of data types in types.py
. The unused type_name
property has been removed.ProgrammingError
from the Python connector.DataFrame.to_pandas()
.DataFrameReader.parquet()
failed to read a parquet file when its column contained spaces.DataFrame.copy_into_table()
failed when the dataframe is created by reading a file with inferred schemas.Session.flatten()
and DataFrame.flatten()
.
cloudpickle
<= 2.0.0
.current_session()
, current_statement()
, current_user()
, current_version()
, current_warehouse()
, date_from_parts()
, date_trunc()
, dayname()
, dayofmonth()
, dayofweek()
, dayofyear()
, grouping()
, grouping_id()
, hour()
, last_day()
, minute()
, next_day()
, previous_day()
, second()
, month()
, monthname()
, quarter()
, year()
, current_database()
, current_role()
, current_schema()
, current_schemas()
, current_region()
, current_avaliable_roles()
, add_months()
, any_value()
, bitnot()
, bitshiftleft()
, bitshiftright()
, convert_timezone()
, uniform()
, strtok_to_array()
, sysdate()
, time_from_parts()
, timestamp_from_parts()
, timestamp_ltz_from_parts()
, timestamp_ntz_from_parts()
, timestamp_tz_from_parts()
, weekofyear()
, percentile_cont()
to snowflake.snowflake.functions
.DataFrame.groupByGroupingSets()
, DataFrame.naturalJoin()
, DataFrame.joinTableFunction
, DataFrame.withColumns()
, Session.getImports()
, Session.addImport()
, Session.removeImport()
, Session.clearImports()
, Session.getSessionStage()
, Session.getDefaultDatabase()
, Session.getDefaultSchema()
, Session.getCurrentDatabase()
, Session.getCurrentSchema()
, Session.getFullyQualifiedCurrentSchema()
.DataFrame
with a specific schema using the Session.create_dataframe()
method.INFO
to DEBUG
for several logs (e.g., the executed query) when evaluating a dataframe.Session.create_dataframe()
method.typing-extension
as a new dependency with the version >= 4.1.0
.Session.sproc
property and sproc()
to snowflake.snowpark.functions
, so you can register stored procedures.Session.call
to call stored procedures by name.UDFRegistration.register_from_file()
to allow registering UDFs from Python source files or zip files directly.UDFRegistration.describe()
to describe a UDF.DataFrame.random_split()
to provide a way to randomly split a dataframe.md5()
, sha1()
, sha2()
, ascii()
, initcap()
, length()
, lower()
, lpad()
, ltrim()
, rpad()
, rtrim()
, repeat()
, soundex()
, regexp_count()
, replace()
, charindex()
, collate()
, collation()
, insert()
, left()
, right()
, endswith()
to snowflake.snowpark.functions
.call_udf()
to accept literal values.distinct
keyword in array_agg()
.DataFrame.to_pandas()
to have a string column if Column.cast(IntegerType())
was used.DataFrame.describe()
when there is more than one string column.add_packages()
, get_packages()
, clear_packages()
, and remove_package()
, to class Session
.add_requirements()
to Session
so you can use a requirements file to specify which packages this session will use.packages
to function snowflake.snowpark.functions.udf()
and method UserDefinedFunction.register()
to indicate UDF-level Anaconda package dependencies when creating a UDF.imports
to snowflake.snowpark.functions.udf()
and UserDefinedFunction.register()
to specify UDF-level code imports.session
to function udf()
and UserDefinedFunction.register()
so you can specify which session to use to create a UDF if you have multiple sessions.Geography
and Variant
to snowflake.snowpark.types
to be used as type hints for Geography and Variant data when defining a UDF.Table
, a subclass of DataFrame
for table operations:
update
and delete
update and delete rows of a table in Snowflake.merge
merges data from a DataFrame
to a Table
.DataFrame.sample()
with an additional parameter seed
, which works on tables but not on view and sub-queries.DataFrame.to_local_iterator()
and DataFrame.to_pandas_batches()
to allow getting results from an iterator when the result set returned from the Snowflake database is too large.DataFrame.cache_result()
for caching the operations performed on a DataFrame
in a temporary table.
Subsequent operations on the original DataFrame
have no effect on the cached result DataFrame
.DataFrame.queries
to get SQL queries that will be executed to evaluate the DataFrame
.Session.query_history()
as a context manager to track SQL queries executed on a session, including all SQL queries to evaluate DataFrame
s created from a session. Both query ID and query text are recorded.Session
instance from an existing established snowflake.connector.SnowflakeConnection
. Use parameter connection
in Session.builder.configs()
.use_database()
, use_schema()
, use_warehouse()
, and use_role()
to class Session
to switch database/schema/warehouse/role after a session is created.DataFrameWriter.copy_into_table()
to unload a DataFrame
to stage files.DataFrame.unpivot()
.Column.within_group()
for sorting the rows by columns with some aggregation functions.listagg()
, mode()
, div0()
, acos()
, asin()
, atan()
, atan2()
, cos()
, cosh()
, sin()
, sinh()
, tan()
, tanh()
, degrees()
, radians()
, round()
, trunc()
, and factorial()
to snowflake.snowflake.functions
.ignore_nulls
in function lead()
and lag()
.condition
parameter of function when()
and iff()
now accepts SQL expressions.Session
and replaced them with their snake case equivalents: getImports()
, addImports()
, removeImport()
, clearImports()
, getSessionStage()
, getDefaultSchema()
, getDefaultSchema()
, getCurrentDatabase()
, getFullyQualifiedCurrentSchema()
.DataFrame
and replaced them with their snake case equivalents: groupingByGroupingSets()
, naturalJoin()
, withColumns()
, joinTableFunction()
.DataFrame.columns
is now consistent with DataFrame.schema.names
and the Snowflake database Identifier Requirements
.Column.__bool__()
now raises a TypeError
. This will ban the use of logical operators and
, or
, not
on Column
object, for instance col("a") > 1 and col("b") > 2
will raise the TypeError
. Use (col("a") > 1) & (col("b") > 2)
instead.PutResult
and GetResult
to subclass NamedTuple
.DataFrame.describe()
so that non-numeric and non-string columns are ignored instead of raising an exception.snowflake-connector-python
to 2.7.4.Column.isin()
, with an alias Column.in_()
.Column.try_cast()
, which is a special version of cast()
. It tries to cast a string expression to other types and returns null
if the cast is not possible.Column.startswith()
and Column.substr()
to process string columns.Column.cast()
now also accepts a str
value to indicate the cast type in addition to a DataType
instance.DataFrame.describe()
to summarize stats of a DataFrame
.DataFrame.explain()
to print the query plan of a DataFrame
.DataFrame.filter()
and DataFrame.select_expr()
now accepts a sql expression.bool
parameter create_temp_table
to methods DataFrame.saveAsTable()
and Session.write_pandas()
to optionally create a temp table.DataFrame.minus()
and DataFrame.subtract()
as aliases to DataFrame.except_()
.regexp_replace()
, concat()
, concat_ws()
, to_char()
, current_timestamp()
, current_date()
, current_time()
, months_between()
, cast()
, try_cast()
, greatest()
, least()
, and hash()
to module snowflake.snowpark.functions
.Session.createDataFrame(pandas_df)
and Session.write_pandas(pandas_df)
raise an exception when the pandas DataFrame
has spaces in the column name.DataFrame.copy_into_table()
sometimes prints an error
level log entry while it actually works. It's fixed now.DataFrame
APIs are missing from the docs.snowflake-connector-python
to 2.7.2, which upgrades pyarrow
dependency to 6.0.x. Refer to the python connector 2.7.2 release notes for more details.Session.createDataFrame()
method for creating a DataFrame
from a pandas DataFrame.Session.write_pandas()
method for writing a pandas DataFrame
to a table in Snowflake and getting a Snowpark DataFrame
object back.cume_dist()
, to find the cumulative distribution of a value with regard to other values within a window partition,
and row_number()
, which returns a unique row number for each row within a window partition.DataFrameStatFunctions
class.DataFrameNaFunctions
class.rollup()
, cube()
, and pivot()
to the DataFrame
class.GroupingSets
class, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS.FileOperation(session)
class that you can use to upload and download files to and from a stage.DataFrame.copy_into_table()
method for loading data from files in a stage into a table.when()
and otherwise()
now accept Python types in addition to Column
objects.replace
parameter to True
to overwrite an existing UDF with the same name.df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)]
raised an exception.df.toPandas()
raised an exception when a DataFrame was created from large local data.Start of Private Preview
FAQs
Snowflake Snowpark for Python
We found that snowflake-snowpark-python demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.