
Security Fundamentals
Turtles, Clams, and Cyber Threat Actors: Shell Usage
The Socket Threat Research Team uncovers how threat actors weaponize shell techniques across npm, PyPI, and Go ecosystems to maintain persistence and exfiltrate data.
By Ryan Gerard Wilson (https://ryangerardwilson.com)
This library is meant to protect your eyes (and brain) from OOP syntax. It is unbelievably sad that some of the best work done in creating math and data science libraries in Python has been corrupted by the OOP mind-virus.
By creating a functional-programming wrapper around these libraries, we aim to soothe. This library assumes a Linux environment and the existence of a .rgwfuncsrc
file for certain features (like database querying, sending data to Slack, etc.).
Install the package using:
pip install rgwfuncs
.rgwfuncsrc
FileA .rgwfuncsrc
file (located at `vi ~/.rgwfuncsrc) is required for MSSQL, CLICKHOUSE, MYSQL, GOOGLE BIG QUERY, SLACK, TELEGRAM, and GMAIL integrations.
{
"db_presets" : [
{
"name": "mssql_db9",
"db_type": "mssql",
"host": "",
"username": "",
"password": "",
"database": ""
},
{
"name": "clickhouse_db7",
"db_type": "clickhouse",
"host": "",
"username": "",
"password": "",
"database": ""
},
{
"name": "mysql_db2",
"db_type": "mysql",
"host": "",
"username": "",
"password": "",
"database": ""
},
{
"name": "bq_db1",
"db_type": "google_big_query",
"json_file_path": "",
"project_id": ""
},
{
"name": "athena_db1",
"db_type": "aws_athena",
"aws_access_key": "",
"aws_secret_key": "",
"aws_region: "",
"database": "logs",
"output_bucket": "s3://bucket-name"
}
],
"vm_presets": [
{
"name": "main_server",
"host": "",
"ssh_user": "",
"ssh_key_path": ""
}
],
"cloud_storage_presets": [
{
"name": "gcs_bucket_name",
"credential_path": "/path/to/your/credentials.json"
}
],
"telegram_bot_presets": [
{
"name": "rgwml-bot",
"chat_id": "",
"bot_token": ""
}
],
"slack_bot_presets": [
{
"name": "labs-channel",
"channel_id": "",
"bot_token": ""
}
],
"gmail_bot_presets": [
{
"name": "info@xyz.com",
"service_account_credentials_path": "/path/to/your/credentials.json"
}
]
}
Import the library:
import rgwfuncs
View available function docstrings in alphabetical order:
rgwfuncs.docs()
View specific docstrings by providing a filter (comma-separated). For example, to display docstrings about "numeric_clean":
rgwfuncs.docs(method_type_filter='numeric_clean')
To display all docstrings, use:
rgwfuncs.docs(method_type_filter='*')
Print a list of available function names in alphabetical order. If a filter is provided, print the docstrings of functions containing the term.
• Parameters:
method_type_filter
(str): Optional, comma-separated to select docstring types, or '*' for all.• Example:
from rgwfuncs import docs
docs(method_type_filter='numeric_clean,limit_dataframe')
This section includes functions that facilitate launching an interactive Python shell to inspect and modify local variables within the user's environment.
interactive_shell
Launches an interactive prompt for inspecting and modifying local variables, making all methods in the rgwfuncs library available by default. This REPL (Read-Eval-Print Loop) environment supports command history and autocompletion, making it easier to interact with your Python code. This function is particularly useful for debugging purposes when you want real-time interaction with your program's execution environment.
• Parameters:
local_vars
(dict, optional): A dictionary of local variables to be accessible within the interactive shell. If not provided, defaults to an empty dictionary.• Usage:
• Example:
from rgwfuncs import interactive_shell
import pandas as pd
import numpy as np
# Example DataFrame
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'age': [30, 25, 35, 28, 22],
'city': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', 'Boston']
})
# Launch the interactive shell with local variables
interactive_shell(locals())
Subsequently, in the interactive shell you can use any library in your python file, as well as all rgwfuncs methods (even if they are not imported). Notice, that while pandas and numpy are available in the shell as a result of importing them in the above script, the rgwfuncs method first_n_rows
was not imported - yet is available for use.
Welcome to the rgwfuncs interactive shell.
>>> pirst_n_rows(df, 2)
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'pirst_n_rows' is not defined. Did you mean: 'first_n_rows'?
>>> first_n_rows(df, 2)
{'age': '30', 'city': 'New York', 'id': '1', 'name': 'Alice'}
{'age': '25', 'city': 'Los Angeles', 'id': '2', 'name': 'Bob'}
>>> print(df)
id name age city
0 1 Alice 30 New York
1 2 Bob 25 Los Angeles
2 3 Charlie 35 Chicago
3 4 David 28 San Francisco
4 5 Eva 22 Boston
>>> arr = np.array([1, 2, 3, 4, 5])
>>> arr
array([1, 2, 3, 4, 5])
This section provides comprehensive functions for handling algebraic expressions, performing tasks such as computation, simplification, solving equations, and prime factorization, all outputted in LaTeX format.
compute_prime_factors
Computes prime factors of a number and presents them in LaTeX format.
• Parameters:
n
(int): The integer to factorize.• Returns:
str
: Prime factorization in LaTeX.• Example:
from rgwfuncs import compute_prime_factors
factors_1 = compute_prime_factors(100)
print(factors_1) # Output: "2^{2} \cdot 5^{2}"
factors_2 = compute_prime_factors(60)
print(factors_2) # Output: "2^{2} \cdot 3 \cdot 5"
factors_3 = compute_prime_factors(17)
print(factors_3) # Output: "17"
compute_constant_expression
Computes the numerical result of a given expression, which can evaluate to a constant, represented as a float. Evaluates an constant expression provided as a string and returns the computed result. Supports various arithmetic operations, including addition, subtraction, multiplication, division, and modulo, as well as mathematical functions from the math module.
• Parameters:
expression
(str): The constant expression to compute. This should be a string consisting of arithmetic operations and Python's math module functions.• Returns:
float
: The computed numerical result.• Example:
from rgwfuncs import compute_constant_expression
result1 = compute_constant_expression("2 + 2")
print(result1) # Output: 4.0
result2 = compute_constant_expression("10 % 3")
print(result2) # Output: 1.0
result3 = compute_constant_expression("math.gcd(36, 60) * math.sin(math.radians(45)) * 10000")
print(result3) # Output: 84852.8137423857
compute_constant_expression_involving_matrices
Computes the result of a constant expression involving matrices and returns it as a LaTeX string.
• Parameters:
expression
(str): The constant expression involving matrices. Example format includes operations such as "+", "-", "*", "/".• Returns:
str
: The LaTeX-formatted string representation of the computed matrix, or an error message if the operations cannot be performed due to dimensional mismatches.• Example:
from rgwfuncs import compute_constant_expression_involving_matrices
# Example with addition of 2D matrices
result = compute_constant_expression_involving_matrices("[[2, 6, 9], [1, 3, 5]] + [[1, 2, 3], [4, 5, 6]]")
print(result) # Output: \begin{bmatrix}3 & 8 & 12\\5 & 8 & 11\end{bmatrix}
# Example of mixed operations with 1D matrices treated as 2D
result = compute_constant_expression_involving_matrices("[3, 6, 9] + [1, 2, 3] - [2, 2, 2]")
print(result) # Output: \begin{bmatrix}2 & 6 & 10\end{bmatrix}
# Example with dimension mismatch
result = compute_constant_expression_involving_matrices("[[4, 3, 51]] + [[1, 1]]")
print(result) # Output: Operations between matrices must involve matrices of the same dimension
compute_constant_expression_involving_ordered_series
Computes the result of a constant expression involving ordered series, and returns it as a Latex string.
• Parameters:
expression
(str): A series operation expression. Supports operations such as "+", "-", "*", "/", and dd()
for discrete differences.• Returns:
str
: The string representation of the resultant series after performing operations, or an error message if series lengths do not match.• Example:
from rgwfuncs import compute_constant_expression_involving_ordered_series
# Example with addition and discrete differences
result = compute_constant_expression_involving_ordered_series("dd([2, 6, 9, 60]) + dd([78, 79, 80])")
print(result) # Output: [4, 3, 51] + [1, 1]
# Example with elementwise subtraction
result = compute_constant_expression_involving_ordered_series("[10, 15, 21] - [5, 5, 5]")
print(result) # Output: [5, 10, 16]
# Example with length mismatch
result = compute_constant_expression_involving_ordered_series("[4, 3, 51] + [1, 1]")
print(result) # Output: Operations between ordered series must involve series of equal length
python_polynomial_expression_to_latex
Converts a polynomial expression written in Python syntax to a LaTeX formatted string. This function parses algebraic expressions provided as strings using Python’s syntax and translates them into equivalent LaTeX representations, making them suitable for academic or professional documentation. The function supports inclusion of named variables, with an option to substitute specific values into the expression.
• Parameters:
expression
(str): The algebraic expression to convert to LaTeX. This should be a string formatted with Python syntax acceptable by SymPy.subs
(Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers with which to substitute those variables.• Returns:
str
: The LaTeX formatted string equivalent to the provided expression.• Raises:
ValueError
: If the expression cannot be parsed due to syntax errors.• Example:
from rgwfuncs import python_polynomial_expression_to_latex
# Convert a simple polynomial expression to LaTeX format
latex_result1 = python_polynomial_expression_to_latex("x**2 + y**2")
print(latex_result1) # Output: "x^{2} + y^{2}"
# Convert polynomial expression with substituted values
latex_result2 = python_polynomial_expression_to_latex("x**2 + y**2", {"x": 3, "y": 4})
print(latex_result2) # Output: "25"
# Another example with partial substitution
latex_result3 = python_polynomial_expression_to_latex("x**2 + y**2", {"x": 3})
print(latex_result3) # Output: "y^{2} + 9"
# Trigonometric functions included with symbolic variables
latex_result4 = python_polynomial_expression_to_latex("sin(x+z**2) + cos(y)", {"x": 55})
print(latex_result4) # Output: "cos y + sin \\left(z^{2} + 55\\right)"
# Simplified trigonometric functions example with substitution
latex_result5 = python_polynomial_expression_to_latex("sin(x) + cos(y)", {"x": 0})
print(latex_result5) # Output: "cos y"
expand_polynomial_expression
Expands a polynomial expression written in Python syntax and converts it into a LaTeX formatted string. This function takes algebraic expressions provided as strings using Python's syntax, applies polynomial expansion through SymPy, and translates them into LaTeX representations, suitable for academic or professional documentation. It supports expressions with named variables and provides an option to substitute specific values into the expression before expansion.
• Parameters:
expression
(str): The algebraic expression to expand and convert to LaTeX. This string should be formatted using Python syntax acceptable by SymPy.subs
(Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers with which to substitute those variables before expanding.• Returns:
str
: The LaTeX formatted string of the expanded expression.• Raises:
ValueError
: If the expression cannot be parsed due to syntax errors.• Example:
from rgwfuncs import expand_polynomial_expression
# Expand a simple polynomial expression and convert to LaTeX
latex_result1 = expand_polynomial_expression("(x + y)**2")
print(latex_result1) # Output: "x^{2} + 2 x y + y^{2}"
# Expand polynomial expression with substituted values
latex_result2 = expand_polynomial_expression("(x + y)**2", {"x": 3, "y": 4})
print(latex_result2) # Output: "49"
# Another example with partial substitution
latex_result3 = expand_polynomial_expression("(x + y)**2", {"x": 3})
print(latex_result3) # Output: "y^{2} + 6 y + 9"
# Handling trigonometric functions with symbolic variables
latex_result4 = expand_polynomial_expression("sin(x + z**2) + cos(y)", {"x": 55})
print(latex_result4) # Output: "cos y + sin \\left(z^{2} + 55\\right)"
factor_polynomial_expression
Factors a polynomial expression written in Python syntax and converts it into a LaTeX formatted string. This function parses an algebraic expression, performs polynomial factoring using SymPy, and converts the factored expression into a LaTeX representation, ideal for academic or professional use. Optional substitutions can be made before factoring.
• Parameters:
expression
(str): The polynomial expression to factor and convert to LaTeX. This should be a valid expression formatted using Python syntax.subs
(Optional[Dict[str, float]]): An optional dictionary of substitutions. The keys are variable names in the expression, and the values are numbers that replace these variables.• Returns:
str
: The LaTeX formatted string representing the factored expression.• Raises:
ValueError
: If the expression cannot be parsed due to syntax errors.• Example:
from rgwfuncs import factor_polynomial_expression
# Factor a polynomial expression and convert to LaTeX
latex_result1 = factor_polynomial_expression("x**2 - 4")
print(latex_result1) # Output: "\left(x - 2\right) \left(x + 2\right)"
# Factor with substituted values
latex_result2 = factor_polynomial_expression("x**2 - y**2", {"y": 3})
print(latex_result2) # Output: "\left(x - 3\right) \left(x + 3\right)"
simplify_polynomial_expression
Simplifies an algebraic expression in polynomial form and returns it in LaTeX format. Takes an algebraic expression, in polynomial form, written in Python syntax and simplifies it. The result is returned as a LaTeX formatted string, suitable for academic or professional documentation.
• Parameters:
expression
(str): The algebraic expression, in polynomial form, to simplify. For instance, the expression 'np.diff(8*x**30) where as 'np.diff([2,5,9,11)' is not a polynomial.subs
(Optional[Dict[str, float]]): An optional dictionary of substitutions where keys are variable names and values are the numbers to substitute them with.• Returns:
str
: The simplified expression formatted as a LaTeX string.• Example:
from rgwfuncs import simplify_polynomial_expression
# Example 1: Simplifying a polynomial expression without substitutions
simplified_expr1 = simplify_polynomial_expression("2*x + 3*x")
print(simplified_expr1) # Output: "5 x"
# Example 2: Simplifying a complex expression involving derivatives
simplified_expr2 = simplify_polynomial_expression("(np.diff(3*x**8)) / (np.diff(8*x**30) * 11*y**3)")
print(simplified_expr2) # Output: r"\frac{1}{110 x^{22} y^{3}}"
# Example 3: Simplifying with substitutions
simplified_expr3 = simplify_polynomial_expression("x**2 + y**2", subs={"x": 3, "y": 4})
print(simplified_expr3) # Output: "25"
# Example 4: Simplifying with partial substitution
simplified_expr4 = simplify_polynomial_expression("a*b + b", subs={"b": 2})
print(simplified_expr4) # Output: "2 a + 2"
cancel_polynomial_expression
Cancels common factors within a polynomial expression written in Python syntax and converts it to a LaTeX formatted string. This function parses an algebraic expression, cancels common factors using SymPy, and translates the reduced expression into a LaTeX representation. It can also accommodate optional substitutions to be made prior to simplification.
• Parameters:
expression
(str): The algebraic expression to simplify and convert to LaTeX. This string should be formatted using Python syntax.subs
(Optional[Dict[str, float]]): An optional dictionary of substitutions where the keys are variable names in the expression, and the values are the numbers to substitute.• Returns:
str
: The LaTeX formatted string of the simplified expression. If the expression involves indeterminate forms due to operations like division by zero, a descriptive error message is returned instead.• Raises:
ValueError
: If the expression cannot be parsed due to syntax errors or involves undefined operations, such as division by zero.• Example:
from rgwfuncs import cancel_polynomial_expression
# Cancel common factors within a polynomial expression
latex_result1 = cancel_polynomial_expression("(x**2 - 4) / (x - 2)")
print(latex_result1) # Output: "x + 2"
# Cancel with substituted values
latex_result2 = cancel_polynomial_expression("(x**2 - 4) / (x - 2)", {"x": 2})
print(latex_result2) # Output: "Undefined result. This could be a division by zero error."
solve_homogeneous_polynomial_expression
Solves a homogeneous polynomial expression for a specified variable and returns solutions in LaTeX format. Assumes that the expression is homoegeneous (i.e. equal to zero), and solves for a designated variable. May optionally include substitutions for other variables in the equation. The solutions are provided as a LaTeX formatted string. The method solves equations for specified variables, with optional substitutions, returning LaTeX-formatted solutions.
• Parameters:
expression
(str): A string of the homogeneous polynomial expression to solve.variable
(str): The variable to solve for.subs
(Optional[Dict[str, float]]): Substitutions for variables.• Returns:
str
: Solutions formatted in LaTeX.• Example:
from rgwfuncs import solve_homogeneous_polynomial_expression
solutions1 = solve_homogeneous_polynomial_expression("a*x**2 + b*x + c", "x", {"a": 3, "b": 7, "c": 5})
print(solutions1) # Output: "\left[-7/6 - sqrt(11)*I/6, -7/6 + sqrt(11)*I/6\right]"
solutions2 = solve_homogeneous_polynomial_expression("x**2 - 4", "x")
print(solutions2) # Output: "\left[-2, 2\right]"
plot_polynomial_functions
This function plots polynomial functions described by a list of expressions and their corresponding substitution dictionaries. It generates SVG markup of the plots, with options for specifying the domain, axis zoom, and legend display.
• Parameters:
functions
(List[Dict[str, Dict[str, Any]]]
): A list of dictionaries, each containing:
"x**2"
, "np.diff(x,2)"
)."x"
key, either as "*"
for default domain or a NumPy array.zoom
(float
): Determines the numeric axis range from -zoom
to +zoom
for both x and y axes (default is 10.0
).show_legend
(bool
): Specifies whether to include a legend in the plot (default is True
).open_file
(bool
): If saving to path is not desireable, opens the svg as a temp file, else opens the file from the actual location using the system's default viewer (defaults to False).save_path
(Optional[str]
): If specified, saves the output string as a .svg at the indicated path (defaults to None).• Returns:
str
: The raw SVG markup of the resulting plot.• Example:
from rgwfuncs import plot_polynomial_functions
# Generate the SVG
plot_svg_string = plot_polynomial_functions(
functions=[
{"x**2": {"x": "*"}}, # Single expression, "*" means plot all discernable points
{"x**2/(2 + a) + a": {"x": np.linspace(-3, 4, 101), "a": 1.23}},
{"np.diff(x**3, 2)": {"x": np.linspace(-2, 2, 10)}}
],
zoom=2,
open_file=True,
save_path="plot_polynomial_functions_example_1.svg"
)
• Displaying the SVG:
plot_x_points_of_polynomial_functions
This function plots one or more expressions described by a list of dictionaries. For each item in the list, the function evaluates the given Python/NumPy expression at the specified x-values (converted to NumPy arrays if they are Python lists) and plots the resulting points on a single figure.
• Parameters:
functions
(List[Dict[str, Dict[str, Any]]]
): A list of one or more items, each of which has exactly one key-value pair:
str
): A valid Python/NumPy expression (e.g., x**2
, np.sin(x)
, x - a
).Dict[str, Any]
): Must assign x
a valuezoom
(float
): Numeric range from -zoom to +zoom on both x and y axes (default 10.0).show_legend
(bool
): If True, includes a legend (default True).open_file
(bool):
save_path
is given, opens that file with the system viewer.save_path
is None, writes to a temporary file and opens it (default False).save_path
(Optional[str]
): If specified, writes the resulting SVG to this path (default None).• Returns:
str
: The raw SVG markup of the resulting scatter plot.• Example:
from rgwfuncs import plot_x_points_of_polynomial_functions
import numpy as np
svg_output = plot_x_points_of_polynomial_functions(
functions=[
{"x**2": {"x": np.array([-2, -1, 0, 1, 2])}},
{"np.sin(x)": {"x": np.linspace(0, 2*np.pi, 5)}},
{"x - 3": {"x": np.array([-2, 0, 2, 4, 6])}}
],
zoom=6,
show_legend=True,
open_file=True,
save_path="plot_x_points_of_polynomial_functions_example_5.svg"
)
• Displaying the SVG:
Send a message to a Telegram chat using a specified preset from your configuration file.
• Parameters:
preset_name
(str): The name of the preset to use for sending the message. This should match a preset in the configuration file.message
(str): The message text that you want to send to the Telegram chat.• Raises:
RuntimeError
: If the preset is not found in the configuration file or if necessary details (bot token or chat ID) are missing.• Example:
from rgwfuncs import send_telegram_message
preset_name = "daily_updates"
message = "Here is your daily update!"
send_telegram_message(preset_name, message)
Below is a quick reference of available functions, their purpose, and basic usage examples.
numeric_clean
Cleans the numeric columns in a DataFrame according to specified treatments.
• Parameters:
column_names
(str): A comma-separated string containing the names of the columns to clean.column_type
(str): The type to convert the column to (INTEGER
or FLOAT
).irregular_value_treatment
(str): How to treat irregular values (NAN
, TO_ZERO
, MEAN
).• Returns:
• Example:
from rgwfuncs import numeric_clean
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'col1': [1, 2, 3, 'x', 4],
'col2': [10.5, 20.1, 'not_a_number', 30.2, 40.8]
})
# Clean numeric columns
df_cleaned = numeric_clean(df, 'col1,col2', 'FLOAT', 'MEAN')
print(df_cleaned)
limit_dataframe
Limit the DataFrame to a specified number of rows.
• Parameters:
num_rows
(int): The number of rows to retain.• Returns:
• Example:
from rgwfuncs import limit_dataframe
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': range(10, 20)})
df_limited = limit_dataframe(df, 5)
print(df_limited)
from_raw_data
Create a DataFrame from raw data.
• Parameters:
• Returns:
• Example:
from rgwfuncs import from_raw_data
headers = ["Name", "Age"]
data = [
["Alice", 30],
["Bob", 25],
["Charlie", 35]
]
df = from_raw_data(headers, data)
print(df)
append_rows
Append rows to the DataFrame.
• Parameters:
• Returns:
• Example:
from rgwfuncs import append_rows
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
new_rows = [
['Bob', 25],
['Charlie', 35]
]
df_appended = append_rows(df, new_rows)
print(df_appended)
append_columns
Append new columns to the DataFrame with None values.
• Parameters:
col_names
(list or comma-separated string): The names of the columns to add.• Returns:
• Example:
from rgwfuncs import append_columns
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
df_new = append_columns(df, ['Salary', 'Department'])
print(df_new)
update_rows
Update specific rows in the DataFrame based on a condition.
• Parameters:
• Returns:
• Example:
from rgwfuncs import update_rows
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
df_updated = update_rows(df, "Name == 'Alice'", {'Age': 31})
print(df_updated)
delete_rows
Delete rows from the DataFrame based on a condition.
• Parameters:
• Returns:
• Example:
from rgwfuncs import delete_rows
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
df_deleted = delete_rows(df, "Age < 28")
print(df_deleted)
drop_duplicates
Drop duplicate rows in the DataFrame, retaining the first occurrence.
• Parameters:
• Returns:
• Example:
from rgwfuncs import drop_duplicates
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
df_no_dupes = drop_duplicates(df)
print(df_no_dupes)
drop_duplicates_retain_first
Drop duplicate rows based on specified columns, retaining the first occurrence.
• Parameters:
• Returns:
• Example:
from rgwfuncs import drop_duplicates_retain_first
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
df_no_dupes = drop_duplicates_retain_first(df, 'A')
print(df_no_dupes)
drop_duplicates_retain_last
Drop duplicate rows based on specified columns, retaining the last occurrence.
• Parameters:
• Returns:
• Example:
from rgwfuncs import drop_duplicates_retain_last
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2], 'B': [3,3,4,4]})
df_no_dupes = drop_duplicates_retain_last(df, 'A')
print(df_no_dupes)
load_data_from_query
Load data from a specified database using a SQL query and return the results in a Pandas DataFrame. The database connection configurations are determined by a preset name specified in a configuration file.
db_preset_name
(str): The name of the database preset found in the configuration file. This preset determines which database connection details to use.query
(str): The SQL query string to be executed on the database.config
(Optional[Union[str, dict]], optional): Configuration source. Can be:
pd.DataFrame
: Returns a DataFrame that contains the results from the executed SQL query.~/.rgwfuncsrc
.name
: Name of the database preset.db_type
: Type of the database (mssql
, mysql
, clickhouse
, google_big_query
, aws_athena
).credentials
: Necessary credentials such as host, username, password, and potentially others depending on the database type.from rgwfuncs import load_data_from_query
# Load data using a preset configuration
df = load_data_from_query(
db_preset_name="MyDBPreset",
query="SELECT * FROM my_table"
)
print(df)
~/.rgwfuncsrc
) is secure and accessible only to authorized users, as it contains sensitive information.pymssql
for MSSQL, mysql-connector-python
for MySQL, and so on.ValueError
if the specified preset name does not exist or if the database type is unsupported. Additional exceptions may arise from network issues or database errors.load_data_from_path
Load data from a file into a DataFrame based on the file extension.
• Parameters:
file_path
(str): The absolute path to the data file.• Returns:
• Example:
from rgwfuncs import load_data_from_path
df = load_data_from_path("/absolute/path/to/data.csv")
print(df)
load_data_from_sqlite_path
Execute a query on a SQLite database file and return the results as a DataFrame.
• Parameters:
sqlite_path
(str): The absolute path to the SQLite database file.• Returns:
• Example:
from rgwfuncs import load_data_from_sqlite_path
df = load_data_from_sqlite_path("/path/to/database.db", "SELECT * FROM my_table")
print(df)
first_n_rows
Display the first n rows of the DataFrame (prints out in dictionary format).
• Parameters:
• Example:
from rgwfuncs import first_n_rows
import pandas as pd
df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
first_n_rows(df, 2)
last_n_rows
Display the last n rows of the DataFrame (prints out in dictionary format).
• Parameters:
• Example:
from rgwfuncs import last_n_rows
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [6,7,8,9,10]})
last_n_rows(df, 2)
top_n_unique_values
Print the top n unique values for specified columns in the DataFrame.
• Parameters:
• Example:
from rgwfuncs import top_n_unique_values
import pandas as pd
df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
top_n_unique_values(df, 2, ['Cities'])
bottom_n_unique_values
Print the bottom n unique values for specified columns in the DataFrame.
• Parameters:
• Example:
from rgwfuncs import bottom_n_unique_values
import pandas as pd
df = pd.DataFrame({'Cities': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
bottom_n_unique_values(df, 1, ['Cities'])
print_correlation
Print correlation for multiple pairs of columns in the DataFrame.
• Parameters:
column_pairs
(list of tuples): E.g., [('col1','col2'), ('colA','colB')]
.• Example:
from rgwfuncs import print_correlation
import pandas as pd
df = pd.DataFrame({
'col1': [1,2,3,4,5],
'col2': [2,4,6,8,10],
'colA': [10,9,8,7,6],
'colB': [5,4,3,2,1]
})
pairs = [('col1','col2'), ('colA','colB')]
print_correlation(df, pairs)
print_memory_usage
Print the memory usage of the DataFrame in megabytes.
• Parameters:
• Example:
from rgwfuncs import print_memory_usage
import pandas as pd
df = pd.DataFrame({'A': range(1000)})
print_memory_usage(df)
filter_dataframe
Return a new DataFrame filtered by a given query expression.
• Parameters:
filter_expr
(str)• Returns:
• Example:
from rgwfuncs import filter_dataframe
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [30, 20, 25]
})
df_filtered = filter_dataframe(df, "Age > 23")
print(df_filtered)
filter_indian_mobiles
Filter and return rows containing valid Indian mobile numbers in the specified column.
• Parameters:
mobile_col
(str): The column name with mobile numbers.• Returns:
• Example:
from rgwfuncs import filter_indian_mobiles
import pandas as pd
df = pd.DataFrame({'Phone': ['9876543210', '12345', '7000012345']})
df_indian = filter_indian_mobiles(df, 'Phone')
print(df_indian)
print_dataframe
Print the entire DataFrame and its column types. Optionally print a source path.
• Parameters:
• Example:
from rgwfuncs import print_dataframe
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
print_dataframe(df, source='SampleData.csv')
send_dataframe_via_telegram
Send a DataFrame via Telegram using a specified bot configuration.
• Parameters:
bot_name
(str)as_file
(bool)remove_after_send
(bool)config
(Optional[Union[str, dict]], optional): Configuration source. Can be:
• Example:
from rgwfuncs import send_dataframe_via_telegram
# Suppose your bot config is in "rgwml.config" under [TelegramBots] section
df = ... # Some DataFrame
send_dataframe_via_telegram(
df,
bot_name='MyTelegramBot',
message='Hello from RGWFuncs!',
as_file=True,
remove_after_send=True
)
send_data_to_email
Send an email with an optional DataFrame attachment using the Gmail API via a specified preset.
• Parameters:
preset_name
(str)to_email
(str)as_file
(bool)remove_after_send
(bool)config
(Optional[Union[str, dict]], optional): Configuration source. Can be:
• Example:
from rgwfuncs import send_data_to_email
df = ... # Some DataFrame
send_data_to_email(
df,
preset_name='MyEmailPreset',
to_email='recipient@example.com',
subject='Hello from RGWFuncs',
body='Here is the data you requested.',
as_file=True,
remove_after_send=True
)
send_data_to_slack
Send a DataFrame or message to Slack using a specified bot configuration.
• Parameters:
bot_name
(str)as_file
(bool)remove_after_send
(bool)config
(Optional[Union[str, dict]], optional): Configuration source. Can be:
• Example:
from rgwfuncs import send_data_to_slack
df = ... # Some DataFrame
send_data_to_slack(
df,
bot_name='MySlackBot',
message='Hello Slack!',
as_file=True,
remove_after_send=True
)
order_columns
Reorder the columns of a DataFrame based on a string input.
• Parameters:
column_order_str
(str): Comma-separated column order.• Returns:
• Example:
from rgwfuncs import order_columns
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25], 'Salary': [1000, 1200]})
df_reordered = order_columns(df, 'Salary,Name,Age')
print(df_reordered)
append_ranged_classification_column
Append a ranged classification column to the DataFrame.
• Parameters:
target_col
(str): The column to classify.new_col_name
(str): Name of the new classification column.• Returns:
• Example:
from rgwfuncs import append_ranged_classification_column
import pandas as pd
df = pd.DataFrame({'Scores': [5, 12, 25]})
df_classified = append_ranged_classification_column(df, [0, 10, 20, 30], 'Scores', 'ScoreRange')
print(df_classified)
append_percentile_classification_column
Append a percentile classification column to the DataFrame.
• Parameters:
target_col
(str)new_col_name
(str)• Returns:
• Example:
from rgwfuncs import append_percentile_classification_column
import pandas as pd
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
df_classified = append_percentile_classification_column(df, [25, 50, 75], 'Values', 'ValuePercentile')
print(df_classified)
append_ranged_date_classification_column
Append a ranged date classification column to the DataFrame.
• Parameters:
date_ranges
(list[str]): List of date strings in a format pandas can parse (e.g., ['2020-01-01', '2020-06-30', '2020-12-31']).target_col
(str)new_col_name
(str)• Returns:
• Example:
from rgwfuncs import append_ranged_date_classification_column
import pandas as pd
df = pd.DataFrame({'EventDate': pd.to_datetime(['2020-03-15', '2020-08-10'])})
df_classified = append_ranged_date_classification_column(
df,
['2020-01-01', '2020-06-30', '2020-12-31'],
'EventDate',
'DateRange'
)
print(df_classified)
rename_columns
Rename columns in the DataFrame.
• Parameters:
rename_pairs
(dict): Mapping old column names to new ones.• Returns:
• Example:
from rgwfuncs import rename_columns
import pandas as pd
df = pd.DataFrame({'OldName': [1,2,3]})
df_renamed = rename_columns(df, {'OldName': 'NewName'})
print(df_renamed)
cascade_sort
Cascade sort the DataFrame by specified columns and order.
• Parameters:
• Returns:
• Example:
from rgwfuncs import cascade_sort
import pandas as pd
df = pd.DataFrame({
'Name': ['Charlie', 'Alice', 'Bob'],
'Age': [25, 30, 22]
})
sorted_df = cascade_sort(df, ["Name::ASC", "Age::DESC"])
print(sorted_df)
append_xgb_labels
Append XGB training labels (TRAIN, VALIDATE, TEST) based on a ratio string.
• Parameters:
ratio_str
(str): e.g. "8:2", "7:2:1".• Returns:
• Example:
from rgwfuncs import append_xgb_labels
import pandas as pd
df = pd.DataFrame({'A': range(10)})
df_labeled = append_xgb_labels(df, "7:2:1")
print(df_labeled)
append_xgb_regression_predictions
Append XGB regression predictions to the DataFrame. Requires an XGB_TYPE
column for TRAIN/TEST splits.
• Parameters:
target_col
(str)feature_cols
(str): Comma-separated feature columns.pred_col
(str)boosting_rounds
(int, optional)model_path
(str, optional)• Returns:
• Example:
from rgwfuncs import append_xgb_regression_predictions
import pandas as pd
df = pd.DataFrame({
'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
'Feature1': [1.2, 2.3, 3.4, 4.5],
'Feature2': [5.6, 6.7, 7.8, 8.9],
'Target': [10, 20, 30, 40]
})
df_pred = append_xgb_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
print(df_pred)
append_xgb_logistic_regression_predictions
Append XGB logistic regression predictions to the DataFrame. Requires an XGB_TYPE
column for TRAIN/TEST splits.
• Parameters:
target_col
(str)feature_cols
(str)pred_col
(str)boosting_rounds
(int, optional)model_path
(str, optional)• Returns:
• Example:
from rgwfuncs import append_xgb_logistic_regression_predictions
import pandas as pd
df = pd.DataFrame({
'XGB_TYPE': ['TRAIN','TRAIN','TEST','TEST'],
'Feature1': [1, 0, 1, 0],
'Feature2': [0.5, 0.2, 0.8, 0.1],
'Target': [1, 0, 1, 0]
})
df_pred = append_xgb_logistic_regression_predictions(df, 'Target', 'Feature1,Feature2', 'PredictedTarget')
print(df_pred)
print_n_frequency_cascading
Print the cascading frequency of top n values for specified columns.
• Parameters:
order_by
(str): ASC
, DESC
, FREQ_ASC
, FREQ_DESC
.• Example:
from rgwfuncs import print_n_frequency_cascading
import pandas as pd
df = pd.DataFrame({'City': ['NY','LA','NY','SF','LA','LA']})
print_n_frequency_cascading(df, 2, 'City', 'FREQ_DESC')
print_n_frequency_linear
Prints the linear frequency of the top n
values for specified columns.
pd.DataFrame
): The DataFrame to analyze.int
): The number of top values to print for each column.list
): A list of column names to be analyzed.str
): The order of frequency. The available options are:
"ASC"
: Sort keys in ascending lexicographical order."DESC"
: Sort keys in descending lexicographical order."FREQ_ASC"
: Sort the frequencies in ascending order (least frequent first)."FREQ_DESC"
: Sort the frequencies in descending order (most frequent first)."BY_KEYS_ASC"
: Sort keys in ascending order, numerically if possible, handling special strings like 'NaN' as typical entries."BY_KEYS_DESC"
: Sort keys in descending order, numerically if possible, handling special strings like 'NaN' as typical entries.from rgwfuncs import print_n_frequency_linear
import pandas as pd
df = pd.DataFrame({'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'LA']})
print_n_frequency_linear(df, 2, ['City'], 'FREQ_DESC')
This example analyzes the City
column, printing the top 2 most frequent values in descending order of frequency.
retain_columns
Retain specified columns in the DataFrame and drop the others.
• Parameters:
columns_to_retain
(list or str)• Returns:
• Example:
from rgwfuncs import retain_columns
import pandas as pd
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6]})
df_reduced = retain_columns(df, ['A','C'])
print(df_reduced)
mask_against_dataframe
Retain only rows with common column values between two DataFrames.
• Parameters:
other_df
(pd.DataFrame)column_name
(str)• Returns:
• Example:
from rgwfuncs import mask_against_dataframe
import pandas as pd
df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
df_masked = mask_against_dataframe(df1, df2, 'ID')
print(df_masked)
mask_against_dataframe_converse
Retain only rows with uncommon column values between two DataFrames.
• Parameters:
other_df
(pd.DataFrame)column_name
(str)• Returns:
• Example:
from rgwfuncs import mask_against_dataframe_converse
import pandas as pd
df1 = pd.DataFrame({'ID': [1,2,3], 'Value': [10,20,30]})
df2 = pd.DataFrame({'ID': [2,3,4], 'Extra': ['X','Y','Z']})
df_uncommon = mask_against_dataframe_converse(df1, df2, 'ID')
print(df_uncommon)
union_join
Perform a union join, concatenating two DataFrames and dropping duplicates.
• Parameters:
df1
(pd.DataFrame): First DataFrame.df2
(pd.DataFrame): Second DataFrame.• Returns:
df1
and df2
, without duplicates.• Example:
from rgwfuncs import union_join
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value': [10, 20, 30]})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Value': [20, 30, 40]})
df_union = union_join(df1, df2)
print(df_union)
bag_union_join
Perform a bag union join, concatenating two DataFrames without dropping duplicates.
• Parameters:
df1
(pd.DataFrame): First DataFrame.df2
(pd.DataFrame): Second DataFrame.• Returns:
df1
and df2
.• Example:
from rgwfuncs import bag_union_join
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value': [10, 20, 30]})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Value': [20, 30, 40]})
df_bag_union = bag_union_join(df1, df2)
print(df_bag_union)
left_join
Perform a left join on two DataFrames.
• Parameters:
df1
(pd.DataFrame): The left DataFrame.df2
(pd.DataFrame): The right DataFrame.left_on
(str): Column name in df1
to join on.right_on
(str): Column name in df2
to join on.• Returns:
• Example:
from rgwfuncs import left_join
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value': [10, 20, 30]})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Extra': ['A', 'B', 'C']})
df_left_join = left_join(df1, df2, 'ID', 'ID')
print(df_left_join)
right_join
Perform a right join on two DataFrames.
• Parameters:
df1
(pd.DataFrame): The left DataFrame.df2
(pd.DataFrame): The right DataFrame.left_on
(str): Column name in df1
to join on.right_on
(str): Column name in df2
to join on.• Returns:
• Example:
from rgwfuncs import right_join
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value': [10, 20, 30]})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Extra': ['A', 'B', 'C']})
df_right_join = right_join(df1, df2, 'ID', 'ID')
print(df_right_join)
insert_dataframe_in_sqlite_database
Inserts a Pandas DataFrame into a SQLite database table. If the specified table does not exist, it will be created with column types automatically inferred from the DataFrame's data types.
Parameters:
db_path
(str): The path to the SQLite database file. If the database does not exist, it will be created.tablename
(str): The name of the table in the database. If the table does not exist, it is created with the DataFrame's columns and data types.df
(pd.DataFrame): The DataFrame containing the data to be inserted into the database table.Returns:
None
Notes:
int64
is mapped to INTEGER
float64
is mapped to REAL
object
is mapped to TEXT
datetime64[ns]
is mapped to TEXT
(dates are stored as text)bool
is mapped to INTEGER
(SQLite does not have a separate Boolean type)Example:
from rgwfuncs import insert_dataframe_in_sqlite_database import pandas as pd
df = pd.DataFrame({ 'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [88.5, 92.3, 85.0] })
db_path = 'my_database.db' tablename = 'students'
insert_dataframe_in_sqlite_database(db_path, tablename, df)
sync_dataframe_to_sqlite_database
Processes and saves a DataFrame to an SQLite database, adding a timestamp column and replacing the existing table if needed. Creates the table if it does not exist.
• Parameters:
db_path
(str): Path to the SQLite database file.tablename
(str): The name of the table in the database.df
(pd.DataFrame): The DataFrame to be processed and saved.• Returns:
• Example:
from rgwfuncs import sync_dataframe_to_sqlite_database
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3], 'Value': [10, 20, 30]})
db_path = 'my_database.db'
tablename = 'my_table'
sync_dataframe_to_sqlite_database(db_path, tablename, df)
load_fresh_data_or_pull_from_cache
Retrieves data from a cache if a recent cache file exists, or fetches fresh data, saves it to the cache, and returns it. If the cache is too old or doesn’t exist, it uses a fetching function to get new data, which it caches and returns. An optional dtype
parameter allows specifying column data types when reading from the cache, preventing issues with mixed-type columns.
fetch_func
(typing.Callable[[], pd.DataFrame]): A callable function that fetches fresh data and returns it as a pandas DataFrame.cache_dir
(str): The directory where cache files are stored.file_prefix
(str): The prefix used for cache filenames to identify relevant cache files.cache_cutoff_hours
(int): The age in hours beyond which a cache file is considered obsolete.dtype
(dict, optional): A dictionary specifying the data types for columns when reading the CSV cache file. Passed to pd.read_csv()
to handle mixed-type columns explicitly. Defaults to None
, in which case pandas infers the types.pd.DataFrame
: The DataFrame containing cached or freshly unmarked data.from rgwfuncs import load_fresh_data_or_pull_from_cache
import pandas as pd
def fetch_data():
# This is your data-fetching logic. Replace with real fetching code.
return pd.DataFrame({'Column1': [1, 2, 3], 'Column2': ['4', '5', '6']})
cache_dir = 'cache_directory'
file_prefix = 'cached_data'
cache_cutoff_hours = 24
# Without dtype (pandas infers types)
df = load_fresh_data_or_pull_from_cache(fetch_data, cache_dir, file_prefix, cache_cutoff_hours)
# With dtype to handle mixed types
df = load_fresh_data_or_pull_from_cache(fetch_data, cache_dir, file_prefix, cache_cutoff_hours, dtype={'Column2': str})
print(df)
For more information, refer to each function’s docstring by calling:
rgwfuncs.docs(method_type_filter='function_name')
or display all docstrings with:
rgwfuncs.docs(method_type_filter='*')
© 2025 Ryan Gerard Wilson. All rights reserved.
FAQs
A functional programming paradigm for mathematical modelling and data science
We found that rgwfuncs demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security Fundamentals
The Socket Threat Research Team uncovers how threat actors weaponize shell techniques across npm, PyPI, and Go ecosystems to maintain persistence and exfiltrate data.
Security News
At VulnCon 2025, NIST scrapped its NVD consortium plans, admitted it can't keep up with CVEs, and outlined automation efforts amid a mounting backlog.
Product
We redesigned our GitHub PR comments to deliver clear, actionable security insights without adding noise to your workflow.