pyWBE.preliminary_functions
Contains preliminary functions used to aid data analysis.
Note: Add type-hints and docstrings to functions as they are implemented.
Module Contents
Functions
|
This function plots the given time-series data for easy visualization. |
|
This function computes the weekly percentage change in concentration levels in the given time-series data. |
|
This function computes the trend line for the given data. |
|
This function uses the PELT (Pruned Exact Linear Time) function |
|
This function normalizes the time-series data given in |
|
This function predicts the value of the given time-series data |
|
This function analyzes a given time-series data for seasonality. |
|
This function computes the lead and lag correlations between two |
- pyWBE.preliminary_functions.plot_time_series(series_x: pandas.Series, series_y: pandas.Series, plt_save_pth: str, plot_type: str = 'linear')
This function plots the given time-series data for easy visualization.
- Parameters:
series_x (Pandas Series) – The independent variable, usually indicating time steps in arbitrary or specific units.
series_y (Pandas Series (of type float or int)) – The dependent variable, indicating values of the variable of interest over time.
plt_save_pth (str) – The path where the plot image will be saved.
plot_type (str) – Can be either ‘linear’ (default) or ‘log’. ‘linear’ plots series_y v/s series_x, ‘log’ plots the natural log of series_y v/s series_x.
- pyWBE.preliminary_functions.calculate_weekly_concentration_perc_change(conc_data: pandas.Series) pandas.Series
This function computes the weekly percentage change in concentration levels in the given time-series data.
- Parameters:
conc_data (Pandas Series (of type float or int)) – The concentration data, assumed to have a periodicity of 1 week.
- Returns:
Returns the weekly percentage change in concentration levels.
- Return type:
pd.Series
- pyWBE.preliminary_functions.analyze_trends(data: pandas.Series) list[float]
This function computes the trend line for the given data.
- Parameters:
data (pd.Series) – The time-series data (assumed to be sorted in an increasing order of time).
- Returns:
Returns the trend line values which can be plotted as date v/s returned trend line values.
- Return type:
list
- pyWBE.preliminary_functions.change_point_detection(data: pandas.Series, model: str = 'l2', min_size: int = 28, penalty: int = 1)
This function uses the PELT (Pruned Exact Linear Time) function of the Ruptures library to analyze the given time-series data for change point detection.
- Parameters:
data (pd.Series) – A Pandas Series containing the time-series data whose change points need to be detected.
model (str) – The model used by PELT to perform the analysis. Allowed types include “l1”, “l2”, and “rbf”.
min_size (int) – The minimum separation (time steps) between two consecutive change points detected by the model.
penalty (int) – The penalty value used during prediction of change points.
- Returns:
Returns a sorted list of breakpoints.
- Return type:
list
- pyWBE.preliminary_functions.normalize_viral_load(data: pandas.DataFrame, to_normalize: str, normalize_by: str | int) pandas.Series
This function normalizes the time-series data given in the “to_normalize” column of the data using the values in the “normalize_by” column of the data.
- Parameters:
data (Pandas DataFrame) – The Pandas DataFrame containing the relevant data.
to_normalize (str) – The name of the column containing the data to be normalized.
normalize_by (str) – The name of the column containing the data to normalize by or the integer value to normalize the data by.
- Returns:
The normalized data.
- Return type:
Pandas Series
- pyWBE.preliminary_functions.forecast_single_instance(data: pandas.Series, window: pandas.DatetimeIndex) pandas.Series
This function predicts the value of the given time-series data a single time-step into the future using a Linear Regression model trained on the data specified by the parameter “window_length”.
- Parameters:
data (pd.Series) – A Pandas Series, assumed to have dates as its indices, containing the time-series data whose value needs to be predicted in the future.
window (pd.DateTimeIndex) – A Pandas DateTimeIndex containing date range for the “data” that must be used to train the Linear Regression model. Minimum length must be 1 week and maximum length can be the entire date range of the “data”.
- Returns:
Returns the original “data” with the next time-step prediction appended to it.
- Return type:
pd.Series
- pyWBE.preliminary_functions.detect_seasonality(data: pandas.Series, model_type: str = 'additive') pandas.DataFrame
This function analyzes a given time-series data for seasonality.
- Parameters:
data (pd.Series) – A Pandas Series, assumed to have dates as its indices with the corresponding values of the time-series data.
model_type (str) – Can be “additive” or “multiplicative”, determines the type of seasonality model assumed for the data.
- Returns:
Returns a Pandas DataFrame that contain the Trend, Seasonal, and Residual components computed using the given model type. Can be plotted using the “plot” method of Pandas DataFrame class.
- Return type:
pd.DataFrame
- pyWBE.preliminary_functions.get_lead_lag_correlations(x: pandas.Series, y: pandas.Series, time_instances: int, plt_save_pth: str, max_lag: int = 3)
This function computes the lead and lag correlations between two given time-series data.
- Parameters:
x (pd.Series) – The first time-series data.
y (pd.Series) – The second time-series data.
time_instances (int) – The number of time instances to be considered for the correlation analysis.
plt_save_pth (str) – The path where the plot image will be saved.
max_lag (int) – The maximum lag time to be considered for the correlation analysis.
- Returns:
Returns the lead and lag correlations between the given time-series data and the buffer where the time-series comparision is stored.
- Return type:
Tuple