pyWBE.preliminary_functions

Contains preliminary functions used to aid data analysis.

Note: Add type-hints and docstrings to functions as they are implemented.

Module Contents

Functions

plot_time_series(series_x, series_y, plt_save_pth[, ...])

This function plots the given time-series data for easy visualization.

calculate_weekly_concentration_perc_change(→ pandas.Series)

This function computes the weekly percentage change in concentration levels in the given time-series data.

analyze_trends(→ list[float])

This function computes the trend line for the given data.

change_point_detection(data[, model, min_size, penalty])

This function uses the PELT (Pruned Exact Linear Time) function

normalize_viral_load(→ pandas.Series)

This function normalizes the time-series data given in

forecast_single_instance(→ pandas.Series)

This function predicts the value of the given time-series data

detect_seasonality(→ pandas.DataFrame)

This function analyzes a given time-series data for seasonality.

get_lead_lag_correlations(x, y, time_instances, ...[, ...])

This function computes the lead and lag correlations between two

pyWBE.preliminary_functions.plot_time_series(series_x: pandas.Series, series_y: pandas.Series, plt_save_pth: str, plot_type: str = 'linear')

This function plots the given time-series data for easy visualization.

Parameters:
  • series_x (Pandas Series) – The independent variable, usually indicating time steps in arbitrary or specific units.

  • series_y (Pandas Series (of type float or int)) – The dependent variable, indicating values of the variable of interest over time.

  • plt_save_pth (str) – The path where the plot image will be saved.

  • plot_type (str) – Can be either ‘linear’ (default) or ‘log’. ‘linear’ plots series_y v/s series_x, ‘log’ plots the natural log of series_y v/s series_x.

pyWBE.preliminary_functions.calculate_weekly_concentration_perc_change(conc_data: pandas.Series) pandas.Series

This function computes the weekly percentage change in concentration levels in the given time-series data.

Parameters:

conc_data (Pandas Series (of type float or int)) – The concentration data, assumed to have a periodicity of 1 week.

Returns:

Returns the weekly percentage change in concentration levels.

Return type:

pd.Series

This function computes the trend line for the given data.

Parameters:

data (pd.Series) – The time-series data (assumed to be sorted in an increasing order of time).

Returns:

Returns the trend line values which can be plotted as date v/s returned trend line values.

Return type:

list

pyWBE.preliminary_functions.change_point_detection(data: pandas.Series, model: str = 'l2', min_size: int = 28, penalty: int = 1)

This function uses the PELT (Pruned Exact Linear Time) function of the Ruptures library to analyze the given time-series data for change point detection.

Parameters:
  • data (pd.Series) – A Pandas Series containing the time-series data whose change points need to be detected.

  • model (str) – The model used by PELT to perform the analysis. Allowed types include “l1”, “l2”, and “rbf”.

  • min_size (int) – The minimum separation (time steps) between two consecutive change points detected by the model.

  • penalty (int) – The penalty value used during prediction of change points.

Returns:

Returns a sorted list of breakpoints.

Return type:

list

pyWBE.preliminary_functions.normalize_viral_load(data: pandas.DataFrame, to_normalize: str, normalize_by: str | int) pandas.Series

This function normalizes the time-series data given in the “to_normalize” column of the data using the values in the “normalize_by” column of the data.

Parameters:
  • data (Pandas DataFrame) – The Pandas DataFrame containing the relevant data.

  • to_normalize (str) – The name of the column containing the data to be normalized.

  • normalize_by (str) – The name of the column containing the data to normalize by or the integer value to normalize the data by.

Returns:

The normalized data.

Return type:

Pandas Series

pyWBE.preliminary_functions.forecast_single_instance(data: pandas.Series, window: pandas.DatetimeIndex) pandas.Series

This function predicts the value of the given time-series data a single time-step into the future using a Linear Regression model trained on the data specified by the parameter “window_length”.

Parameters:
  • data (pd.Series) – A Pandas Series, assumed to have dates as its indices, containing the time-series data whose value needs to be predicted in the future.

  • window (pd.DateTimeIndex) – A Pandas DateTimeIndex containing date range for the “data” that must be used to train the Linear Regression model. Minimum length must be 1 week and maximum length can be the entire date range of the “data”.

Returns:

Returns the original “data” with the next time-step prediction appended to it.

Return type:

pd.Series

pyWBE.preliminary_functions.detect_seasonality(data: pandas.Series, model_type: str = 'additive') pandas.DataFrame

This function analyzes a given time-series data for seasonality.

Parameters:
  • data (pd.Series) – A Pandas Series, assumed to have dates as its indices with the corresponding values of the time-series data.

  • model_type (str) – Can be “additive” or “multiplicative”, determines the type of seasonality model assumed for the data.

Returns:

Returns a Pandas DataFrame that contain the Trend, Seasonal, and Residual components computed using the given model type. Can be plotted using the “plot” method of Pandas DataFrame class.

Return type:

pd.DataFrame

pyWBE.preliminary_functions.get_lead_lag_correlations(x: pandas.Series, y: pandas.Series, time_instances: int, plt_save_pth: str, max_lag: int = 3)

This function computes the lead and lag correlations between two given time-series data.

Parameters:
  • x (pd.Series) – The first time-series data.

  • y (pd.Series) – The second time-series data.

  • time_instances (int) – The number of time instances to be considered for the correlation analysis.

  • plt_save_pth (str) – The path where the plot image will be saved.

  • max_lag (int) – The maximum lag time to be considered for the correlation analysis.

Returns:

Returns the lead and lag correlations between the given time-series data and the buffer where the time-series comparision is stored.

Return type:

Tuple