Excecutable and modules

bin.OriginFinder

Script name:

OriginFinder.py

Usage:

python OriginFinder.py –configure /path/to/configuration.ini –override section0:option0:value0 section1:option1

where section0 is a section, option0 is its option, and value0 is an input that you choose. If an option is boolean, please do as section1:option1

  1. Online mode
    Description:
    • Online mode automatically creates a list of glitches using an external event generator such as Omicron and pycbc-live in the current version.

    Processes:
    • It automatically queries a list of glitches that are generated an external event trigger generator (e.g., omicron trigger, pycbc-live tirgger upto date).

    • It chooses the glitches that not overlap with any other glitches.

    • It conditions auxiliary channels and quantifies them as “importance” which is the fraction of frequency bins of the on-source window above a upper threshold of the off-source window in a given frequency band.

    • It produces a plot of values of importance for auxiliary channels of a glitch right after values of importance are computed.

    • It creates summary plots; glitch indices VS. channels, SNR of h(t) VS channels, time versus channels, the averaged values of importance VS channels.

    • It clusters glitches based on PCA and Gaussian mixture and make its plot and the rates of each clusters.

    • Note values of importance are saved as .csv files.

  2. Offline mode
    Description:
    • Offline mode is similar to the online mode. However the dominant difference is that a list of glitches are provided by a user.

    For instance, a user can use a .csv file generated by Gravity Spy, (potentially cWB, or other pipelines). This mode has benefit when external event trigger generators can not catch glitches as you wish.

    Processes:
    • The offline modes is operated with a list of glitches that is already supplied by a user. For instance, Gravity Spy .csv file can be used.

    • It chooses the glitches that not overlap with any other glitches.

    • It conditions auxiliary channels and quantifies them as “importance” which is the fraction of frequency bins of the on-source window above a upper threshold of the off-source window in a given frequency band.

    • It creates summary plots; glitch indices VS. channels, SNR of h(t) VS channels, time versus channels, the averaged values of importance VS channels.

    • It clusters glitches based on PCA and Gaussian mixture and make its plot and the rates of each clusters.

    • It produces a plot of values of importance for auxiliary channels of individual glitches.

    • Note values of importance are saved as .csv files.

  3. Null sample generator
    Description:
    • In order to perform statistics (see the following section), null sample generator creates the null hypothesis set where the samples are in quiet times.

    Note that large number of null samples are preferred.

    Processes:
    • It creates randomly distributed synthetic event time periods whose distribution of durations follow that of the target glitches.

    • It chooses those synthetic time periods (null samples) that are not overlapping with any other glitches including actuall all the glitches and null samples

    • It analyzes null samples in the same way either online mode or offline mode operates.

  4. Statistics mode
    Description:
    • It finds channels that are responsible for the target glitches based on a various statistical methods using the null samples.

    • It can find channels that are non-trivial by eyes, thanks to the null samples.

    • A user needs to supply a confidence level to find channels that reject the null hypothesis.

    • Note that each statistics are operated for each channel (in each frequency band) independently.

    Processes:
    • It performs the one-sided binomial test for the target glitches as a whole. Also, it calculates “Witness ratio statistic” (WRS), which is the ratio of the fraction of the target samples with the value of importance above a threshold over the sum of the fraction of the target samples and the null samples above a threhold. Note that the threshold is set as the mean of importance of the null samples on the basis of the experiments conducted so far, The threshold should be chosen from another set of null samples to get unbiased values. However, it the number of null samples are sufficiently large, both null samples are expected to converge to the same value. In spite of the fact that the threshold is not truly independent from the null samples that are used for the statistical test, WRS is approximately the probability of a channel has glitches in the coincidence with target glitches in h(t).

    • It performes the one-sided Welch’s t-test on target glitches as a whole and make a plot and save the table as a .csv file.

    • It makes a plot combining WRS and t-value to provides the channels that reject both the binomial test and t-test.

    • It incorporates the binomial test and t-test for clustering for reduces the number of redundant sub-classes.

    • It analyzes each individual glitch by means of chi-square test and make their plots.

  5. WitnessFlag mode
    Description:
    • TThe methodology consists of two major parts of 1) finding witness channels 2) finding flags with chosen witness channels. Finding the witness channels are determined by statistical tests and reduce a list of channels automatically and stop the analysis automatically. The determination of flags can be made with multiple channels. The following process is made with null samples which are the samples at quiet times.

    Processes:
    Finding witness channels:
    • shuffle a list of glitches

    • take a few glitches and analyze them with all the safe channels

    • perform the one-sided binomial test and one-sided Welch’s t-test against null samples

    • reject the channels which do NOT pass both the tests, i.e., which can not reject the hypothesis that a channel is consistent with null samples

    • calculate the error ratio of the t-value of the top-ranking channel to its previous t-value

    • analyze the next glitch using the channels that pass both the tests

    • add the values of importance to the target samples

    • repeat the bullet points (3)-(7)

    • terminate the process when the error ratio reaches the tolerance

    Finding flags
    • select high ranking witness channels

    • determine the upper cut of the null samples for those high ranking witness channels using null samples

    • analyze all the glitches using the selected witness channels

    • make a flag when those channels give importance above the uppercut of the null

    • calculate efficiency and deadtime

origli.utilities.const

Script name: const.py

Description:

File containing the list of glitch names This is supposed to be used by import_data_hdf5.py under bin dir

origli.utilities.multiband_search_utilities

file name: multiband_search_utilities.py

This file contains the utilities to be used for multi-frequency band search

origli.utilities.multiband_search_utilities.CreateAllChannels_rho_multband(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma)[source]
description:
  1. use a single glitch time

  2. query timeseries of all the channels around a glitch

  3. condition (whitening and compare the on- and off-source window)

  4. quantify all the channels (compute values of importance of all the channels)

USAGE: IndexSatisfied, Mat_Count_in_multibands, list_sample_rates, re_sfchs, gpstime, duration, SNR, confi, ID = CreateAllChannels_rho_multband(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma)

Parameters

Listsegment – a list of segment parameters

:param IFO :param channels: a list of safe channels :param number_process: a number of processes in parallel :param PlusHOFT: whether to get data of hoft, {‘True’ or ‘False’} :param sigma: an integer to be used for calculating values of importance :return:

IndexSatisfied: glitch index Mat_Count_in_multibands: a matrix of rho where frequencies in rows and channels in columns, numpy array list_sample_rates: a list of sampling rates of channels, numpy array re_sfchs: list of channels without “IFO:” at the beginning gpstime: a gps time duration: a value of duration SNR: signal to noise ratio in the h(t) conf: a confidence level of classification of a glitch, provided by Gravity Spy. Otherwise None ID: a glitch ID, provided by Gravity Spy in usual

origli.utilities.multiband_search_utilities.CreateRho_multiband(full_timeseries, target_timeseries_start, target_timeseries_end, pre_background_start, pre_background_end, fol_background_start, fol_background_end, sigma)[source]
discription:
  1. calculate the whitened FFT of the on- and off-source window for a single channel

  2. compute the value of importance for a single channel

USAGE: Counts_in_multibands, sample_rate = CreateRho_multiband(full_timeseries, target_timeseries_start, target_timeseries_end, pre_background_start, pre_background_end, fol_background_start, fol_background_end, sigma)

Parameters
  • full_timeseries – the full time series in gwpy object including on- and off source windows

  • target_timeseries_start – the start time of the on-source window

  • target_timeseries_end – the end time of the on-source window

  • pre_background_start – the start time of the preceding off-source window

  • pre_background_end – the end time of the preceding off-source window

  • fol_background_start – the start time of the following off-source window

  • fol_background_end – the end time of the following off-source window

  • sigma – an interger to calculate the value of importance

Returns

Counts_in_multibands: values of importance in different frequency bands where importance is a fraction of frequency bins in a frequency range above an upper bound of the off-source window for a single channel sample_rate: a sampling rate of a single channel

origli.utilities.multiband_search_utilities.HierarchyChannelAboveThreshold_single_channel_multiband(whitened_fft_target, whitened_fft_PBG, whitened_fft_FBG, duration, sampling_rate, sigma)[source]
description:

calculate values of importance: a fraction of frequency bins in a frequency range above an upper bound of the off-source window for a single channel, in different frequency bnads

USAGE: Counts_in_multibands = HierarchyChannelAboveThreshold_single_channel_multiband(whitened_fft_target, whitened_fft_PBG, whitened_fft_FBG, duration, sigma)

Parameters
  • whitened_fft_target – whitened fft of the on-source window

  • whitened_fft_PBG – whitened fft of the preceding off-source window

  • whitened_fft_FBG – whitened fft of the following off-source window

  • duration – a duration of the on-source window

  • sampling_rate – sampling rate of a channel

  • sigma – an integer to determine the upper bound of the off-source window

Returns

Counts_in_multibands: values of importance in different frequency bands, numpy array

class origli.utilities.multiband_search_utilities.PlotTableAnalysis_multiband[source]

Bases: object

CreateMatCount_multiband(g_Individual=None)[source]
description:
  1. find counts for each glitch

  2. stack over all the gliches

  3. make a matix comprising importance versus channels

dependencies: self.HierarchyChannelAboveThreshold(g, LowerCutOffFreq, UpperCutOffFreq) USAGE: MatCount, ListChannelName, ListSNR, ListConf, ListGPS, ListDuration, ListID, mat_rho, freq_bands, ListOriginalChannelName = CreateMatCount_multiband() # for all glitches

MatCount, ListChannelName, SNR, Conf, GPS, Duration, ID, mat_rho, freq_bands, ListOriginalChannelName = CreateMatCount_multiband(g_Individual) # for an individual glitch

Parameters

g_Individual – a HDF5 file group for a glitch

Returns

MatCount: a matrix comprising importance versus channels ListChannelName: a list of channel names, combining frequency band information ListSNR: a list of SNRs ListConf: a list of confidence ListGPS: a list of GPS times ListID: a list of IDs mat_rho: a matrix of rho where frequencies in rows channels in columns, numpy array freq_bands: a matrix of frequency bands ListOriginalChannelName: a list of channel names, original

PlotCausalityVSChannelMultiBand(list_Causal_passed, list_Causal_fail, list_causal_passed_err, list_causal_failed_err, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot witness ratio statistics of chanenls in multi-frequency bands the cells which do not pass the test are masked

USAGE: PlotCausalityVSChannelMultiBand(list_Causal_passed, list_Causal_fail, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence)

Parameters
  • list_Causal_passed – a list of the probability of the causality that are passed one-tailed Binomial test, if not zero

  • list_Causal_fail – a list of the probability of the causality that are failed one-tailed Binomial test, if not zero

  • list_causal_passed_err – a list of the error of the causal probability that are passed one-tailed Binomial test, otherwise, zero

  • list_causal_failed_err – a list of the error of the causal probability that are failed one-tailed Binomial test, otherwise, zero

  • list_Test – a list of results of the Binomial test, ‘pass’ or ‘fail’

  • ListChannelName – a list of channel names

  • output_dir – (only used for all glitches)

  • output_file – (only used for all glitches)

  • BinomialTestConfidence – binomial test confidence level

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

PlotCausalityVSChannelMultiBandNoMaskNoTable(list_Causal_passed, list_Causal_fail, list_causal_passed_err, list_causal_failed_err, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot witness ratio statistics of chanenls in multi-frequency bands without mask and table

USAGE: PlotCausalityVSChannelMultiBandNoMaskNoTable(list_Causal_passed, list_Causal_fail, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence)

Parameters
  • list_Causal_passed – a list of the probability of the causality that are passed one-tailed Binomial test, if not zero

  • list_Causal_fail – a list of the probability of the causality that are failed one-tailed Binomial test, if not zero

  • list_causal_passed_err – a list of the error of the causal probability that are passed one-tailed Binomial test, otherwise, zero

  • list_causal_failed_err – a list of the error of the causal probability that are failed one-tailed Binomial test, otherwise, zero

  • list_Test – a list of results of the Binomial test, ‘pass’ or ‘fail’

  • ListChannelName – a list of channel names

  • output_dir – (only used for all glitches)

  • output_file – (only used for all glitches)

  • BinomialTestConfidence – binomial test confidence level

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

PlotFrequencyVSChannel(glitchtype, SNR, Conf, GPS, Duration, ID, URL, mat_rho, ListOriginalChannelName, freq_bands, output_dir, output_file)[source]
description:

make a plot of frequencies versus channels for a glitch

dependencies: CreateChannelTicks() USAGE: PlotFrequencyVSChannel(glitchtype, SNR, Conf, GPS, Duration, ID, URL, mat_rho, ListOriginalChannelName, freq_bands, output_dir, output_file)

Parameters
  • glitchtype – a type of a glitch

  • SNR – SNR in h(t)

  • Conf – classification confidence level

  • GPS – a gps time

  • Duration – a duration of a gltich

  • ID – Gravity Spy ID

  • URL – Q-transform in h(t) of a glitch store in Gravity Spy

  • mat_rho – a matrix of rho where frequencies in rows channels in columns, numpy array

  • ListOriginalChannelName – a list of original channel names

  • freq_bands – a matrix of frequency bands

  • output_dir

  • output_file

Returns

None

PlotIndividualFCS_ImportanceVSChannel_multiband(glitchtype, IFO, GravitySpy_df, output_dir, mode='offline', sigma=None, Listsegments=None, re_sfchs=None, Data_outputpath=None, Data_outputfilename=None, PlusHOFT='False', number_process=None)[source]
description:
  1. load a file comprising all glitches in a class

  2. create a plot comprising frequency versus channel & importance versus channel
    • dependency:

      self.CreateChannelTicks(ListChannel) self.make_subset_channel_based_on_samplingrate() self.CreateMatCount() self.PlotImportanceVSChannel()

  3. save a plot

dependencies: make_subset_channel_based_on_samplingrate(), CreateChannelTicks(), CreateMatCount(), PlotImportanceVSChannel() USAGE: PlotIndividualFCS_ImportanceVSChannel_multiband(self, glitchtype, IFO, GravitySpy_df, output_dir, mode=’offline’, Listsegments=None, re_sfchs=None, Data_outputpath=None, Data_outputfilename=None, PlusHOFT=’False’, number_process=None)

Parameters
  • glitchtype – a type of glitch used for create a name of a plot

  • IFO – # a type of IFO used in a name of a plot

  • GravitySpy_df – a meta data of Gravity Spy in pandas frame

  • output_dir – a output directory

  • simga – an integer number used for the upper bound of BG noise

  • mode – ‘offline’ or ‘online’

  • sigma – an integer to determine the upper bound of the off-source window

  • Listsegments – a list of allowed glitches, which is used for online mode only, None in default

  • re_sfchs – a list of safe channels except unused channels, which is used for online mode only, None in default

  • Data_outputpath – a directory saving for a HDF5 file, which is used for online mode only, None in default

  • Data_outputfilename – a file saving for a HDF5 file, which is used for online mode only, None in default

  • PlusHOFT – whether to get data of HOFT {‘True’, ‘False’}, which is used for online mode only, ‘False’ in default

  • number_process – a number of processes in parallel, which is used for online mode only, None in default

return None

Plot_WRS_Welch_t_test_MultiBand(channels, list_Causal_passed, list_Causal_fail, list_Test_binomial, list_t_values_passed, list_t_values_failed, list_t_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of one-sided Welch t-test

USAGE: Plot_Welch_t_test(channels, list_t_values_passed, list_t_values_failed, list_Test, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_t_values_passed – a list of t-values that pass the test

  • list_t_values_failed – a list of t-values that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Plot_Welch_t_test_MultiBand(channels, list_t_values_passed, list_t_values_failed, list_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of one-sided Welch t-test

USAGE: Plot_Welch_t_test(channels, list_t_values_passed, list_t_values_failed, list_Test, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_t_values_passed – a list of t-values that pass the test

  • list_t_values_failed – a list of t-values that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Plot_p_greater_MultiBand(channels, list_p_greater_passed, list_p_greater_failed, list_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of p_greater

USAGE: Plot_p_greater_MultiBand(self, channels, list_p_greater_passed, list_p_greater_failed, list_Test, confidence_level, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_p_greater_passed – a list of p_greater that pass the test

  • list_p_greater_failed – a list of p_greater that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

ReconstructFromFlattenedList(flattened_list, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

make the matrix of the original order from the flatten

USAGE: mat_originl, mat_flipped = ReconstructFromFlattenedList(flattened_list, freq_bands=Const.freq_bands)

Parameters
  • flattened_list – flattend list from the matrix using np.flatten(order=’F’)

  • freq_bands – frequency bands

Returns

mat_originl: a matrix where frequency bnads in row from the top to bottom, and channels in columns mat_flipped: a matrix where # frequency bnads in row from the bottom to top, and channels in columns

getHDF5Object()[source]

show a HDF5 file object USAGE: getHDF5Object()

Returns

a dictionary

setHDF5Object(input_dir, input_file)[source]

set a HDF5 file object

Parameters

f_dict – a dictionary of time series data sets

Returns

None

origli.utilities.multiband_search_utilities.SaveTargetAndBackGroundHDF5_multiband_OFFLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, sigma, PlusHOFT='False')[source]
description:

THIS IS USED FOR “OFFLINE” MODE 0. assuming Listsegments is given by Findglitchlist() 1.take the information of the list of allowed target and a preceding and following segments 2. whiten a target segmement based on the average background segment 3. find whitened FFT 4. save the whitened target and backgrounds FFTs Note this depends on

USAGE: SaveTargetAndBackGroundHDF5(Listsegments, re_sfchs, IFO, outputpath, outputfilename, mode==’offline’)

Parameters
  • Listsegments – a list of segment parameters

  • channels – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • sigma – an integer to determine the upper bound of the off-source window

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

:return None

origli.utilities.multiband_search_utilities.SaveTargetAndBackGroundHDF5_multiband_ONLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, sigma, PlusHOFT)[source]
description:

THIS IS USED FOR “ONLINE” MODE 0. assuming Listsegments is given by Findglitchlist() 1.take the information of the list of allowed target and a preceding and following segments 2. whiten a target segmement based on the average background segment 3. find whitened FFT 4. save the whitened target and backgrounds FFTs Note this depends on Multiprocess_whitening()

USAGE: SaveTargetAndBackGroundHDF5(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, sigma, PlusHOFT)

Parameters
  • Listsegments – a list of segment parameters

  • channels – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • sigma – an integer to determine the upper bound of the off-source window

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}

:return None

origli.utilities.utilities

Script name: utilities.py

Description:

File containing utilities

origli.utilities.utilities.FindBGlist(state, number_trials, step, outputMother_dir, df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf, flag='Both')[source]
description:
  1. load Gravity Spy data set (.csv file)

  2. get the data about the target glitch class

  3. get the subset of the target glitches based on SNR and confidence level threshold a user defines

  4. accept glitches where their back ground segment do not coincide with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = FindglitchlistOnLineMode(df, Epochstart, Epochend, Commissioning_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, flag)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.utilities.FindRadomlistPoints(state, IFO, Epoch_lt, number_samples, step, outputMother_dir, df_target)[source]
description:
  1. within an epoch, create a list of synthetic points with randomly chosen with durations following a distribution of that of a target glitch class

  2. make pandas frame dataset

USAGE: df = FindRadomlistPoints(state, Epoch_lt, number_trials, step)

Parameters
  • state – IFO state {observing, nominal-lock}

  • IFO – an observer {H1, L1}

  • Epoch_lt – a list of epochs

  • number_samples – number of samples picked up

  • step – step of data points in sec

  • outputMother_dir – an output directory in witch the data set is in

  • df_target – true glitch samples generated by an ETG with SNR above an upper threshold of background

Returns

df: synthetic random data points within an epoch with durations generated from a distribution of a target glitch

origli.utilities.utilities.FindShiftedPoints(state, IFO, Epoch_lt, number_samples, step, outputMother_dir, df_target)[source]
description:
  1. within an epoch, create a list of synthetic points by shifting a target glitch class

  2. make pandas frame dataset

USAGE: df = FindRadomlistPoints(state, Epoch_lt, number_trials, step)

Parameters
  • state – IFO state {observing, nominal-lock}

  • IFO – an observer {H1, L1}

  • Epoch_lt – a list of epochs

  • number_samples – number of samples picked up

  • step – step of data points in sec

  • outputMother_dir – an output directory in witch the data set is in

  • df_target – true glitch samples generated by an ETG with SNR above an upper threshold of background

Returns

df: synthetic random data points within an epoch with durations generated from a distribution of a target glitch

origli.utilities.utilities.Findglitchlist(df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UserDefinedDuration, UpperDurationThresh, LowerDurationThresh, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf, flag='Both')[source]
description:
  1. load Gravity Spy data set (.csv file)

  2. get the data about the target glitch class

  3. get the subset of the target glitches based on SNR and confidence level threshold a user defines

  4. accept glitches where their back ground segment do not coincide with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = FindglitchlistOnLineMode(df, Epochstart, Epochend, Commissioning_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UserDefinedDuration, gap, flag)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • position_duration_bfr_centr – proportion of duration for a target segment around a center time, e.g, 0.5 indicates duration is even distributed around a center time, 0.83 indicates 5/6 is before the center time

  • TriggerPeakFreqLowerCutoff – a lower limit cutoff value of the peak frequency of triggers given by an ETG for target glitches

  • TriggerPeakFreqUpperCutoff – an upper limit cutoff value of the peak frequency of triggers given by an ETG queries for target glitches

  • targetUpperSNR_thre – an upper limit cutoff value of SNR of triggers given by an ETG queries for target glitches

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.utilities.FindglitchlistLongestBG(df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf, flag='Both')[source]
description:
  1. load Gravity Spy data set (.csv file)

  2. get the data about the target glitch class

  3. get the subset of the target glitches based on SNR and confidence level threshold a user defines

  4. accept glitches where their back ground segment do not coincide with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = FindglitchlistOnLineMode(df, Epochstart, Epochend, Commissioning_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, flag)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • position_duration_bfr_centr – proportion of duration for a target segment around a center time, e.g, 0.5 indicates duration is even distributed around a center time, 0.83 indicates 5/6 is before the center time

  • TriggerPeakFreqLowerCutoff – a lower limit cutoff value of the peak frequency of triggers given by an ETG for target glitches

  • TriggerPeakFreqUpperCutoff – an upper limit cutoff value of the peak frequency of triggers given by an ETG queries for target glitches

  • targetUpperSNR_thre – an upper limit cutoff value of SNR of triggers given by an ETG queries for target glitches

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.utilities.Findglitchlist_for_timeseries_analysis(df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf, flag='Both')[source]
description:
  1. load Gravity Spy data set (.csv file)

  2. get the data about the target glitch class

  3. get the subset of the target glitches based on SNR and confidence level threshold a user defines

  4. accept glitches where their back ground segment do not coincide with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = Findglitchlist_for_timeseries_analysis(df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=np.inf,flag=’Both’):

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • position_duration_bfr_centr – proportion of duration for a target segment around a center time, e.g, 0.5 indicates duration is even distributed around a center time, 0.83 indicates 5/6 is before the center time

  • TriggerPeakFreqLowerCutoff – a lower limit cutoff value of the peak frequency of triggers given by an ETG for target glitches

  • TriggerPeakFreqUpperCutoff – an upper limit cutoff value of the peak frequency of triggers given by an ETG queries for target glitches

  • targetUpperSNR_thre – an upper limit cutoff value of SNR of triggers given by an ETG queries for target glitches

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.utilities.GrabGPStimesSafechannel(fileid, ifo, SNRthre, glitch, PathSafeChannel, Epochstart, Epochend, Commissioning_lt=None)[source]

description: This imports a file containing the output of GravitySpy

This take the GPS times during O2 run and take the list of the safe channels and modify it such as it works in gwpy

USAGE: GPSs, ids ,re_sfchs = GrabGPStimesSafechannel(‘/home/kentaro.mogushi/longlived/MachineLearningJointPisaUM/dataset/GravityspyTrainingset/gspy-db-20180813.csv’, ‘L1’, 7, ‘Blip’, ‘/home/kentaro.mogushi/longlived/MachineLearningJointPisaUM/dataset/ListSaveChannel/L1/O2_omicron_channel_list_hvetosafe_GDS.txt’) :param fileid: a file that contains all the meta data of glitches used for training set for GravitySpy

Parameters
  • ifo – a kind of interferometer {L1, H1, V1}

  • SNRthre – the minimum threshold of SNR, e.g, 7

  • glitch – a kind of glitch

  • PathSafeChannel – the full path of the file of the metadata

  • Epochstart – GPS time when a science run begins, float or int

  • Epochend – GPS time when a science run ends, float or int

  • Commissioning_lt – the set of commissioning times in a list of lists, e.g., [[Cstart1, Cend1], [Cstart2, Cend2]], None in defalt

:return

GPSs: a list of GPS timse ids: a list of unique id re_sfchs: a list of safe channels

origli.utilities.utilities.GrabSafechannel(PathSafeChannel)[source]
description:

take the list of the safe channels and modify it such as it works in gwpy

USAGE: re_sfchs = GrabGPStimesSafechannel(‘/home/kentaro.mogushi/longlived/MachineLearningJointPisaUM/dataset/ListSaveChannel/L1/O2_omicron_channel_list_hvetosafe_GDS.txt’)

Parameters

:PathSafeChannel – the full path of the file of the metadata

:return

re_sfchs: a list of safe channels

class origli.utilities.utilities.IdentifyGlitch[source]

Bases: object

CombinedIndentifyingProcess(IFO, ListSegments, TriggerDir)[source]
description:
  1. use OmimcronTriggerPath() to log in either L1 or H1 cluster and get a list of trigger files paths

  2. use CopyOmicroTriggerandUnzip() to copy trigger files and unzip them

  3. iterate with trigger XML files

    3-1. use readXML() to get mata data stored in a .xml file 3-2. use ExtractOmcronTriggerMetadata() to re-arrange meta data matrix

  4. save the matrix as .csv file

  5. copy the .csv file into the CIT cluster and go back to the CIT cluster

USAGE: CombinedIndentifyingProcess(self, ‘L1’, ListSegments, ‘/home/kentaro.mogushi/longlived/OmicronTrigger’)

Parameters
  • IFO – ifo {H1, L1, V1}

  • ListSegments – a list of segments

  • TriggerDir – a mother directory of a trigger file

Returns

None

CopyOmicroTriggerandUnzip(input_file, input_dir, TriggerDir, output_dir='OmicronTriggerXML')[source]
description:
  1. load a file comprising all the paths of omicron trigger files you are interested in

  2. copy all the omicron trigger file in your working place

  3. go to an output directory

  4. unzip all the files and replace them with the zipped files

  5. go back to a working directory

USAGE: CopyOmicroTriggerandUnzip(‘omicron.txt’, trigger_dir, trigger_dir)

Parameters
  • input_file – an input file

  • input_dir – an input directory, in default, a current directory

  • trigger_dir – # a directory right above an output directory

  • output_dir – # an output directory where all the omicron trigger files will be stored

Returns

None

ExtractOmcronTriggerMetadata(name_a)[source]
description:
  1. load meta data matrix, expected to be created by readXML()

  2. re-arrange for the sake of convenience and convert it in a numpy array

  3. return numpy re-arranged meta data matrix

USAGE: Matdataset = ExtractOmcronTriggerMetadata(name_a)

Parameters

name_a – omicron trigger info (list)

Returns

Matdataset: omicron trigger metadata (numpy array)

FindGlitchNearestGPS(PathMetadataFile, candidate_GPS)[source]
description:
  1. take a omicron trigger meta data file

  2. find a glitch nearest to an input GPS, which you are concerned about

3. replace the label of this glitch from ‘arbitrary’ to ‘candidate’ This function is assumed to be used in DQR. Once GraceDB provides a GPS time, this function labels a glitch nearest glitch given by Omicron Trigger as it can be considered as the most significant candidate of astronomical event. In this way, we can study only the candidate glitch by specifying glitch_type = candidate in a configuration file

USAGE: largetSNR = FindGlitchNearestGPS(PathMetadataFile, GPS)

Parameters
  • PathMetadataFile – a path to omicron meta data file

  • GPS – a GPS time to be concerned

Returns

None

FindObservingTimeSegments(IFO, startT, endT, outputMother_dir, state='observing')[source]
description:
  1. take a DataQualityFrag from a server

  2. save its segment data as a HDF file

  3. take active segments

  4. return active segments as a numpy array

USAGE: SegmentsMat, trigger_dir = FindObservingTimeSegments(IFO, startT, endT, outputMother_dir, state)

Parameters
  • IFO – a type of interferometer, ‘L1’ or ‘H1’

  • startT – (float, int or string: e.g., ‘Dec 8 2016’) starting time

  • endT – ending time

  • outputMother_dir – an ouput directory where the HDF5 file will be stored

  • state – state of an interferometer, {observing, nominal-lock}

Returns

SegmentsMat: active segments in a numpy array

GetTriggerMetadata(ListSegments, IFO, output_dir, number_process, trigger_pipeline='omicron', output_file='TriggerMetadata.csv', channel='GDS-CALIB_STRAIN')[source]
description:
  1. take the path of files storing omicron triggers during a segment

  2. make a metadata in pandas frame

3. save it as a .csv file Note: this function works samely as CombinedIndentifyingProcess() but it is faster. And it is supposed to support pyCBC trigger as well.

USAGE: GetTriggerMetadata(ListSegments, IFO, output_dir, number_process, trigger_pipeline=’omicron’, output_file=’TriggerMetadata.csv’, channel=’GDS-CALIB_STRAIN’)

Parameters
  • ListSegments – (list of lists) [[s1, e1], [s2, e2], …], this is because I want to exclude segments of non-observing times

  • IFO – interferometer (L1, or H1)

  • output_dir – an output directory

  • output_file – a name of an output file

  • number_process – a number maximum processes in parallel

  • trigger_pipeline – trigger method ‘omicron’, ‘pycbc-live’

  • channel – (str) the name of a channel

Returns

ListXML: a list of trigger XML files

OmimcronTriggerPath(ListSegments, TriggerDir, IFO, output_file='omicron.txt', channel='GDS-CALIB_STRAIN')[source]
description:
  1. log-in either Livingston or Hanford cluster

  2. take the path of files storing omicron triggers during a segment

  3. save those paths into an output file

USAGE: OmimcronTriggerPath(‘L1’, ListSegments, ‘/home/kentaro.mogushi/longlived/OmicronTrigger’, IFO)

Parameters
  • IFO – interferometer (L1, or H1)

  • ListSegments – (list of lists) [[s1, e1], [s2, e2], …], this is because I want to exclude segments of non-observing times

  • channel – (str) the name of a channel

  • trigger_dir – (str) an output directory

  • output_file – (str) an output file

Param

IFO:

Returns

trigger_dir

SaveMetaDataAsCSV(MetaData, output_dir, output_file)[source]
description:
  1. load trigger meta data matrix, which is expected to be created by ExtractOmcronTriggerMetadata()

  2. add label for these triggers with ‘unknown’

  3. add imgUrl with None

  4. save this matrix as .csv file

USAGE: SaveMetaDataAsCSV(AllMatDataStr, trigger_dir, ‘OmicrontriggerMetadata.csv’)

Parameters
  • MetaData – numpy array, omicron trigger metadata

  • output_file – an output file

  • output_dir – and output directory

Returns

None

calculate_chisqr_weighted_snr(snr, chisq, chisq_dof)[source]
description:

calculate the chi-square weighted SNR Reference: Macleod et al. 2015, Equation (21)

USAGE: chisqr_weighted_snr = calculte_chisqr_weighted_snr(self, snr, chisqr, chisqr_dof)

Parameters
  • snr – SNR

  • chisqr – chi-square

  • chisqr_dof – chi-square degrees of freedom

Returns

chisqr_weighted_snr: the chi-square weighted SNR

pyCBC_SNR_filter(all_pycbc_triggers_frame, SNR_low_cut, SNR_high_cut)[source]
description:

band pass filter with SNR for pycbc triggers

USAGE: df_SNR_cut = pyCBC_SNR_filter(all_pycbc_triggers_frame, SNR_low_cut=7.5, SNR_high_cut=150)

Parameters
  • all_pycbc_triggers_frame – a pandas frame of all the pyCBC triggers

  • SNR_low_cut – a lower cutoff SNR

  • SNR_high_cut – a higher cutoff SNR

Returns

df_new: pycbc triggers that pass the all the condition defined above

pyCBC_chisqr_weighted_snr_filter(pycbc_triggers_frame, chisqr_weighted_snr_lower_cutoff)[source]
description

high pass filter with chi square weighted SNR for pycbc triggers

USAGE: df_chi_square_weighted_cut = pyCBC_chisqr_weighted_snr_filter(pycbc_triggers_frame, chisqr_weighted_snr_lower_cutoff)

Parameters
  • pycbc_triggers_frame – pyCBC triggers in pandas frame

  • chisqr_weighted_snr_lower_cutoff – a lower cutoff chi-square weighted snr

Returns

df_new: triggers with chi square weighted SNR above chisqr_weighted_snr_lower_cutoff

pyCBC_massratio_filter(pycbc_triggers_frame, low_cutoff_massratio, high_cutoff_massratio)[source]
description

band pass filter with total mass for pycbc triggers

USAGE: df_massratio_cut = pyCBC_query_massratio(pycbc_triggers_frame, low_cutoff_massratio, high_cutoff_massratio)

Parameters
  • pycbc_triggers_frame – pyCBC triggers in pandas frame

  • low_cutoff_totalmass – a lower cutoff of the total mass in solar mass

  • high_cutoff_totalmass – a higher cutoff of the total mass in solar mass

Returns

df_new: triggers with mass ratio between low_cutoff_massratio and high_cutoff_massratio

pyCBC_query_outlier(pycbc_triggers_frame, BinNum, Nsigma, cut)[source]
description:
  1. bin the triggers with values of log10 of chi-square per DOF

2. calculate the lower bound of log10 of chi-square per DOF in the bin 3 (cut = ‘median’) Calculate the median of log10 of SNR and log10 of chi square per degree of freedom in the bin 3.(cut = ‘mad’) Calculate the upper bound of log10 of SNR in the bin where the lower bound is the median minus the median absolute deviation, the upper bound is the median plus the median absolute deviation 4. Polynomial fit the upper bound of log10 of SNR as a function of the lower bound of log10 of chi-square per DOF 5. split triggers using the polynomial fit to loud and quiet triggers

USAGE: pycbc_loud = pyCBC_query_outlier(pycbc_triggers_frame, BinNum=50, Nsigma=1, cut=’median’)

Parameters
  • pycbc_triggers_frame – pycbc triggers in pandas frame

  • BinNum – the number of bins of a histogram for log10 of chi squar per degrees of freedom

  • Nsigma – an integer to determine the upper bound of the quiet triggers

  • cut – a method of cut {median or mad}

Returns

pycbc_loud: loud triggers

pyCBC_template_duration_filter(pycbc_triggers_frame, low_cutoff_duration, high_cutoff_duration)[source]
description:

band pass filter with template duration for pycbc triggers

USAGE: df_template_duration_cut = pyCBC_template_duration_filter(pycbc_triggers_frame, low_cutoff_duration, high_cutoff_duration)

Parameters
  • pycbc_triggers_frame – pyCBC triggers in pandas frame

  • low_cutoff_duration – a lower cutoff of the template duration in sec

  • high_cutoff_duration – a higher cutoff of the template duration in sec

Returns

df_new: triggers with template druation between low_cutoff_duration and high_cutoff_duration

pyCBC_totalmass_filter(pycbc_triggers_frame, low_cutoff_totalmass, high_cutoff_totalmass)[source]
description

band pass filter with total mass for pycbc triggers

USAGE: df_totalmass_cut = pyCBC_totalmass_filter(pycbc_triggers_frame, low_cutoff_totalmass, high_cutoff_totalmass)

Parameters
  • pycbc_triggers_frame – pyCBC triggers in pandas frame

  • low_cutoff_totalmass – a lower cutoff of the total mass in solar mass

  • high_cutoff_totalmass – a higher cutoff of the total mass in solar mass

Returns

triggers with total mass between low_cutoff_totalmass and high_cutoff_totalmass

pycbc_clustering_timeslice(pycbc_trigger, IFO, startT, endT, window, extension_duration)[source]
description:

This is a clustering filter 1. take a frame of pycbc triggers 2. pick a trigger with highest SNR in a window that is a time-sliced bin 3. create new columns required to run this code

USAGE: pycbc_trigger = pycbc_clustering_timeslice(pycbc_trigger, IFO, startT, endT, window=0.1, extension_duration=1.5)

Parameters
  • pycbc_trigger – pycbc triggers in pandas frame

  • IFO – ifo

  • startT – start time of an epoch

  • endT – end time of an epoch

  • window – a window length in sec for clustering

  • extension_duration – factor for extending the template duration, e.g., extension_duration = 1.5 makes the duration of the on-source window 1.5 times longer than that of the trigger

Returns

pycbc_trigger: clustered pycbc triggers

pycbc_clustering_window_around_trigger(pycbc_trigger, IFO, one_sided_window, extension_duration)[source]
description:

This is a clustering filter 1. take a frame of pycbc triggers 2. pick up a trigger with highest SNR in a window around a trigger. The window size is twice of “one_sided_window” 3. create new columns required to run this code

USAGE: pycbc_trigger = pycbc_clustering_window_around_trigger(pycbc_trigger, IFO, one_sided_window=0.1, extension_duration=1.5)

Parameters
  • pycbc_trigger – pycbc triggers in pandas frame format

  • IFO – ifo

  • one_sided_window – one sided window in seconds around a trigger for clustering,

  • extension_duration – factor for extending the template duration, e.g., extension_duration = 1.5 makes the duration of the on-source window 1.5 times longer than that of the trigger

Returns

pycbc_trigger: clustered pycbc triggers

readXML(input_dir, input_file)[source]
description:
  1. take meta data stored s a .xml file in a directory named OmicronTriggerXML

  2. meta data matrix in the form of list

USAGE: TriggerMat = readXML(input_dir, input_file)

Parameters
  • input_dir – an input directory

  • input_file – input file name

Returns

name_a: (numpy array)

origli.utilities.utilities.ListUsedSafeChannel(path_list_channel, ifo)[source]
description:
  1. take a path to a .csv file that has lists of channels

  2. remove unused safe channel from a list of safe channels

USAGE: sfchs = ListUsedSafeChannel(path_list_channel, ifo)

Parameters
  • path_list_channel – a path to a list of channels (.csv file)

  • ifo – observatory {L1, H1} in str

Returns

sfchs: subst of safe channels in numpy array

origli.utilities.utilities.Multiprocess_ConvertToTable(cache_indiv, trigger_pipeline, IFO, Columns)[source]
description:

Multi process does not work if I include this inside the class IdentifyGlitch() so that I define this here globally

Parameters
  • cache_indiv – an individual cache (each trigger file)

  • trigger_pipeline – a name of trigger pipeline {omicron, pycbc-live}

  • IFO – a name of the detector {L1, H1}

  • Columns – a list of columns for a metadata. This is None for omicron trigger as it is nother to do

Returns

df_indiv: a metadata of triggers in pandas frame

origli.utilities.utilities.Multiprocess_whitening(full_timeseries, target_timeseries_start, target_timeseries_end, pre_background_start, pre_background_end, fol_background_start, fol_background_end)[source]
description:

This is used for multi processing for whitening segments

Parameters
  • full_timeseries – time series comprising target and BGs

  • target_timeseries_start – a start time of a target segment

  • target_timeseries_end – an end time of a target segment

  • pre_background_start – a start time of a preceding BG

  • pre_background_start – an end time of a preceding BG

  • pre_background_start – a start time of a following BG

  • pre_background_start – an end time of a following BG

Returns

whitened_fft_target: whitened fft of a target segment whitened_fft_PBG: whitened fft of a preceding segment whitened_fft_FBG: whitened fft of a following segment sample_rate: sampling rate of this channel DURATION: a duration of a target segment

origli.utilities.utilities.Multiprocess_whitening_timeseries(full_timeseries, target_timeseries_start, target_timeseries_end, pre_background_start, pre_background_end, fol_background_start, fol_background_end)[source]
description:

This is used for multi processing for whitening segments and puts out the absolute values of whitened timeseries of the on- and off-source windows instead of putting out of the frequency series. The the deviation of the whitened timeseries is imformative so that it calculates absolute values This function is used to study the metric evaluated in the time domain compared with the metric evaluated in the frequency domain. In conclusion, the metric in the frequency domain is better so that this function is useless.

USAGE: whitened_target_timeseries_abs, whitened_pre_off_source_abs, whitened_fol_off_source_abs, sample_rate, DURATION = Multiprocess_whitening_timeseries(full_timeseries, target_timeseries_start, target_timeseries_end, pre_background_start, pre_background_end, fol_background_start, fol_background_end)

Parameters
  • full_timeseries – time series comprising target and BGs

  • target_timeseries_start – a start time of a target segment

  • target_timeseries_end – an end time of a target segment

  • pre_background_start – a start time of a preceding BG

  • pre_background_start – an end time of a preceding BG

  • pre_background_start – a start time of a following BG

  • pre_background_start – an end time of a following BG

Returns

whitened_target_timeseries_abs: the absolute values of whitened timeseries in the on-source window whitened_pre_off_source_abs: the absolute values of whitened timeseries in the preceding off-source window whitened_fol_off_source_abs: the absolute values of whitened timeseries in the preceding off-source window sample_rate: sampling rate of this channel DURATION: a duration of a target segment

class origli.utilities.utilities.PlotTableAnalysis[source]

Bases: object

AutoDetermineTrendBin(SegmentStart, SegmentEnd)[source]

description: automatically determine the bins of the subclass trend plot USAGE: trend = AutoDetermineTrendBin(SegmentStart, SegmentEnd)

Parameters
  • SegmentStart – start time of a segment

  • SegmentEnd – end time of a segment

Returns

trend: {‘mins’, ‘hours’, ‘days’, ‘month’}

BinomialDist(k, n, likelihood)[source]

description: Binomial distribution USAGE: out = BinomialDist(self, k, n, likelihood)

Parameters
  • k – the number of detections

  • n – the number of trials

  • p – likelihood of detection

Returns

out: a value of probability density to find k detection out of n trials

BinomialTest(k, N, rate)[source]

description: compute p-value of one-tailed Binomial test against null rate USAGE: p_value = BinomialTest(k, N, rate)

Parameters
  • k – observed numberof successes

  • N – total number of samples

  • rate – rate of successes drawn from a null hypothesis

Returns

p_value: probablity of successes equal or greater than the observed number of successes

Calculate_number_and_rate(df_target, df_null, d_c, err_cal, channels)[source]
description:
  1. use the target and null samples

  2. calculate the numbers of the target and null samples above threshold

  3. calculate the fraction of the target and null samples above threshold

USAGE: channels, list_num1, list_num0, total_sample1, total_sample0, list_Causal, list_causal_err = Calculate_number_and_rate(df_target, df_null, d_c=None, err_cal=False, channels=None)

Parameters
  • df_target – target samples in pandas frame

  • df_null – null samples in pandas frame

  • d_c – a threshold. if it is None, the threshold is the mean value of null samples

  • err_cal – boolen, wether calculate the error of the statistics or not

  • channels – a list of channels

Returns

channels: a list of channels list_num1: a list of the number of the target samples above a threshold list_num0: a list of the number of the null samples above a threshold total_sample1: the number of the target samples, float total_sample0: the number of the null samples, float list_Causal: a list of the statistics list_causal_err: a list of the errors of the statistics. It is None if err_cal = False

CreateChannelTicks(ListChannelName)[source]
description:
  1. take the dominant sub-channel names

  2. get a list of index where a sub-sensor name changes

dependencies: (tacitly) CreateMatCount(), make_subset_channel_based_on_samplingrate() USAGE: CenterTicks, ListInd, ListSubsys = CreateChannelTicks(ListChannelName)

Parameters

ListChannelName – a list of channel names

Returns

CenterTicks: center value of a dominant sub sensor belongs ListInd: the edge indices of a dominant sub sensor belongs ListSubsys: a list of dominant sensor names

CreateMatCount(sigma, g_Individual=None, LowerCutOffFreq='None', UpperCutOffFreq='None')[source]
description:
  1. find counts for each glitch

  2. stack over all the gliches

  3. make a matix comprising importance versus channels

USAGE: MatCount, ListChannelName, ListSNR, ListConf, ListGPS, ListDuration = CreateMatCount(sigma, LowerCutOffFreq=’None’, UpperCutOffFreq=’None’) # for all glitches

MatCount, ListChannelName, ListSNR, ListConf = CreateMatCount(sigma, g_Individual, LowerCutOffFreq=’None’, UpperCutOffFreq=’None’) # for an individual glitch

Parameters
  • sigma (sigma) – an integer to determine the upper bound of the off-source window

  • g_Individual – a grounp of a HDF5 file that has values of importance for a gltich

  • LowerCutOffFreq – a lower limit frequency cut to calculate a value of importance

  • UpperCutOffFreq – an uppper limit frequency cut to calculate a value of importance

Returns

MatCount: a matrix comprising importance versus channels ListChannelName: a list of channel names ListSNR: a list of SNRs ListConf: a list of confidence

dependencies: self.HierarchyChannelAboveThreshold(g, LowerCutOffFreq, UpperCutOffFreq)

Determine_number_of_subclass(Path_Target_Glitch_SubClassClustered_Dataset, Path_Null_Dataset, test_confidence)[source]
description:
  1. query the clustered target samples

  2. query null samples

  3. perform one-sided binomial test and one-sided Welch t-test on each subclass

  4. count the number of subclasses which has at least one channel passing the both tests

USAGE: num_subclass = Determine_number_of_subclass(Path_Target_Glitch_SubClassClustered_Dataset, Path_Null_Dataset, test_confidence)

Parameters
  • Path_Target_Glitch_SubClassClustered_Dataset – a path to the clustered target samples

  • Path_Null_Dataset – a path to null samples

  • test_confidence – a statistical confidence level

Returns

num_subclass: the number of subclasses which has at least one channel passing the both tests

FindNumberChannels(g)[source]
description:

count a number of channels that are analyzed

USAGE: NumberOfChannels = FindNumberChannels(g)

Parameters

g – a HDF file group object

Returns

NumberOfChannels: a number of channels that are analyzed

FindNumberSample()[source]
description:

find the number of samples

Returns

the number of samples

FindSubClass(MatCount, ListChannelName, ListGPS, ListDuration, output_dir, upper_number_cluster, applied_Transformation)[source]
description:
  1. use a clustering approach

  2. plot glitch index VS channel grouped by clusters

  3. make a table comprising a list of GPS times a given cluster

  4. plot Importance VS channel of a given sub-class

  5. make a corresponding table

USAGE: FindSubClass(MatCount, ListChannelName, ListGPS, output_dir)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • ListGPS – a list of GPS times

  • ListDuration – a list of durations

  • output_dir

  • upper_number_cluster – a value of upper limit of the number of clusters

  • applied_Transformation – decomposition applided {‘PCA’, ‘kernelPCA’, ‘None’}

Returns

None

FrequencyBandColor(PathGlitchChannel_Low, PathGlitchChannel_Mid, PathGlitchChannel_High)[source]
description:
  1. load files that have a low, middle, high frequency band importances. (GPSImportanceChannels.csv)

  2. make a matrix that has RGB color based on each importance

  3. output this matrix

USAGE: RGBMat = FrequencyBandColor(PathGlitchChannel_Low, PathGlitchChannel_Mid, PathGlitchChannel_High)

Parameters
  • PathGlitchChannel_Low – a path to a file that has a low frequency band

  • PathGlitchChannel_Mid – a path to a file that has a middle frequency band

  • PathGlitchChannel_High – a path to a file that has a high frequency band

Returns

RGBMat

Get_MatCountGPSDuration(Path_Target_Glitch_SubClassClustered_Dataset)[source]
description:

get a importance matrix, a list of channels, a list of GPS times, and a list of durations from the clustered target samples

Parameters

Path_Target_Glitch_SubClassClustered_Dataset – a path to the clustered target samples

Returns

MatCount: a importance matrix ListChannelName: a list of channels ListGPS: a list of GPS times ListDuration: a list of durations

HierarchyChannelAboveThreshold(g, sigma, LowerCutOffFreq='None', UpperCutOffFreq='None')[source]
description:

calculate values of importance which is the number of frequency bins above a the off-source window for channels for a given time of a glitch

USAGE: RankingChannelAndCount, ListCount, ListChannelName, GPS, ID, SNR, confidence, duration = HierarchyChannelAboveThreshold(g, LowerCutOffFreq, UpperCutOffFreq, sigma=10)

Parameters
  • g – (hdf5 format) a group having a glitch

  • pt – a value of area integrating a distribution upto

  • sigma – value of standard deviation of ratio of medians to determine important channels

Returns

RankingChannelAndRatioMed: # a list of channel names with their ratios descended by ratios Importantchannels: # channels with their ratio is greater than the threshold of ratio, along with their ratios

MakeMatrixOccurrenceVSChannel(MatCount, ListChannelName, ListGPS, ListDuration, output_dir)[source]
description:

save matrix of GPS, duration, importance as .csv file

USAGE: PlotOccurrenceVSChannel(MatCount, ListChannelName, ListGPS, ListDuration, output_dir, output_file)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • ListGPS – a list of GPS times

  • ListDuration – a list of durations

  • output_dir

Returns

None

dependencies: CreateChannelTicks()

PlotCausalityVSChannel(list_Causal_passed, list_Causal_fail, list_causal_passed_err, list_causal_failed_err, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

Calculate the probabilities of the causality of channels

USAGE: PlotCausalityVSChannel(list_Causal_passed, list_Causal_fail, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence)

Parameters
  • list_Causal_passed – a list of the probability of the causality that are passed one-tailed Binomial test, if not zero

  • list_Causal_fail – a list of the probability of the causality that are failed one-tailed Binomial test, if not zero

  • list_causal_passed_err – a list of the error of the causal probability that are passed one-tailed Binomial test, otherwise, zero

  • list_causal_failed_err – a list of the error of the causal probability that are failed one-tailed Binomial test, otherwise, zero

  • list_Test – a list of results of the Binomial test, ‘pass’ or ‘fail’

  • ListChannelName – a list of channel names

  • output_dir – (only used for all glitches)

  • output_file – (only used for all glitches)

  • BinomialTestConfidence – binomial test confidence level

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

PlotConfidenceVSChannel(MatCount, ListChannelName, ListConf, output_dir, output_file)[source]

make a plot of values of confidence level of Gravity Spy versus channels dependencies: CreateChannelTicks() USAGE: PlotConfidenceVSChannel(MatCount, ListChannelName, ListConf, output_dir, output_file)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • ListConf – a list confidence levels

  • output_dir

  • output_file

Returns

None

PlotImportanceVSChannel(MatCount, ListChannelName, output_dir, output_file, ax=None)[source]
description:
  • for all glitches in a class, this method works stand-alone

  1. convert number to channel names in x ticks

  2. color background based on channel types

3. plot a bar showing importance of channels USAGE: PlotImportanceVSChannel(MatCount, ListChannelName, output_dir, output_file) - for an individual glitch in a class, this method is used by …. 1. convert number to channel names in x ticks 2. color background based on channel types 3. plot a bar showing importance of channels

dependencies: CreateChannelTicks() tacitly CreateMatCount() USAGE: PlotImportanceVSChannel(MatCount, ListChannelName, None, None, ax) # for on-line mode USAGE: PlotImportanceVSChannel(MatCount, ListChannelName, output_dir, output_file) @ for off-line mode

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • output_dir – (only used for all glitches)

  • output_file – (only used for all glitches)

  • ax – matplotlib.pyplot object (used only for an individual glitch)

Returns

None

PlotIndividualFCS_ImportanceVSChannel(glitchtype, IFO, GravitySpy_df, output_dir, sigma, LowerCutOffFreq='None', UpperCutOffFreq='None', mode='offline', Listsegments=None, re_sfchs=None, Data_outputpath=None, Data_outputfilename=None, PlusHOFT='False', number_process=None)[source]
description:
  1. load a file comprising all glitches in a class

  2. create a plot comprising frequency versus channel & importance versus channel
    • dependency:

      self.CreateChannelTicks(ListChannel) self.make_subset_channel_based_on_samplingrate() self.CreateMatCount() self.PlotImportanceVSChannel()

  3. save a plot

dependencies: make_subset_channel_based_on_samplingrate(), CreateChannelTicks(), CreateMatCount(), PlotImportanceVSChannel() USAGE: PlotIndividualFCS_ImportanceVSChannel(glitchtype, IFO, output_dir, sigma, LowerCutOffFreq, UpperCutOffFreq)

Parameters
  • glitchtype – a type of glitch used for create a name of a plot

  • IFO – # a type of IFO used in a name of a plot

  • GravitySpy_df – a meta data of Gravity Spy in pandas frame

  • output_dir – a output directory

  • simga – an integer number used for the upper bound of BG noise

  • LowerCutOffFreq – the lower cut-off frequency used in CreateMatCount, ‘None’ in default

  • UpperCutOffFreq – the upper cut-off frequency used in CreateMatCount, ‘None’ in default

  • mode – ‘offline’ or ‘online’

  • Listsegments – a list of allowed glitches, which is used for online mode only, None in default

  • re_sfchs – a list of safe channels except unused channels, which is used for online mode only, None in default

  • Data_outputpath – a directory saving for a HDF5 file, which is used for online mode only, None in default

  • Data_outputfilename – a file saving for a HDF5 file, which is used for online mode only, None in default

  • PlusHOFT – whether to get data of HOFT {‘True’, ‘False’}, which is used for online mode only, ‘False’ in default

  • number_process – a number of processes in parallel, which is used for online mode only, None in default

return None

PlotOccurrenceVSChannel(MatCount, ListChannelName, ListGPS, ListDuration, output_dir, output_file)[source]
description:

plot glitch indecies versus channels

USAGE: PlotOccurrenceVSChannel(MatCount, ListChannelName, ListGPS, ListDuration, output_dir, output_file)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • ListGPS – a list of GPS times

  • ListDuration – a list of durations

  • output_dir

  • output_file

Returns

None

dependencies: CreateChannelTicks()

PlotSNRVSChannel(MatCount, ListChannelName, ListSNR, output_dir, output_file)[source]
description:

make a plot of SNR of h(t) versus channels

dependencies: CreateChannelTicks() USAGE: PlotSNRVSChannel(self, MatCount, ListChannelName, ListSNR, output_dir, output_file)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • ListSNR – a list SNRs

  • output_dir

  • output_file

Returns

None

PlotTimeVSChannel(MatCount, ListChannelName, ListGPS, output_dir, output_file, startT, endT, dt)[source]
description:

make a plot of glitch times versus channels where the time flow is from top to bottom If there are more than one glitch in a time bin, those gltiches’ values of importance are averaged If there is no glitches in a time bin, all the values of importance are set to be zero

dependencies: CreateChannelTicks() USAGE: PlotTimeVSChannel(MatCount, ListChannelName, output_dir, output_file, startT, endT, dt)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • ListGPS – a list of GPS times

  • output_dir – a path to an output directory

  • output_file – a name of an output file

  • startT – start time of an epoch

  • endT – end time of an epoch

  • dt – step size in sec of the time slice

Returns

None

Plot_Welch_t_test(channels, list_t_values_passed, list_t_values_failed, list_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of one-sided Welch t-test

USAGE: Plot_Welch_t_test(channels, list_t_values_passed, list_t_values_failed, list_Test, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_t_values_passed – a list of t-values that pass the test

  • list_t_values_failed – a list of t-values that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Plot_fap(channels_band, list_GPS, list_duration, mat_fap, mat_Test, confidence_level, output_dir, output_file=None, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of point chi-square test

USAGE: Plot_point_chisqur_test(channels, list_GPS, list_duration, mat_fap, mat_Test, confidence_level, output_dir, output_file, freq_bands=Const.freq_bands)

Parameters
  • channels_band – a list of channels in numpy array

  • list_GPS – a list of GPS times in numpy array

  • list_duration – a list of durations in numpy array

  • mat_fap – matrix of fap

  • mat_Test – a list of the test results {‘pass’, ‘fail’}

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Plot_p_belong(channels_band, list_GPS, list_duration, mat_p_belong, mat_Test, confidence_level, output_dir, output_file=None, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of p_belong

USAGE: Plot_p_belong(channels, list_GPS, list_duration, mat_p_belong, mat_Test, confidence_level, output_dir, output_file, freq_bands=Const.freq_bands)

Parameters
  • channels – a list of channels in numpy array

  • list_GPS – a list of GPS times in numpy array

  • list_duration – a list of durations in numpy array

  • mat_p_belong – matrix of p_belong

  • mat_Test – a list of the test results {‘pass’, ‘fail’}

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Plot_p_greater(channels, list_p_greater_passed, list_p_greater_failed, list_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of p_greater

USAGE: Plot_p_greater(channels, list_p_greater_passed, list_p_greater_failed, list_Test, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_p_greater_passed – a list of p_greater above confidence level

  • list_p_greater_failed – a list of p_greater that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Plot_point_chisqur_test(channels_band, list_GPS, list_duration, mat_chsqr_passed, mat_chsqr_failed, mat_Test, p_values, confidence_level, output_dir, output_file=None, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

plot the result of point chi-square test

USAGE: Plot_point_chisqur_test(channels, list_GPS, list_duration, mat_chsqr_passed, mat_chsqr_failed, mat_Test, confidence_level, output_dir, output_file, freq_bands=Const.freq_bands)

Parameters
  • channels_band – a list of channels in numpy array

  • list_GPS – a list of GPS times in numpy array

  • list_duration – a list of durations in numpy array

  • mat_chsqr_passed – a matrix of “passed” chi-square values where glitch indices are in rows and channels are columns in numpy array note that channels in glitches that passed the test have non-zero values, otherwise zero

  • mat_chsqr_failed – a matrix of “failed” chi-square values where glitch indices are in rows and channels are columns in numpy array note that channels in glitches that failed the test have non-zero values, otherwise zero

  • mat_Test – a list of the test results {‘pass’, ‘fail’}

  • p_values – a matrix of p-values where glitch indices are in rows and channels are columns in numpy array

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Principal_component_analysis(MatCount, frac_component=0.9)[source]
description:

PCA decomposition applided to feature matrix

USAGE: MatCount, MatCount_inverse_pca = Principal_component_analysis(self, MatCount, frac_component=0.9)

Parameters
  • MatCount – a feature matrix of glitches with samples in rows and features in columns

  • frac_component – a cumulative variance

Returns

MatCount_pca: a feature matix in PCA space MatCount_inverse_pca: a reconstructed feature matrix in the original space

Save_fap_csv(channels, list_GPS, list_duration, mat_fap, output_dir)[source]
description:

save the FAP table as csv file

USAGE: Save_fap_csv(channels, list_GPS, list_duration, mat_fap, output_dir)

Parameters
  • channels – a list of channels

  • list_GPS – a list of GPS times

  • list_duration – a list of durations

  • mat_fap – a matrix of FAP of each channel for each glitch

  • output_dir – output directory

Returns

None

Save_p_belong_csv(channels_passed, list_GPS, list_duration, mat_p_belong, output_dir)[source]
description:

save the p_belong table as csv file channels_passed is the list of channels whose p_greater is above 0.5 to avoid misinterpretation of p_belong.

USAGE: Save_p_belong_csv(channels_passed, list_GPS, list_duration, mat_p_belong, output_dir)

Parameters
  • channels_passed – a list of channels

  • list_GPS – a list of GPS times

  • list_duration – a list of durations

  • mat_p_belong – a matrix of p_belong of each channel for each glitch

  • output_dir – output directory

Returns

None

Student_t_independet_test(data1, data2, Welch_test=True)[source]

Independent Student t-test USAGE: stat, p_value = self.Student_t_independet_test(data1, data0)

Parameters
  • data1 – a population

  • data2 – another population

Returns

t_value: t-value (critical value) p: p-value

TableCausality(list_Causal_passed, list_Causal_fail, list_Test, ListChannelName, output_dir, output_file, BinomialTestConfidence, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]

make a table of witness ratio statistic (WRS) of channels as a .csv file USAGE: TableCausality(self, list_Causal_passed, list_Causal_fail, list_Test, ListChannelName, output_dir, output_file)

Parameters
  • list_Causal_passed – a list of the probability of the causality that are passed one-tailed Binomial test, if not zero

  • list_Causal_fail – a list of the probability of the causality that are failed one-tailed Binomial test, if not zero

  • list_Test – a list of results of the Binomial test, ‘pass’ or ‘fail’

  • ListChannelName – a list of channel names

  • output_dir

  • output_file

  • BinomialTestConfidence – binomial test confidence level

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

TableImportance(MatCount, ListChannelName, output_dir, output_file)[source]

make a table of values of importance as a .csv file USAGE: TableImportance(MatCount, ListChannelName, output_dir, output_file)

Parameters
  • MatCount – a matrix comprising importance versus channels

  • ListChannelName – a list of channel names

  • output_dir

  • output_file

Returns

None

Table_Welch_t_test(channels, list_t_values_passed, list_t_values_failed, list_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

save the result of one-sided Welch t-test as a .csv file

USAGE: Table_Welch_t_test(channels, list_t_values_passed, list_t_values_failed, list_Test, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_t_values_passed – a list of t-values that pass the test

  • list_t_values_failed – a list of t-values that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

Table_p_greater(channels, list_p_greater_passed, list_p_greater_failed, list_Test, confidence_level, output_dir, output_file, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:

save the result of p_greater as a .csv file

USAGE: Table_Welch_t_test(channels, list_p_greater_passed, list_p_greater_failed, list_Test, output_dir, output_file)

Parameters
  • channels – a list of channels

  • list_p_greater_passed – a list of p_greater that pass the test

  • list_p_greater_failed – a list of p_greater that fail the test

  • list_Test – a list of the test results {‘pass’, ‘fail’}

  • confidence_level – a confidence level

  • output_dir – an output directory

  • output_file – an output file name

  • freq_bands – frequency bands used for the multi-frequency band search, which is defined in const.py

Returns

None

TrendSubClass(SegmentStart, SegmentEnd, input_dir, input_file, output_dir, trend='months', norm=False)[source]

input file is supposed to be ClusteredGPSImportanceChannels.csv USAGE: TrendSubClass(SegmentStart, SegmentEnd, input_dir, input_file, output_dir, trend=’month’)

Parameters
  • SegmentStart – start GPS time of a segment

  • SegmentEnd – end GPS time of a segment

  • input_dir – input directory

  • input_file – an input file

  • output_dir – an output directory

  • trend – a trend ‘months’, ‘days’, ‘hours’ or ‘mins’

Returns

None

calculate_fap(target_mat, null_mat)[source]
description:

calculate the values of fap of each channel in each glitch

USAGE: faps = calculate_fap(target_mat, null_mat)

Parameters
  • target_mat – a matrix of target samples where glitch indices are in rows and channels are in columns

  • null_mat – a matrix of null samples where glitch indices are in rows and channels are in columns

Returns

fap

calculate_reweighted_importance(df1, df_FAP)[source]
description:

reweight the values of importance using FAP the reweighted importance is defined as rho_new = rho / (FAP + 1)

USAGE: df1_new = calculate_reweighted_importance(df1, df_FAP)

Parameters
  • df1 – a target samples in pandas frame

  • df_FAP – a FAP matrix in pandas frame

Returns

df1_new: reweighted importance matrix in pandas frame

chisquare_test(target_mat, null_mat)[source]
description:

calculate the values of chi-square of each channel in each glitch and put out values of chi-square and corresponding p-values

USAGE: chi2_value, p_value = chisquare_test(self, target_mat, null_mat)

Parameters
  • target_mat – a matrix of target samples where glitch indices are in rows and channels are in columns

  • null_mat – a matrix of null samples where glitch indices are in rows and channels are in columns

Returns

chi2_value: a matrix of chi-square values where glitch indices are in rows and channels are in columns p_value: a matrix of p-values where glitch indices are in rows and channels are in columns

find_channels(df_target)[source]
description:

find a list of channels from the target samples

USAGE: channels = find_channels(df_target)

Parameters

df_target – target samples in pandas frame

Returns

channels: a list of channels

find_meaing_ful_confidence(df1, df0, BinomialTestConfidence, d_c=None, err_cal=False, channels=0)[source]
description
  1. load files of target glitches (df1) and dummy quite data set (df0)

  2. compute true positive probability

  3. perform one-tailed Binomial test

USAGE: list_Causal_passed, list_Causal_fail, list_causal_passed_err, list_causal_failed_err, list_Test, list_channels= find_meaing_ful_confidence(df1, df0, BinomialTestConfidence, d_c)

Parameters
  • df1 – target glitches in the pandas format

  • df0 – null dataset in the pandas format

  • BinomialTestConfidence – a confidence level used for one-tailed Binomial test

  • d_c – user defined threshold to claim detection of a glitch, None in default

  • channels – a list of channels, 0 in default if d_c is None, values of threshold is given by the mean value of importance generated by the dummy quite dataset

Returns

list_Causal_passed: a list of the probability of the causality that are passed one-tailed Binomial test, otherwise, zero list_Causal_fail: a list of the probability of the causality that are failed one-tailed Binomial test, otherwise, zero list_causal_passed_err: a list of the error of the causal probability that are passed one-tailed Binomial test, otherwise, zero list_causal_failed_err: a list of the error of the causal probability that are failed one-tailed Binomial test, otherwise, zero list_Test: a list of results of the Binomial test, ‘pass’ or ‘fail’ channels: a list of channel names

getHDF5Object()[source]

show a HDF5 file object USAGE: getHDF5Object()

Returns

a dictionary

make_subset_channel_based_on_samplingrate(g, target_sampling_rate)[source]

dependencies being upon: PlotIndividualFCS_ImportanceVSChannel() Usage: X, duration, Listch_label_num, ListChannel, GPS, SNR, confidence, ID = make_subset_channel_based_on_samplingrate(f[‘gps00000’], 256)

Parameters
  • g – target glitch class’s group or a file itself (at a GPS time), HDF5 format

  • target_sampling_rate – Sampling rate used to group them, (256, 512, 1024, 2048, 4096, 8192, 16384

Return X

matrix comprising some channels with a same samplign rate of a target glitch class at a given time whitened a reference, only duration: duration of time series Listch_label_num: the list of channel labels ListChannel: the list of name of channels GPS: GPS time of this group SNR: SNR of this glitch confidence: confidence level of this glitch ID: GravitySpy uniqu ID

perform_Welch_test(df_target, df_null, confidence_level, channels=0)[source]
description:

perform one-sided Welch t-test

USAGE: channels, list_t_values_passed, list_t_values_failed, list_Test = perform_Welch_test(df_target, df_null, confidence_level, channels=None)

Parameters
  • df_target – target samples in pandas frame

  • df_null – null samples in pandas frame

  • confidence_level – a confidence level

  • channels – a list of channels, 0 in default

:return

channels: a list of channels list_t_values_passed: a list of t-values that pass the test list_t_values_failed: a list of t-vlaues that fail the test list_Test: a list of the test results {‘pass’, ‘fail’}

perform_beta_dist(df_target, df_null, channels=0)[source]
description:

create beta distribution fits for target and null samples

USAGE: rv_t_dict, rv_n_dict = perform_beta_dist(df_target, df_null, channels=0)

Parameters
  • df_target – target samples in pandas frame

  • df_null – null samples in pandas frame

  • channels – channels (optional)

Returns

rv_t_dict: a dictionary of beta distribution (scipy obj) for the target samples rv_n_dict: a dictionary of beta distribution (scipy obj) for the null samples

perform_fap(df_target, df_null, confidence_level, channels=0)[source]
description:

calculate a FAP for each channel at each glitch

USAGE: channels, list_GPS, list_duration, mat_fap, mat_Test, confidence_level= perform_fap(df_target, df_null, confidence_level, channels)

Parameters
  • df_target – target samples in pandas frame

  • df_null – null samples in pandas frame

  • confidence_level – a confidence level

  • channels – a list of channels, 0 in default

:return

channels: a list of channels in numpy array list_GPS: a list of GPS times in numpy array list_duration: a list of durations in numpy array mat_fap: matrix of fap mat_Test: a list of the test results {‘pass’, ‘fail’} confidence_level: a value of user defined confidence level that is used for the test

perform_p_belong(df_target, p_greater_dict, rv_t_dict, rv_n_dict, confidence_level, channels=0)[source]
description:

1. calculate p_belong for each channel in each frequency in each glitch keep only channels whose p_greater is greater than 0.5

USAGE: list_channels_passed, list_GPS, list_duration, mat_p_belong, mat_Test, confidence_level = perform_p_belong(df_target, p_greater_dict, rv_t_dict, rv_n_dict, confidence_level, channels=0)

Parameters
  • df_target – target samples in pandas frame

  • p_greater_dict – null samples in pandas frame

  • rv_t_dict – a dictionary of beta distribution (scipy obj) for the target samples

  • rv_n_dict – a dictionary of beta distribution (scipy obj) for the null samples

  • confidence_level – confidecel level

  • channels – channels (optional)

Returns

list_channels_passed: a list of channels whose p_greater is above 0.5 list_GPS: a list of GPS time of the target samples list_duration: list of durations mat_p_belong: a matrix of p_belong where glitch samples are in row and passed channels are in columns mat_Test: a matrix of {pass, fail} where confidence_level: confidence level used

perform_p_greater(df_target, rv_t_dict, rv_n_dict, confidence_level, channels=0)[source]
description:

calcualte p_greater for channels note that set p_greater to be 0.5 if the p_greater is not monotonically growing for the target samples

USAGE: channels, p_greater_dict, list_p_greater, list_p_greater_passed, list_p_greater_failed, list_Test, confidence_level = perform_p_greater(df_target, rv_t_dict, rv_n_dict, confidence_level, channels=0)

Parameters
  • df_target – target samples in pandas frame

  • rv_t_dict – a dictionary of beta distribution (scipy obj) for the target samples

  • rv_n_dict – a dictionary of beta distribution (scipy obj) for the null samples

  • confidence_level – confidence level

  • channels – channels

Returns

channels: list of channels p_greater_dict: a dictionary of p_greater list_p_greater: a list of p_greater list_p_greater_passed: a lisf of p_greater where p_greater is kept if the value greater than confidence level, other wise 0 list_p_greater_failed: a lisf of p_greater where p_greater is kept if the value less than confidence level, other wise 0 list_Test: list of {pas, fail} confidence_level: confidence level

perform_point_chisqr_test(df_target, df_null, confidence_level, channels=0)[source]
description:

perform a single point chi-square test for each channel at each glitch

USAGE: channels, list_GPS, list_duration, mat_chsqr_passed, mat_chsqr_failed, mat_Test, p_values, confidence_level= perform_point_chisqr_test(df_target, df_null, confidence_level, channels)

Parameters
  • df_target – target samples in pandas frame

  • df_null – null samples in pandas frame

  • confidence_level – a confidence level

  • channels – a list of channels, 0 in default

:return

channels: a list of channels in numpy array list_GPS: a list of GPS times in numpy array list_duration: a list of durations in numpy array mat_chsqr_passed: a matrix of “passed” chi-square values where glitch indices are in rows and channels are columns in numpy array

note that channels in glitches that passed the test have non-zero values, otherwise zero

mat_chsqr_failed: a matrix of “failed” chi-square values where glitch indices are in rows and channels are columns in numpy array

note that channels in glitches that failed the test have non-zero values, otherwise zero

mat_Test: a list of the test results {‘pass’, ‘fail’} confidence_level: a value of user defined confidence level that is used for the test p_values: a matrix of p-values where glitch indices are in rows and channels are columns in numpy array

query_targetglitch_null(path_target_glitch_dataset, path_null_dataset)[source]
description:

load files, otherwise, end the program

Parameters
  • path_target_glitch_dataset – a path to .csv file of a target gltich class

  • path_null_dataset – a path to .csv file of a null dataset

Returns

df1: a pandas dataframe of a target glitch class df0: a pandas datafram of a null dataset

ranking_channels(list_ranking_statistic, list_Test)[source]
description:
  1. sort based on the value of the ranking statistic

  2. sort based on the test with “pass” and “fail”, where “pass” comes before “fail”

  3. findthe indecies based on the sort 1) and 2)

USAGE: list_sorted_base_index_pass_fail, list_sorted_ranking_statistic_pass_fail, list_sorted_Test_pass_fail = ranking_channels(list_ranking_statistic, list_Test)

Parameters
  • list_ranking_statistic

  • list_Test

Returns

setHDF5Object(input_dir, input_file)[source]

set a HDF5 file object

Parameters

f_dict – a dictionary of time series data sets

Returns

None

origli.utilities.utilities.RemoveChannelUnused(re_sfchs, PathListChannelUnused)[source]
Description:

K.M find that some of the channels in the list of the safe channels are not used in O2 so that gwpy can not get the time series of those channels. This function remove those channels.

USAGE:

re_sfchs = RemoveChannelUnused(re_sfchs, ‘/home/kentaro.mogushi/longlived/MachineLearningJointPisaUM/dataset/ListSaveChannel/L1/O2_omicron_channel_list_hvetosafe_GDS.txt’)

Parameters

re_sfchs – the list the safe channels in the numpy array format

Return re_sfchs

the list the safe channels without unused channels in the numpy array format

origli.utilities.utilities.SaveTargetAndBackGroundHDF5_OFFLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT='False')[source]
description:

THIS IS USED FOR “OFFLINE” MODE 0. assuming Listsegments is given by Findglitchlist() 1.take the information of the list of allowed target and a preceding and following segments 2. whiten a target segmement based on the average background segment 3. find whitened FFT 4. save the whitened target and backgrounds FFTs Note this depends on

USAGE: SaveTargetAndBackGroundHDF5(Listsegments, re_sfchs, IFO, outputpath, outputfilename, mode==’offline’)

Parameters
  • Listsegments – a list of segment parameters

  • channels – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

origli.utilities.utilities.SaveTargetAndBackGroundHDF5_ONLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT)[source]
description:

THIS IS USED FOR “ONLINE” MODE 0. assuming Listsegments is given by Findglitchlist() 1.take the information of the list of allowed target and a preceding and following segments 2. whiten a target segmement based on the average background segment 3. find whitened FFT 4. save the whitened target and backgrounds FFTs Note this depends on Multiprocess_whitening()

USAGE: SaveTargetAndBackGroundHDF5(Listsegments, re_sfchs, IFO, outputpath, outputfilename, mode==’offline’)

Parameters
  • Listsegments – a list of segment parameters

  • channels – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}

origli.utilities.utilities.SaveTargetAndBackGroundHDF5_TimeShift(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT='False')[source]
description:

THIS IS USED FOR “OFFLINE” MODE 0. assuming Listsegments is given by Findglitchlist() 1.take the information of the list of allowed target and a preceding and following segments 2. whiten a target segmement based on the average background segment 3. find whitened FFT 4. save the whitened target and backgrounds FFTs Note this depends on

USAGE: SaveTargetAndBackGroundHDF5(Listsegments, re_sfchs, IFO, outputpath, outputfilename, mode==’offline’)

Parameters
  • Listsegments – a list of segment parameters

  • channels – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

origli.utilities.utilities.TimeShiftingSamplePrecedingBGonly(df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UserDefinedDuration, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf, flag='Both')[source]
description:
  1. load Gravity Spy data set (.csv file)

  2. get the data about the target glitch class

  3. get the subset of the target glitches based on SNR and confidence level threshold a user defines

  4. accept glitches where their back ground segment do not coincide with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = FindglitchlistOnLineMode(df, Epochstart, Epochend, Commissioning_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UserDefinedDuration, gap, flag)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • position_duration_bfr_centr – proportion of duration for a target segment around a center time, e.g, 0.5 indicates duration is even distributed around a center time, 0.83 indicates 5/6 is before the center time

  • TriggerPeakFreqLowerCutoff – a lower limit cutoff value of the peak frequency of triggers given by an ETG for target glitches

  • TriggerPeakFreqUpperCutoff – an upper limit cutoff value of the peak frequency of triggers given by an ETG queries for target glitches

  • targetUpperSNR_thre – an upper limit cutoff value of SNR of triggers given by an ETG queries for target glitches

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.utilities.select_some_trials(iterate, maximum_iterate=5)[source]
description:

this function is used to reduce the number of trials for the off-source window

USAGE: list_index_trials = select_some_trials(iterate, maximum_iterate=5)

Parameters
  • iterate – a number of trials

  • maximum_iterate – a maximum number of trials

Returns

a randomly chosen trials where the total number is maximum_iteration or less

origli.utilities.veto_utilities

file name: veto_utilities.py

this file contains the utilities to be used for finding veto channel

origli.utilities.veto_utilities.BackgroundCut(df_null, channel, background_upper_cut)[source]
description:

calculate the upper cut of channels using the FAP distribution

USAGE: cut = BackgroundCut(df_null, channel, background_upper_cut)

Parameters
  • df_null – null samples in pandas frame

  • channel – a list of channels

  • background_upper_cut – confidence level of the uppercut of null samples of witness channel(s), e.g., 1sigma = 0.68268, 2sigma = 0.95449, 3sigma = 0.997300204, 4sigma = 0.99993666, and 5simga = 99.9999426

Returns

cut: an uuper cut of the null sample of those channels

origli.utilities.veto_utilities.CreateAllChannels_rho(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma, LowerCutOffFreq, UpperCutOffFreq)[source]
description:
  1. use a single glitch time

  2. query timeseries of all the channels around a glitch

  3. condition (whitening and compare the on- and off-source window)

  4. quantify all the channels (compute values of importance of all the channels)

USAGE: List_Count, re_sfchs, gpstime, duration, SNR, confi, ID = CreateAllChannels_rho(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma, LowerCutOffFreq, UpperCutOffFreq)

Parameters

Listsegment – a list of segment parameters

:param IFO :param channels: a list of safe channels :param number_process: a number of processes in parallel :param PlusHOFT: whether to get data of hoft, {‘True’ or ‘False’} :param sigma: an integer to be used for calculating values of importance :param LowerCutOffFreq: a lower cutoff frequency :param UpperCutOffFreq: an upper cutoff frequency

origli.utilities.veto_utilities.CreateRho(full_timeseries, target_timeseries_start, target_timeseries_end, pre_background_start, pre_background_end, fol_background_start, fol_background_end, sigma, LowerCutOffFreq, UpperCutOffFreq)[source]
discription:
  1. calculate the whitened FFT of the on- and off-source window for a single channel

  2. compute the value of importance for a single channel

Parameters
  • full_timeseries – the full time series in gwpy object including on- and off source windows

  • target_timeseries_start – the start time of the on-source window

  • target_timeseries_end – the end time of the on-source window

  • pre_background_start – the start time of the preceding off-source window

  • pre_background_end – the end time of the preceding off-source window

  • fol_background_start – the start time of the following off-source window

  • fol_background_end – the end time of the following off-source window

  • sigma – an interger to calculate the value of importance

  • LowerCutOffFreq – a lower cutoff frequency

  • UpperCutOffFreq – an upper cutoff frequency

Returns

Count: the importance: a fraction of frequency bins in a frequency range above an upper bound of the off-source window for a single channel

origli.utilities.veto_utilities.FlagFinder(Epoch_lt, Listsegments, IFO, channels, list_statistics, num_high_rank_channels_to_be_used, df_null, background_upper_cut, number_process, sigma, PlusHOFT, LowerCutOffFreq, UpperCutOffFreq, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:
  1. select high ranking witness channels

  2. determine the upper cut of the null samples for those high ranking witness channels

  3. analyze all the glitches using the selected witness channels

  4. make a flag when those channels give importance above the upper cut of the null (flags are made if ALL the chosen witness have value of importance above the upper cut of the null samples)

  5. calculate efficiency and deadtime

USAGE: efficiency, deadtime_frac, df = FlagFinder(Epoch_lt, Listsegments, IFO, channels, list_statistics, num_high_rank_channels_to_be_used, df_null, background_upper_cut, number_process, sigma, PlusHOFT, LowerCutOffFreq, UpperCutOffFreq)

Parameters
  • Epoch_lt – a list of an epoch

  • Listsegments – a list of glitches

  • IFO – ifo

  • channels – a list of channels, which are expected to be witness channels

  • list_statistics – a list of ranking statistics, either witness ratio statistics or t-value

  • num_high_rank_channels_to_be_used – number of high ranking channels to be used for making flag

  • df_null – null samples in pandas frame

  • background_upper_cut – confidence level of the uppercut of null samples of witness channel(s), e.g., 1sigma = 0.68268, 2sigma = 0.95449, 3sigma = 0.997300204, 4sigma = 0.99993666, and 5simga = 99.9999426

  • number_process – number of processors

  • sigma – an integer to determine the upper bound of the off-source window

  • PlusHOFT – boolen, whether analyze hoft

  • LowerCutOffFreq – a lower cutoff frequency

  • UpperCutOffFreq – an upper cutoff frequency

Returns

efficiency: a ratio of glitches that are flagged to total glitches analyzed without an issue of no data available deadtime_frac: a ratio of total on-source windows to the total analysis time df: a matrix of GPS, duration, SNR, confidence level of classification of glitches, flag and importance in pandas frame

origli.utilities.veto_utilities.FlagFinder_all_witnesses(Proportion_Duration_Bfr_Centr, Listsegments, IFO, channels, list_statistics, num_high_rank_channels_to_be_used, df_null, background_upper_cut, number_process, sigma, PlusHOFT, LowerCutOffFreq, UpperCutOffFreq, freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:
  1. select high ranking witness channels

  2. determine the upper cut of the null samples for those high ranking witness channels

  3. analyze all the glitches using the selected witness channels

  4. make flags when those channels give importance above the upper cut of the null (flags are make for individual channels)

USAGE: df_flag = FlagFinder(Proportion_Duration_Bfr_Centr, Listsegments, IFO, channels, list_statistics, num_high_rank_channels_to_be_used, df_null, background_upper_cut, number_process, sigma, PlusHOFT, LowerCutOffFreq, UpperCutOffFreq)

Parameters
  • Proportion_Duration_Bfr_Centr – a fraction of the on-source window before the peak GPS time

  • Listsegments – a list of glitches

  • IFO – ifo

  • channels – a list of channels, which are expected to be witness channels

  • list_statistics – a list of ranking statistics, either witness ratio statistics or t-value

  • num_high_rank_channels_to_be_used – number of high ranking channels to be used for making flag

  • df_null – null samples in pandas frame

  • background_upper_cut – confidence level of the uppercut of null samples of witness channel(s), e.g., 1sigma = 0.68268, 2sigma = 0.95449, 3sigma = 0.997300204, 4sigma = 0.99993666, and 5simga = 99.9999426

  • number_process – number of processors

  • sigma – an integer to determine the upper bound of the off-source window

  • PlusHOFT – boolen, whether analyze hoft

  • LowerCutOffFreq – a lower cutoff frequency

  • UpperCutOffFreq – an upper cutoff frequency

Returns

df_flag: a matrix of GPS, duration, SNR, confidence level of classification of glitches, importance of the witness channels, the flags of the witness channels in pandas frame

origli.utilities.veto_utilities.HierarchyChannelAboveThreshold_single_channel(whitened_fft_target, whitened_fft_PBG, whitened_fft_FBG, duration, sampling_rate, sigma, LowerCutOffFreq='None', UpperCutOffFreq='None')[source]
description:

calculate the importance: a fraction of frequency bins in a frequency range above an upper bound of the off-source window for a single channel

USAGE: Count = HierarchyChannelAboveThreshold_single_channel(whitened_fft_target, whitened_fft_PBG, whitened_fft_FBG, duration, sampling_rate, sigma, LowerCutOffFreq=’None’, UpperCutOffFreq=’None’)

Parameters
  • whitened_fft_target – whitened fft of the on-source window

  • whitened_fft_PBG – whitened fft of the preceding off-source window

  • whitened_fft_FBG – whitened fft of the following off-source window

  • duration – a duration of the on-source window

  • sampling_rate – sampling rate of a channel

  • sigma – an integer to determine the upper bound of the off-source window

  • LowerCutOffFreq – a lower cutoff frequency

  • UpperCutOffFreq – an upper cutoff frequency

Returns

Count: importance

origli.utilities.veto_utilities.WitnessFinder(Listsegments, IFO, re_sfchs_init, sigma, number_process, first_chunk, tolerance, confidence_level, df_null, shuffle='True', PlusHOFT='False', LowerCutOffFreq='None', UpperCutOffFreq='None', freq_bands=[[1, 50], [1, 128], [128, 256], [256, 512], [512, 1024], [1024, 2048], [2048, 4096], [4096, 8192], ['None', 'None']])[source]
description:
  1. use a list of glitches

  2. analyze glitches of the number of ‘first chunk’ with all the channels of ‘re_sfchs_init’

  3. perform one-sided binomial test and Welch one-sided t-test

  4. reject the channel that do NOT pass the both tests, i.e., can not reject a hypothesis that a channel is consistent with null samples

  5. calculate the error ratio of the t-value of the top ranking channel to the previous t-value

  6. analyze a next glitch using the channels that pass the both tests

  7. add the values of importance to the passed analyzed samples

  8. repeat (3)-(7)

  9. terminate the process when the error ratio reaches the tolerance

USAGE: re_sfchs, MatCount, list_Causal_passed_final, list_t_values_passed_final = WitnessFinder(Listsegments, IFO, re_sfchs_init, sigma, number_process, first_chunk, tolerance, confidence_level, df_null, shuffle=True, PlusHOFT=’False’, LowerCutOffFreq=’None’, UpperCutOffFreq=’None’)

param Listsegments

a list of glitches

param IFO

ifo

param re_sfchs_init

all thes safe channels to be used at the beginning

param sigma

an integer to determine the upper bound of the off-source window

param number_process

number processes of a machine

param first_chunk

the number of samples to be used for the first chunk, where all the channels are to be used

param tolerate

tolerance number to stop the USAGE: List_Count, re_sfchs, gpstime, duration, SNR, confi, ID = CreateAllChannels_rho(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma, LowerCutOffFreq, UpperCutOffFreq)

analysis
param confidence_level

confidence level for one-sided binomial test and Welch one-sided t-test

param df_null

null samples in pandas frame, which is expected to already created

param shuffle

boolen, whether shuffle the list of glitches

param PlusHOFT

boolen, whether analysis hoft

param LowerCutOffFreq

a lower cutoff frequency

param UpperCutOffFreq

an upper cutoff frequency

return

re_sfchs: a list of channels that will have passed the tests untill the tolerance MatCount: a matrix of importance of the channels that passed the test list_Causal_passed_final: a list of witness ratio statistics of the channels that passed the tests list_t_values_passed_final: a list of t-values of the channels that passed the tests

origli.utilities.veto_utilities.make_veto_omicron_in_aux(epoch_start, epoch_end, IFO, channel, df_foreground, glitch_type, Proportion_Duration_Bfr_Centr, SNR_thresh, df_flag, OutputHDF5_dir, ifostate, N_processes)[source]
description:
  1. query omicron triggers (aux) of a witness channel

  2. find the aux omicron triggers which are coincident with the glitches that are analyzed

  3. find the aux SNR cut which corresponds to the importance cut of this witness channel

  4. find the aux omicron triggers which are coincident with all the glitches with label being studied

  5. veto glitches when the coincident aux triggers have SNR above the aux SNR cut (given in the step 3)

USAGE: rho_cut, snr_cut, deadtime, efficiency, efficiency_over_deadtime, df_target = make_veto_omicron_in_aux(epoch_start, epoch_end, IFO, channel, df_foreground, glitch_type, Proportion_Duration_Bfr_Centr, SNR_thresh, df_flag, OutputHDF5_dir, ifostate, N_processes)

Parameters
  • epoch_start – start time of the analysis period

  • epoch_end – end time of the analysis period

  • IFO – ifo {‘H1’ or ‘L1’}

  • channel – a witness channel name, it could be a channel in a particular frequency band

  • df_foreground – a pandas data frame of all the glitches in the strain channel fed into pychChoo

  • glitch_type – a glitch type that is focused on

  • Proportion_Duration_Bfr_Centr – a fraction the on-source window before the peak GPS time of a glitch

  • SNR_thresh – a lower SNR threshold to select glitches that are studied

  • df_flag – a pandas data frame of flagged (including ‘Y’ and ‘N’) of the witness channels for the glitches that have been analyzed with FlagFinder_all_witnesses()

  • OutputHDF5_dir – a output directory where the omicron trigger of a witness channel

  • ifostate – state of an ifo

  • N_processes – number of cores

Returns

rho_cut: lower cut of importance of a witness channel snr_cut: corresponding SNR cut of this witness channel deadtime: a fraction of the time that are vetoed during the analysis time efficiency: a fraction of glitches that are vetoed efficiency_over_deadtime: ratio of efficiency over deadtime df_target: a pandas data frame of this witness, where the glitches that are vetoed are marked as ‘Y’

origli.utilities.condor_utilities

Script name: condor_utilities.py

Description:

File containing utilities

class origli.utilities.condor_utilities.CondorUtils[source]

Bases: object

create_condor_dag_file(path_sub, work_dir, num_glitches_to_be_analyzed)[source]
create_condor_submission_file(work_dir, abs_path_executable, abs_path_config, obs_run='o3')[source]

origli.utilities.burn_in_utilities

origli.utilities.burn_in_utilities.BG_upper_threshold_single_channel_given_freqband(list_dummy_duration, list_whitened_fft, sampling_rate, list_num_trial_used, sigma, LowerCutOffFreq='None', UpperCutOffFreq='None')[source]
description:

For a single channel per glitch, this function calculates a list of the background upper threshold across dummy on-source windows

USAGE: list_bg_upper_threshold = BG_upper_threshold_single_channel_given_freqband(list_dummy_duration, list_whitened_fft, sampling_rate, list_num_trial_used, sigma, LowerCutOffFreq=’None’, UpperCutOffFreq=’None’)

Parameters
  • list_dummy_duration – a list of dummy on-source window

  • list_whitened_fft – a list of the normalized spectrums where each element is a spectrum for a given dummy on-source window

  • sampling_rate – sampling rate of a channel

  • list_num_trial_used – a list of trials of dummy on-source window within the total on-source window

  • sigma – an integer to determine the upper bound of the off-source window

  • LowerCutOffFreq – a lower cutoff frequency

  • UpperCutOffFreq – an upper cutoff frequency

Returns

Count: importance

origli.utilities.burn_in_utilities.BG_upper_threshold_single_channel_multiband(list_dummy_duration, list_whitened_fft, sampling_rate, list_num_trial_used, sigma)[source]
description:

calculate values of the background upper threshold per dummy on-source window and per frequency band

USAGE: MatBGUpperThresh = BG_upper_threshold_single_channel_multiband(list_dummy_duration, list_whitened_fft, sampling_rate, list_num_trial_used, sigma)

Parameters
  • list_dummy_duration – a list of dummy on-source window

  • list_whitened_fft – a list of the normalized spectrums where each element is a spectrum for a given dummy on-source window

  • sampling_rate – sampling rate of a channel

  • list_num_trial_used – a list of trials of dummy on-source window within the total on-source window

  • sigma – an integer to determine the upper bound of the off-source window

Returns

MatBGUpperThresh: values of the background upper threshold, numpy array, where the frequency bands are rows from the top to bottom, the dummy on-source windows are in columns from left to right

origli.utilities.burn_in_utilities.CreateAllChannels_BGUpperThresh_multband(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma, duration_max, trial_duration_sample)[source]
description:
  1. use a single glitch time

  2. query timeseries of all the channels around a glitch

  3. calculate values of the background upper threshold per dummy on-source window per frequency band

  4. iterate through channels

USAGE: IndexSatisfied, list_mat_BG_upper_thresh, array_dummy_duration, list_sample_rates, re_sfchs, gpstime, duration, SNR, confi, ID = CreateAllChannels_BGUpperThresh_multband(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, sigma, duration_max = 15, trial_duration_sample = 20)

Parameters

Listsegment – a list of segment parameters

:param IFO :param re_sfchs: a list of safe channels :param number_process: a number of processes in parallel :param PlusHOFT: whether to get data of hoft, {‘True’ or ‘False’} :param sigma: an integer to be used for calculating values of importance :param maximum value of dummy on-sourc window in sec :param a number of dummy on-source windows within the total on-source window :return:

IndexSatisfied: glitch index list_mat_BG_upper_thresh: a list of matrices per channel where element of a matrix is a value of the background upper threshold, numpy array, where the frequency bands are rows from the top to bottom, the dummy on-source windows are in columns from left to right array_dummy_duration: numpy array of the dummy on-source windows list_sample_rates: a list of sampling rates of channels, numpy array re_sfchs: list of channels without “IFO:” at the beginning gpstime: a gps time duration: a value of duration SNR: signal to noise ratio in the h(t) conf: a confidence level of classification of a glitch, provided by Gravity Spy. Otherwise None ID: a glitch ID, provided by Gravity Spy in usual

origli.utilities.burn_in_utilities.CreateAllChannels_rho_multband_from_bg_up_bd_prior(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, hdf5_obj_bg_up_thresh)[source]
description:
  1. use a single glitch time

  2. query timeseries of all the channels around a glitch

  3. calculate the normalized spectrum

  4. compute the value of importance for a single channel across frequency bands

USAGE: IndexSatisfied, Mat_Count_in_multibands, list_sample_rates, re_sfchs, gpstime, duration, SNR, confi, ID = CreateAllChannels_rho_multband_from_bg_up_bd_prior(Listsegment, IFO, re_sfchs, number_process, PlusHOFT, hdf5_obj_bg_up_thresh)

Parameters

Listsegment – a list of segment parameters

:param IFO :param channels: a list of safe channels :param number_process: a number of processes in parallel :param PlusHOFT: whether to get data of hoft, {‘True’ or ‘False’} :param hdf5_obj_bg_up_thresh: a HDF5 object that contains the polynomial parameters of the fit that represent the background upper theshold as a functin of on-source window length for all the channels and freq bands :return:

IndexSatisfied: glitch index Mat_Count_in_multibands: a matrix of rho where frequencies in rows and channels in columns, numpy array list_sample_rates: a list of sampling rates of channels, numpy array re_sfchs: list of channels without “IFO:” at the beginning gpstime: a gps time duration: a value of duration SNR: signal to noise ratio in the h(t) conf: a confidence level of classification of a glitch, provided by Gravity Spy. Otherwise None ID: a glitch ID, provided by Gravity Spy in usual

origli.utilities.burn_in_utilities.CreateBGUpperThresh_single_channel_multiband(full_timeseries, target_timeseries_start, target_timeseries_end, array_dummy_duration, sigma)[source]
discription:
  1. make list_whitened_fft: list of numpy arrays of the nomalized spectrum where each element of this list is the normalized spectrum for each trial with a given dummy on-source window.

    These spectrums are concatenated to a vector from left to right, e.g, np.array([sp0_try0, sp1_try0, …, sp0_try1, sp0_try1, …. ]) Hence, this list is [ (sp for dummy 0), (sp for dummy 1), (….), ….]

    sample_rate: sampling rate of this channel DURATION: a duration of a target segment list_num_trial_used: a list of trials per dummy on-source window. Note the number of trials per dummy on-source vary becuase of the limited length of the extended total on-source window.

    The longer the dummy on-source, the fewer the trials are available

  2. calculate values of the background upper threshold per dummy on-source window and per frequency band

USAGE: MatBGUpperThresh, sample_rate = CreateBGUpperThresh_single_channel_multiband(full_timeseries, target_timeseries_start, target_timeseries_end, array_dummy_duration, sigma)

Parameters
  • full_timeseries – the full time series in gwpy object including on- and off source windows

  • target_timeseries_start – the start time of the on-source window

  • target_timeseries_end – the end time of the on-source window

  • array_dummy_duration – numpy array of dummy on-source windows

  • sigma – an interger to calculate the value of importance

Returns

MatBGUpperThresh: values of the background upper threshold, numpy array, where the frequency bands are rows from the top to bottom, the dummy on-source windows are in columns from left to right sample_rate: a sampling rate of a single channel

origli.utilities.burn_in_utilities.CreateRho_single_channel_multiband_from_bg_up_bd_prior(full_timeseries, target_timeseries_start, target_timeseries_end, list_poly_para)[source]
discription:
  1. calculate the normalized spectrum

  2. compute the value of importance for a single channel across frequency bands

USAGE: Counts_in_multibands, sample_rate = CreateRho_single_channel_multiband_from_bg_up_bd_prior(full_timeseries, target_timeseries_start, target_timeseries_end, list_poly_para)

Parameters
  • full_timeseries – the full time series in gwpy object including on- and off source windows

  • target_timeseries_start – the start time of the on-source window

  • target_timeseries_end – the end time of the on-source window

  • list_poly_para – a list of polynomial fit of the background upper threshold per freq band

Returns

Counts_in_multibands: values of importance in different frequency bands where importance is a fraction of frequency bins in a frequency range above an upper bound of the off-source window for a single channel sample_rate: a sampling rate of a single channel

origli.utilities.burn_in_utilities.FindBGlis_extendBG(state, number_trials, step, outputMother_dir, df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf)[source]
description: get the null samples where their durations are drawn from the target set. The only subset of the null samples are chosen if their on-source windows do not coincide with any other glitches.
  1. gltich file (.csv file)

  2. the target samples

  3. create random time stamps with their durations drawn from the target set

  4. accept the random time stamps where their on-source windows do not coincide with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = FindBGlis_extendBG(state, number_trials, step, outputMother_dir, df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, TriggerPeakFreqLowerCutoff, TriggerPeakFreqUpperCutoff, targetUpperSNR_thre)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • TriggerPeakFreqUpperCutoff – the lower SNR threshold for selecting the target set

  • targetUpperSNR_thre – the upper SNR threshold for selecting the target set

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.burn_in_utilities.FindBGlistBurnIn(state, duration_max, number_trials, step, outputMother_dir, df, Epoch_lt, IFO, BGSNR_thre, UserDefinedDuration, gap)[source]

description: From the random time stamps created with FindRadomlistPointsForBurnIn(), select the subset in which their on-source windows do not overlap with any other glitches Note that if on-source window can be extended, it will do

1. load glitch data set (.csv file) 4. accept time stamps where their on-source windows do not coincide with any other glitches 5. return the info of the accepted glitches

USAGE: Listsegments = FindBGlistBurnIn(state, duration_max, number_trials, step, outputMother_dir, df, Epoch_lt, IFO, BGSNR_thre, UserDefinedDuration, gap)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.burn_in_utilities.FindRadomlistPointsForBurnIn(state, IFO, Epoch_lt, number_samples, step, duration_max, outputMother_dir)[source]
description: create random time stamps with their durations are uniformly distributed in log10 of 0.02 sec to duration_max
  1. within an epoch, create a list of synthetic points with randomly chosen with durations

  2. make pandas frame dataset

USAGE: df = FindRadomlistPointsForBurnIn(state, IFO, Epoch_lt, number_samples, step, duration_max, outputMother_dir)

Parameters
  • state – IFO state {observing, nominal-lock}

  • IFO – an observer {H1, L1}

  • Epoch_lt – a list of epochs

  • number_samples – number of samples picked up

  • step – step of data points in sec

  • duration_max – a maximum value of the duration in sec

  • outputMother_dir – an output directory in witch the data set is in

Returns

df: synthetic random data points within an epoch

origli.utilities.burn_in_utilities.FindglitchlistextendBG(df, Epoch_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=inf)[source]
description:
  1. glitch data set (.csv file)

  2. get the data about the target glitch class

  3. get the subset of the target glitches based on SNR and confidence level threshold a user defines

  4. the preceeding and following BGs are 64 sec-long for every samples regardless the overlapping with any other glitches

  5. return the info of the accepted glitches

USAGE: Listsegments = FindglitchlistextendBG(df, Epochstart, Epochend, Commissioning_lt, TargetGlitchClass, IFO, BGSNR_thre, targetSNR_thre, Confidence_thre, UpperDurationThresh, LowerDurationThresh, UserDefinedDuration, gap, position_duration_bfr_centr, TriggerPeakFreqLowerCutoff=0, TriggerPeakFreqUpperCutoff=8192, targetUpperSNR_thre=np.inf)

Parameters
  • df – GravitySpy meta data in pandas format

  • Epochstart – starting time of an epoch

  • Epochend – end time of an epoch

  • Commissioning_lt – commissioning time in list

  • TargetGlitchClass – a target glitch class name (str)

  • IFO – a type of interferometer (H1, L1, V1) (str)

  • BGSNR_thre – an upper threhold of SNR for background glitches, i.e., quiet enough ), float or int

  • targetSNR_thre – a lower threshold of SNR for target glitches, float or int

  • Confidence_thre – a threshold of confidence level (float or int )

  • UpperDurationThresh – an upper bound of duration in sec (float or int)

  • LowerDurationThresh – a lower bound of duration in sec (float or int)

  • UserDefinedDuration – user defined duration of a glitch (float or int), 0 in default

  • gap – a time gap between the target and the background segments in sec, 1 sec in default

  • position_duration_bfr_centr – proportion of duration for a target segment around a center time, e.g, 0.5 indicates duration is even distributed around a center time, 0.83 indicates 5/6 is before the center time

  • TriggerPeakFreqLowerCutoff – a lower limit cutoff value of the peak frequency of triggers given by an ETG for target glitches

  • TriggerPeakFreqUpperCutoff – an upper limit cutoff value of the peak frequency of triggers given by an ETG queries for target glitches

  • targetUpperSNR_thre – an upper limit cutoff value of SNR of triggers given by an ETG queries for target glitches

  • flag – ‘Both’ or ‘Either’ taking both backgronds or either the preceding or the following background, respectively to accept glitches

Returns

the list of parameters of glitches passing the above thresholds Listsegments contains of

ListIndexSatisfied: a list of index of glitches Listtarget_timeseries_start: a list of a target glitch starting time Listtarget_timeseries_end: a list of a target glitch ending time Listpre_background_start: a list of preceding background starting time Listpre_background_end; a list of preceding background ending time Listfol_background_start: a list of following background starting time Listfol_background_end: a list of following background ending time Listgpstime: a list of GPS times Listduration: a list of durations ListSNR: a list of SNRs Listconfi: a list of confidence levels ListID: a list of IDs

origli.utilities.burn_in_utilities.HierarchyChannelAboveThreshold_single_channel_multiband_from_bg_up_bd_prior(whitened_fft_target, sampling_rate, duration, list_poly_para)[source]
description:

calculate the importance for a signle channel for frequency bands

USAGE: Counts_in_multibands = HierarchyChannelAboveThreshold_single_channel_multiband_from_bg_up_bd_prior(whitened_fft_target, sampling_rate, duration, list_poly_para)

Parameters
  • whitened_fft_target – whitened fft of the on-source window

  • duration – a duration of the on-source window

  • sampling_rate – sampling rate of a channel

  • list_poly_para – a list of polynomial fit of the background upper threshold per freq band

Returns

Counts_in_multibands: values of importance in different frequency bands, numpy array

origli.utilities.burn_in_utilities.Multiprocess_whitening_for_burn_in(full_timeseries, target_timeseries_start, target_timeseries_end, array_dummy_duration)[source]
description:

This is used for normalizing the spectrum of each trial in the on-source window of random time stamps created with FindBGlistBurnIn() iterate each trial for each dummy on 1. from the time series of a single channel 2. calculate how many trials are available per dummy on-source window within the extended on-source window 3. iterate trials per dummy on-source window

USAGE: list_whitened_fft, sample_rate, DURATION, list_num_trial_used = Multiprocess_whitening_for_burn_in(full_timeseries, target_timeseries_start, target_timeseries_end, array_dummy_duration)

Parameters
  • full_timeseries – time series comprising target and BGs

  • target_timeseries_start – a start time of a target segment

  • target_timeseries_end – an end time of a target segment

  • array_dummy_duration – numpy array of dummy on-source windows

Returns

list_whitened_fft: list of numpy arrays of the nomalized spectrum where each element of this list is the normalized spectrum for each trial with a given dummy on-source window.

These spectrums are concatenated to a vector from left to right, e.g, np.array([sp0_try0, sp1_try0, …, sp0_try1, sp0_try1, …. ]) Hence, this list is [ (sp for dummy 0), (sp for dummy 1), (….), ….]

sample_rate: sampling rate of this channel DURATION: a duration of a target segment list_num_trial_used: a list of trials per dummy on-source window. Note the number of trials per dummy on-source vary becuase of the limited length of the extended total on-source window.

The longer the dummy on-source, the fewer the trials are available

origli.utilities.burn_in_utilities.Multiprocess_whitening_for_target(full_timeseries, target_timeseries_start, target_timeseries_end)[source]
description:

This is used for multi processing for whitening segments

USAGE: whitened_fft_target, sample_rate, DURATION = Multiprocess_whitening_for_target(full_timeseries, target_timeseries_start, target_timeseries_end)

Parameters
  • full_timeseries – time series comprising target and BGs

  • target_timeseries_start – a start time of a target segment

  • target_timeseries_end – an end time of a target segment

Returns

whitened_fft_target: whitened fft of a target segment sample_rate: sampling rate of this channel DURATION: a duration of a target segment

origli.utilities.burn_in_utilities.SaveDummyTargetHDF5_OFFLINE_burn_in(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT='False', trial_duration_sample=20)[source]
description:

Save the normalized spectrum for every channels for every glitches, with the use of Multiprocess_whitening_for_burn_in() 1. iterate all the samples 2. whiten the on-source window for every channels 4. save the whitened spectrum as a HDF5 file

each group in the HDF5 is for a single channel and each of datasets per group has the normalized spectrum for a dummy on-source window

USAGE: SaveTargetAndBackGroundHDF5_OFFLINE_burn_in(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT=’False’)

Parameters
  • Listsegments – a list of segment parameters

  • re_sfchs – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

  • trial_duration_sample – a number of dummy on-source

:return None

origli.utilities.burn_in_utilities.SaveOnlyTargetHDF5_OFFLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT='False')[source]
description:

THIS IS USED FOR “OFFLINE” MODE 0. assuming Listsegments is given by Findglitchlist() 1.take the information of the list of allowed target and a preceding and following segments 2. whiten a target segmement based on the average background segment 3. find whitened FFT 4. save the whitened target and backgrounds FFTs Note this depends on

USAGE: SaveOnlyTargetHDF5_OFFLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, PlusHOFT=’False’)

Parameters
  • Listsegments – a list of segment parameters

  • re_sfchs – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

origli.utilities.burn_in_utilities.SaveOnlyTargetHDF5_multiband_OFFLINE_from_bg_up_bd_prior(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, path_hdf5_bg_up_thresh, PlusHOFT='False')[source]
description:

THIS IS USED FOR “OFFLINE” MODE 1. take the information of the list of allowed target and a preceding and following segments 2. query time series and get normalized spectrum and calculate importance 4. save the whitened target and backgrounds FFTs Note this depends on

USAGE: SaveOnlyTargetHDF5_multiband_OFFLINE_from_bg_up_bd_prior(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, path_hdf5_bg_up_thresh, PlusHOFT=’False’)

Parameters
  • Listsegments – a list of segment parameters

  • re_sfchs – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • path_hdf5_bg_up_thresh – path to a HDF5 file that contains the polynomial fit of the background upper threshold

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

:return None

origli.utilities.burn_in_utilities.SaveUppperThreshodBG_multiband_OFFLINE_burn_in(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, sigma, PlusHOFT='False', duration_max=15, trial_duration_sample=20)[source]
description:

THIS IS USED FOR “OFFLINE” MODE 1. iterate glitch samples 2. get values of the background upper threshold per dummy on-source window per frequency band for each of channels 4. save the values Note this depends on

USAGE: SaveUppperThreshodBG_multiband_OFFLINE(Listsegments, re_sfchs, IFO, outputpath, outputfilename, number_process, sigma, PlusHOFT=’False’, duration_max = 15, trial_duration_sample = 20)

Parameters
  • Listsegments – a list of segment parameters

  • re_sfchs – a list of safe channels

  • outputpath – a directory of an output file

  • outputfilename – a name of an output file

  • number_process – a number of processes in parallel

  • PlusHOFT – whether to get data of hoft, {‘True’ or ‘False’}, ‘False’ in default

  • sigma – an integer to determine the upper bound of the off-source window

  • duration_max – a maximum value of length of dummy on-source window in sec

  • trial_duration_sample – a number of dummy on-source window within the total on-source window

:return None

origli.utilities.burn_in_utilities.cal_importance_single_channel_singl_freqband_from_bg_up_bd_prior(whitened_fft_target, sampling_rate, DURATION, poly_para, LowerCutOffFreq, UpperCutOffFreq)[source]
description:

calculate the importance for a single channel in a given frequency band

USAGE: Count = cal_importance_single_channel_singl_freqband_from_bg_up_bd_prior(whitened_fft_target, sampling_rate, DURATION, poly_para, LowerCutOffFreq, UpperCutOffFreq)

Parameters
  • whitened_fft_target – on-source window normalized spectrum

  • sampling_rate – sample rate

  • DURATION – duration of on-source window

  • poly_para – polynomial parameters of the fit of the background upper threshold as a function of on-source window length

  • LowerCutOffFreq

  • UpperCutOffFreq

Returns

origli.utilities.burn_in_utilities.clean_duration_for_asd(duration)[source]
description:

This function is used for clean the dicimal points in value of duration to avoid the error shown in ASD estimator in gwpy

Parameters

duration – duration in sec

Returns

duration: cleaned duration in sec

origli.utilities.burn_in_utilities.make_bg_up_thesh_interpolate_hdf5(input_dir, input_hdf5_file)[source]
description:

make a dictionry of values of the background upper threshold across dummy on-source window and frequency bands per channel

USAGE: list_all_dummy_duration_sorted, mat_ch_sorted_dict = make_bg_up_thesh_interpolate_hdf5(input_dir, input_hdf5_file)

Parameters
  • input_dir – input directory

  • input_hdf5_file – name of a HDF5 file

Returns

list_all_dummy_duration_sorted: ascending list of dummy on-source windows mat_ch_sorted_dict: a dictionary in which each of the key contains array of values the he background upper threshold per dummy on-source window per frequency band where frequency bands are in row from top to bottom and dummy on-source windows are in columns from left to right

origli.utilities.burn_in_utilities.save_interpolate_bg_upper_thres_hdf5(list_all_dummy_duration_sorted, mat_ch_sorted_dict, output_dir, output_hdf5_file, med_abs_sigma=6, poly_degree=10)[source]
description:
  1. fit the polynomial function against the background upper threshold as a function of dummy on-source window per freq band per channel

  2. save the polynomial parametes to a HDF5 file

USAGE: save_interpolate_bg_upper_thres_hdf5(list_all_dummy_duration_sorted, mat_ch_sorted_dict, output_dir, output_hdf5_file, med_abs_sigma=6, poly_degree=10)

Parameters
  • list_all_dummy_duration_sorted – ascending list of dummy on-source windows

  • mat_ch_sorted_dict – a dictionary in which each of the key contains array of values the he background upper threshold per dummy on-source window per frequency band where frequency bands are in row from top to bottom and dummy on-source windows are in columns from left to right

  • output_dir – output directory

  • output_hdf5_file – otuput file name

  • med_abs_sigma – a interger number of median absolute error to remove outliers for fitting

  • poly_degree – polynomial degree

Returns

None