Python-based glitch characterization tool (PyChChoo)

  • Excess transient noise events, or glitches, impact the data quality of ground-based gravitational-wave detectors and impair the detection of signals produced by astrophysical sources. Identification of the causes of these glitches is a crucial starting point for the improvement of gravitational-wave signals detectability. However, glitches are the product of linear and non-linear couplings among the interrelated detector-control systems that include mitigation of ground motion and regulation of optic motion, which generally makes it difficult to find their origin. PyChChoo uses information from instrumental control systems and environmental sensors around the time when glitches are present in the detector’s output to reveal essential clues about their origin.

  • PyChChoo performs different tasks such as quantifying auxiliary channels at times of glitches using glitches identified with external pipelines (e.g., Gravity Spy, omicron trigger, pycbc-live trigger, or your own list of glitches). Also, it statistically finds witness channels for a class of glitches based on the one-sided binomial test and the one-sided Welch’s test. Besides, it gets a list of auxiliary channels that are responsible for each glitch using chi-square test. Incorporating statistical tests and a machine learning, it clusters glitches to find sub-classes of glitches only on the basis of auxiliary channels without morphological approach.

Documentation

How to set up a virtual environment in a LIGO cluster

  • configure conda

source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh
  • create a new conda environment

conda create --name your_env_name python=3.7
  • activate your environment

conda activate your_env_name
  • install nds-2

conda install python-nds2-client
  • install required python modules

pip install gwpy pandas sklearn gwtrigfind
  • install ipython

conda install ipython
  • install a package to make gwpy available to read frames

conda install -c conda-forge python-ldas-tools-framecpp
  • install ligo-lw

conda install python-ligo-lw
  • install numpyencoder to solve an issue with json

pip install numpyencoder

How to execute

  • PyChChoo operates different tasks. Those tasks are fully tuned with a configuration file by running

python OriginFinder.py -c /path/to/config.ini
  • A configuration file can be overridden like the following

python OriginFinder.py -c /path/to/config.ini --override section0:option0:value0 section1:option1

where section0 is a section, option0 is its option, and value0 is an input that you choose. If an option is boolean, please do as section1:option1

Structure of a configuration file

  • * mark indicates required

  • Online-mode means that a list of GPS times and durations (SNR, confidence level, etc) are given in .csv in priori, e.g., Gravity Spy’ file

  • Offline-mode means that a list of GPS times, durations, etc are generated with an external trigger generator (e.g., omicron and pycbc-live)

  • a candidate search: a single specific trigger

  • The order of options under a section in a configuration file does not influence at all

[PathOutputHDF5]
; a path to a directory of a result file in HDF5 file format, which contains conditioned frequency series of the safe channels at times of glitches that are present in the detector's output. In the case of choosing LowerCutOffFreq = UpperCutOffFreq = Multiband, the output HDF5 file contains a matrix of values of importance, which is a number of frequency bins above an upper threshold of the off-source window in a given frequency bin. The matrix of importance has frequency bands in row from the top to bottom and channels in columns.
dir = /path/to/directory/of/HDF5/ *
; a name of a result file, if it is None, a suitable name will be chosen automatically, normally {GlitchClass}{IFO}.h5, e.g., WhistleL1.h5. Please create a name with the same format if you set a name manually
file = /name/of/HDF5 *

[PathSafeChannel]
; a path to a directory of a list of safe channels in .txt format
dir = /path/to/directory/of/safechannel *
; a name of a file which have a list of safe channels
file = /name/of/file/of/safechannel *

[PathUnusedSafeChannel]
; a path to a directory of a file of a list of unused safe channels
dir = /path/to/directory/of/unused_safechannels \*
; a name of a file of a list of unused safe channels
file = /name/of/file/of/unused_safechannels \*

[PlotsOutPutMother]
;a path to a directory where all result plots are stored. A directory under ~/public_html is recommended in a LIGO cluster
dir = /path/to/directory/of/plots \*

[Epoch]
; a matrix of start and end times of segments to be analyzed
Epoch_lt = [[start_GSP0, end_GPS0], [start_GPS1, end_GPS1], ....]

[Glitch-Selection`
; a detector to be analyzed, {L1, H1}
IFO = ifo *
; a class of glitches in Gravity Spy in addition to "arbitrary" is allowed in OFF-LINE mode. "unknown" or "arbitrary" are allowed for ON-LINE mode
glitch_type = a class of glitches
; a value of SNR of triggers above which triggers are analyzed
SNR_thresh = value \*
; a value of SNR of triggers under which triggers are analyzed, infinity in default
UpperSNR_thresh = value
; a value of SNR of triggers under which triggers are ignored as quiet enough*
SNR_BG_thresh = value
; a value of an upper duration threshold in sec above which the triggers are not used for the analysis, in default infinite
UpperDurationThresh = value
; None or a value of a lower duration threshold in sec under which the triggers are not used for the analysis, in default 0.01
LowerDurationThresh = value
; a value of a lower cut-off frequency of the peak frequency of triggers above which triggers are analyzed, 0 in default. Only quoted for OFF-LINE mode and ON-LINE mode with omicron trigger
TriggerPeakFreqLowerCutoff = value
; a value of a lower cut-off frequency of the peak frequency of triggers under which triggers are analyzed, 8192 in default. Only quoted for OFF-LINE mode and ON-LINE mode with omicron trigger
TriggerPeakFreqUpperCutoff = value

[Analysis-parameters]
;a mode either {online, offline, statistics, WitnessFlag}
mode = Mode
; a duration in sec of each trigger is given by an external trigger generator in default (same as choosing value = 0). Choosing a value above 0 overwrites durations of a class of glitches of the interest
user_defined_duration = value
; a time window in sec between a on-time window and off-time windows, 1 sec in default
gap = value
; an integer number to set the upper bound of BG noise, 3 in default
sigma = value
; a fraction of a duration before a trigger time. 0.5 in default
ProportionDurationBfrCentr = value
; Flag. analyzes strian channel as well, otherwise not
PlUSHOFT =
; a value of a lower cut-off frequency of each channel to be analyzed, None in default, which is the same to setting None in .ini file. Setting it to Multiband queries does multi-frequency band search with the bands defined in const.py
LowerCutOffFreq = value
; a value of an upper cut-off frequency of each channel to be analyzed, None in default, which is the same to setting None in .ini file. Setting it to Multiband queries does multi-frequency band search with the bands defined in const.py
UpperCutOffFreq = value
; Flag. It skips the process of taking data and conditioning.
HDF5FileExist =
;a number of processes in parallel, used for conditioning and getting triggers in ON-LINE mode, 5 in default
Nprocesses = value

[Offline]
;a value of a confidence level of a classified glitches above which triggers are analyzed
confidence_thresh = value
; a path to a directory of a meta data which have a list of GPS times and durations (SNR, confidence, ... etc) of glitches in .csv file format. GravitySpy's .csv file works with no problem. Only quoted in OFF-LINE mode
MetaDataSetDir = /path/to/directory/of/metadata
; a name of a meta data file of glitches that are present in the detector's output, in .csv format, only quoted in OFF-LINE mode
MetaDataSetFile = /name/of/metadata

; Please make certain to run online mode in a cluster which comprises a trigger data
[Online]
; quoted only for ON-LINE mode, {observing, nominal-lock}
IFOstate = ifo_state
required only for ON-LINE mode. A external trigger generator used for online mode {omicron, pycbc-live}
TriggerPipeline = ETG

[Null_dataset]
;Flag. if it is written like above, it generates a null hypothesis data set to be used statistics.
NullHypothesisGenerate =
; a number of samples of a null hypothesis dataset to be generated before checking the overlaps with any other triggers, quoted only if NullHypothesisGenerate = True
NullHypothesisSampleSize = value
; a time separation in sec for GPS data points to be used for generating a null hypothesis dataset
NullHypothesisStep = value
; quoted only for ON-LINE mode, {observing, nominal-lock}
IFOstate = ifo_state


[Candidate-event]
a GPS time of a single trigger, None in default
candidate_GPS = GPStime


[Statistics]
; a path to a .csv file of glitch index VS channel for a target glitch class
PathTargetGlitchDataset = /path/to/directory/of/target_glitch_metadata
; a path to a .csv file of glitch index VS channel for a null dataset
PathNullDataset = /path/to/directory/of/null_metadata
;a path to an output directory where a plot and .csv file of the causal probability to be stored
OutputDirCausalPro = path/to/output_directory
; a name of a plot of the causal probability
OutputPlotFileCausalPro = Value.pdf
; a name of .csv file of the causal probability
OutputCsvFileCausalPro = Value.csv
; a confidence level of one tailed Binomial test, between 0 and 1, 0.95 in default
BinomialTestConfidence = Value

[WitnessFlag]
; offline mode analyzes a list of glitches generated in advance, online mode generates a list of glitches
WitnessFlagMode = {offline, online}
;a path to null samples
- ``PathNullDataset = /path/to/Null_Dataset
; whether shuffle a list of glitches before analyzing them and perform statistical tests. This parameter is used for finding witness channels
Shuffle = {True, False}
; the number of the first group of samples to be performed with statistical tests. This parameter is used for finding witness channels
FirstChunk = 3
; a tolerance level where the process of finding witness channels terminates when the ratio error of t-value of a current top ranking channel to a previous t-value reaches this tolerance value
Tolerance = 0.001
; a confidence level for one-sided binomial test and one-sided Welch t-test
TestConfidence = 0.95
; a confidence level for an upper cut of background distribution which is from the null samples
BackgroundUpperCut = 0.999936
; the number of high ranking channeles to be used for flagging
NumHighRankChannelsToBeUsed = 1

Run via condor

Extended background mode

  • Burn-in mode

First thing to do is to make a configuration file. The following part is needed for Burn-in mode. How to make a configuration file (example)

[BurnIn]
; Flag. Burn-in on or off
BurnIn
maximum duration in seconds
DurationMax = 35
; a number of trials per dummy on-source
trial_duration_sample = 30
; an integer to remove the outliers in the background upper threshold to fit
med_abs_sigma = 6
; polynomial degree to fit
poly_degree = 10


[Null-dataset]
; Flag to specify that random timestamps during absence of glitches
NullHypothesisGenerate =
; a number of a null hypothesis data set to be generated
NullHypothesisSampleSize = 300
; a separation of each data point of a null hypothesis data set, in sec
NullHypothesisStep = 1
; IFOstate is used for ONLIME search, which does not affect OFFLINE search {observing, nominal-lock}
IFOstate = observing

After creating the appropriate configuration file, you are ready to run. Here is the condor mode (recommended) …

python /path/to/OriginFinder.py -c /path/to/config.ini --condor

You will see Listsegments.json which contains the list of glitches, the condor submission file trigger_generation.sub, the condor dag file trigger_generation.dag, and the log directory logs.

Now you are ready to submit condor jobs by typing …

condor_submit_dag -maxjobs 500 trigger_generation.dag

The output files will be stored in the directory HDF5data_individual under the director specified in the configuration file like

[PathOutputHDF5]
dir = /path/to/HDF5data/

After the condor jobs finish, it is now ready to assemble those files by typing …

python /path/to/OriginFinder.py -c /path/to/config.ini --assemble-hdf5

Finally it is ready to obtain the background upper threshold. You need to uncomment one line in the configuration file to let PyChChoo knows the HDF5 file has been assembled

Then, to calculate the values, you can run the command by typing …

python /path/to/OriginFinder.py -c /path/to/config.ini

You will see the file called bg_up_thresh.h5 under the directory specified in the configuration file /path/to/HDF5data/.

  • Extended background mode

To use the extended background mode, it is needed to specify the following in the configuration file. Write the following lines in [Analysis-parameters]

[Analysis-parameters]
; Flag for the extended background mode
extendBG =
; a path to the file containing the stationarity upper thresholds
path_hdf5_bg_up_thresh = /path/to/...

Next, you can prepare condor jobs by typing …

python /path/to/OriginFinder.py -c /path/to/config.ini --condor

Then, you can submit condor jobs by typing

condor_submit_dag -maxjobs 500 trigger_generation.dag

After the condor jobs have been finished, you can assemble individual HDF5 files by typing

python /path/to/OriginFinder.py -c /path/to/config.ini --assemble-hdf5

Now, you can uncomment the setting HDF5fileExist = in the configuration file like …

[Analysis-parameters]
; HDF5FileExist =

Then run by typing …

python /path/to/OriginFinder.py -c /path/to/config.ini