Abstract or Keywords
Regulation of gene expression is dependent on the binding of specific proteins, including transcription factors, to genomic DNA. Identification of the binding locations of these proteins is therefore an important problem in biology. Recently, Cleavage Under Targets and Tagmentation (CUT&Tag) has been developed as a sensitive means to identify protein localization on genomic DNA. In addition to identifying true localizations (manifest as "peaks" of tag reads at specific genomic positions) CUT&Tag (and all localization techniques) can suffer from both false-positive and false-negative peak identification. In this study, we evaluated CUT&Tag data from the EWS/FLI transcription factor in Ewing sarcoma, a pediatric bone tumor. We used a neural network to build a model based on the features of each EWS/FLI peak and used that model to distinguish false peaks from true peaks. After training a densely-connected neural net model, the best-performing model was able to identify false peaks from true peaks with an F1 score of 0.82 on the training set and 0.67 on an additional test cell line. This study demonstrates that a neural network approach can be valuable in classifying genomic localization data in biologic systems.