1 Introduction

This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information.

This document shows how to use the BED (Biological Entity Dictionary) R package to get and explore mapping between identifiers of biological entities (BE). This package provides a way to connect to a BED Neo4j database in which the relationships between the identifiers from different sources are recorded.

1.2 Installation

1.2.1 Dependencies

This BED package depends on the following packages available in the CRAN repository:

  • neo2R
  • visNetwork
  • dplyr
  • readr
  • stringr
  • utils
  • shiny
  • DT
  • miniUI
  • rstudioapi

All these packages must be installed before installing BED.

1.2.2 Installation from github

devtools::install_github("patzaw/BED")

1.2.3 Possible issue when updating from releases <= 1.3.0

If you get an error like the following…

Error: package or namespace load failed for ‘BED’:
 .onLoad failed in loadNamespace() for 'BED', details:
  call: connections[[connection]][["cache"]]
  error: subscript out of bounds

… remove the BED folder located here:

file.exists(file.path(Sys.getenv("HOME"), "R", "BED"))

1.3 Connection

Before using BED, the connection needs to be established with the underlying Neo4j DB. url, username and password should be adapted.

library(BED)
connectToBed(url="localhost:5454", remember=FALSE, useCache=FALSE)

The remember parameter can be set to TRUE in order to save connection information that will be automatically used the next time the connectToBed() function is called or the next time the BED library is loaded. By default, this parameter is set to FALSE to comply with CRAN policies. Saved connection can be managed with the lsBedConnections() and the forgetBedConnection() functions.

The useCache parameter is by default set to FALSE to comply with CRAN policies. However, it is recommended to set it to TRUE to improve the speed of recurrent queries: the results of some large queries are saved locally in a file.

The connection can be checked the following way.

checkBedConn(verbose=TRUE)
## http://bel040344:5454
## BED
## UCB-Human-Internal
## 2022.04.25
## Cache ON
## [1] TRUE
## attr(,"dbVersion")
##   name           instance    version
## 1  BED UCB-Human-Internal 2022.04.25

If the verbose parameter is set to TRUE, the URL and the content version are displayed as messages.

lsBedConnections()
## [[1]]
## [[1]]$url
## [1] "bel040344:5454"
## 
## [[1]]$username
## [1] NA
## 
## [[1]]$password
## [1] NA
## 
## [[1]]$cache
## [1] TRUE
## 
## [[1]]$name
## [1] "BED"
## 
## [[1]]$instance
## [1] "UCB-Human-Internal"
## 
## [[1]]$version
## [1] "2021.12.16"
## 
## 
## [[2]]
## [[2]]$url
## [1] "localhost:5420"
## 
## [[2]]$username
## [1] NA
## 
## [[2]]$password
## [1] NA
## 
## [[2]]$cache
## [1] TRUE
## 
## [[2]]$name
## [1] "BED"
## 
## [[2]]$instance
## [1] "UCB-Human-Internal"
## 
## [[2]]$version
## [1] "2021.12.16"
## 
## 
## [[3]]
## [[3]]$url
## [1] "localhost:5410"
## 
## [[3]]$username
## [1] NA
## 
## [[3]]$password
## [1] NA
## 
## [[3]]$cache
## [1] TRUE
## 
## [[3]]$name
## [1] "BED"
## 
## [[3]]$instance
## [1] "UCB-Human"
## 
## [[3]]$version
## [1] "2021.12.16"
## 
## 
## [[4]]
## [[4]]$url
## [1] "localhost:5454"
## 
## [[4]]$username
## [1] NA
## 
## [[4]]$password
## [1] NA
## 
## [[4]]$cache
## [1] TRUE
## 
## [[4]]$name
## [1] "BED"
## 
## [[4]]$instance
## [1] "UCB-Human"
## 
## [[4]]$version
## [1] "2020.05.03"

The connection param of the connectToBed function can be used to connect to a saved connection other than the last one.

1.4 Data model

The BED underlying data model can be shown at any time using the following command.

showBedDataModel()
BED