
Security News
Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
bioconductor
Advanced tools
Bioconductor classes, generics and methods, implemented in Javascript.
This package aims to provide Javascript implementations of Bioconductor data structures for use in web applications. Much like the original R code, we focus on the use of common generics to provide composability, allowing users to construct complex objects that "just work". We also attempt to circumvent Javascript's pass-by-reference behavior to avoid unintended modifications to unrelated objects when calling setter methods from their nested child objects.
Here, we perform some generic operations on a DataFrame
object, equivalent to Bioconductor's S4Vectors::DFrame
class.
// Import using ES6 notation
import * as bioc from "bioconductor";
// Construct a DataFrame
let results = new bioc.DataFrame(
{
logFC: new Float64Array([-1, -2, 1.3, 2.1]),
pvalue: new Float64Array([0.01, 0.02, 0.001, 1e-8])
},
{
rowNames: [ "p53", "SNAP25", "MALAT1", "INS" ]
}
);
// Run generics
bioc.LENGTH(results);
bioc.SLICE(results, [ 2, 3, 1 ]);
bioc.CLONE(results);
let more_results = new bioc.DataFrame(
{
logFC: new Float64Array([0, 0.1, -0.1]),
pvalue: new Float64Array([1e-5, 1e-4, 0.5])
},
{
rowNames: [ "GFP", "mCherry", "tdTomato" ]
}
);
bioc.COMBINE([results, more_results]);
See the reference documentation for more details.
Our generics allow users to operate on different objects in a consistent manner.
For example, a DataFrame
allows us to store any object as a column as long as it defines methods for the LENGTH
, SLICE
, CLONE
and COMBINE
generics.
This enables the construction of complex objects like a DataFrame
nested inside another DataFrame
.
let genomic_results = new bioc.DataFrame(
{
logFC: new Float64Array([-1, -2, 1.3, 2.1]),
pvalue: new Float64Array([0.01, 0.02, 0.001, 1e-8]),
location: new bioc.DataFrame({
"chromosome": [ "chrA", "chrB", "chrC", "chrD" ],
"start": [ 1, 2, 3, 4 ],
"width": [ 10, 20, 30, 40 ],
"strand": new Uint8Array([-1, 1, 1, -1 ])
})
},
{
rowNames: [ "p53", "SNAP25", "MALAT1", "INS" ]
}
);
let subset = bioc.SLICE(genomic_results, { start: 2, end: 4 });
bioc.LENGTH(subset);
subset.column("location");
Alternatively, we could store an IRanges
(see below) as a column of our DataFrame
.
All generics on the parent DataFrame
will be automatically applied to the IRanges
column.
let old_location = genomic_results.column("location");
let new_location = new bioc.GRanges(old_location.column("chromosome"),
new bioc.IRanges(old_location.column("start"), old_location.column("width")),
{ strand: old_location.column("strand") });
genomic_results.$setColumn("location", new_location);
subset = bioc.SLICE(genomic_results, { start: 2, end: 4 });
subset.column("location");
We mimic R's S4 generics using methods in Javascript classes.
For example, each vector-like class should define a _bioconductor_LENGTH
method to quantify its concept of "length".
The LENGTH
function will then call this method to obtain a length value for any instance of any supported class.
We prefix this method with _bioconductor_
to avoid collisions with other properties;
this allows safe monkey patching of third-party classes if they are sufficiently vector-like.
(Admittedly, the LENGTH
function is not really necessary, as users could just call _bioconductor_LENGTH
directly.
However, the latter is long and unpleasant to type, so we might as well wrap it in something that's easier to remember.
It would also require monkey patching of built-in classes like Arrays and TypedArrays, which is somewhat concerning as it risks interfering with the behavior of other packages.
By defining our own LENGTH
function, we can safely handle the built-in classes as special cases without modifying their prototypes.)
We mimic R's copy-on-write behavior by returning a new object from any setter, rather than mutating the existing object.
This avoids silent pass-by-reference changes in separate objects, which would be particularly problematic in complex classes that contain many child objects.
In the example below, another_reference
still retains the original set of row names while only modified
has its row names removed.
// Construct a DataFrame
let results = new bioc.DataFrame(
{
logFC: new Float64Array([-1, -2, 1.3, 2.1]),
pvalue: new Float64Array([0.01, 0.02, 0.001, 1e-8])
},
{
rowNames: [ "p53", "SNAP25", "MALAT1", "INS" ]
}
);
let another_reference = results;
let modified = results.setRowNames(null);
For users who are very sure that they are only operating on a single instance of the object,
or for those who wish to exploit pass-by-reference behavior to multiple multiple objects at once,
we can use mutating setters for slightly more efficiency.
These are prefixed with $
signs to indicate their potentially unexpected behavior.
results.$setRowNames(null);
another_reference.rowNames(); // this will now be null.
Note that this copy-on-write paradigm only applies to the setters defined in the bioconductor.js classes.
Assignments to base objects (e.g., arrays, TypedArrays) will still exhibit pass-by-reference behavior.
If there is a risk of inadvertently modifying a shared object, users should consider CLONE
ing their object before modifying it.
// Returns a base object, i.e., Float64Array of log-fold changes.
let lfc = results.column("logFC");
// We clone it so that changes don't propagate to 'results' by reference.
// We can then apply our arbitrary modifications to the copy.
let lfc_copy = bioc.CLONE(lfc);
lfc_copy[0] = 100;
// Only 'more_modified' will contain the new log-FC's;
// 'results' itself is not affected.
let more_modified = results.setColumn("logFC", lfc_copy);
We can construct equivalents of Bioconductor's IRanges
and GRanges
objects, representing integer and genomic ranges respectively.
Similarly, Bioconductor's GRangesList
is implemented as a GroupedGRanges
in this package.
let ir = new bioc.IRanges(/* start = */ [1,2,3], /* width = */ [ 10, 20, 30 ]);
let gr = new bioc.GRanges([ "chrA", "chrB", "chrC" ], ir, { strand: [ 1, 0, -1 ] });
// Generics still work on these range objects:
bioc.LENGTH(gr);
bioc.SLICE(gr, [ 2, 1, 0 ]);
bioc.CLONE(gr);
We can find overlaps between two sets of ranges, akin to Bioconductor's findOverlaps()
function:
let index = gr.buildOverlapIndex();
let gr2 = new bioc.GRanges([ "chrA", "chrC", "chrA" ], new bioc.IRanges([5, 3, 2], [9, 9, 9]));
let overlaps = index.overlap(gr2);
We can store per-range metadata in the elementMetadata
field of each object, just like Bioconductor's mcols()
.
let meta = gr.elementMetadata();
meta.$setColumn("symbol", [ "Nanog", "Snap25", "Malat1" ]);
gr.$setElementMetadata(meta);
gr.elementMetadata().columnNames();
The SummarizedExperiment
object is a data structure for storing experimental data in a matrix-like object,
along with further annotations on the rows (usually features) and samples (usually columns).
To illustrate, let's mock up a small count matrix, ostensibly from an RNA-seq experiment:
// Making a column-major dense matrix of random data.
let ngenes = 100;
let nsamples = 20;
let expression = new Int32Array(ngenes * nsamples);
expression.forEach((x, i) => expression[i] = Math.random() * 10);
let mat = new bioc.DenseMatrix(ngenes, nsamples, expression);
// Mocking up row names, column annotations.
let rownames = [];
for (var g = 0; g < ngenes; g++) {
rownames.push("Gene_" + String(g));
}
let treatment = new Array(nsamples);
treatment.fill("control", 0, 10);
treatment.fill("treated", 10, nsamples);
let sample_meta = new bioc.DataFrame({ group: treatment });
We can now store all of this information in a SummarizedExperiment
:
let se = new bioc.SummarizedExperiment({ counts: mat },
{ rowNames: rownames, columnData: sample_meta });
This can be manipulated by generics for two-dimensional objects:
bioc.NUMBER_OF_ROWS(se);
bioc.SLICE_2D(se, { start: 0, end: 50 }, [0, 2, 4, 8, 10, 12, 14, 16, 18]);
bioc.COMBINE_COLUMNS([se, se]);
Similar implementations are provided for the RangedSummarizedExperiment
and SingleCellExperiment
classes.
For classes:
Javascript | R/Bioconductor equivalent |
---|---|
DataFrame | S4Vectors::DFrame |
IRanges | IRanges::IRanges |
GRanges | GenomicRanges::GRanges |
GroupedGRanges | GenomicRanges::GRangesList |
SummarizedExperiment | SummarizedExperiment::SummarizedExperiment |
RangedSummarizedExperiment | SummarizedExperiment::RangedSummarizedExperiment |
SingleCellExperiment | SingleCellExperiment::SingleCellExperiment |
For generics:
Javascript | R/Bioconductor equivalent |
---|---|
LENGTH | base::NROW |
SLICE | S4Vectors::extractROWS |
COMBINE | S4Vectors::bindROWS |
CLONE | - |
NUMBER_OF_ROWS | base::NROW |
NUMBER_OF_COLUMNS | base::NCOL |
SLICE_2D | base::"[" |
COMBINE_ROWS | S4Vectors::bindROWS |
COMBINE_COLUMNS | S4Vectors::bindCOLS |
A high-level description of Bioconductor data structures is given in the "Orchestrating high-throughput genomic analysis with Bioconductor" paper.
The formulation of the generics was mostly based on the code in the S4Vectors package.
The implementation of each class is based on the code in the corresponding R package, e.g., GRanges
in GenomicRanges.
FAQs
Bioconductor classes, generics and methods, implemented in Javascript.
The npm package bioconductor receives a total of 23 weekly downloads. As such, bioconductor popularity was classified as not popular.
We found that bioconductor demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
Research
Security News
Malicious Koishi plugin silently exfiltrates messages with hex strings to a hardcoded QQ account, exposing secrets in chatbots across platforms.
Research
Security News
Malicious PyPI checkers validate stolen emails against TikTok and Instagram APIs, enabling targeted account attacks and dark web credential sales.