![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Framework for machine learning on source code. Provides API and tools to train and use models based on source code features extracted from Babelfish's UASTs.
This project is the foundation for MLonCode research and development. It abstracts feature extraction and training models, thus allowing to focus on the higher level tasks.
Currently, the following models are implemented:
It is written in Python3 and has been tested on Linux and macOS. source{d} ml is tightly coupled with source{d} engine and delegates all the feature extraction parallelization to it.
Here is the list of proof-of-concept projects which are built using sourced.ml:
Whether you wish to include Spark in your installation or would rather use an existing
installation, to use sourced-ml
you will need to have some native libraries installed,
e.g. on Ubuntu you must first run: apt install libxml2-dev libsnappy-dev
. Tensorflow
is also a requirement - we support both the CPU and GPU version.
In order to select which version you want, modify the package name in the next section
to either sourced-ml[tf]
or sourced-ml[tf-gpu]
depending on your choice.
If you don't, neither version will be installed.
pip3 install sourced-ml
If you already have Apache Spark installed and configured on your environment at $APACHE_SPARK
you can re-use it and avoid downloading 200Mb through pip "editable installs" by
pip3 install -e "$SPARK_HOME/python"
pip3 install sourced-ml
In both cases, you will need to have some native libraries installed. E.g.,
on Ubuntu apt install libxml2-dev libsnappy-dev
. Some parts require Tensorflow.
This project exposes two interfaces: API and command line. The command line is
srcml --help
docker run -it --rm srcd/ml --help
If this first command fails with
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
And you are sure that the daemon is running, then you need to add your user to docker
group: refer to the documentation.
...are welcome! See CONTRIBUTING and CODE_OF_CONDUCT.md.
We build the source code identifier co-occurrence matrix for every repository.
Read Git repositories.
Classify files using enry.
Extract UAST from each supported file.
Split and stem all the identifiers in each tree.
Traverse UAST, collapse all non-identifier paths and record all
identifiers on the same level as co-occurring. Besides, connect them with their immediate parents.
Write the global co-occurrence matrix.
Train the embeddings using Swivel (requires Tensorflow). Interactively view
the intermediate results in Tensorboard using --logs
.
Write the identifier embeddings model.
1-5 is performed with repos2coocc
command, 6 with id2vec_preproc
, 7 with id2vec_train
, 8 with id2vec_postproc
.
We represent every repository as a weighted bag-of-vectors, provided by we've got document frequencies ("docfreq") and identifier embeddings ("id2vec").
1-7 are performed with repos2bow
command.
See here.
See here.
FAQs
Framework for machine learning on source code. Provides API and tools to train and use models based on source code features extracted from Babelfish's UASTs.
We found that sourced-ml demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.