
Security News
CISA Kills Off RSS Feeds for KEVs and Cyber Alerts
CISA is discontinuing official RSS support for KEV and cybersecurity alerts, shifting updates to email and social media, disrupting automation workflows.
github.com/exa-graph/vite
Vite (ˈviːte/)
Vite
is an MPI+OpenMP implementation of Louvain method for (undirected)
graph clustering or community detection, that supports a number of heuristics
to improve the scalability. Vite can use both real-world and synthetic graph
(uses an in-memory random geometric graph generator).
Please refer to the following paper that discusses the distributed implementation and associated heuristics: https://ieeexplore.ieee.org/abstract/document/8425242/
We recently added an option to improve the distribution to be more edge-balanced rather than standard vertex-based ("-b"), which can avoid communication and improve execution time. Relevant discussion in the following paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8916299
If quality is important for you, please consider the coloring options. Coloring will increase the overall execution time and communication, but it is possible to combine coloring with other heuristics to improve the performance. Relevant discussion in the following paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8547534
Vite can be downloaded from (please don't use the past PNNL/PNL link to download Vite mentioned in the paper, the current GitHub link is the correct one): https://github.com/Exa-Graph/vite
This code requires an MPI library (preferably MPI-3 compatible) and C++11 compliant compiler for building.
Please contact the following for any queries or support:
Sayan Ghosh, PNNL (sg0 at pnnl dot gov) Mahantesh Halappanavar, PNNL (hala at pnnl dot gov) Antonino Tumeo, PNNL (antonino.tumeo at pnnl dot gov)
Please '*' this repository on GitHub if the code is useful to you.
Pass -DDEBUG_PRINTF if detailed diagonostics is required along program run. This program requires OpenMP and C++11 support, so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/ -std=c++0x. For Cray systems, use CC.
Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are within 32-bit range (2 x 10^9), else 64-bit range is assumed.
Pass -DDONT_CREATE_DIAG_FILES if you dont want to create 2 files per process with detail diagonostics.
Upon building, the program will generate two binaries: bin/graphClustering (parallel) and bin/fileConvert (serial).
Please use bin/fileConvert for input graph conversion from native formats to a binary format that bin/graphClustering will be able to read.
Compiling on Intel KNL:
We made some modifications to the code to port it to Cray XC systems with Intel KNL. We use lib-memkind to allocate some heavily used data structures to KNL MCDRAM (for usage in flat-mode). Please ensure lib-memkind module is loaded and pass -DUSE_AUTOHBW_MEMALLOC, and pass -xmic-AVX512 instead of -xHost in the Makefile.
Experimental device offload:
The modularity computation is the only arithmetic logic in this code. Even though, due to the limited computation associated with the modularity computation, it is doubtful how much GPU can help in improving the performance relative to using OpenMP host pragmas. However, we have added an option to enable OpenMP offload only for the modularity computation kernel. Vite must be built with -DOMP_TARGET_OFFLOAD and a compiler that supports OpenMP 4.5. The default compiler options in the Makefile needs to be updated as well, for instance, on Summit, for enabling OpenMP offload in the XL compiler, one would need to pass: "-qsmp=omp -qoffload" currently. Initial testing suggests that for certain graphs, it is possible to achieve about 10-15% improvement in performance relative to host OpenMP.
Convert input files to binary from various formats. Possible options (can be combined):
Typical example: bin/./fileConvert -n -f karate.txt -o karate.bin
If your input file does not have edge weights, and you want random edge weights, then pass option "-r".
bin/./fileConvert -m -r -f karate.mtx -o karate.bin
Note: DIMACS file could be directed or undirected, so pass 0 if input is directed, or a number > 0 input graph is undirected. Also, we expect DIMACS inputs to have weights, therefore passing -r has no effect.
bin/./fileConvert -d 0 -f sample.gr -o sample.bin
Same as the one before, except initializes all weights to 1, no matter what is in the input file.
bin/./fileConvert -d 0 -w -f sample.gr -o sample.bin
Note: If the input file is index 1-based, then indicate that by passing -z, such that it will be converted to 0-based internally.
bin/./fileConvert -s -z -f network.dat -o lfrtest.bin
Note: We consider files containing edge list, as 'simple'
formatted files (option "-s" for directed, and option "-u"
for undirected). If you have a weighted simple format file
(i.e., is u v w
format) then you have to additionally
pass "-i".
If there are weights in such an edge-list file and for some
reason you do not want to consider them, pass "-w" or "-r"
in addition to "-s" or "-u" such that we can assign one-weights
or random-weights internally.
By default, we assume edge list files to not have any weights
(as in, just u v
per line).
bin/./fileConvert -f uk2007-edges.txt -s -w -o uk2007.bin
-s
and -u
optionsThe -s
or -u
option is just to tell the implementation about
your native graph format (as in if you pass -s
, we assume in
your edge list just u v
exists and not its complement; whereas
when you pass -u
, we will assume that both u v
and v u
exists).
Regardless of the native format, internally, we always use an
undirected graph representation, because the algorithm itself
assumes an undirected input, and accordingly prints the number of
edges as double if your native graph was directed (on the other
hand, if your native graph was undirected, and for e.g., you used
-u
to convert to binary, the #edges printed by the implementation
should exactly match with the #edges of your native graph).
Apart from the simple edge list formats and matrix-market formats, we do not actively check the correctness of the other formats.
Apart from real world graphs, users can use specific options to generate a Random Geometric Graph (RGG) in parallel. RGGs have been known to have good community structure: https://arxiv.org/pdf/1604.03993.pdf
The way we have implemented a parallel RGG generator, vertices owned by a process will only have cross edges with its logical neighboring processes (each process owning 1x1/p chunk of the 1x1 unit square). If MPI process mapping is such that consecutive processes (for e.g., p and p+1) are physically close to each other, then there is not much communication stress in the application. Therefore, we allow an option to add extra edges between randomly chosen vertices, whose owners may be physically far apart. Relevant discussion in paper: https://ieeexplore.ieee.org/abstract/document/8641631
We require the total number of processes to be a power of 2 and total number of vertices to be perfectly divisible by the number of processes when parallel RGG generation options are used.
An n-D random geometric graph (RGG), is generated by randomly placing N vertices in an n-D space and connecting pairs of vertices whose Euclidean distance is less than or equal to d. We only consider 2D RGGs contained within a unit square, [0,1]^2. We distribute the domain such that each process receives N/p vertices (where p is the total number of processes). Each process owns (1 * 1/p) portion of the unit square and d is computed as (please refer to Section 4 of above paper for details):
d = (dc + dt)/2; where, dc = sqrt(ln(N) / piN); dt = sqrt(2.0736 / piN)
Therefore, the number of vertices (N) passed during miniVite execution on p processes must satisfy the condition -- 1/p > d.
Please note, the default distribution of graph generated from the in-built random geometric graph generator causes a process to only communicate with its two immediate neighbors. If you want to increase the communication intensity for generated graphs, please use the "-e" option to specify an extra percentage of edges that will be generated, linking random vertices. As a side-effect, this option significantly increases the time required to generate the graph.
E.g.: mpiexec -n 2 bin/./minivite -f karate.bin mpiexec -n 2 bin/./minivite -l -n 100 mpiexec -n 2 bin/./minivite -n 100 mpiexec -n 2 bin/./minivite -e 2 -n 100
We also have bindings to NetworKit[*], that can help generate random and scale free graphs, but this option is actively supported and may be purged from future Vite versions. This is mostly serial, but NetworKit internally may use OpenMP.
Note: We have only tested with NetworKit GitHub main (circa Oct 2023), and have no plans on supporting other NetworKit versions (we will most probably remove this option from future releases).
[*] https://networkit.github.io/
Networkit can be built from GitHub: git clone --recursive https://github.com/networkit/networkit.git
Possible options (can be combined):
Input file conversion to Vite binary format from the respective native file formats: convert the input file to binary format using fileConvert.
Set the desired number of processes and threads.
Run distributed parallel Louvain algorithm (graphClustering binary) using the binary file created in Step #0:
E.g., pass a binary file using "-f" option and include other heuristics: mpiexec -n 2 bin/./graphClustering -i -f karate.bin
E.g., generate a random geometric graph by passing #vertices=64: mpiexec -n 2 bin/./graphClustering -n 64
E.g., generate a random geometric graph by passing #vertices=64 and store the resultant binary graph: mpiexec -n 2 bin/./graphClustering -n 64 -s rgg64.bin
E.g., read a previously generated random geometric binary graph: mpiexec -n 2 bin/./graphClustering -f rgg64.bin
Possible options (can be combined):
Coloring:
mpiexec -n 2 bin/graphClustering -c "4" -f karate.bin mpiexec -n 2 bin/graphClustering -c "8 false" -i -f karate.bin
By default, the coloring method is run for multiple iterations, to make it run for just a single iteration, pass "true". For e.g.: mpiexec -n 2 bin/graphClustering -c "4 true" -f karate.bin
*Note: Option to update inactive vertices percentage is defined as macro ET_CUTOFF in louvain.hpp, and the default is 2%.
Generate benchmark graphs with ground truth community information. For e.g., we have tested with LFR benchmark. (https://sites.google.com/site/santofortunato/inthepress2) LFR generates an edge list (index 1-based), along with community membership of each vertex.
Convert the generated network to binary using fileConvert binary.
Execute the program as shown in the e.g. below: mpiexec -n 2 bin/./graphClustering -r 1 -f lfrtest.bin -z -g community.dat
The difference between standard way to execute the program and this way
is that one needs to pass the ground truth vertex-community mapping
using the "-g" option. Also, the "-z" option tells the program that
the input ground truth file -- community.dat
is 1-based. If "-z" is
not passed, then we assume that the file is 0-based.
We return some statistics such as F-score, Gini-coefficient, false positives/negatives and true positives. We expect that the isolated vertices will be placed into their respective communities, and not into a single one.
However, extra communication per phase is required when these metrics are to be computed, so the program execution time will increase significantly. Plus, the community comparison part is currently multithreaded (uses OpenMP), so everything is brought to a single node/PE and then communities are compared.
If -DDONT_CREATE_DIAG_FILES is passed during compilation,
then output is send to stdout.
Otherwise, the output result is dumped per process on files
named as dat.out.. Check dat.out.0 to review
program diagonostics.
Output files are cleared with: make clean
.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISA is discontinuing official RSS support for KEV and cybersecurity alerts, shifting updates to email and social media, disrupting automation workflows.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.