
Security News
Crates.io Implements Trusted Publishing Support
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
github.com/noobyscoob/grpc-map-reduce
This assignment is an implementation of MapReduce framework in Golang using gRpc (Remote procedure call framework by Google). It can compute operations on large dataset using distributed setting, while achieving parallelism and concurrency.
Key Words and Phrases : Golang, RPC, gRpc, Protocol Buffers.
In this implementation of MapReduce, the master spawns the map reduce processes on a single machine as OS processes. They communicate using the gRpc implementation of remote procedure calls (which underlyingly use Http 2.0). Current design tests the implementation on two operations, wordcount and inverted index. This also uses better serializing mechanism, with protocol buffers, achieves parallelism of connections and concurrency in processing of map and reduce tasks with goroutines (also called threads).
File : main.go
Client is a user program that initiates the map reduce process, i.e., starts the master with user specific configuration. In this implementation, immediately after starting the client process with the required input, it creates a separate master process. Client listens on the connection until the jobs are processed and gets notified about the output.
Input: input files, type of operation, ports configuration, number of mappers and reducers.
Output: output readable files
Command : $go run main.go client ./input/filepath/ wc
Arguments
File : master.go
Protocol Defination : master.proto
Master program is started by the client in a separate OS process which listens for RPC connections. After client establishes the connection with the master it calls the initialize map reduce function to spin up mappers and reducers respectively.
File : mapper.go
Protocol Defination : mapper.proto
Each mapper process spawned by the master takes one input file from the master and generates intermediate files.
File : reducer.go
Protocol Defination : reducer.proto
All mappers and reducers are started at the same time. So, the reducers are waiting idle listening for intermediate files from the mappers.
Implementation of Word Count and Inverted Index are provided. Example:
Respective functions are implemented in mapper.go and reducer.go files.
Grouping implementation is split into two stages where:
$make: Builds the binary files. $make test: runs word count and inverted index tests $make run: builds and runs the program
Go version 1.20 is used to develop. Install the latest go version from here: https://go.dev/doc/install
Manual Execution:
After running each test case use below command to cleanup ports. $killall main
Word Count:
Test1: $go run main.go client ./input/small/ wc Test2: go run main.go client ./input/large/ wc
Inverted Index:
Test1: $go run main.go client ./input/small/ ii Test2: go run main.go client ./input/large/ ii
Can use $./bin/main_linux
instead of $go run main.go
Known Edge Cases: unsupported characters in the text file, large input files (>5mb), not closed connections and files.
This program is tested with around 10 (200 – 800kb files) which runs the tasks in 1-4 seconds on my local machine (M2 Mac – 8 Gb memory)
This implementation sends files across process using the gRpc framework. Though it uses the protocol buffers as intermediate files. It spent around 50ms to send the files (avg 500kb) over local network.
[1] Jeffery Dean and Sanjay Ghemawat, Google Inc., Map Reduce: Simplified Data Processing in Large Clusters
[2] Robert Morris, MIT 6.824 Spring 2020, Distributed Systems Course Materials
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.