Processing of csv files
An unofficial ruby library for quickly parsing 23andme raw data files into a plain Ruby structures for quick processing and analysis.
My SAKURA gem with various utilities. This is my swiss-army knife for Linux and Mac. See README.md for amazing examples, like: richelp ubuntu # shows a richelp of my 'ubuntu' cheatsheet richelp sakura synopsis # shows a richelp of my 'sakura' cheatsheet, grepping for 'synopsis' ls | act # randomly scrambles the lines! Taken from cat/atc ;) ps | rainbow # colors all lines differently twice itunes - # lowers volume of iTunes... twice :) 10 echo Bart Simpson likes it DRY # tells you this 10 times. Very sarcastic script! seq 100 | 1suN 7 # prints every 7th element of the list zombies # prints processes that show zombies (plus funny options to kill them) find . -size +300M | xargs mvto /tmp/bigfiles/ # moves big files to that directory alias gp='never_as_root git pull' # only if u r not root it runs! tellme-time # Tells you the time with Riccardo voice in Italian. Brilliant! find-duplicates . # Tells you files with same size/MD5 in this directory facter is_google_vm # Tells if it's a GCE Virtual Machine
DirectoryTemplate is a library which lets you generate directory structures and files from a template structure. The template structure can be a real directory structure on the filesystem, or it can be stored in a yaml file. Take a look at the examples directory in the gem to get an idea, how a template can look. When generating a new directory structure from a template, DirectoryTemplate will process the pathname of each directory and file using the DirectoryTemplate#path_processor. It will also process the contents of each file with all processors that apply to a given file. The standard path processor allows you to use `%{variables}` in pathnames. The gem comes with a .erb (renders ERB templates) and .html.markdown processor (renders markdown to html). You can use the existing processors or define your own ones.
Rack middleware for SassC which process sass/scss files when in development environment.
Jog is a simple command-line tool that simplifies the process of logging what you've worked on, storing plain-text files in a sensible file structue.
CSV Files: I hate them, you probably do too, but sometimes you need to get data into your system and this is the only way it's happening. If you're deploying a rails app in a cloud setup, you may have troubles if you're trying to store an uploaded file locally and process it later in a background thread (I know I have). cumulus_csv is one way to solve that problem. You can save your file to your S3 account, and loop over the data inside it at your convenience later. So it doesn't matter where you're doing the processing, you just need to have the key you used to store the file, and you can process away.
A gem for Ruby 1.8 to start process in parallel, 1) don't wait for it finish 2) dont inherit io file handles 3) both in windows/linux. Ruby 1.9 dont need it because it has spawn method.
Easily process flat files with Flat. Specify the format in a subclass of Flat::File and read and write until the cows come home.
SiteFuel is a Ruby program and lightweight API for processing the source code behind your static and dynamic websites. SiteFuel can remove comments and unneeded whitespace from your CSS, HTML, and JavaScript files (as well as fragments in RHTML and PHP) files. It can also losslessly compress your PNG and JPEG images. SiteFuel can also deploy your website from SVN or GIT. Support for more formats and repositories is planned for future versions.
Miscellaneous methods that may or may not be useful. sh:: Safely pass untrusted parameters to sh scripts. Raise an exception if the script returns a non-zero value. fork_and_check:: Run a block in a forked process and raise an exception if the process returns a non-zero value. do_and_exit, do_and_exit!:: Run a block. If the block does not run exit!, a successful exec or equivalent, run exit(1) or exit!(1) ourselves. Useful to make sure a forked block either runs a successful exec or dies. Any exceptions from the block are printed to standard error. overwrite:: Safely replace a file. Writes to a temporary file and then moves it over the old file. tempname_for:: Generates an unique temporary path based on a filename. The generated filename resides in the same directory as the original one. try_n_times:: Retries a block of code until it succeeds or a maximum number of attempts (default 10) is exceeded. Exception#to_formatted_string:: Return a string that looks like how Ruby would dump an uncaught exception. IO#best_datasync:: Try fdatasync, falling back to fsync, falling back to flush. Random#exp:: Return a random integer 0 ≤ n < 2^argument (using SecureRandom). Random#float:: Return a random float 0.0 ≤ n < argument (using SecureRandom). Random#int:: Return a random integer 0 ≤ n < argument (using SecureRandom). Password:: A small wrapper for String#crypt that does secure salt generation and easy password verification.
Integrates data into MS Word docx template files. Processing supports loops and replacement of strings of data both outside and within loops.
Processes, evaluates and compares 2 different CSS files based on their AST.
Starts n AVDs based on JSON file config. AVDs are created and configured according to user liking before instrumentation test process (started either via shell command or gradle) and killed/deleted after test process finishes.
CLI tool for bath csv file processing via BriteVerify API
Literate programming using markdown! Converts files to html or extracts the code snippets into one or more source files. To use: $ literate_md --help Options: --weave, -w: Produce documentation --tangle, -t: Produce code --outputdir, -o <s>: Directory to write files to --lang, -l <s>: Default language of code (default: ruby) --files, -f <s>: Files to process --standalone, -s: Weaves in html and body tags --help, -h: Show this message
Command line tool handling steps to deliver music album masters from recordings. Handle Track Mixing, Track Mastering, Track Master Delivery, Album Mastering and Album Master Delivery. Easy-to-use configuration files drive the complete processes.
The FBO gem manages the process of downloading and parsing file-based notice information from the Federal Business Opportunities (https://www.fbo.gov/) application / database. The FBO feed files include new and updated opportunities, information about awarded contracts, and other details concerning the offer and disposition of federal government contracts and tenders.
DoverToCalais allows the user to send a wide range of data sources (files & URLs) to OpenCalais and receive asynchronous responses when OpenCalais has finished processing the inputs. In addition, DoverToCalais enables the filtering of the response in order to find relevant tags and/or tag values.
:title: The Ruby API :section: PYAPNS::Client There's python in my ruby! This is a class used to send notifications, provision applications and retrieve feedback using the Apple Push Notification Service. PYAPNS is a multi-application APS provider, meaning it is possible to send notifications to any number of different applications from the same application and same server. It is also possible to scale the client to any number of processes and servers, simply balanced behind a simple web proxy. It may seem like overkill for such a bare interface - after all, the APS service is rather simplistic. However, PYAPNS takes no shortcuts when it comes to completeness/compliance with the APNS protocol and allows the user many optimization and scaling vectors not possible with other libraries. No bandwidth is wasted, connections are persistent and the server is asynchronous therefore notifications are delivered immediately. PYAPNS takes after the design of 3rd party push notification service that charge a fee each time you push a notification, and charge extra for so-called 'premium' service which supposedly gives you quicker access to the APS servers. However, PYAPNS is free, as in beer and offers more scaling opportunities without the financial draw. :section: Provisioning To add your app to the PYAPNS server, it must be `provisioned` at least once. Normally this is done once upon the start-up of your application, be it a web service, desktop application or whatever... It must be done at least once to the server you're connecting to. Multiple instances of PYAPNS will have to have their applications provisioned individually. To provision an application manually use the `PYAPNS::Client#provision` method. require 'pyapns' client = PYAPNS::Client.configure client.provision :app_id => 'cf', :cert => '/home/ss/cert.pem', :env => 'sandbox', :timeout => 15 This basically says "add an app reference named 'cf' to the server and start a connection using the certification, and if it can't within 15 seconds, raise a `PYAPNS::TimeoutException` That's all it takes to get started. Of course, this can be done automatically by using PYAPNS::ClientConfiguration middleware. `PYAPNS::Client` is a singleton class that is configured using the class method `PYAPNS::Client#configure`. It is sensibly configured by default, but can be customized by specifying a hash See the docs on `PYAPNS::ClientConfiguration` for a list of available configuration parameters (some of these are important, and you can specify initial applications) to be configured by default. :section: Sending Notifications Once your client is configured, and application provisioned (again, these should be taken care of before you write notification code) you can begin sending notifications to users. If you're wondering how to acquire a notification token, you've come to the wrong place... I recommend using google. However, if you want to send hundreds of millions of notifications to users, here's how it's done, one at a time... The `PYAPNS::Client#notify` is a sort of polymorphic method which can notify any number of devices at a time. It's basic form is as follows: client.notify 'cf', 'long ass app token', {:aps=> {:alert => 'hello?'}} However, as stated before, it is sort of polymorphic: client.notify 'cf', ['token', 'token2', 'token3'], [alert, alert2, alert3] client.notify :app_id => 'cf', :tokens => 'mah token', :notifications => alertHash client.notify 'cf', 'token', PYAPNS::Notification('hello tits!') As you can see, the method accepts paralell arrays of tokens and notifications meaning any number of notifications can be sent at once. Hashes will be automatically converted to `PYAPNS::Notification` objects so they can be optimized for the wire (nil values removed, etc...), and you can pass `PYAPNS::Notification` objects directly if you wish. :section: Retrieving Feedback The APS service offers a feedback functionality that allows application servers to retrieve a list of device tokens it deems to be no longer in use, and the time it thinks they stopped being useful (the user uninstalled your app, better luck next time...) Sounds pretty straight forward, and it is. Apple recommends you do this at least once an hour. PYAPNS will return a list of 2-element lists with the date and the token: feedbacks = client.feedback 'cf' :section: Asynchronous Calls PYAPNS::Client will, by default, perform no funny stuff and operate entirely within the calling thread. This means that certain applications may hang when, say, sending a notification, if only for a fraction of a second. Obviously not a desirable trait, all `provision`, `feedback` and `notify` methods also take a block, which indicates to the method you want to call PYAPNS asynchronously, and it will be done so handily in another thread, calling back your block with a single argument when finished. Note that `notify` and `provision` return absolutely nothing (nil, for you rub--wait you are ruby developers!). It is probably wise to always use this form of operation so your calling thread is never blocked (especially important in UI-driven apps and asynchronous servers) Just pass a block to provision/notify/feedback like so: PYAPNS::Client.instance.feedback do |feedbacks| feedbacks.each { |f| trim_token f } end :section: PYAPNS::ClientConfiguration A middleware class to make `PYAPNS::Client` easy to use in web contexts Automates configuration of the client in Rack environments using a simple confiuration middleware. To use `PYAPNS::Client` in Rack environments with the least code possible `use PYAPNS::ClientConfiguration` (no, really, in some cases, that's all you need!) middleware with an optional hash specifying the client variables. Options are as follows: use PYAPNS::ClientConfiguration( :host => 'http://localhost/' :port => 7077, :initial => [{ :app_id => 'myapp', :cert => '/home/myuser/apps/myapp/cert.pem', :env => 'sandbox', :timeout => 15 }]) Where the configuration variables are defined: :host String the host where the server can be found :port Number the port to which the client should connect :initial Array OPTIONAL - an array of INITIAL hashes INITIAL HASHES: :app_id String the id used to send messages with this certification can be a totally arbitrary value :cert String a path to the certification or the certification file as a string :env String the environment to connect to apple with, always either 'sandbox' or 'production' :timoeut Number The timeout for the server to use when connecting to the apple servers :section: PYAPNS::Notification An APNS Notification You can construct notification objects ahead of time by using this class. However unnecessary, it allows you to programmatically generate a Notification like so: note = PYAPNS::Notification.new 'alert text', 9, 'flynn.caf', {:extra => 'guid'} -- or -- note = PYAPNS::Notification.new 'alert text' These can be passed to `PYAPNS::Client#notify` the same as hashes
Used for locking processes via PID and file (daemon style).
Stylist provides powerful stylesheet management for your Rails app. You can organize your CSS files by media, add, remove or prepend stylesheets in the stylesheets stack from your controllers and views, and process them using Less or Sass. And as if that wasn't awesome enough, you can even minify them using YUI Compressor and bundle them into completely incomprehensible, but bandwidth-friendly mega-stylesheets.
This Ruby gem helps to manage processes in your application so that new process won’t start while the previous one is still running. It uses so known 'lock file' approach to figure out whether a process is running or not.
Starts n AVDs based on JSON file config. AVDs are created and configured according to user liking before instrumentation test process (started either via shell command or gradle) and killed/deleted after test process finishes.
RStore makes batch processing of csv files a breeze. Automatically fetches data files, directories, URLs :: Customizable using additional options :: Validation of field values :: Descriptive error messages :: Safe and transparent data storage using database transactions
`fingerpuppet` is a simple library and commandline tool to interact with Puppet's REST API without needing to have Puppet itself installed. This may be integrated, for example, into a provisioning tool to allow your provisioning process to remotely sign certificates of newly built systems. Alternatively, you could use it to request known facts about a node from your Puppet Master, or even to request a catalog for a node to, for example, perform acceptance testing against a new version of Puppet before upgrading your production master. Install the binford2k/fingerpuppet puppet module to get a class that can automatically configure your `auth.conf` file under Puppet Enterprise, where that file is managed.
Process files using pipelines
rightmove_wrangler is a command line utility for processing a directory of Rightmove .zip files and submitting them to an API.
This is a very simple command line tool for HTML file pre-processing
Process and View Grin Phaserunner ASIObjectDictionary.xml and BOD.json files
* A regex based parser that processes the ITunes Music Library.xml file and generates a sqlite3 database for additional data mining.
This is a ruby interface to the once popular Ispell package. Please keep in mind, that every instance forks an ispell process. It was since then mostly superseeded by Aspell, but still remains quite useful. Especially it has a good support for Russian using ru-ispell dictionaries. Ispell is a fast screen-oriented spelling checker that shows you your errors in the context of the original file, and suggests possible corrections when it can figure them out. Compared to UNIX spell, it is faster and much easier to use. Ispell can also handle languages other than English.
Converts Rich Text Format (RTF) word processing files to plain text. Uses the rtf-filter C++ executable
Bulk upload data in a file (e.g. CSV), process in the background, then send a success or failure report
Extension for fixing processing images with Carrierwave on reupload when file extension changes
This agent consolidates and manages multiple New Relic plugins. It pulls the agents that defined in .yml file and run them all in one process.
Readorder orders a list of files into a more effective read order. You would possibly want to use readorder in a case where you know ahead of time that you have a large quantity of files on disc to process. You can give that list off those files and it will report back to you the order in which you should process them to make most effective use of your disc I/O. Given a list of filenames, either on the command line or via stdin, readorder will output the filenames in an order that should increase the I/O throughput when the files corresponding to the filenames are read off of disc. The output order of the filenames can either be in inode order or physical disc block order. This is dependent upon operating system support and permission level of the user running readorder.
Switch helps you add multiple languages to your site by leveraging the power of google spreadsheets. It is a commandline tool providing you with an easy way to automate the process and avoid common mistakes. The most common use case of switch is for switching between a locale representation in JSON/YAML to a CSV (spreadsheet) based one and vice-versa. # Install ``` gem install switch-cli ``` # Usage ``` switch json2csv [input-dir] [output-file] ``` Converts multiple json files to be a single csv file with columns for each file, with the file name as the column header. If you do not specify an input-dir it will be taken as ./locales and output-file would be the direcotry name + .csv. ``` switch csv2json [input-file] [output-dir] ``` Converts a single csv file into multiple json files, with a file for each column using the key and order columns to construct the files.
A simple way to direct upload big files to Amazon S3 storage from Ruby applications and process it with cloud encoding service.
Log2json lets you read, filter and send logs as JSON objects via Unix pipes. It is inspired by Logstash, and is meant to be compatible with it at the JSON event/record level so that it can easily work with Kibana. Reading logs is done via a shell script(eg, `tail`) running in its own process. You then configure(see the `syslog2json` or the `nginxlog2json` script for examples) and run your filters in Ruby using the `Log2Json` module and its contained helper classes. `Log2Json` reads from stdin the logs(one log record per line), parses the log lines into JSON records, and then serializes and writes the records to stdout, which then can be piped to another process for processing or sending it to somewhere else. Currently, Log2json ships with a `tail-log` script that can be run as the input process. It's the same as using the Linux `tail` utility with the `-v -F` options except that it also tracks the positions(as the numbers of lines read from the beginning of the files) in a few files in the file system so that if the input process is interrupted, it can continue reading from where it left off next time if the files had been followed. This feature is similar to the sincedb feature in Logstash's file input. Note: If you don't need the tracking feature(ie, you are fine with always tailling from the end of file with `-v -F -n0`), then you can just use the `tail` utility that comes with your Linux distribution.(Or more specifically, the `tail` from the GNU coreutils). Other versions of the `tail` utility may also work, but are not tested. The input protocol expected by Log2json is very simple and documented in the source code. ** The `tail-log` script uses a patched version of `tail` from the GNU coreutils package. A binary of the `tail` utility compiled for Ubuntu 12.04 LTS is included with the Log2json gem. If the binary doesn't work for your distribution, then you'll need to get GNU coreutils-8.13, apply the patch(it can be found in the src/ directory of the installed gem), and then replace the bin/tail binary in the directory of the installed gem with your version of the binary. ** P.S. If you know of a way to configure and compile ONLY the tail program in coreutils, please let me know! The reason I'm not building tail post gem installation is that it takes too long to configure && make because that actually builds every utilties in coreutils. For shipping logs to Redis, there's the `lines2redis` script that can be used as the output process in the pipe. For shipping logs from Redis to ElasticSearch, Log2json provides a `redis2es` script. Finally here's an example of Log2json in action: From a client machine: tail-log /var/log/{sys,mail}log /var/log/{kern,auth}.log | syslog2json | queue=jsonlogs \ flush_size=20 \ flush_interval=30 \ lines2redis host.to.redis.server 6379 0 # use redis DB 0 On the Redis server: redis_queue=jsonlogs redis2es host.to.es.server Resources that help writing log2json filters: - look at log2json.rb source and example filters - http://grokdebug.herokuapp.com/ - http://www.ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/DateTime.html#method-i-strftime
Daemon launching and management made dead simple. With daemon-spawn you can start, stop and restart processes that run in the background. Processed are tracked by a simple PID file written to disk. In addition, you can choose to either execute ruby in your daemonized process or 'exec' another process altogether (handy for wrapping other services).
Processing.rb runs a Processing sketch written in Ruby, and reloads it automatically when files in the same directory change.
Ruby Library for processing External Call History (ECHI) files from the Avaya CMS
Extends the Markdown parser Kramdown to support hieroglyphs, inline multi-column glosses, and output to BBCode for use on forums. Includes an executable for processing files and a webfont version of the Gardiner signs.
Keeps your files more secure by ensuring saved files are chmoded 0700 by the same user who is running the process.
Vera helps you automatize the process of organizing, cleaning and sorting your photo and video files.
Checking process is composed of two steps; file size and checksum hash. Not using diff command. User can stop the first step by indicationg an option.
Simple command line tool that monitors files and processes and sends notifications or take corrective actions when problems arise. Monitor log files for errors, processes CPU and memory consumption (can kill if exceeding), respawn dead processes.
Walks one or more text files, which may be gzipped, allowing line-by-line processing of the contents
First Argument is a variables file, second is an ERB template, outputs to STDOUT