Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

fastcsv

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

fastcsv

  • 0.0.7
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

FastCSV

Gem Version Build Status Coverage Status Code Climate

A fast Ragel-based CSV parser, compatible with Ruby's CSV.

Usage

FastCSV.raw_parse is implemented in C and is the fastest way to read CSVs with FastCSV.

require 'fastcsv'

# Read from file.
File.open(filename) do |f|
  FastCSV.raw_parse(f) do |row|
    # do stuff
  end
end

# Read from an IO object.
FastCSV.raw_parse(StringIO.new("foo,bar\n")) do |row|
  # do stuff
end

# Read from a string.
FastCSV.raw_parse("foo,bar\n") do |row|
  # do stuff
end

# Transcode like with the CSV module.
FastCSV.raw_parse("\xF1\n", encoding: 'iso-8859-1:utf-8') do |row|
  # ["ñ"]
end

FastCSV can be used as a drop-in replacement for CSV (replace CSV with FastCSV) except:

  • The :row_sep option is ignored. The default :auto is implemented #9.
  • The :col_sep option must be a single-byte string, like the default , #8. Python and PHP support single-byte delimiters only, as do the major libraries in JavaScript, Java, C, Objective-C and Perl. A major Node library supports multi-byte delimiters. The CSV Dialect Description Format allows only single-byte delimiters.
  • If FastCSV raises an error, you can't continue reading #3. Its error messages don't perfectly match those of CSV.

A few minor caveats:

  • Use FastCSV.parse_line(string, options) instead of string.parse_csv(options).
  • If you were passing CSV an IO object on which you had wrapped #gets (for example, as described in this article), #gets will not be called.
  • The :field_size_limit option is ignored. If you need to prevent DoS attacks – the ostensible reason for this option – limit the size of the input, not the size of quoted fields.
  • FastCSV doesn't support UTF-16 or UTF-32. See UTF-8 Everywhere.

Development

ragel -G2 ext/fastcsv/fastcsv.rl
ragel -Vp ext/fastcsv/fastcsv.rl | dot -Tpng -o machine.png
rake compile
gem uninstall fastcsv
rake install
rake
rspec test/runner.rb test/csv

Implementation

FastCSV implements its Ragel-based CSV parser in C at FastCSV::Parser.

FastCSV is a subclass of CSV. It overrides #shift, replacing the parsing code, in order to act as a drop-in replacement.

FastCSV's raw_parse requires a block to which it yields one row at a time. FastCSV uses Fibers to pass control back to #shift while parsing.

CSV delegates IO methods to the IO object it's reading. IO methods that move the pointer within the file like rewind changes the behavior of CSV's #shift. However, FastCSV's C code won't take notice. We therefore null the Fiber whenever the pointer is moved, so that #shift uses a new Fiber.

CSV's #shift runs the regular expression in the :skip_lines option against a row's raw text. FastCSV::Parser implements a row method, which returns the most recently parsed row's raw text.

FastCSV is tested against the same tests as CSV. See TESTS.md for details.

Why?

I evaluated many CSV Ruby gems, and they were either too slow or had implementation errors. rcsv is fast and libcsv-based, but it skips blank rows (Ruby's CSV module returns an empty array) and silently fails on input with an unclosed quote. bamfcsv is well implemented, but it's considerably slower on large files. I looked for Ragel-based CSV parsers to copy, but they either had implementation errors or could not handle large files. commas looks good, but it performs a memory check on each character, which is overkill.

Acknowledgements

Started as a Ruby 2.1 fork of MoonWolf moonwolf@moonwolf.com's CSVScan, found in this commit. CSVScan uses Ragel code from HPricot from this commit. Most of the Ruby (i.e. non-C, non-Ragel) methods are copied from CSV.

Copyright (c) 2014 James McKinney, released under the MIT license

FAQs

Package last updated on 03 Jan 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc