
Research
/Security News
Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
#+TITLE: Dreader #+AUTHOR: Adolfo Villafiorita #+STARTUP: showall
Dreader is a simple DSL built on top of [[https://github.com/roo-rb/roo][Roo]] to read and process tabular data (CSV, LibreOffice, Excel) in a simple and structured way.
Main advantages:
We use Dreader for importing fairly big files (in the order of 10K-100K records) in MIP, an ERP to manage distribution of bins to the population. The main issues we had before using Dreader were errors and exceptional cases in the input data. We also had to manage several small variations in the input files (coming from different ERPs) and Dreader helped us standardizing the input code.
The gem depends on =roo=, from which it leverages all data reading/parsing facilities keeping its size in about 250 lines of code.
It should be relatively easy to use; /dreader/ stands for /d/ata /r/eader.
Add this line to your application's Gemfile:
#+BEGIN_EXAMPLE ruby gem 'dreader' #+END_EXAMPLE
And then execute:
#+BEGIN_EXAMPLE $ bundle #+END_EXAMPLE
Or install it yourself as:
#+BEGIN_EXAMPLE $ gem install dreader #+END_EXAMPLE
** Quick start
Print name and age of people from the following data:
| Name | Date of birth | |------------------+-----------------| | Forest Whitaker | July 15, 1961 | | Daniel Day-Lewis | April 29, 1957 | | Sean Penn | August 17, 1960 |
#+BEGIN_EXAMPLE ruby require 'dreader'
class Reader extend Dreader::Engine
options do
# we start reading from row 2
first_row 2
end
column :name do
doc "column A contains :name, a string; doc is optional"
colref 'A'
end
# column B contains :birthdate, a date. We can use a Hash and omit
# colref
column({ birthdate: 'B' }) do
process do |c|
Date.parse(c)
end
end
# add as many example lines as you want to show examples of good
# records these example lines are added to the template generated with
# generate_template
example { name: "John", birthday: "27/03/2020" }
# for each line, :age is computed from :birthdate
virtual_column :age do
process do |row|
birthdate = row[:birthdate][:value]
birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
today = Date.today
[0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
end
end
# this is how we process each line of the input file
mapping do |row|
r = Dreader::Util.simplify(row)
puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
end
end
reader = Reader
reader.read filename: "Birthdays.ods"
reader.virtual_columns
reader.mappings
File.open("ages.txt", "w") do |file| reader.table.each do |row| unless row[:row_errors].any? file.puts "#{row[:name][:value]} #{row[:age][:value]}" end end end #+END_EXAMPLE
** Gentler Introduction
To write an import function with Dreader:
Dreader now knows ho to collect, shape, and tranform (map) data according to your instructions. We are now ready to do the actual work. This consists of the following steps, various of which can be performed together:
Each step is described in more details in the following sections.
*** Declare which is the input file and where we can find data
Require =dreader= and declare a class which extends =Dreader::Engine=:
#+BEGIN_EXAMPLE ruby require 'dreader'
class Reader extend Dreader::Engine [...] end #+END_EXAMPLE
Specify parsing option in the class, using the following syntax:
#+BEGIN_EXAMPLE ruby options do filename 'example.ods' # this optional. Use it when the file does not have an extension extension :ods
sheet 'Sheet 1'
first_row 1
last_row 20
# optional (this allows to integrate with other applications already
# using a logger)
logger Logger.new
logger_level Logger::INFO
end #+END_EXAMPLE
where:
You can override some of the defaults by passing a hash as argument to the =read= function. For instance:
#+BEGIN_EXAMPLE ruby Reader.read filename: another_filepath #+END_EXAMPLE
will read data from =another_filepath=, rather than from the filename specified in the options. This might be useful, for instance, if the same specification has to be used for different files.
*** Declare the content of columns and how to parse them
Declare the columns you want to read by assigning them a name and a column reference.
There are two notations:
#+BEGIN_EXAMPLE ruby
column :name do colref 'A' end
column({ name: 'A' }) do end #+END_EXAMPLE
The reference to a column can either be a letter or a number. First column is ='A'= or =1=.
The =column= declaration can contain Ruby blocks:
#+begin_example column({ name: 'A' }) do check_raw do |cell| !cell.nil? end end #+end_example
#+begin_quote If you declare more than a check block of the same type per column, use a unique symbol to distinguish the blocks or the error messages will be overwritten. #+end_quote
#+begin_example column({ name: 'A' }) do check_raw :must_be_non_nil do |cell| !cell.nil? end
check_raw :first_letter_must_be_a do |cell|
cell[0] == 'A'
end
end #+end_example
#+begin_quote =process= is always executed before =check=. If you want to check raw data use the =check_raw= directive. #+end_quote
#+begin_quote There can be only one process block. If you define more than one per column, only the last one is executed. #+end_quote
#+begin_example column({ name: 'A' }) do check_raw do |cell| # Here cell is like in the input file end
process do |cell|
cell.upcase
end
check do |cell|
# Here cell is upcase and
end
end #+end_example
For instance, given the tabular data:
| Name | Date of birth | |------------------+-----------------| | Forest Whitaker | July 15, 1961 | | Daniel Day-Lewis | April 29, 1957 | | Sean Penn | August 17, 1960 |
we could use the following declaration to specify the data to read:
#+BEGIN_EXAMPLE ruby
column :name do colref 1 check do |x| x and x.length > 0 end end
column :birthdate do colref 2
# make sure the column is transformed into a Date
process do |x|
Date.parse(x)
end
# check age is a date (check is invoked on the value returned
# by process)
check do |x|
x.class == Date
end
end #+END_EXAMPLE
#+BEGIN_NOTES
If there are different columns that have to be read and processed in the same way, =columns= (notice the plural form) allows for a more compact representation:
#+BEGIN_EXAMPLE ruby columns { a: 'A', b: 'B' } #+END_EXAMPLE
is equivalent to:
#+BEGIN_EXAMPLE ruby column :a do colref 'A' end
column :b do colref 'B' end #+END_EXAMPLE
=columns= accepts a code block, which can be used to add =process= and =check= declarations:
#+BEGIN_EXAMPLE ruby columns({ a: 'A', b: 'B' }) do process do |cell| ... end end #+END_EXAMPLE
See [[file:examples/wikipedia_us_cities/us_cities_bulk_declare.rb][us_cities_bulk_declare.rb]] for an example of =columns=.
#+BEGIN_NOTES If you use code blocks, don't forget to put in parentheses the column mapping, or the Ruby parser won't be able to distinguish the hash from the code block. #+END_NOTES
The data read from each row of our input data is stored in a hash. The hash uses column names as the primary key and stores the values in the =:value= key.
*** Add virtual columns
Sometimes it is convenient to aggregate or otherwise manipulate the data read from each row, before doing the actual processing.
For instance, we might have a table with dates of birth, while we are really interested in the age of people.
In such cases, we can use virtual column. A virtual column allows one to add a column to the data read, computed using the values of other cells in the same row.
The following declaration adds an =age= column to each row of the data read from the previous example:
#+BEGIN_EXAMPLE ruby
virtual_column :age do
process do |row|
# the function compute_birthday
has to be defined
compute_birthday(row[:birthdate])
end
end
#+END_EXAMPLE
Virtual columns are, of course, available to the =mapping= directive (see below).
*** Specify which lines to reject
You can reject some lines using the =reject= declaration, which is applied row by row, can predicate over columns and virtual columns, and has to return a Boolean value.
All lines returning a truish value will be be rejected, that is, not stored in the =@table= variable (and, consequently, passed to the mapping function).
For instance, the following declaration rejects all lines in which the population column is higher than =3_000_000=:
#+begin_src ruby reject { |row| row[:population][:value] > 3_000_000 } #+end_src
*** Specify how to process each line
The =mapping= directive specifies what to do with each line read. The =mapping= declaration takes an arbitrary piece of ruby code, which can reference the fields using the column names we declared.
For instance the following code gets the value of column =:name=, the value of column =:age= and prints them to standard output
#+BEGIN_EXAMPLE ruby mapping do |row| puts "#{row[:name][:value]} is #{row[:age][:value]} years old" end #+END_EXAMPLE
To invoke the =mapping= declaration on a file, use the =mappings= method, which invokes =map= to each row and it stores in the =@table= variable whatever value mapping returns.
*** Process data
If =mapping= does not work for your data processing activities (e.g., you need to make elaborations on data which span different rows), you can add your perform your elaborations on the data transformed by =mappings=.
A typical scenario works as follows:
#+BEGIN_EXAMPLE ruby i = Reader
i.read
Reader.read #+END_EXAMPLE
#+BEGIN_EXAMPLE ruby array_of_hashes = i.errors array_of_hashes.each do |error_hash| puts error_hash end #+END_EXAMPLE
#+BEGIN_EXAMPLE ruby i.virtual_columns #+END_EXAMPLE
(Optionally: check again for errors.)
#+BEGIN_EXAMPLE ruby i.mappings #+END_EXAMPLE
(Optionally: check again for errors.)
#+begin_quote Notice that =mappings= does a side effect and invoking the mapping twice in a row won't work: you need to reload the file first. #+end_quote
Look in the examples directory for further details and a couple of working examples.
*** Improving performances
While debugging your specification executing =read=, =virtual_columns=, and =mappings= in distinct steps is a good idea. When you go in production, you might want to reduce the number of passes you perform on the data.
You can pass the option =virtual: true= to =read= to compute virtual columns while you are reading data.
You can pass the option =mapping: true= to =read= to compute virtual columns and perform the mapping while you are reading data. Notice that:
*** Managing Errors
**** Finding errors in input data
Dreader collects errors in three specific ways:
The following, for instance checks that name or surname have a valid value:
#+begin_example ruby virtual_column :global_check do doc "Name or Surname must exist" check :name_or_surname_must_be_defined do |row| row[:name] || row[:surname] end end #+end_example
If you prefer, you can also define a virtual column that contains the value of the check:
#+begin_example ruby virtual_column :name_or_surname_exist do doc "Name or Surname must exist" process do |row| row[:name] || row[:surname] end end #+end_example
You can then act in the mapping directive according to value returned by the virtual column:
#+begin_example ruby mapping do |row| unless row[:global_check][:value] == false [...] end #+end_example
**** Managing Errors
You can check for errors in two different ways:
The first is in the =mapping= directive, where can check whether some checks for the =row= failed, by:
checking from the =:error= boolean key associated to each column, that is:
=row[<column_name>][:error]=
looking at the value of the =:row_errors= key, which contains all error messages generated for the row:
=row[:row_errors]=
After the processing, by using the method =errors=, which lists all the errors.
The utility function =Dreader::Util.errors= takes as input the errors generated by Dreader and extract those of a specific row and, optionally column:
#+begin_example ruby
Dreader::Util.errors i.errors, 2
Dreader::Util.errors i.errors, 2, 3 #+end_example
From version 0.6.0 =dreader= allows to generate a template starting from the specification.
The template is generated by the following call:
#+begin_example ruby generate_template template_filename: "template.xlsx" #+end_example
(The =template_filename= directive can also be specified in the =options= section).
The template contains the following rows:
The position of the first row is determined by the value of =first_row=, that is, if =first_row= is 2 (content starts from the second row), the header row is put in row 1.
Only Excel is supported, at the moment.
An example of template generation can be found in the Examples.
** Digging deeper
If you need to perform elaborations which cannot be performed row by row you can access all data, with the =table= method:
#+BEGIN_EXAMPLE ruby i.read i.table #+END_EXAMPLE
The function =i.table= returns an array of Hashes. Each element of the array is a row of the input file. Each element/row has the following structure:
#+BEGIN_EXAMPLE ruby { col_name1: { }, [...] col_nameN: { }, row_errors: [ ], row_number: } #+END_EXAMPLE
where =col_name1=, ..., =col_nameN= are the names you have assigned to the columns and the information stored for each cell is the following:
#+BEGIN_EXAMPLE ruby { value: ..., # the result of calling process on the cell row_number: ..., # the row number col_number: ..., # the column number error: ... # the result of calling check on the cell processed value } #+END_EXAMPLE
(Note that virtual columns only store =value= and a Boolean =virtual=, which is always =true=.)
Thus, for instance, given the example above returns:
#+BEGIN_EXAMPLE ruby i.table [ { name: { value: "John", row_number: 1, col_number: 1, errors: nil }, age: { value: 30, row_number: 1, col_number: 2, errors: nil } }, { name: { value: "Jane", row_number: 2, col_number: 1, errors: nil }, age: { value: 31, row_number: 2, col_number: 2, errors: nil } } ] #+END_EXAMPLE
The =Dreader::Util= class provides some functions to simplify the hashes built by =dreader=. This is useful to simplify the code you write and to genereate hashes you can pass, for instance, to ActiveRecord creators.
** Simplify removes everything but the values
=Dreader::Util.simplify(hash)= removes all information but the value and makes the value accessible directly from the name of the column.
#+BEGIN_EXAMPLE ruby i.table[0] { name: { value: "John", row_number: 1, col_number: 1, errors: nil }, age: { value: 30, row_number: 1, col_number: 2, errors: nil } }
Dreader::Util.simplify i.table[0] { name: "John", age: 30 } #+END_EXAMPLE
** Slice and Clean select columns
=Dreader::Util.slice(hash, keys)= and =Dreader::Util.clean(hash, keys)=, where =keys= is an arrays of keys, are respectively used to select or remove some keys from the hash returned by Dreader. (Notice that the Ruby Hash class already provides similar methods.)
#+BEGIN_EXAMPLE ruby i.table[0] { name: { value: "John", row_number: 1, col_number: 1, errors: nil }, age: { value: 30, row_number: 1, col_number: 2, errors: nil } }
Dreader::Util.slice i.table[0], :name { name: { value: "John", row_number: 1, col_number: 1, errors: nil}
Dreader::Util.clean i.table[0], :name { age: { value: 30, row_number: 1, col_number: 2, errors: nil } #+END_EXAMPLE
The methods =slice= and =clean= are more useful when used in conjuction with =simplify=:
#+BEGIN_EXAMPLE ruby hash = Dreader::Util.simplify i.table[0] { name: "John", age: 30 }
Dreader::Util.slice hash, [:age] { age: 30 }
Dreader::Util.clean hash, [:age] { name: "John" } #+END_EXAMPLE
The output produced by =slice= and =simplify= is a hash which can be used to create an =ActiveRecord= object.
** Better Integration with ActiveRecord
Finally, the =Dreader::Util.restructure= method helps building hashes to create [[http://api.rubyonrails.org/classes/ActiveModel/Model.html][ActiveModel]] objects with nested attributes.
The starting point is a simplified row.
#+BEGIN_EXAMPLE ruby hash = { name: "John", surname: "Doe", address: "Unknown", city: "NY" }
Dreader::Util.restructure hash, [:name, :surname, :address_attributes, [:address, :city]] { name: "John", surname: "Doe", address_attributes: { address: "Unknown", city: "NY" } } #+END_EXAMPLE
The =debug= function prints the current configuration, reads some records from the input file(s), and shows the records read:
#+BEGIN_EXAMPLE ruby i.debug i.debug n: 40 # read 40 lines (from first_row) i.debug n: 40, filename: filepath # like above, but read from filepath #+END_EXAMPLE
By default =debug= invokes the =check_raw=, =process=, and =check= directives. Pass the following options, if you want to disable this behavior; this might be useful, for instance, if you intend to check only what data is read:
#+BEGIN_EXAMPLE ruby i.debug process: false, check: false #+END_EXAMPLE
Notice that =check= implies =process=, since =check= is invoked on the output of the =process= directive.`
If you prefer, in alternative to =debug= you can also use configuration variables (but then you need to change the configuration according to the environment):
#+begin_example ruby i.options do debug true end #+end_example
See [[file:CHANGELOG.org][CHANGELOG]].
At the moment:
Some known bugs and an unknown number of unknown bugs.
(See the open issues for the known bugs.)
After checking out the repo, run =bin/setup= to install dependencies. You can also run =bin/console= for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run =bundle exec rake install=. To release a new version, update the version number in =version.rb=, and then run =bundle exec rake release=, which will create a git tag for the version, push git commits and tags, and push the =.gem= file to [[https://rubygems.org][rubygems.org]].
Bug reports and pull requests are welcome.
You need to get in touch with me by email, till I figure how to enable it in Gitea.
[[https://opensource.org/licenses/MIT][MIT License]].
FAQs
Unknown package
We found that dreader demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.