New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

hirsute

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

hirsute

0.1.0
Rubygems

Version published: 12 years ago

Maintainers: 1

Created: 12 years ago

Source

Hirsute

Hirsute is a Ruby-based domain specific language for generating plausible fake data. You might need fake data for any of the following reasons:

demoing to a potential customer and providing a realistic experience
building a "real" database for testing (versus the often messy, inaccurate databases in dev systems)
building a database for load testing before a launch

In Hirsute, you define a template for what an object might look like, and then generate however many copies you need. Then you can work with those collections as needed. There is a full manual, but here is a quick example to give the flavor. Say you're building a system where you have a bunch of users, and you want to build in a "friend" concept that allows each user to have 10 friends. You want to generate a random sample of data, but you think most users will only have 2-4 friends.

The relevant Hirsute script might look like this:


# define a user template that has an id that is an incrementing counter starting at 1 and an email address that we're defining as testuser1@gmail.com,testuser2@yahoo.com,testuser3@aol.com, and so forth
storage :mysql

a('user') {
    has :id => counter,
        :email => combination(
                  "testuser",counter,"@",one_of(['gmail','aol','yahoo']),".com")
    is_stored_in "users"
}

#make 1000 users
users = user * 1000

# define a friendship object that maps two users together. Just define the user ids as literals so they can be defined but can be filled in later
a('friendship') {
    has :user1 => 1, 
        :user2 => 1
    is_stored_in "friendship"
}
friendships = collection_of friendship

# for each user, pick an appropriate number of friends and create the friendship objects
foreach user do |cur_user|
  # figure out a number of friends this user might have. Pass in a histogram to steer the probability the way we want
  
  # while you can pass in an array of probabilities, you can also define the histogram more intuitively
  friend_dist = <<-HIST
  num_friends
         0 **
         1 **********
         2 ******************************
         3 ******************************
         4 ********************
         5 *
         6 *
         7 *
         8 *
         9 **
        10 *
  HIST
  # the first argument is the options to draw from, the second argument (optional) is a histogram representing distribution of probabilities
  num_friends = pick_from([0,1,2,3,4,5,6,7,8,9,10], HIST)

  # since this in Ruby, you can just write in it as needed
  (0...num_friends).each do |idx|
     # grab a random user that isn't this one
     friend = any(user) {|friend| friend.id != cur_user.id}

     new_friendship = friendship.make # because there's only one collection holding these, it's added automatically
     new_friendship.user1 = friend.id
     new_friendship.user2 = cur_user.id
  end
end

# and now write them all out to files
finish users
finish friendships

This will create files that have data such as this:


INSERT INTO users ('email','id') VALUES ('testuser1@yahoo.com',1);
INSERT INTO users ('email','id') VALUES ('testuser2@yahoo.com',2);
INSERT INTO users ('email','id') VALUES ('testuser3@aol.com',3);
INSERT INTO users ('email','id') VALUES ('testuser4@yahoo.com',4);
INSERT INTO users ('email','id') VALUES ('testuser5@yahoo.com',5);

and


INSERT INTO friendship ('user1','user2') VALUES (624,1);
INSERT INTO friendship ('user1','user2') VALUES (808,1);
INSERT INTO friendship ('user1','user2') VALUES (81,1);
INSERT INTO friendship ('user1','user2') VALUES (15,2);

To run this script, cd to the hirsute directory and run


ruby lib/hirsute.rb samples/readme.hrs

Roadmap

Hirsute is still early in development, but my goal is to continue adding output formats (currently only mysql and csv are supported) and generators as well as continuing to allow for more declarative syntax that would make the data generation more flexible, terse, and intuitive.

I also don't think it will yet meet one of my needs, which is to generate the data for a multimillion-user system. So far, it does everything in memory, which will obviously cause problems for large data sets.

Why Hirsute?

The name was a joke with a friend. When I wanted something like this, I asked him, since he's up on many open-source projects. He said he didn't know of something like this, so I replied, "No, no. You're supposed to say 'Look at Hirsute' or something like that."

FAQs

What is hirsute?

Is hirsute well maintained?

Package last updated on 01 Jul 2013

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

hirsute

Hirsute

Roadmap

Why Hirsute?

Related posts

Michigan TypeScript Founder Successfully Runs Doom Inside TypeScript's Type System

vlt Launches "reproduce": A New Tool Challenging the Limits of Package Provenance