Dragonfly is a framework that enables on-the-fly processing for any content type. It is especially suited to image handling. Its uses range from image thumbnails to standard attachments to on-demand text generation.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology. Note that google-cloud-speech-v1 is a version-specific client library. For most uses, we recommend installing the main client library google-cloud-speech instead. See the readme for more details.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology. Note that google-cloud-speech-v1p1beta1 is a version-specific client library. For most uses, we recommend installing the main client library google-cloud-speech instead. See the readme for more details.
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
This library performs diffs of CSV data, or any table-like source. Unlike a standard diff that compares line by line, and is sensitive to the ordering of records, CSV-Diff identifies common lines by key field(s), and then compares the contents of the fields in each line. Data may be supplied in the form of CSV files, or as an array of arrays. The diff process provides a fine level of control over what to diff, and can optionally ignore certain types of changes (e.g. changes in position). CSV-Diff is particularly well suited to data in parent-child format. Parent- child data does not lend itself well to standard text diffs, as small changes in the organisation of the tree at an upper level can lead to big movements in the position of descendant records. By instead matching records by key, CSV-Diff avoids this issue, while still being able to detect changes in sibling order. This gem implements the core diff algorithm, and handles the loading and diffing of CSV files (or Arrays of Arrays). It also supports converting data in XML format into tabular form, so that it can then be processed like any other CSV or table-like source. It returns a CSVDiff object containing the details of differences in object form. This is useful for projects that need diff capability, but want to handle the reporting or actioning of differences themselves. For a pre-built diff reporting capability, see the csv-diff-report gem, which provides a command-line tool for generating diff reports in HTML, Excel, or text formats.
git-scribe is a workflow tool for starting, writing, reviewing and publishing multiple forms of a book. it allows you to use asciidoc plain text markup to write, review and translate a work and provides a simple toolkit for generating common digital outputs for publishing - epub, mobi, pdf and html. \ it is also integrated into github functionality, letting you automate the publishing and collaboration process.
Lookout Lookout is a unit testing framework for Ruby¹ that puts your results in focus. Tests (expectations) are written as follows expect 2 do 1 + 1 end expect ArgumentError do Integer('1 + 1') end expect Array do [1, 2, 3].select{ |i| i % 2 == 0 } end expect [2, 4, 6] do [1, 2, 3].map{ |i| i * 2 } end Lookout is designed to encourage – force, even – unit testing best practices such as • Setting up only one expectation per test • Not setting expectations on non-public APIs • Test isolation This is done by • Only allowing one expectation to be set per test • Providing no (additional) way of accessing private state • Providing no setup and tear-down methods, nor a method of providing test helpers Other important points are • Putting the expected outcome of a test in focus with the steps of the calculation of the actual result only as a secondary concern • A focus on code readability by providing no mechanism for describing an expectation other than the code in the expectation itself • A unified syntax for setting up both state-based and behavior-based expectations The way Lookout works has been heavily influenced by expectations², by {Jay Fields}³. The code base was once also heavily based on expectations, based at Subversion {revision 76}⁴. A lot has happened since then and all of the work past that revision are due to {Nikolai Weibull}⁵. ¹ Ruby: http://ruby-lang.org/ ² Expectations: http://expectations.rubyforge.org/ ³ Jay Fields’s blog: http://blog.jayfields.com/ ⁴ Lookout revision 76: https://github.com/now/lookout/commit/537bedf3e5b3eb4b31c066b3266f42964ac35ebe ⁵ Nikolai Weibull’s home page: http://disu.se/ § Installation Install Lookout with % gem install lookout § Usage Lookout allows you to set expectations on an object’s state or behavior. We’ll begin by looking at state expectations and then take a look at expectations on behavior. § Expectations on State: Literals An expectation can be made on the result of a computation: expect 2 do 1 + 1 end Most objects, in fact, have their state expectations checked by invoking ‹#==› on the expected value with the result as its argument. Checking that a result is within a given range is also simple: expect 0.099..0.101 do 0.4 - 0.3 end Here, the more general ‹#===› is being used on the ‹Range›. § Regexps ‹Strings› of course match against ‹Strings›: expect 'ab' do 'abc'[0..1] end but we can also match a ‹String› against a ‹Regexp›: expect %r{a substring} do 'a string with a substring' end (Note the use of ‹%r{…}› to avoid warnings that will be generated when Ruby parses ‹expect /…/›.) § Modules Checking that the result includes a certain module is done by expecting the ‹Module›. expect Enumerable do [] end This, due to the nature of Ruby, of course also works for classes (as they are also modules): expect String do 'a string' end This doesn’t hinder us from expecting the actual ‹Module› itself: expect Enumerable do Enumerable end or the ‹Class›: expect String do String end for obvious reasons. As you may have figured out yourself, this is accomplished by first trying ‹#==› and, if it returns ‹false›, then trying ‹#===› on the expected ‹Module›. This is also true of ‹Ranges› and ‹Regexps›. § Booleans Truthfulness is expected with ‹true› and ‹false›: expect true do 1 end expect false do nil end Results equaling ‹true› or ‹false› are slightly different: expect TrueClass do true end expect FalseClass do false end The rationale for this is that you should only care if the result of a computation evaluates to a value that Ruby considers to be either true or false, not the exact literals ‹true› or ‹false›. § IO Expecting output on an IO object is also common: expect output("abc\ndef\n") do |io| io.puts 'abc', 'def' end This can be used to capture the output of a formatter that takes an output object as a parameter. § Warnings Expecting warnings from code isn’t very common, but should be done: expect warning('this is your final one!') do warn 'this is your final one!' end expect warning('this is your final one!') do warn '%s:%d: warning: this is your final one!' % [__FILE__, __LINE__] end ‹$VERBOSE› is set to ‹true› during the execution of the block, so you don’t need to do so yourself. If you have other code that depends on the value of $VERBOSE, that can be done with ‹#with_verbose› expect nil do with_verbose nil do $VERBOSE end end § Errors You should always be expecting errors from – and in, but that’s a different story – your code: expect ArgumentError do Integer('1 + 1') end Often, not only the type of the error, but its description, is important to check: expect StandardError.new('message') do raise StandardError.new('message') end As with ‹Strings›, ‹Regexps› can be used to check the error description: expect StandardError.new(/mess/) do raise StandardError.new('message') end § Queries Through Symbols Symbols are generally matched against symbols, but as a special case, symbols ending with ‹?› are seen as expectations on the result of query methods on the result of the block, given that the method is of zero arity and that the result isn’t a Symbol itself. Simply expect a symbol ending with ‹?›: expect :empty? do [] end To expect it’s negation, expect the same symbol beginning with ‹not_›: expect :not_nil? do [1, 2, 3] end This is the same as expect true do [].empty? end and expect false do [1, 2, 3].empty? end but provides much clearer failure messages. It also makes the expectation’s intent a lot clearer. § Queries By Proxy There’s also a way to make the expectations of query methods explicit by invoking methods on the result of the block. For example, to check that the even elements of the Array ‹[1, 2, 3]› include ‹1› you could write expect result.to.include? 1 do [1, 2, 3].reject{ |e| e.even? } end You could likewise check that the result doesn’t include 2: expect result.not.to.include? 2 do [1, 2, 3].reject{ |e| e.even? } end This is the same as (and executes a little bit slower than) writing expect false do [1, 2, 3].reject{ |e| e.even? }.include? 2 end but provides much clearer failure messages. Given that these two last examples would fail, you’d get a message saying “[1, 2, 3]#include?(2)” instead of the terser “true≠false”. It also clearly separates the actual expectation from the set-up. The keyword for this kind of expectations is ‹result›. This may be followed by any of the methods • ‹#not› • ‹#to› • ‹#be› • ‹#have› or any other method you will want to call on the result. The methods ‹#to›, ‹#be›, and ‹#have› do nothing except improve readability. The ‹#not› method inverts the expectation. § Literal Literals If you need to literally check against any of the types of objects otherwise treated specially, that is, any instances of • ‹Module› • ‹Range› • ‹Regexp› • ‹Exception› • ‹Symbol›, given that it ends with ‹?› you can do so by wrapping it in ‹literal(…)›: expect literal(:empty?) do :empty? end You almost never need to do this, as, for all but symbols, instances will match accordingly as well. § Expectations on Behavior We expect our objects to be on their best behavior. Lookout allows you to make sure that they are. Reception expectations let us verify that a method is called in the way that we expect it to be: expect mock.to.receive.to_str(without_arguments){ '123' } do |o| o.to_str end Here, ‹#mock› creates a mock object, an object that doesn’t respond to anything unless you tell it to. We tell it to expect to receive a call to ‹#to_str› without arguments and have ‹#to_str› return ‹'123'› when called. The mock object is then passed in to the block so that the expectations placed upon it can be fulfilled. Sometimes we only want to make sure that a method is called in the way that we expect it to be, but we don’t care if any other methods are called on the object. A stub object, created with ‹#stub›, expects any method and returns a stub object that, again, expects any method, and thus fits the bill. expect stub.to.receive.to_str(without_arguments){ '123' } do |o| o.to_str if o.convertable? end You don’t have to use a mock object to verify that a method is called: expect Object.to.receive.name do Object.name end As you have figured out by now, the expected method call is set up by calling ‹#receive› after ‹#to›. ‹#Receive› is followed by a call to the method to expect with any expected arguments. The body of the expected method can be given as the block to the method. Finally, an expected invocation count may follow the method. Let’s look at this formal specification in more detail. The expected method arguments may be given in a variety of ways. Let’s introduce them by giving some examples: expect mock.to.receive.a do |m| m.a end Here, the method ‹#a› must be called with any number of arguments. It may be called any number of times, but it must be called at least once. If a method must receive exactly one argument, you can use ‹Object›, as the same matching rules apply for arguments as they do for state expectations: expect mock.to.receive.a(Object) do |m| m.a 0 end If a method must receive a specific argument, you can use that argument: expect mock.to.receive.a(1..2) do |m| m.a 1 end Again, the same matching rules apply for arguments as they do for state expectations, so the previous example expects a call to ‹#a› with 1, 2, or the Range 1..2 as an argument on ‹m›. If a method must be invoked without any arguments you can use ‹without_arguments›: expect mock.to.receive.a(without_arguments) do |m| m.a end You can of course use both ‹Object› and actual arguments: expect mock.to.receive.a(Object, 2, Object) do |m| m.a nil, 2, '3' end The body of the expected method may be given as the block. Here, calling ‹#a› on ‹m› will give the result ‹1›: expect mock.to.receive.a{ 1 } do |m| raise 'not 1' unless m.a == 1 end If no body has been given, the result will be a stub object. To take a block, grab a block parameter and ‹#call› it: expect mock.to.receive.a{ |&b| b.call(1) } do |m| j = 0 m.a{ |i| j = i } raise 'not 1' unless j == 1 end To simulate an ‹#each›-like method, ‹#call› the block several times. Invocation count expectations can be set if the default expectation of “at least once” isn’t good enough. The following expectations are possible • ‹#at_most_once› • ‹#once› • ‹#at_least_once› • ‹#twice› And, for a given ‹N›, • ‹#at_most(N)› • ‹#exactly(N)› • ‹#at_least(N)› § Utilities: Stubs Method stubs are another useful thing to have in a unit testing framework. Sometimes you need to override a method that does something a test shouldn’t do, like access and alter bank accounts. We can override – stub out – a method by using the ‹#stub› method. Let’s assume that we have an ‹Account› class that has two methods, ‹#slips› and ‹#total›. ‹#Slips› retrieves the bank slips that keep track of your deposits to the ‹Account› from a database. ‹#Total› sums the ‹#slips›. In the following test we want to make sure that ‹#total› does what it should do without accessing the database. We therefore stub out ‹#slips› and make it return something that we can easily control. expect 6 do |m| stub(Class.new{ def slips raise 'database not available' end def total slips.reduce(0){ |m, n| m.to_i + n.to_i } end }.new, :slips => [1, 2, 3]){ |account| account.total } end To make it easy to create objects with a set of stubbed methods there’s also a convenience method: expect 3 do s = stub(:a => 1, :b => 2) s.a + s.b end This short-hand notation can also be used for the expected value: expect stub(:a => 1, :b => 2).to.receive.a do |o| o.a + o.b end and also works for mock objects: expect mock(:a => 2, :b => 2).to.receive.a do |o| o.a + o.b end Blocks are also allowed when defining stub methods: expect 3 do s = stub(:a => proc{ |a, b| a + b }) s.a(1, 2) end If need be, we can stub out a specific method on an object: expect 'def' do stub('abc', :to_str => 'def'){ |a| a.to_str } end The stub is active during the execution of the block. § Overriding Constants Sometimes you need to override the value of a constant during the execution of some code. Use ‹#with_const› to do just that: expect 'hello' do with_const 'A::B::C', 'hello' do A::B::C end end Here, the constant ‹A::B::C› is set to ‹'hello'› during the execution of the block. None of the constants ‹A›, ‹B›, and ‹C› need to exist for this to work. If a constant doesn’t exist it’s created and set to a new, empty, ‹Module›. The value of ‹A::B::C›, if any, is restored after the block returns and any constants that didn’t previously exist are removed. § Overriding Environment Variables Another thing you often need to control in your tests is the value of environment variables. Depending on such global values is, of course, not a good practice, but is often unavoidable when working with external libraries. ‹#With_env› allows you to override the value of environment variables during the execution of a block by giving it a ‹Hash› of key/value pairs where the key is the name of the environment variable and the value is the value that it should have during the execution of that block: expect 'hello' do with_env 'INTRO' => 'hello' do ENV['INTRO'] end end Any overridden values are restored and any keys that weren’t previously a part of the environment are removed when the block returns. § Overriding Globals You may also want to override the value of a global temporarily: expect 'hello' do with_global :$stdout, StringIO.new do print 'hello' $stdout.string end end You thus provide the name of the global and a value that it should take during the execution of a block of code. The block gets passed the overridden value, should you need it: expect true do with_global :$stdout, StringIO.new do |overridden| $stdout != overridden end end § Integration Lookout can be used from Rake¹. Simply install Lookout-Rake²: % gem install lookout-rake and add the following code to your Rakefile require 'lookout-rake-3.0' Lookout::Rake::Tasks::Test.new Make sure to read up on using Lookout-Rake for further benefits and customization. ¹ Read more about Rake at http://rake.rubyforge.org/ ² Get information on Lookout-Rake at http://disu.se/software/lookout-rake/ § API Lookout comes with an API¹ that let’s you create things such as new expected values, difference reports for your types, and so on. ¹ See http://disu.se/software/lookout/api/ § Interface Design The default output of Lookout can Spartanly be described as Spartan. If no errors or failures occur, no output is generated. This is unconventional, as unit testing frameworks tend to dump a lot of information on the user, concerning things such as progress, test count summaries, and flamboyantly colored text telling you that your tests passed. None of this output is needed. Your tests should run fast enough to not require progress reports. The lack of output provides you with the same amount of information as reporting success. Test count summaries are only useful if you’re worried that your tests aren’t being run, but if you worry about that, then providing such output doesn’t really help. Testing your tests requires something beyond reporting some arbitrary count that you would have to verify by hand anyway. When errors or failures do occur, however, the relevant information is output in a format that can easily be parsed by an ‹'errorformat'› for Vim or with {Compilation Mode}¹ for Emacs². Diffs are generated for Strings, Arrays, Hashes, and I/O. ¹ Read up on Compilation mode for Emacs at http://www.emacswiki.org/emacs/CompilationMode ² Visit The GNU Foundation’s Emacs’ software page at http://www.gnu.org/software/emacs/ § External Design Let’s now look at some of the points made in the introduction in greater detail. Lookout only allows you to set one expectation per test. If you’re testing behavior with a reception expectation, then only one method-invocation expectation can be set. If you’re testing state, then only one result can be verified. It may seem like this would cause unnecessary duplication between tests. While this is certainly a possibility, when you actually begin to try to avoid such duplication you find that you often do so by improving your interfaces. This kind of restriction tends to encourage the use of value objects, which are easy to test, and more focused objects, which require simpler tests, as they have less behavior to test, per method. By keeping your interfaces focused you’re also keeping your tests focused. Keeping your tests focused improves, in itself, test isolation, but let’s look at something that hinders it: setup and tear-down methods. Most unit testing frameworks encourage test fragmentation by providing setup and tear-down methods. Setup methods create objects and, perhaps, just their behavior for a set of tests. This means that you have to look in two places to figure out what’s being done in a test. This may work fine for few methods with simple set-ups, but makes things complicated when the number of tests increases and the set-up is complex. Often, each test further adjusts the previously set-up object before performing any verifications, further complicating the process of figuring out what state an object has in a given test. Tear-down methods clean up after tests, perhaps by removing records from a database or deleting files from the file-system. The duplication that setup methods and tear-down methods hope to remove is better avoided by improving your interfaces. This can be done by providing better set-up methods for your objects and using idioms such as {Resource Acquisition Is Initialization}¹ for guaranteed clean-up, test or no test. By not using setup and tear-down methods we keep everything pertinent to a test in the test itself, thus improving test isolation. (You also won’t {slow down your tests}² by keeping unnecessary state.) Most unit test frameworks also allow you to create arbitrary test helper methods. Lookout doesn’t. The same rationale as that that has been crystallized in the preceding paragraphs applies. If you need helpers you’re interface isn’t good enough. It really is as simple as that. To clarify: there’s nothing inherently wrong with test helper methods, but they should be general enough that they reside in their own library. The support for mocks in Lookout is provided through a set of test helper methods that make it easier to create mocks than it would have been without them. Lookout-rack³ is another example of a library providing test helper methods (well, one method, actually) that are very useful in testing web applications that use Rack⁴. A final point at which some unit test frameworks try to fragment tests further is documentation. These frameworks provide ways of describing the whats and hows of what’s being tested, the rationale being that this will provide documentation of both the test and the code being tested. Describing how a stack data structure is meant to work is a common example. A stack is, however, a rather simple data structure, so such a description provides little, if any, additional information that can’t be extracted from the implementation and its tests themselves. The implementation and its tests is, in fact, its own best documentation. Taking the points made in the previous paragraphs into account, we should already have simple, self-describing, interfaces that have easily understood tests associated with them. Rationales for the use of a given data structure or system-design design documentation is better suited in separate documentation focused at describing exactly those issues. ¹ Read the Wikipedia entry for Resource Acquisition Is Initialization at http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization ² Read how 37signals had problems with slow Test::Unit tests at http://37signals.com/svn/posts/2742-the-road-to-faster-tests/ ³ Visit the Lookout-rack home page at http://disu.se/software/lookout-rack/ ⁴ Visit the Rack Rubyforge project page at http://rack.rubyforge.org/ § Internal Design The internal design of Lookout has had a couple of goals. • As few external dependencies as possible • As few internal dependencies as possible • Internal extensibility provides external extensibility • As fast load times as possible • As high a ratio of value objects to mutable objects as possible • Each object must have a simple, obvious name • Use mix-ins, not inheritance for shared behavior • As few responsibilities per object as possible • Optimizing for speed can only be done when you have all the facts § External Dependencies Lookout used to depend on Mocha for mocks and stubs. While benchmarking I noticed that a method in Mocha was taking up more than 300 percent of the runtime. It turned out that Mocha’s method for cleaning up back-traces generated when a mock failed was doing something incredibly stupid: backtrace.reject{ |l| Regexp.new(@lib).match(File.expand_path(l)) } Here ‹@lib› is a ‹String› containing the path to the lib sub-directory in the Mocha installation directory. I reported it, provided a patch five days later, then waited. Nothing happened. {254 days later}¹, according to {Wolfram Alpha}², half of my patch was, apparently – I say “apparently”, as I received no notification – applied. By that time I had replaced the whole mocking-and-stubbing subsystem and dropped the dependency. Many Ruby developers claim that Ruby and its gems are too fast-moving for normal package-managing systems to keep up. This is testament to the fact that this isn’t the case and that the real problem is instead related to sloppy practices. Please note that I don’t want to single out the Mocha library nor its developers. I only want to provide an example where relying on external dependencies can be “considered harmful”. ¹ See the Wolfram Alpha calculation at http://www.wolframalpha.com/input/?i=days+between+march+17%2C+2010+and+november+26%2C+2010 ² Check out the Wolfram Alpha computational knowledge engine at http://www.wolframalpha.com/ § Internal Dependencies Lookout has been designed so as to keep each subsystem independent of any other. The diff subsystem is, for example, completely decoupled from any other part of the system as a whole and could be moved into its own library at a time where that would be of interest to anyone. What’s perhaps more interesting is that the diff subsystem is itself very modular. The data passes through a set of filters that depends on what kind of diff has been requested, each filter yielding modified data as it receives it. If you want to read some rather functional Ruby I can highly recommend looking at the code in the ‹lib/lookout/diff› directory. This lookout on the design of the library also makes it easy to extend Lookout. Lookout-rack was, for example, written in about four hours and about 5 of those 240 minutes were spent on setting up the interface between the two. § Optimizing For Speed The following paragraph is perhaps a bit personal, but might be interesting nonetheless. I’ve always worried about speed. The original Expectations library used ‹extend› a lot to add new behavior to objects. Expectations, for example, used to hold the result of their execution (what we now term “evaluation”) by being extended by a module representing success, failure, or error. For the longest time I used this same method, worrying about the increased performance cost that creating new objects for results would incur. I finally came to a point where I felt that the code was so simple and clean that rewriting this part of the code for a benchmark wouldn’t take more than perhaps ten minutes. Well, ten minutes later I had my results and they confirmed that creating new objects wasn’t harming performance. I was very pleased. § Naming I hate low lines (underscores). I try to avoid them in method names and I always avoid them in file names. Since the current “best practice” in the Ruby community is to put ‹BeginEndStorage› in a file called ‹begin_end_storage.rb›, I only name constants using a single noun. This has had the added benefit that classes seem to have acquired less behavior, as using a single noun doesn’t allow you to tack on additional behavior without questioning if it’s really appropriate to do so, given the rather limited range of interpretation for that noun. It also seems to encourage the creation of value objects, as something named ‹Range› feels a lot more like a value than ‹BeginEndStorage›. (To reach object-oriented-programming Nirvana you must achieve complete value.) § News § 3.0.0 The ‹xml› expectation has been dropped. It wasn’t documented, didn’t suit very many use cases, and can be better implemented by an external library. The ‹arg› argument matcher for mock method arguments has been removed, as it didn’t provide any benefit over using Object. The ‹#yield› and ‹#each› methods on stub and mock methods have been removed. They were slightly weird and their use case can be implemented using block parameters instead. The ‹stub› method inside ‹expect› blocks now stubs out the methods during the execution of a provided block instead of during the execution of the whole except block. When a mock method is called too many times, this is reported immediately, with a full backtrace. This makes it easier to pin down what’s wrong with the code. Query expectations were added. Explicit query expectations were added. Fluent boolean expectations, for example, ‹expect nil.to.be.nil?› have been replaced by query expectations (‹expect :nil? do nil end›) and explicit query expectations (‹expect result.to.be.nil? do nil end›). This was done to discourage creating objects as the expected value and creating objects that change during the course of the test. The ‹literal› expectation was added. Equality (‹#==›) is now checked before “caseity” (‹#===›) for modules, ranges, and regular expressions to match the documentation. § Financing Currently, most of my time is spent at my day job and in my rather busy private life. Please motivate me to spend time on this piece of software by donating some of your money to this project. Yeah, I realize that requesting money to develop software is a bit, well, capitalistic of me. But please realize that I live in a capitalistic society and I need money to have other people give me the things that I need to continue living under the rules of said society. So, if you feel that this piece of software has helped you out enough to warrant a reward, please PayPal a donation to now@disu.se¹. Thanks! Your support won’t go unnoticed! ¹ Send a donation: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=now%40disu%2ese&item_name=Lookout § Reporting Bugs Please report any bugs that you encounter to the {issue tracker}¹. ¹ See https://github.com/now/lookout/issues § Contributors Contributors to the original expectations codebase are mentioned there. We hope no one on that list feels left out of this list. Please {let us know}¹ if you do. • Nikolai Weibull ¹ Add an issue to the Lookout issue tracker at https://github.com/now/lookout/issues § Licensing Lookout is free software: you may redistribute it and/or modify it under the terms of the {GNU Lesser General Public License, version 3}¹ or later², as published by the {Free Software Foundation}³. ¹ See http://disu.se/licenses/lgpl-3.0/ ² See http://gnu.org/licenses/ ³ See http://fsf.org/
AYLIEN Text Analysis API is a package of Natural Language Processing and Machine Learning-powered tools for analyzing and extracting various kinds of information from text and images.
Textoken is a Ruby library for text tokenization. This gem extracts words from text with many customizations. It can be used in many fields like Web Crawling and Natural Language Processing.
This library runs GNU Go in a separate process and allows you to communicate with it using the Go Text Protocol (GTP). This makes it easy to manage full games of Go, work with SGF files, analyze Go positions, and more.
A Ruby extension to process MultiMarkdown-formatted text, using Fletcher Penney's C peg-multimarkdown implementation.
The Rosette Text Analytics Platform uses natural language processing, statistical modeling, and machine learning to analyze unstructured and semi-structured text across 364 language-encoding-script combinations, revealing valuable information and actionable data. Rosette provides endpoints for extracting entities and relationships, translating and comparing the similarity of names, categorizing and adding linguistic tags to text and more.
A Rails engine gem for deriving discrete data points from narrative text via natural language processing (NLP). The gem includes a user interface to present the abstracted data points for confirmation/revision by curator.
FreeLing::Analyzer is a Ruby wrapper around `analyzer`, a binary tool included in FreeLing's package that allows the user to process a stream of text with FreeLing.
wortsammler is an environment to manage documentation. Basically it comprises of * a directory structure to organize the document sources * a manifest file to control the publication process * a tool to produce the doucments Particular features of wortsammler are * various output formats * support of requirement management * generate documents for different audiences based on single sources * text snippets (markdown and xlsx) wortsammler is based on ruby, pandoc, latex
Text processing utilities for Brazilian Portuguese
Dragonfly is a framework that enables on-the-fly processing for any content type. It is especially suited to image handling. Its uses range from image thumbnails to standard attachments to on-demand text generation.
Use Mysql AUTO_INCREMENT to support key value cache, which should be combined by an integer and string. It means to reduce the database storage size, and improve query performance. All cache will store in process memory, and will never be expired, until the process dies, so the less kvs you use, the better performance you will get. BTW, 100,000 general strings use 10MB memory. Some relatived articles: http://en.wikipedia.org/wiki/Correlation_database Usage ------------------------------------------ ## setup ```ruby create_table :kv_browser_names, :options => 'ENGINE=MyISAM DEFAULT CHARSET=utf8' do |t| t.string :name t.timestamps end class KvBrowserName < ActiveRecord::Base include IdNameCache end ``` or ```ruby create_table :common_tag, :options => 'ENGINE=MyISAM DEFAULT CHARSET=utf8' do |t| t.integer :tagid t.string :tagname end class CommonTag < ActiveRecord::Base self.table_name = :common_tag self.primary_key = :tagid include IdNameCache; set_key_value :tagid, :tagname # include IdNameCache; set_key_value_without_create :tagid, :tagname # if you dont want create it automately end ``` ### use cases ```text ruby-1.9.3-rc1 :001 > QuizTag[1] QuizTag Load (0.3ms) SELECT `common_tag`.* FROM `common_tag` WHERE `common_tag`.`tagid` = 1 LIMIT 1 => "Android" ruby-1.9.3-rc1 :002 > QuizTag[1] => "Android" ruby-1.9.3-rc1 :003 > QuizTag['Android'] QuizTag Load (0.5ms) SELECT `common_tag`.* FROM `common_tag` WHERE `common_tag`.`tagname` = 'Android' LIMIT 1 => 1 ruby-1.9.3-rc1 :004 > QuizTag['Android'] => 1 ``` == Copyright MIT, David Chen at eoe.cn
Sometimes you need to display activity on the text console to inform the user that the program is actually doing something. Be funny with a lot predefined sets or make you own widget to display an animation text during a long-running process in your console app.
== DESCRIPTION: websitary (formerly known as websitiary with an extra "i") monitors webpages, rss feeds, podcasts etc. It reuses other programs (w3m, diff etc.) to do most of the actual work. By default, it works on an ASCII basis, i.e. with the output of text-based webbrowsers like w3m (or lynx, links etc.) as the output can easily be post-processed. It can also work with HTML and highlight new items. This script was originally planned as a ruby-based websec replacement. By default, this script will use w3m to dump HTML pages and then run diff over the current page and the previous backup. Some pages are better viewed with lynx or links. Downloaded documents (HTML or ASCII) can be post-processed (e.g., filtered through some ruby block that extracts elements via hpricot and the like). Please see the configuration options below to find out how to change this globally or for a single source. This user manual is also available as PDF[http://websitiary.rubyforge.org/websitary.pdf]. == FEATURES/PROBLEMS: * Handle webpages, rss feeds (optionally save attachments in podcasts etc.) * Compare webpages with previous backups * Display differences between the current version and the backup * Provide hooks to post-process the downloaded documents and the diff * Display a one-page report summarizing all news * Automatically open the report in your favourite web-browser * Experimental: Download webpages on defined intervalls and generate incremental diffs.
Input parser for records which require minor text processing before they can be parsed as JSON
Save processed version of a text in the database.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes more than 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google's machine learning technology. Note that google-cloud-speech-v2 is a version-specific client library. For most uses, we recommend installing the main client library google-cloud-speech instead. See the readme for more details.
git-scribe is a workflow tool for starting, writing, reviewing and publishing multiple forms of a book. it allows you to use asciidoc plain text markup to write, review and translate a work and provides a simple toolkit for generating common digital outputs for publishing - epub, mobi, pdf and html. it is also integrated into github functionality, letting you automate the publishing and collaboration process.
Processr is a simple text processing and concatenation library. It takes a number of input strings (or files) and outputs a single string (or file) containing the result. Text can be passed through filters to modify the output.
= Simple task organizer syctask can be used to create, plan, prioritize and schedule tasks. ==Install The application can be installed with $ gem install syc-task == Usage syctask provides basic task organizer functions as create, update, list and complete a task. Additional functions are to plan tasks you want to accomplish today. If you are not sure in which sequence to conduct the task you can prioritize them with a pair wise comparisson. You can time tasks with start and stop and you can finally extract tasks from a minutes of meetings file. The schedule task command will print a graphical timeline of the working day assigning the planned tasks to the timeline. Busy times are marked red. Meetings are listed with associated tasks that are assigned to the meetings. With the statistics command you can print statistical evaluation of tasks duration and count. ===Create tasks with new Create a new task in the default task directory ~/.tasks $ syctask new "My first task" Provide a description $ syctask new "My first task" --description "Explanation of my first task" Schedule a task with a follow-up and due date $ syctask new "My first task" --follow-up "2013-02-25" --due "2013-03-11" Set a proirity for a task $ syctask new "My first task" --prio 3 Prompt for task input $ syctask new will prompt for task titles. Ctrl-D will end input. Except for --description you can also provide short forms for the options. ===Create tasks by scanning from files When writing minutes of meetings tasks that should be followed up in syctask can be annotated so they will be recognized by the scan command. The following structure shows how to annotade tasks Some text before @task; title;description;follow_up;due_date,prio Schedule meeting;Invite all developers;2016-09-12;2016-10-12;1 Write letter;Practice writing letters;;;3 Some text after The above annotation will only scan the next task because of the singular 'task' where the task values are separated with ';'. The line after the annotation '@task' lists the sequence of the fields of the task. It is also possible to list the tasks in a table, e.g. markdown Some text before @tasks| title |description |follow_up |due_date |prio ----------------|--------------------------|----------|----------|---- Schedule meeting|Invite all developers |2016-09-12|2016-10-12|1 Write letter |Practice writing letters | | |3 Some text after Call partner |Ask for project's progress|2016-09-14| |1 Even more text The example above scans all tasks due to the plural 'tasks'. It also scans all tasks that are separated with non-task text and occur after the annotation and confirm to the field structure. Lines that start with '-' will be ignored. So if you want to skip only a few tasks within a task list prepend them with '-'. If you have tasks with different fields then you have to add another annotation with the new field structure. Possible fields are title - the title of the task - mandatory field! description - the description of the task follow_up - the follow-up date of the task in the form yyyy-mm-dd due_date - the due-date of the task in the form yyyy-mm-dd prio - the priority of the task tags - tags the task is annotated with note - a note for the task Note: follow_up and due_date can also be written as Follow-up and Due-Date. Also case is ignored. As inidcated in the list the title column is mandatory. Without the title column scan will raise an error during a scan. Fields that are not part of the above list will be ignored. # | Title | Who - | ------------------------------------ | --- 1 | Schedule meeting with all developers | Me 2 | Write letter to practice writing | You In the table only the column Title will be scanned. The '#' and 'Who' column will be ignored during scan. This table is also a table for a minimum scan structure. You need at least to provide a title column so the scan function will recognize the table as a task list. Scanning tasks from files $ syctask scan 2016-09-10-mom.md 2016-09-09-mom.md ===Plan tasks The plan command will print tasks and prompts whether to (a)dd or (s)kip the task. If (q)uit is selected the tasks already added will be add to the today's task list. If (c)omplete is selected the complete task will be printed and the user will be prompted again for adding the task. Invoke plan without filter $ syctask plan 1 - My first task (a)dd, (c)omplete, (s)kip, (q)uit? a Duration (1 = 15 minutes, return 30 minutes): 3 --> 1 task(s) planned Invoke plan with a filter $ syctask plan --id "1,3,5,8" 1 - My first task (a)dd, (c)omplete, (s)kip, (q)uit? Move tasks to another days plan $ syctask plan today --move tomorrow --id 3,5 This will move the tasks with ID 3 and 5 from the today's plan to the tomorrow's plan. The duration will be set to the remaining processing time but at least to 30 minutes. ===Prioritize tasks Planned tasks can be prioritized in a pair wise comparisson. So each task is compared to all other tasks. The task with the highest priority will bubble on top followed by the task with the next highest priority and so on. $ syctask prio 1: My first task 2: My second task Task 1 has (h)igher or (l)ower priority, or (q)uit: h 1: My first task 2: My third task Task 1 has (h)igher or (l)ower priority, or (q)uit: l 1: My third task 2: My fourth task Task 1 has (h)igher or (l)ower priority, or (q)uit: h ... syctask schedule will then print tasks as follows Tasks ----- 0: 10 - My fourth task 1: 7 - My third task 2: 3 - My first task 3: 9 - My second task ... Instead of conducting pairwise comparisson the order of the tasks in the plan can be specified with the -o flag $ syctask plan -o 7,3,10,9 The plan or schedule command will print the tasks in the specified order Tasks ----- 0: 7 - My third task 1: 3 - My first task 2: 10 - My fourth task 3: 9 - My second task If only a part of the tasks is provided the rest of the tasks is appended to the end of the task plan. If you specify a position flag the prioritized tasks are added at the provided position. $ syctask plan -o 7,9 -p 2 Tasks ----- 0: 3 - My first task 1: 10 - My fourth task 2: 7 - My third task 3: 9 - My second task ===Create schedule The schedule command will print a graphical schedule with assigning the tasks selected with plan. When schedule command is invoked the planned tasks are added at or after the current time within the time schedule. Tasks that are done and scheduled in the future are not shown. Tasks done and in the past are shown with the actual processing time. The day starts at 00:00 and ends at 23:59. So 24:00 should be 00:00. Create a schedule with working time from 8a.m. to 6p.m. and meetings between 9a.m. and 9.30a.m. and 1p.m. and 2.45p.m. $ syctask schedule -w "8:00-18:00" -b "9:00-9:30,13:00-14:45" Add titles to the meetings $ syctask schedule -m "Project status,Management meeting" The output will be Meetings -------- A - Project status B - Management meeting A B xxx-///-|---|---|---///////-|---|---|---| 8 9 10 11 12 13 14 15 16 17 18 1 Tasks ----- 0 - 1: My first task Adding a task to a meeting $ syctask schedule -a "A:0" will print Meetings -------- A - Project status 1 - My first task B - Management meeting A B ----///-|---|---|---///////-|---|---|---| 8 9 10 11 12 13 14 15 16 17 18 Tasks ----- 0: 1 - My first task A task that is re-scheduled with $ syctask update 1 -f tomorrow will be shown as done (green) in the schedule and instead of separator - it shows ~. Tasks ---- 0: 1 ~ My first task A started task will be indicated by * $ syctask start 1 $ syctask sche Tasks ----- 0: 1 * My first task ===List tasks List tasks that are not marked as done in short form $ syctask list List all tasks in long form $ syctask list --all --complete Search tasks that match a pattern $ syctask list --id "<10" --follow_up ">2013-02-25" --title "My \w task" ===Inspect tasks Lists each unplanned task and allows to edit, delete, mark as done or plan for today or another day $ syctask inspect 0016 Create command for inspection (e)dit, (d)one, de(l)ete, (p)lan, da(t)e, (c)omplete, (s)kip, (b)ack, (q)uit ===Edit task Edit a task with ID 10 in vi $ syctask edit 10 ===Update tasks Except for title and id all values can be updated. Note and tags are not overridden rather supplemented with the update value. Update task with ID 1 and provide some informative note $ syctask update 1 --note "Some explanation about the progress on the task" ===Complete tasks Complete the task with ID 1 and provide a final note $ syctask done 1 --note "Finalize my first task" ===Delete tasks Delete tasks with ID 1,3 and 5 from the default task directory $ syctask delete --id 1,3,5 Delete tasks with ID 8 and 12 from the planned tasks of today. The tasks are only removed from the planned tasks and not physically deleted. $ syctask delete --plan today --id 8,12 ===Settings The settings command allows to define default values for task directory and to create general purpose tasks that can be used for tracking and later statistical evaluation. Create general purpose tasks for phone and talk $ syctask setting --general PHONE,TALK List all settings $ syctask setting --list ===Info Info searches for the location of a task and lists all task directories Search for task with id 102 $ syctask info --id 102 List all task directories $ syctask info --taskdir ===Statistics Shows statistics for work and meeting times as well as for task processing Evaluate the complete log file $ syctask statistics Evaluate work times, meetings and tasks between 2013-01-01 and 2013-04-14 $ syctask statistics 2013-01-01 2013-04-14 Evaluate yesterday and today $ syctask statistics yesterday today ===Task directory and project directory The global options --taskdir and --project determine where the command finds or creates the tasks. The default task directory is ~/.tasks, so if no task directory is specified all commands obtain tasks from or create tasks in ~/.tasks. If a project is specified the tasks will be saved to or obtained from the task directories subdirectory specified with the --project flag. --taskdir --project Tasks in - - default_task_dir x - task_dir - x default_task_dir/project x x task_dir/project In the table the relation of commands to --taskdir and --project are listed. Command --taskdir --project Comment delete x x deletes the tasks in taskdir/project done x x marks tasks in taskdir/project as done help - - inspect x x lists task to edit, done, delete, plan list x x lists tasks in taskdir/project new x x creates tasks in taskdir/project plan x x retrieves tasks to plan from taskdir/projekt prio - - input to prio are planned tasks (see plan) scan x x creates scanned tasks in taskdir/project schedule - - schedules the planned tasks (see plan) start - - starts task from planned tasks (see plan) statistics - - shows statistics of time and count stop - - stops task from planned task update x x updates task in taskdir/project ===Files * ID id file contains the last issued id. * IDS ids file contains all issued ids. * Task files The tasks are named ID.task where ID is any Integer as 10.task. The files are saved as YAML files and can be edited directly. * Planned tasks files The planned tasks are save to YYYY-MM-DD_planned_tasks in syctask's system directory. Each task is saved with the task's directory and the ID. * Schedule files The schedule is saved to YYYY-MM-DD_time_schedule in the default task directory. The files are saved as YAML files and can be changed manually. * Log file Creating schedule and task processings is logged to tasks.log. For example when a task is started and stopped this is action is saved to tasks.log. * Tracked file A started task is saved to tracked_tasks. A semaphore file is created with ID.track when the task ID is started. When the task is stopped the semaphore file is deleted. * General purpose tasks With syctask setting -g PHONE so called general purpose tasks can be created. These tasks can be used for time tracking and later statistic evaluation to determine the amount of disturbences e.g. by phone. These tasks are saved to default_tasks. The general purpose tasks itself are also saved to the .syc/syctask directory as regular task files. * Default task dir The default task that is used e.g. with list is saved to default_tasks_dir. This can be set with the setting command. ==Working with syctask To work with syctask and get the most out of it there is to follow a certain process. ===Creating a schedule ==== View tasks In the morning before I start to work I scan my tasks with syctask list or syctask inspect to get an overview of my open tasks. $ syctask list ==== Plan tasks Next I start the planning phase with syctask plan. If I have a specific schedule for the day I will filter for the respective tasks $ syctask plan ==== Prioritize tasks (optionally) If I want to process the tasks in a specific sequence I prioritize the tasks with $ syctask prio ==== Create schedule I create a schedule with my working hours and meetings that have been scheduled with $ syctask schedule -w "8:00-18:00" -b "9:00-10:00,14:30-16:00" -m "Team,Status" ==== Create an agenda I assign the topics I want to discuss in the meetings to the meetings with syctask schedule -a "A:1,3,6;B:3,5" ==== Start a task To begin I start the first task in the schedule with syctask start -p ID (where ID is the ID of the planned (-p) tasks) $ syctask start -p 10 ==== End a task To end the task I invoke $ syctask stop This will stop the last started task ==== Re-schedule a task If I cannot finish a task than I update the task with a new follow-up date $ syctask update 23 -f tomorrow The task will be shown in the today's schedule as done. ==== Complete a task When the task is done I call $ syctask done 23 ===Attachements * E-mails If an e-mail creates a task I create a new task with syctask new title_of_task. The subject of the e-mail I prepend with the ID and move the e-mail to a <b>open topics</b> directory. * Files If I create files in the course of a task I create a folder in the task directory with the ID and save the files in this directory. If there is an existing directory I link to the file from the ID directory ==Supported platform syc-task up to version 0.4.2 has been tested with Ruby 1.9.3. Version 0.4.2 also runs with Ruby 2.7. It also works in Windows using Cygwin. Version 1.0.0 has been upgraded to Ruby 3.2. ==Add TAB-completion to syctask To activate bash's TAB-completion following lines have to be added to ~/.bashrc complete -F get_syctask_commands syctask function get_syctask_commands { if [ -z $2 ] ; then COMPREPLY=(`syctask help -c`) else COMPREPLY=(`syctask help -c $2`) fi } After ~/.bashrc has been updated the shell session has to be restarted with $ source ~/.bashrc Now syctask followed by TAB TAB will print $ syctask <TAB><TAB> delete done list plan scan stop _doc help new prio schedule start update To complete a command we can type $ syctask sch<TAB> which will complete to $ syctask schedule ==Output to Printer To print syctask's output to a printer pipe the command to lpr $ syctask schedule | lpr This will print the schedule to the default printer. To determine all available printer lpstat can be used with the lpstat -a command $ lpstat -a Canon-LBP6650-3470 accepting requests since Sat 16 Mar 2013 04:26:15 PM CET Dell-B1160w-Mono accepting requests since Sat 16 Mar 2013 04:27:45 PM CET To print to Dell-B1160w-Mono the following command can be used $ syctask schedule | lpr -P Dell-B1160w-Mono ==Release Notes ===Version 0.0.1 Implementation of new, update, list and done commands. ===Version 0.0.4 * delete: deleting tasks or remove tasks from a task plan * plan: plan tasks and add them to the task plan * schedule: create a schedule with work and busy time and assign the tasks from the task plan to the free times ===Version 0.0.6 * start: start a task and track the lead time * stop: stop the tracking and print the lead time of the task * start, stop: the task is logged in the ~/.tasks/task.log file when added and when stopped * prio: prioritize tasks in the task plan, that is specifying the sequence in that the tasks should be conducted * plan: --move flag added to move tasks from the specified plan to another days task plan * update, new: when a follow-up or a due date is provided the task is added to the provided dates task plan. If both dates are set the task is added to both dates task plans ===Version 0.0.7 * updated rdoc ===Version 0.1.15 * IDs are now unique independent of the task or project directory. After upgrading from a version 0.0.7 or older the user asked whether to re-index the tasks. It is adviced to tar the tasks before re-indexing with $ tar cvfz tasks.tar.gz .tasks other_task_directories * start will now show a timer in the upper right corner of the screen when started with the -t (--timer) flag. $ syctask start 10 -t In order to use the task timer ncurses has to be installed as the task timer uses tput from the ncurses library. * The schedule has a heading with the schedule's date and the working time * Planned tasks are now added at or after the current time if they are not done yet. Done tasks are shown in the past with the actual processing time. Tasks done before the start of the schedule are not shown in the schedule. * Meetings that are at the current time are indicated with a *. Active tasks are indicated with a star, re-scheduled tasks are indicated with a ~. * Assigning tasks to meetings in a schedule is now done with the task ID * Statistics show statistics about work time, meeting times, general purpose tasks and task processing. Total, min, max and average time and count is listed. If you have used version 0.0.7 it is adviced to delete tasks.log that lives in ~/.tasks before upgrading or in ~/.syc/syctask after upgrading. Otherwise the statistic results seem odd. * Meeting time in time line now shows correct duration * Info command searches for the location of a task and lists all task task directories with the tasks contained. * Plan move command sets the duration to the remaining processing time but at least to 15 minutes * With the setting command the default task directory can be set and general purpose tasks can be created. A general purpose task can be used for tracking to analyse how much time for phone calls is occupied. setting -l list all general purpose tasks and the default task directory * Prio command now takes a position flag together with the order flag to determine where to insert the newly ordered tasks * All commands that take an ID as argument (done, edit, start, update) look up the task file associated to the id in the ids file. If it is found the provided task directory is not considered for the task file. If the id is not contained in the ids file the task is looked up in the provided directory * Inspect command allows to list each today's unplanned task to edit, delete, mark as done or plan * Update command now has a duration flag to set the task's duration ====Version 0.2.0 * Migrated from TestUnit to Minitest * Implemented _timeleap_ {<img src="https://badge.fury.io/rb/timeleap.svg" alt="Gem Version" />}[http://badge.fury.io/rb/timeleap] which allows to specify additional time distances to yesterday, today tomorrow. Time distances come in two flavors as long and short forms. Examples for long forms are - yesterday|today|tomorrow - next|previous_monday|tuesday|...|sunday - monday|tuesday|...|sunday_in|back_1_week|month|year - in|back_10_days|weeks|months|years Examples for short forms are - y|tod|tom - n|pmo|tu|..|su - mo|tu|...|sui|b1w|m|y - i|b10d|w|m|y ====Version 0.2.1 * Fix a bug in `syctask delete --plan` * Add indicator '>' to task list when task contains notes * Refactor migration from version 0.0.7 and when user has deleted system files. The user can now specify the directories where the tasks are located and can also define directories to be excluded. This is especially helpful to omit search in large mounted directories, like from NAS servers. ====Version 0.3.1 * Add csv output spearated by ';' to the list command * Fix bug when schedule file is empty * Add scan command to scan tasks from files ====Version 0.3.2 * Fix bugs of missing class lib/syctask/scanner.rb ====Version 0.4.2 * delete command can take now ranges of ids, e.g. 1,2,4-8,5,20-25 * inspect can now go back in the task list * inspect will now show the updated task after making changes to the task in edit * inspect allows to specify a follow_up date * scan will ignore columns that are not part of a syctask task * scan recognizes 'Follow-up' as well as 'follow_up' now. That is an underscore can be replaced with '-' * Fix bug when scanning tables that have spaces between separator and column * When tasks.log file is missing `syctask inspect` prints warning with reason why statistics cannot be printed ====Version 1.0.0 * Upgrade to Ruby 3.2.2 ==Development Pull from Github and then run $ bundle install New classes have to be added to 'lib/syctask.rb' Debugging the interface can be done with GLI_DEBUG: $ bundle exec env GLI_DEBUG=true bin/syctask Building and pushing the gemfile to Rubygems $ gem build syctask.gemspec $ gem push syc-task-0.2.1.gem ==Tests The test files live in the folder test and start with test_. There is a rake file available to run all tests $ rake test The CLI is tested with Cucumber. To run the Cucumber features in verbose mode $ cucumber or if you prefer cleaner output run $ rake features ==License syc-task is released under the {MIT License}[http://opensource.org/licenses/MIT] ==Links * [http://www.github.com/sugaryourcoffee/syc-task] - Source code on GitHub * [https://rubygems.org/gems/syc-task] - RubyGems
[Groonga](http://groonga.org/) is a full text search engine. It also provides column store based database feature. Column store based database means that aggregate calculation is fast. You can use Groonga server as fast full text search server and fast [OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) (OnLine Analytical Processing) server.
FatTable is a gem that treats tables as a data type. It provides methods for constructing tables from a variety of sources, building them row-by-row, extracting rows, columns, and cells, and performing aggregate operations on columns. It also provides as set of SQL-esque methods for manipulating table objects: select for filtering by columns or for creating new columns, where for filtering by rows, order_by for sorting rows, distinct for eliminating duplicate rows, group_by for aggregating multiple rows into single rows and applying column aggregate methods to ungrouped columns, a collection of join methods for combining tables, and more. Furthermore, FatTable provides methods for formatting tables and producing output that targets various output media: text, ANSI terminals, ruby data structures, LaTeX tables, Emacs org-mode tables, and more. The formatting methods can specify cell formatting in a way that is uniform across all the output methods and can also decorate the output with any number of footers, including group footers. FatTable applies formatting directives to the extent they makes sense for the output medium and treats other formatting directives as no-ops. FatTable can be used to perform operations on data that are naturally best conceived of as tables, which in my experience is quite often. It can also serve as a foundation for providing reporting functions where flexibility about the output medium can be quite useful. Finally FatTable can be used within Emacs org-mode files in code blocks targeting the Ruby language. Org mode tables are presented to a ruby code block as an array of arrays, so FatTable can read them in with its .from_aoa constructor. A FatTable table can output as an array of arrays with its .to_aoa output function and will be rendered in an org-mode buffer as an org-table, ready for processing by other code blocks.
Simple text processing for small data sets.
SSL Transport Agent is a foundation for all applications that may be classified as Transport Agents (TA). A TA listens to one or more TCP ports and when a connection is made to a listening port, a process is dispatched to communicate with that connection. The most common examples of this type of application are Mail Transport Agents (commonly known as Mail Servers), HTTPS Server (commonly known as a Web Server), Mail Delivery Agents (DOVECOT, for example), and other applications that exchange data through the internet. This gem only handles the interface to the network. The application which will process the data (yours) sits on top of this layer. This gem can operate in plain text or encrypted mode, and provides methods for issuing queries to MySQL and DNS. At the time of this writing, it contains only an AUTH PLAIN authentication method. The test application is a full, multi-port, multi-process SMTP receiver with TLS encryption and PLAIN authentication which demonstrates how the SSL Transport Agent is used. This gem is also an excellent demonstration of how to make SSLSockets work, for those interested in such things. This gem (C) 2015 Michael J. Welch, Ph.D. <mjwelchphd@gmail.com> Source code and documentation can be found on GitHub: https://github.com/mjwelchphd/ssltransportagent
Jog is a simple command-line tool that simplifies the process of logging what you've worked on, storing plain-text files in a sensible file structue.
:title: The Ruby API :section: PYAPNS::Client There's python in my ruby! This is a class used to send notifications, provision applications and retrieve feedback using the Apple Push Notification Service. PYAPNS is a multi-application APS provider, meaning it is possible to send notifications to any number of different applications from the same application and same server. It is also possible to scale the client to any number of processes and servers, simply balanced behind a simple web proxy. It may seem like overkill for such a bare interface - after all, the APS service is rather simplistic. However, PYAPNS takes no shortcuts when it comes to completeness/compliance with the APNS protocol and allows the user many optimization and scaling vectors not possible with other libraries. No bandwidth is wasted, connections are persistent and the server is asynchronous therefore notifications are delivered immediately. PYAPNS takes after the design of 3rd party push notification service that charge a fee each time you push a notification, and charge extra for so-called 'premium' service which supposedly gives you quicker access to the APS servers. However, PYAPNS is free, as in beer and offers more scaling opportunities without the financial draw. :section: Provisioning To add your app to the PYAPNS server, it must be `provisioned` at least once. Normally this is done once upon the start-up of your application, be it a web service, desktop application or whatever... It must be done at least once to the server you're connecting to. Multiple instances of PYAPNS will have to have their applications provisioned individually. To provision an application manually use the `PYAPNS::Client#provision` method. require 'pyapns' client = PYAPNS::Client.configure client.provision :app_id => 'cf', :cert => '/home/ss/cert.pem', :env => 'sandbox', :timeout => 15 This basically says "add an app reference named 'cf' to the server and start a connection using the certification, and if it can't within 15 seconds, raise a `PYAPNS::TimeoutException` That's all it takes to get started. Of course, this can be done automatically by using PYAPNS::ClientConfiguration middleware. `PYAPNS::Client` is a singleton class that is configured using the class method `PYAPNS::Client#configure`. It is sensibly configured by default, but can be customized by specifying a hash See the docs on `PYAPNS::ClientConfiguration` for a list of available configuration parameters (some of these are important, and you can specify initial applications) to be configured by default. :section: Sending Notifications Once your client is configured, and application provisioned (again, these should be taken care of before you write notification code) you can begin sending notifications to users. If you're wondering how to acquire a notification token, you've come to the wrong place... I recommend using google. However, if you want to send hundreds of millions of notifications to users, here's how it's done, one at a time... The `PYAPNS::Client#notify` is a sort of polymorphic method which can notify any number of devices at a time. It's basic form is as follows: client.notify 'cf', 'long ass app token', {:aps=> {:alert => 'hello?'}} However, as stated before, it is sort of polymorphic: client.notify 'cf', ['token', 'token2', 'token3'], [alert, alert2, alert3] client.notify :app_id => 'cf', :tokens => 'mah token', :notifications => alertHash client.notify 'cf', 'token', PYAPNS::Notification('hello tits!') As you can see, the method accepts paralell arrays of tokens and notifications meaning any number of notifications can be sent at once. Hashes will be automatically converted to `PYAPNS::Notification` objects so they can be optimized for the wire (nil values removed, etc...), and you can pass `PYAPNS::Notification` objects directly if you wish. :section: Retrieving Feedback The APS service offers a feedback functionality that allows application servers to retrieve a list of device tokens it deems to be no longer in use, and the time it thinks they stopped being useful (the user uninstalled your app, better luck next time...) Sounds pretty straight forward, and it is. Apple recommends you do this at least once an hour. PYAPNS will return a list of 2-element lists with the date and the token: feedbacks = client.feedback 'cf' :section: Asynchronous Calls PYAPNS::Client will, by default, perform no funny stuff and operate entirely within the calling thread. This means that certain applications may hang when, say, sending a notification, if only for a fraction of a second. Obviously not a desirable trait, all `provision`, `feedback` and `notify` methods also take a block, which indicates to the method you want to call PYAPNS asynchronously, and it will be done so handily in another thread, calling back your block with a single argument when finished. Note that `notify` and `provision` return absolutely nothing (nil, for you rub--wait you are ruby developers!). It is probably wise to always use this form of operation so your calling thread is never blocked (especially important in UI-driven apps and asynchronous servers) Just pass a block to provision/notify/feedback like so: PYAPNS::Client.instance.feedback do |feedbacks| feedbacks.each { |f| trim_token f } end :section: PYAPNS::ClientConfiguration A middleware class to make `PYAPNS::Client` easy to use in web contexts Automates configuration of the client in Rack environments using a simple confiuration middleware. To use `PYAPNS::Client` in Rack environments with the least code possible `use PYAPNS::ClientConfiguration` (no, really, in some cases, that's all you need!) middleware with an optional hash specifying the client variables. Options are as follows: use PYAPNS::ClientConfiguration( :host => 'http://localhost/' :port => 7077, :initial => [{ :app_id => 'myapp', :cert => '/home/myuser/apps/myapp/cert.pem', :env => 'sandbox', :timeout => 15 }]) Where the configuration variables are defined: :host String the host where the server can be found :port Number the port to which the client should connect :initial Array OPTIONAL - an array of INITIAL hashes INITIAL HASHES: :app_id String the id used to send messages with this certification can be a totally arbitrary value :cert String a path to the certification or the certification file as a string :env String the environment to connect to apple with, always either 'sandbox' or 'production' :timoeut Number The timeout for the server to use when connecting to the apple servers :section: PYAPNS::Notification An APNS Notification You can construct notification objects ahead of time by using this class. However unnecessary, it allows you to programmatically generate a Notification like so: note = PYAPNS::Notification.new 'alert text', 9, 'flynn.caf', {:extra => 'guid'} -- or -- note = PYAPNS::Notification.new 'alert text' These can be passed to `PYAPNS::Client#notify` the same as hashes
A Ruby extension to process MultiMarkdown-formatted text, using Fletcher Penney's C peg-multimarkdown implementation.
Converts Rich Text Format (RTF) word processing files to plain text. Uses the rtf-filter C++ executable
OWLScribble converts a specific set of wiki text markup into HTML. (The syntax used in the markup is a knockoff of the markup used by OpenWiki; the 'OWL' in OWLScribble means "OpenWiki-like".) The OWLScribble.each_wiki_link method provides a way to customize the HTML produced for individual in-wiki page links. (Since the URLs for such links is custom to each site, and the user may wish to perform DB queries to control the display and/or linking of various links.) The OWLScribble.each_wiki_command method provides a way to handle special processing instructions used in the markup.
The powerful Natural Language Processing APIs let you perform part of speech tagging, entity identification, sentence parsing, and much more to help you understand the meaning of unstructured text.
A framework for combining natural speech processing tools with public APIs. Basic functions work out of the box, and with a bit of configuration you can get weather information, manage your google calendar, or access wolfram alpha, all using your voice or natural language text. If you want more functionality, it's easy to associate your own code with a keyword or speech category. Try the demo interface by tweeting @Cogibara
Walks one or more text files, which may be gzipped, allowing line-by-line processing of the contents
Germinate is a tool for writing about code. With Germinate, the source code IS the article. For example, given the following source code: # #!/usr/bin/env ruby # :BRACKET_CODE: <pre>, </pre> # :PROCESS: ruby, "ruby %f" # :SAMPLE: hello def hello(who) puts "Hello, #{who}" end hello("World") # :TEXT: # Check out my amazing program! Here's the hello method: # :INSERT: @hello:/def/../end/ # And here's the output: # :INSERT: @hello|ruby When we run the <tt>germ format</tt> command the following output is generated: Check out my amazing program! Here's the hello method: <pre> def hello(who) puts "Hello, #{who}" end </pre> And here's the output: <pre> Hello, World </pre> To get a better idea of how this works, please take a look at link:examples/basic.rb, or run: germ generate > basic.rb To generate an example article to play with. Germinate is particularly useful for writing articles, such as blog posts, which contain code excerpts. Instead of forcing you to keep a source code file and an article document in sync throughout the editing process, the Germinate motto is "The source code IS the article". Specially marked comment sections in your code file become the article text. Wherever you need to reference the source code in the article, use insertion directives to tell Germinate what parts of the code to excerpt. An advanced selector syntax enables you to be very specific about which lines of code you want to insert. If you also want to show the output of your code, Germinate has you covered. Special "process" directives enable you to define arbitrary commands which can be run on your code. The output of the command then becomes the excerpt text. You can define an arbitrary number of processes and have different excerpts showing the same code as processed by different commands. You can even string processes together into pipelines. Development of Germinate is graciously sponsored by Devver, purveyor of fine cloud-based services to busy Ruby developers. If you like this tool please check them out at http://devver.net.
Germinate is a tool for writing about code. With Germinate, the source code IS the article. For example, given the following source code: # #!/usr/bin/env ruby # :BRACKET_CODE: <pre>, </pre> # :PROCESS: ruby, "ruby %f" # :SAMPLE: hello def hello(who) puts "Hello, #{who}" end hello("World") # :TEXT: # Check out my amazing program! Here's the hello method: # :INSERT: @hello:/def/../end/ # And here's the output: # :INSERT: @hello|ruby When we run the <tt>germ format</tt> command the following output is generated: Check out my amazing program! Here's the hello method: <pre> def hello(who) puts "Hello, #{who}" end </pre> And here's the output: <pre> Hello, World </pre> To get a better idea of how this works, please take a look at link:examples/basic.rb, or run: germ generate > basic.rb To generate an example article to play with. Germinate is particularly useful for writing articles, such as blog posts, which contain code excerpts. Instead of forcing you to keep a source code file and an article document in sync throughout the editing process, the Germinate motto is "The source code IS the article". Specially marked comment sections in your code file become the article text. Wherever you need to reference the source code in the article, use insertion directives to tell Germinate what parts of the code to excerpt. An advanced selector syntax enables you to be very specific about which lines of code you want to insert. If you also want to show the output of your code, Germinate has you covered. Special "process" directives enable you to define arbitrary commands which can be run on your code. The output of the command then becomes the excerpt text. You can define an arbitrary number of processes and have different excerpts showing the same code as processed by different commands. You can even string processes together into pipelines. Development of Germinate is graciously sponsored by Devver, purveyor of fine cloud-based services to busy Ruby developers. If you like this tool please check them out at http://devver.net.
farsi_processor is a Ruby gem to process (stem and normalize) persian/farsi text
Not a markup language - rather, an library for processing user-contributed text that handles the common ways in which ordinary users add meaning, structure and formatting using plain text.
This gem is intended to be used in Rails pre-processing, after the page has been generated but before it is delivered to the requestor. It does a case-insensitive search in the source text for the pseudo-tag <toc />, which marks where the table of contents will be placed. If the tag is not found, the unmodified source is returned. If the tag is found, it searches the text for header tags in a given range, and add an id attribute if the header does not already have one. If no headers were found, it will remove the tag and return the modified source. If there are headers, a link is generated for each one, using the header's text and id for the link's text and href. The links are wrapped in some divs, with classes and ids added so the table of contents can be styled. The <toc /> pseudo-tag is then replaced with the table of contents, and the the modified source is returned.
This gem pretends to make easier the process of text analysis through the Textalytics APIs
== ICU4R - ICU Unicode bindings for Ruby ICU4R is an attempt to provide better Unicode support for Ruby, where it lacks for a long time. Current code is mostly rewritten string.c from Ruby 1.8.3. ICU4R is Ruby C-extension binding for ICU library[1] and provides following classes and functionality: * UString: - String-like class with internal UTF16 storage; - UCA rules for UString comparisons (<=>, casecmp); - encoding(codepage) conversion; \ - Unicode normalization; - transliteration, also rule-based; Bunch of locale-sensitive functions: - upcase/downcase; - string collation; \ - string search; - iterators over text line/word/char/sentence breaks; \ - message formatting (number/currency/string/time); - date and number parsing. * URegexp - unicode regular expressions. * UResourceBundle - access to resource bundles, including ICU locale data. * UCalendar - date manipulation and timezone info. * UConverter - codepage conversions API * UCollator - locale-sensitive string comparison == Install and usage > ruby extconf.rb > make && make check > make install Now, in your scripts just require 'icu4r'. To create RDoc, run > sh tools/doc.sh == Requirements To build and use ICU4R you will need GCC and ICU v3.4 libraries[2]. == Differences from Ruby String and Regexp classes === UString vs String 1. UString substring/index methods use UTF16 codeunit indexes, not code points. 2. UString supports most methods from String class. Missing methods are: capitalize, capitalize!, swapcase, swapcase! %, center, ljust, rjust chomp, chomp!, chop, chop! \ count, delete, delete!, squeeze, squeeze!, tr, tr!, tr_s, tr_s! crypt, intern, sum, unpack dump, each_byte, each_line hex, oct, to_i, to_sym reverse, reverse! succ, succ!, next, next!, upto 3. Instead of String#% method, UString#format is provided. See FORMATTING for short reference. 4. UStrings can be created via String.to_u(encoding='utf8') or global u(str,[encoding='utf8']) calls. Note that +encoding+ parameter must be value of String class. 5. There's difference between character grapheme, codepoint and codeunit. See UNICODE reports for gory details, but in short: locale dependent notion of character can be presented using more than one codepoint - base letter and combining (accents) (also possible more than one!), and each codepoint can require more than one codeunit to store (for UTF8 codeunit size is 8bit, though \ some codepoints require up to 4bytes). So, UString has normalization and locale dependent break iterators. 6. Currently UString doesn't include Enumerable module. 7. UString index/[] methods which accept URegexp, throw exception if Regexp passed. 8. UString#<=>, UString#casecmp use UCA rules. === URegexp UString uses ICU regexp library. Pattern syntax is described in [./docs/UNICODE_REGEXPS] and ICU docs. There are some differences between processing in Ruby Regexp and URegexp: 1. When UString#sub, UString#gsub are called with block, special vars ($~, $&, $1, ...) aren't set, as their values are processed through deep ruby core code. Instead, block receives UMatch object, which is essentially immutable array of matching groups: "test".u.gsub(ure("(e)(.)")) do |match| \ puts match[0] # => 'es' <--> $& puts match[1] # => 'e' \ <--> $1 puts match[2] # => 's' <--> $2 end 2. In URegexp search pattern backreferences are in form \n (\1, \2, ...), in replacement string - in form $1, $2, ... NOTE: URegexp considers char to be a digit NOT ONLY ASCII (0x0030-0x0039), but any Unicode char, which has property Decimal digit number (Nd), e.g.: a = [?$, 0x1D7D9].pack("U*").u * 2 puts a.inspect_names <U000024>DOLLAR SIGN <U01D7D9>MATHEMATICAL DOUBLE-STRUCK DIGIT ONE <U000024>DOLLAR SIGN <U01D7D9>MATHEMATICAL DOUBLE-STRUCK DIGIT ONE puts "abracadabra".u.gsub(/(b)/.U, a) abbracadabbra \ 3. One can create URegexp using global Kernel#ure function, Regexp#U, Regexp#to_u, or from UString using URegexp.new, e.g: /pattern/.U =~ "string".u 4. There are differences about Regexp and URegexp multiline matching options: t = "text\ntest" # ^,$ handling : URegexp multiline <-> Ruby default t.u =~ ure('^\w+$', URegexp::MULTILINE) => #<UMatch:0xf6f7de04 @ranges=[0..3], @cg=[\u0074\u0065\u0078\u0074]> t =~ /^\w+$/ => 0 # . matches \n : URegexp DOTALL <-> /m t.u =~ ure('.+test', URegexp::DOTALL) \ => #<UMatch:0xf6fa4d88 ... t.u =~ /.+test/m 5. UMatch.range(idx) returns range for capturing group idx. This range is in codeunits. === References 1. ICU Official Homepage http://ibm.com/software/globalization/icu/ 2. ICU downloads \ http://ibm.com/software/globalization/icu/downloads.jsp 3. ICU Home Page http://icu.sf.net 4. Unicode Home Page http://www.unicode.org ==== BUGS, DOCS, TO DO The code is slow and inefficient yet, is still highly experimental, so can have many security and memory leaks, bugs, inconsistent documentation, incomplete test suite. Use it at your own risk. Bug reports and feature requests are welcome :) === Copying This extension module is copyrighted free software by Nikolai Lugovoi. You can redistribute it and/or modify it under the terms of MIT License. Nikolai Lugovoi <meadow.nnick@gmail.com>
== DESCRIPTION: html-me converts text to html for posting in the web. It does this two ways. First, it processes the text unsing RedCloth (a textile engine), then it finds all of the _pre_ tags and adds syntax highlighting. == FEATURES/PROBLEMS: * The syntax highlighting embeds the styles in the HTML tags. I currently don't have access to my blog's stylesheet (damned Wordpress) so that's how it needed to be. CSS class names should be added soon.
A Ruby extension that can process MultiMarkdown-formatted text using the multimarkdown command line tool.
RubyTokenizer is a simple language processing command-line tool. It performs low-level tokenization and returns the top 10 most frequent words in a body of text. At the moment it's only available for English texts and it segments words by filtering whitespaces, punctuation marks, parantheses and other special characters.
== DESCRIPTION: This is a script for monitoring webpages that reuses other programs (w3m, diff, webdiff etc.) to do most of the actual work. By default, it works on an ASCII basis, i.e. with the output of text-based webbrowsers like w3m (or lynx, links etc.) as the output can easily be post-processed. With the help of some friends (see the section below on requirements), it can also work with HTML. E.g., if you have websec installed, you can also use its webdiff program to show colored diffs. By default, this script will use w3m to dump HTML pages and then run diff over the current page and the previous backup. Some pages are better viewed with lynx or links. Downloaded documents (HTML or ASCII) can be post-processed (e.g., filtered through some ruby block that extracts elements via hpricot and the like). Please see the configuration options below to find out how to change this globally or for a single source. === CAVEAT: The script also includes experimental support for monitoring whole websites. Basically, this script supports robots.txt directives (see requirements) but this is hardly tested and may not work in some cases. While it is okay for your own websites to ignore robots.txt, it is not for others. Please make sure that the webpages you run this program on allow such a use. Some webpages disallow the use of any automatic downloader or offline reader in their user agreements.