Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.
Lookout Lookout is a unit testing framework for Ruby¹ that puts your results in focus. Tests (expectations) are written as follows expect 2 do 1 + 1 end expect ArgumentError do Integer('1 + 1') end expect Array do [1, 2, 3].select{ |i| i % 2 == 0 } end expect [2, 4, 6] do [1, 2, 3].map{ |i| i * 2 } end Lookout is designed to encourage – force, even – unit testing best practices such as • Setting up only one expectation per test • Not setting expectations on non-public APIs • Test isolation This is done by • Only allowing one expectation to be set per test • Providing no (additional) way of accessing private state • Providing no setup and tear-down methods, nor a method of providing test helpers Other important points are • Putting the expected outcome of a test in focus with the steps of the calculation of the actual result only as a secondary concern • A focus on code readability by providing no mechanism for describing an expectation other than the code in the expectation itself • A unified syntax for setting up both state-based and behavior-based expectations The way Lookout works has been heavily influenced by expectations², by {Jay Fields}³. The code base was once also heavily based on expectations, based at Subversion {revision 76}⁴. A lot has happened since then and all of the work past that revision are due to {Nikolai Weibull}⁵. ¹ Ruby: http://ruby-lang.org/ ² Expectations: http://expectations.rubyforge.org/ ³ Jay Fields’s blog: http://blog.jayfields.com/ ⁴ Lookout revision 76: https://github.com/now/lookout/commit/537bedf3e5b3eb4b31c066b3266f42964ac35ebe ⁵ Nikolai Weibull’s home page: http://disu.se/ § Installation Install Lookout with % gem install lookout § Usage Lookout allows you to set expectations on an object’s state or behavior. We’ll begin by looking at state expectations and then take a look at expectations on behavior. § Expectations on State: Literals An expectation can be made on the result of a computation: expect 2 do 1 + 1 end Most objects, in fact, have their state expectations checked by invoking ‹#==› on the expected value with the result as its argument. Checking that a result is within a given range is also simple: expect 0.099..0.101 do 0.4 - 0.3 end Here, the more general ‹#===› is being used on the ‹Range›. § Regexps ‹Strings› of course match against ‹Strings›: expect 'ab' do 'abc'[0..1] end but we can also match a ‹String› against a ‹Regexp›: expect %r{a substring} do 'a string with a substring' end (Note the use of ‹%r{…}› to avoid warnings that will be generated when Ruby parses ‹expect /…/›.) § Modules Checking that the result includes a certain module is done by expecting the ‹Module›. expect Enumerable do [] end This, due to the nature of Ruby, of course also works for classes (as they are also modules): expect String do 'a string' end This doesn’t hinder us from expecting the actual ‹Module› itself: expect Enumerable do Enumerable end or the ‹Class›: expect String do String end for obvious reasons. As you may have figured out yourself, this is accomplished by first trying ‹#==› and, if it returns ‹false›, then trying ‹#===› on the expected ‹Module›. This is also true of ‹Ranges› and ‹Regexps›. § Booleans Truthfulness is expected with ‹true› and ‹false›: expect true do 1 end expect false do nil end Results equaling ‹true› or ‹false› are slightly different: expect TrueClass do true end expect FalseClass do false end The rationale for this is that you should only care if the result of a computation evaluates to a value that Ruby considers to be either true or false, not the exact literals ‹true› or ‹false›. § IO Expecting output on an IO object is also common: expect output("abc\ndef\n") do |io| io.puts 'abc', 'def' end This can be used to capture the output of a formatter that takes an output object as a parameter. § Warnings Expecting warnings from code isn’t very common, but should be done: expect warning('this is your final one!') do warn 'this is your final one!' end expect warning('this is your final one!') do warn '%s:%d: warning: this is your final one!' % [__FILE__, __LINE__] end ‹$VERBOSE› is set to ‹true› during the execution of the block, so you don’t need to do so yourself. If you have other code that depends on the value of $VERBOSE, that can be done with ‹#with_verbose› expect nil do with_verbose nil do $VERBOSE end end § Errors You should always be expecting errors from – and in, but that’s a different story – your code: expect ArgumentError do Integer('1 + 1') end Often, not only the type of the error, but its description, is important to check: expect StandardError.new('message') do raise StandardError.new('message') end As with ‹Strings›, ‹Regexps› can be used to check the error description: expect StandardError.new(/mess/) do raise StandardError.new('message') end § Queries Through Symbols Symbols are generally matched against symbols, but as a special case, symbols ending with ‹?› are seen as expectations on the result of query methods on the result of the block, given that the method is of zero arity and that the result isn’t a Symbol itself. Simply expect a symbol ending with ‹?›: expect :empty? do [] end To expect it’s negation, expect the same symbol beginning with ‹not_›: expect :not_nil? do [1, 2, 3] end This is the same as expect true do [].empty? end and expect false do [1, 2, 3].empty? end but provides much clearer failure messages. It also makes the expectation’s intent a lot clearer. § Queries By Proxy There’s also a way to make the expectations of query methods explicit by invoking methods on the result of the block. For example, to check that the even elements of the Array ‹[1, 2, 3]› include ‹1› you could write expect result.to.include? 1 do [1, 2, 3].reject{ |e| e.even? } end You could likewise check that the result doesn’t include 2: expect result.not.to.include? 2 do [1, 2, 3].reject{ |e| e.even? } end This is the same as (and executes a little bit slower than) writing expect false do [1, 2, 3].reject{ |e| e.even? }.include? 2 end but provides much clearer failure messages. Given that these two last examples would fail, you’d get a message saying “[1, 2, 3]#include?(2)” instead of the terser “true≠false”. It also clearly separates the actual expectation from the set-up. The keyword for this kind of expectations is ‹result›. This may be followed by any of the methods • ‹#not› • ‹#to› • ‹#be› • ‹#have› or any other method you will want to call on the result. The methods ‹#to›, ‹#be›, and ‹#have› do nothing except improve readability. The ‹#not› method inverts the expectation. § Literal Literals If you need to literally check against any of the types of objects otherwise treated specially, that is, any instances of • ‹Module› • ‹Range› • ‹Regexp› • ‹Exception› • ‹Symbol›, given that it ends with ‹?› you can do so by wrapping it in ‹literal(…)›: expect literal(:empty?) do :empty? end You almost never need to do this, as, for all but symbols, instances will match accordingly as well. § Expectations on Behavior We expect our objects to be on their best behavior. Lookout allows you to make sure that they are. Reception expectations let us verify that a method is called in the way that we expect it to be: expect mock.to.receive.to_str(without_arguments){ '123' } do |o| o.to_str end Here, ‹#mock› creates a mock object, an object that doesn’t respond to anything unless you tell it to. We tell it to expect to receive a call to ‹#to_str› without arguments and have ‹#to_str› return ‹'123'› when called. The mock object is then passed in to the block so that the expectations placed upon it can be fulfilled. Sometimes we only want to make sure that a method is called in the way that we expect it to be, but we don’t care if any other methods are called on the object. A stub object, created with ‹#stub›, expects any method and returns a stub object that, again, expects any method, and thus fits the bill. expect stub.to.receive.to_str(without_arguments){ '123' } do |o| o.to_str if o.convertable? end You don’t have to use a mock object to verify that a method is called: expect Object.to.receive.name do Object.name end As you have figured out by now, the expected method call is set up by calling ‹#receive› after ‹#to›. ‹#Receive› is followed by a call to the method to expect with any expected arguments. The body of the expected method can be given as the block to the method. Finally, an expected invocation count may follow the method. Let’s look at this formal specification in more detail. The expected method arguments may be given in a variety of ways. Let’s introduce them by giving some examples: expect mock.to.receive.a do |m| m.a end Here, the method ‹#a› must be called with any number of arguments. It may be called any number of times, but it must be called at least once. If a method must receive exactly one argument, you can use ‹Object›, as the same matching rules apply for arguments as they do for state expectations: expect mock.to.receive.a(Object) do |m| m.a 0 end If a method must receive a specific argument, you can use that argument: expect mock.to.receive.a(1..2) do |m| m.a 1 end Again, the same matching rules apply for arguments as they do for state expectations, so the previous example expects a call to ‹#a› with 1, 2, or the Range 1..2 as an argument on ‹m›. If a method must be invoked without any arguments you can use ‹without_arguments›: expect mock.to.receive.a(without_arguments) do |m| m.a end You can of course use both ‹Object› and actual arguments: expect mock.to.receive.a(Object, 2, Object) do |m| m.a nil, 2, '3' end The body of the expected method may be given as the block. Here, calling ‹#a› on ‹m› will give the result ‹1›: expect mock.to.receive.a{ 1 } do |m| raise 'not 1' unless m.a == 1 end If no body has been given, the result will be a stub object. To take a block, grab a block parameter and ‹#call› it: expect mock.to.receive.a{ |&b| b.call(1) } do |m| j = 0 m.a{ |i| j = i } raise 'not 1' unless j == 1 end To simulate an ‹#each›-like method, ‹#call› the block several times. Invocation count expectations can be set if the default expectation of “at least once” isn’t good enough. The following expectations are possible • ‹#at_most_once› • ‹#once› • ‹#at_least_once› • ‹#twice› And, for a given ‹N›, • ‹#at_most(N)› • ‹#exactly(N)› • ‹#at_least(N)› § Utilities: Stubs Method stubs are another useful thing to have in a unit testing framework. Sometimes you need to override a method that does something a test shouldn’t do, like access and alter bank accounts. We can override – stub out – a method by using the ‹#stub› method. Let’s assume that we have an ‹Account› class that has two methods, ‹#slips› and ‹#total›. ‹#Slips› retrieves the bank slips that keep track of your deposits to the ‹Account› from a database. ‹#Total› sums the ‹#slips›. In the following test we want to make sure that ‹#total› does what it should do without accessing the database. We therefore stub out ‹#slips› and make it return something that we can easily control. expect 6 do |m| stub(Class.new{ def slips raise 'database not available' end def total slips.reduce(0){ |m, n| m.to_i + n.to_i } end }.new, :slips => [1, 2, 3]){ |account| account.total } end To make it easy to create objects with a set of stubbed methods there’s also a convenience method: expect 3 do s = stub(:a => 1, :b => 2) s.a + s.b end This short-hand notation can also be used for the expected value: expect stub(:a => 1, :b => 2).to.receive.a do |o| o.a + o.b end and also works for mock objects: expect mock(:a => 2, :b => 2).to.receive.a do |o| o.a + o.b end Blocks are also allowed when defining stub methods: expect 3 do s = stub(:a => proc{ |a, b| a + b }) s.a(1, 2) end If need be, we can stub out a specific method on an object: expect 'def' do stub('abc', :to_str => 'def'){ |a| a.to_str } end The stub is active during the execution of the block. § Overriding Constants Sometimes you need to override the value of a constant during the execution of some code. Use ‹#with_const› to do just that: expect 'hello' do with_const 'A::B::C', 'hello' do A::B::C end end Here, the constant ‹A::B::C› is set to ‹'hello'› during the execution of the block. None of the constants ‹A›, ‹B›, and ‹C› need to exist for this to work. If a constant doesn’t exist it’s created and set to a new, empty, ‹Module›. The value of ‹A::B::C›, if any, is restored after the block returns and any constants that didn’t previously exist are removed. § Overriding Environment Variables Another thing you often need to control in your tests is the value of environment variables. Depending on such global values is, of course, not a good practice, but is often unavoidable when working with external libraries. ‹#With_env› allows you to override the value of environment variables during the execution of a block by giving it a ‹Hash› of key/value pairs where the key is the name of the environment variable and the value is the value that it should have during the execution of that block: expect 'hello' do with_env 'INTRO' => 'hello' do ENV['INTRO'] end end Any overridden values are restored and any keys that weren’t previously a part of the environment are removed when the block returns. § Overriding Globals You may also want to override the value of a global temporarily: expect 'hello' do with_global :$stdout, StringIO.new do print 'hello' $stdout.string end end You thus provide the name of the global and a value that it should take during the execution of a block of code. The block gets passed the overridden value, should you need it: expect true do with_global :$stdout, StringIO.new do |overridden| $stdout != overridden end end § Integration Lookout can be used from Rake¹. Simply install Lookout-Rake²: % gem install lookout-rake and add the following code to your Rakefile require 'lookout-rake-3.0' Lookout::Rake::Tasks::Test.new Make sure to read up on using Lookout-Rake for further benefits and customization. ¹ Read more about Rake at http://rake.rubyforge.org/ ² Get information on Lookout-Rake at http://disu.se/software/lookout-rake/ § API Lookout comes with an API¹ that let’s you create things such as new expected values, difference reports for your types, and so on. ¹ See http://disu.se/software/lookout/api/ § Interface Design The default output of Lookout can Spartanly be described as Spartan. If no errors or failures occur, no output is generated. This is unconventional, as unit testing frameworks tend to dump a lot of information on the user, concerning things such as progress, test count summaries, and flamboyantly colored text telling you that your tests passed. None of this output is needed. Your tests should run fast enough to not require progress reports. The lack of output provides you with the same amount of information as reporting success. Test count summaries are only useful if you’re worried that your tests aren’t being run, but if you worry about that, then providing such output doesn’t really help. Testing your tests requires something beyond reporting some arbitrary count that you would have to verify by hand anyway. When errors or failures do occur, however, the relevant information is output in a format that can easily be parsed by an ‹'errorformat'› for Vim or with {Compilation Mode}¹ for Emacs². Diffs are generated for Strings, Arrays, Hashes, and I/O. ¹ Read up on Compilation mode for Emacs at http://www.emacswiki.org/emacs/CompilationMode ² Visit The GNU Foundation’s Emacs’ software page at http://www.gnu.org/software/emacs/ § External Design Let’s now look at some of the points made in the introduction in greater detail. Lookout only allows you to set one expectation per test. If you’re testing behavior with a reception expectation, then only one method-invocation expectation can be set. If you’re testing state, then only one result can be verified. It may seem like this would cause unnecessary duplication between tests. While this is certainly a possibility, when you actually begin to try to avoid such duplication you find that you often do so by improving your interfaces. This kind of restriction tends to encourage the use of value objects, which are easy to test, and more focused objects, which require simpler tests, as they have less behavior to test, per method. By keeping your interfaces focused you’re also keeping your tests focused. Keeping your tests focused improves, in itself, test isolation, but let’s look at something that hinders it: setup and tear-down methods. Most unit testing frameworks encourage test fragmentation by providing setup and tear-down methods. Setup methods create objects and, perhaps, just their behavior for a set of tests. This means that you have to look in two places to figure out what’s being done in a test. This may work fine for few methods with simple set-ups, but makes things complicated when the number of tests increases and the set-up is complex. Often, each test further adjusts the previously set-up object before performing any verifications, further complicating the process of figuring out what state an object has in a given test. Tear-down methods clean up after tests, perhaps by removing records from a database or deleting files from the file-system. The duplication that setup methods and tear-down methods hope to remove is better avoided by improving your interfaces. This can be done by providing better set-up methods for your objects and using idioms such as {Resource Acquisition Is Initialization}¹ for guaranteed clean-up, test or no test. By not using setup and tear-down methods we keep everything pertinent to a test in the test itself, thus improving test isolation. (You also won’t {slow down your tests}² by keeping unnecessary state.) Most unit test frameworks also allow you to create arbitrary test helper methods. Lookout doesn’t. The same rationale as that that has been crystallized in the preceding paragraphs applies. If you need helpers you’re interface isn’t good enough. It really is as simple as that. To clarify: there’s nothing inherently wrong with test helper methods, but they should be general enough that they reside in their own library. The support for mocks in Lookout is provided through a set of test helper methods that make it easier to create mocks than it would have been without them. Lookout-rack³ is another example of a library providing test helper methods (well, one method, actually) that are very useful in testing web applications that use Rack⁴. A final point at which some unit test frameworks try to fragment tests further is documentation. These frameworks provide ways of describing the whats and hows of what’s being tested, the rationale being that this will provide documentation of both the test and the code being tested. Describing how a stack data structure is meant to work is a common example. A stack is, however, a rather simple data structure, so such a description provides little, if any, additional information that can’t be extracted from the implementation and its tests themselves. The implementation and its tests is, in fact, its own best documentation. Taking the points made in the previous paragraphs into account, we should already have simple, self-describing, interfaces that have easily understood tests associated with them. Rationales for the use of a given data structure or system-design design documentation is better suited in separate documentation focused at describing exactly those issues. ¹ Read the Wikipedia entry for Resource Acquisition Is Initialization at http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization ² Read how 37signals had problems with slow Test::Unit tests at http://37signals.com/svn/posts/2742-the-road-to-faster-tests/ ³ Visit the Lookout-rack home page at http://disu.se/software/lookout-rack/ ⁴ Visit the Rack Rubyforge project page at http://rack.rubyforge.org/ § Internal Design The internal design of Lookout has had a couple of goals. • As few external dependencies as possible • As few internal dependencies as possible • Internal extensibility provides external extensibility • As fast load times as possible • As high a ratio of value objects to mutable objects as possible • Each object must have a simple, obvious name • Use mix-ins, not inheritance for shared behavior • As few responsibilities per object as possible • Optimizing for speed can only be done when you have all the facts § External Dependencies Lookout used to depend on Mocha for mocks and stubs. While benchmarking I noticed that a method in Mocha was taking up more than 300 percent of the runtime. It turned out that Mocha’s method for cleaning up back-traces generated when a mock failed was doing something incredibly stupid: backtrace.reject{ |l| Regexp.new(@lib).match(File.expand_path(l)) } Here ‹@lib› is a ‹String› containing the path to the lib sub-directory in the Mocha installation directory. I reported it, provided a patch five days later, then waited. Nothing happened. {254 days later}¹, according to {Wolfram Alpha}², half of my patch was, apparently – I say “apparently”, as I received no notification – applied. By that time I had replaced the whole mocking-and-stubbing subsystem and dropped the dependency. Many Ruby developers claim that Ruby and its gems are too fast-moving for normal package-managing systems to keep up. This is testament to the fact that this isn’t the case and that the real problem is instead related to sloppy practices. Please note that I don’t want to single out the Mocha library nor its developers. I only want to provide an example where relying on external dependencies can be “considered harmful”. ¹ See the Wolfram Alpha calculation at http://www.wolframalpha.com/input/?i=days+between+march+17%2C+2010+and+november+26%2C+2010 ² Check out the Wolfram Alpha computational knowledge engine at http://www.wolframalpha.com/ § Internal Dependencies Lookout has been designed so as to keep each subsystem independent of any other. The diff subsystem is, for example, completely decoupled from any other part of the system as a whole and could be moved into its own library at a time where that would be of interest to anyone. What’s perhaps more interesting is that the diff subsystem is itself very modular. The data passes through a set of filters that depends on what kind of diff has been requested, each filter yielding modified data as it receives it. If you want to read some rather functional Ruby I can highly recommend looking at the code in the ‹lib/lookout/diff› directory. This lookout on the design of the library also makes it easy to extend Lookout. Lookout-rack was, for example, written in about four hours and about 5 of those 240 minutes were spent on setting up the interface between the two. § Optimizing For Speed The following paragraph is perhaps a bit personal, but might be interesting nonetheless. I’ve always worried about speed. The original Expectations library used ‹extend› a lot to add new behavior to objects. Expectations, for example, used to hold the result of their execution (what we now term “evaluation”) by being extended by a module representing success, failure, or error. For the longest time I used this same method, worrying about the increased performance cost that creating new objects for results would incur. I finally came to a point where I felt that the code was so simple and clean that rewriting this part of the code for a benchmark wouldn’t take more than perhaps ten minutes. Well, ten minutes later I had my results and they confirmed that creating new objects wasn’t harming performance. I was very pleased. § Naming I hate low lines (underscores). I try to avoid them in method names and I always avoid them in file names. Since the current “best practice” in the Ruby community is to put ‹BeginEndStorage› in a file called ‹begin_end_storage.rb›, I only name constants using a single noun. This has had the added benefit that classes seem to have acquired less behavior, as using a single noun doesn’t allow you to tack on additional behavior without questioning if it’s really appropriate to do so, given the rather limited range of interpretation for that noun. It also seems to encourage the creation of value objects, as something named ‹Range› feels a lot more like a value than ‹BeginEndStorage›. (To reach object-oriented-programming Nirvana you must achieve complete value.) § News § 3.0.0 The ‹xml› expectation has been dropped. It wasn’t documented, didn’t suit very many use cases, and can be better implemented by an external library. The ‹arg› argument matcher for mock method arguments has been removed, as it didn’t provide any benefit over using Object. The ‹#yield› and ‹#each› methods on stub and mock methods have been removed. They were slightly weird and their use case can be implemented using block parameters instead. The ‹stub› method inside ‹expect› blocks now stubs out the methods during the execution of a provided block instead of during the execution of the whole except block. When a mock method is called too many times, this is reported immediately, with a full backtrace. This makes it easier to pin down what’s wrong with the code. Query expectations were added. Explicit query expectations were added. Fluent boolean expectations, for example, ‹expect nil.to.be.nil?› have been replaced by query expectations (‹expect :nil? do nil end›) and explicit query expectations (‹expect result.to.be.nil? do nil end›). This was done to discourage creating objects as the expected value and creating objects that change during the course of the test. The ‹literal› expectation was added. Equality (‹#==›) is now checked before “caseity” (‹#===›) for modules, ranges, and regular expressions to match the documentation. § Financing Currently, most of my time is spent at my day job and in my rather busy private life. Please motivate me to spend time on this piece of software by donating some of your money to this project. Yeah, I realize that requesting money to develop software is a bit, well, capitalistic of me. But please realize that I live in a capitalistic society and I need money to have other people give me the things that I need to continue living under the rules of said society. So, if you feel that this piece of software has helped you out enough to warrant a reward, please PayPal a donation to now@disu.se¹. Thanks! Your support won’t go unnoticed! ¹ Send a donation: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=now%40disu%2ese&item_name=Lookout § Reporting Bugs Please report any bugs that you encounter to the {issue tracker}¹. ¹ See https://github.com/now/lookout/issues § Contributors Contributors to the original expectations codebase are mentioned there. We hope no one on that list feels left out of this list. Please {let us know}¹ if you do. • Nikolai Weibull ¹ Add an issue to the Lookout issue tracker at https://github.com/now/lookout/issues § Licensing Lookout is free software: you may redistribute it and/or modify it under the terms of the {GNU Lesser General Public License, version 3}¹ or later², as published by the {Free Software Foundation}³. ¹ See http://disu.se/licenses/lgpl-3.0/ ² See http://gnu.org/licenses/ ³ See http://fsf.org/
Implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm in Ruby, a multi-word keywords extraction.
A Lita extension to extract CLI-style arguments from messages.
Implementation of TextRank solution to ranked keyword extraction. See https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
Using a variety of APIs (Yahoo term Extractor and Alchemy are currently supported), semantic_extraction can automatically return a collection of keywords for an arbitrary block of text. If using Alchemy, it can also return named entities.
Contains stop words lists and methods to extract keywords from strings.
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, grammar and spelling correction, keywords and keyphrases extraction, chatbot, product description and ad generation, intent classification, text generation, image generation, code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, speech synthesis, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, served through a REST API. This is the Ruby client for the API. More details here: https://nlpcloud.com. Documentation: https://docs.nlpcloud.com.
Use webtagger to use keyword extraction web services (yahoo, tagthe and alchemy) to extract from a text terms suitable for tagging, summarization, query building, etc.
Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.
SERP Scraper is a ruby library that extracts keyword rankings from Google.
Habariki is a template match engine for strings that can extract part of text by template keywords.
Extracts keywords from ember documentation.
Implementation of the Rapid Automatic Spanish Keyword Extraction (RASKE) algorithm
Extract Keywords from Arabic Text
Extract metadata, tags, keywords from a torrent name.
Implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm in Ruby. Forked from the original rake_text gem.
This gem allows users to create simple and complex GPT functions for various applications such as translation and keyword extraction.